Tag Archives: AWS Clean Rooms

Automate AWS Clean Rooms querying and dashboard publishing using AWS Step Functions and Amazon QuickSight – Part 2

2024-02-12 Venkata Kampana

Post Syndicated from Venkata Kampana original https://aws.amazon.com/blogs/big-data/automate-aws-clean-rooms-querying-and-dashboard-publishing-using-aws-step-functions-and-amazon-quicksight-part-2/

Public health organizations need access to data insights that they can quickly act upon, especially in times of health emergencies, when data needs to be updated multiple times daily. For example, during the COVID-19 pandemic, access to timely data insights was critically important for public health agencies worldwide as they coordinated emergency response efforts. Up-to-date information and analysis empowered organizations to monitor the rapidly changing situation and direct resources accordingly.

This is the second post in this series; we recommend that you read this first post before diving deep into this solution. In our first post, Enable data collaboration among public health agencies with AWS Clean Rooms – Part 1 , we showed how public health agencies can create AWS Clean Room collaborations, invite other stakeholders to join the collaboration, and run queries on their collective data without either party having to share or copy underlying data with each other. As mentioned in the previous blog, AWS Clean Rooms enables multiple organizations to analyze their data and unlock insights they can act upon, without having to share sensitive, restricted, or proprietary records.

However, public health organizations leaders and decision-making officials don’t directly access data collaboration outputs from their Amazon Simple Storage Service (Amazon S3) buckets. Instead, they rely on up-to-date dashboards that help them visualize data insights to make informed decisions quickly.

To ensure these dashboards showcase the most updated insights, the organization builders and data architects need to catalog and update AWS Clean Rooms collaboration outputs on an ongoing basis, which often involves repetitive and manual processes that, if not done well, could delay your organization’s access to the latest data insights.

Manually handling repetitive daily tasks at scale poses risks like delayed insights, miscataloged outputs, or broken dashboards. At a large volume, it would require around-the-clock staffing, straining budgets. This manual approach could expose decision-makers to inaccurate or outdated information.

Automating repetitive workflows, validation checks, and programmatic dashboard refreshes removes human bottlenecks and help decrease inaccuracies. Automation helps ensure continuous, reliable processes that deliver the most current data insights to leaders without delays, all while streamlining resources.

In this post, we explain an automated workflow using AWS Step Functions and Amazon QuickSight to help organizations access the most current results and analyses, without delays from manual data handling steps. This workflow implementation will empower decision-makers with real-time visibility into the evolving collaborative analysis outputs, ensuring they have up-to-date, relevant insights that they can act upon quickly

Solution overview

The following reference architecture illustrates some of the foundational components of clean rooms query automation and publishing dashboards using AWS services. We automate running queries using Step Functions with Amazon EventBridge schedules, build an AWS Glue Data Catalog on query outputs, and publish dashboards using QuickSight so they automatically refresh with new data. This allows public health teams to monitor the most recent insights without manual updates.

The architecture consists of the following components, as numbered in the preceding figure:

A scheduled event rule on EventBridge triggers a Step Functions workflow.
The Step Functions workflow initiates the run of a query using the StartProtectedQuery AWS Clean Rooms API. The submitted query runs securely within the AWS Clean Rooms environment, ensuring data privacy and compliance. The results of the query are then stored in a designated S3 bucket, with a unique protected query ID serving as the prefix for the stored data. This unique identifier is generated by AWS Clean Rooms for each query run, maintaining clear segregation of results.
When the AWS Clean Rooms query is successfully complete, the Step Functions workflow calls the AWS Glue API to update the location of the table in the AWS Glue Data Catalog with the Amazon S3 location where the query results were uploaded in Step 2.
Amazon Athena uses the catalog from the Data Catalog to query the information using standard SQL.
QuickSight is used to query, build visualizations, and publish dashboards using the data from the query results.

Prerequisites

For this walkthrough, you need the following:

An AWS account.
AWS Management Console access to launch AWS CloudFormation templates.
A QuickSight account.
An AWS Clean rooms collaboration. For this post, we use the membership ID for the collaboration created in Part 1 of this series. You can locate this on the AWS Clean Rooms console, on your collaboration Details tab.

Launch the CloudFormation stack

In this post, we provide a CloudFormation template to create the following resources:

An EventBridge rule that triggers the Step Functions state machine on a schedule
An AWS Glue database and a catalog table
An Athena workgroup
Three S3 buckets:
- For AWS Clean Rooms to upload the results of query runs
- For Athena to upload the results for the queries
- For storing access logs of other buckets
A Step Functions workflow designed to run the AWS Clean Rooms query, upload the results to an S3 bucket, and update the table location with the S3 path in the AWS Glue Data Catalog
An AWS Key Management Service (AWS KMS) customer-managed key to encrypt the data in S3 buckets
AWS Identity and Access Management (IAM) roles and policies with the necessary permissions

To create the necessary resources, complete the following steps:

Choose Launch Stack:

Enter cleanrooms-query-automation-blog for Stack name.
Enter the membership ID from the AWS Clean Rooms collaboration you created in Part 1 of this series.
Choose Next.

Choose Next again.
On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources.
Choose Create stack.

After you run the CloudFormation template and create the resources, you can find the following information on the stack Outputs tab on the AWS CloudFormation console:

AthenaWorkGroup – The Athena workgroup
EventBridgeRule – The EventBridge rule triggering the Step Functions state machine
GlueDatabase – The AWS Glue database
GlueTable – The AWS Glue table storing metadata for AWS Clean Rooms query results
S3Bucket – The S3 bucket where AWS Clean Rooms uploads query results
StepFunctionsStateMachine – The Step Functions state machine

Test the solution

The EventBridge rule named cleanrooms_query_execution_Stepfunctions_trigger is scheduled to trigger every 1 hour. When this rule is triggered, it initiates the run of the CleanRoomsBlogStateMachine-XXXXXXX Step Functions state machine. Complete the following steps to test the end-to-end flow of this solution:

On the Step Functions console, navigate to the state machine you created.
On the state machine details page, locate the latest query run.

The details page lists the completed steps:

The state machine submits a query to AWS Clean Rooms using the startProtectedQuery API. The output of the API includes the query run ID and its status.
The state machine waits for 30 seconds before checking the status of the query run.
After 30 seconds, the state machine checks the query status using the getProtectedQuery API. When the status changes to SUCCESS, it proceeds to the next step to retrieve the AWS Glue table metadata information. The output of this step contains the S3 location to which the query run results are uploaded.
The state machine retrieves the metadata of the AWS Glue table named patientimmunization, which was created via the CloudFormation stack.
The state machine updates the S3 location (the location to which AWS Clean Rooms uploaded the results) in the metadata of the AWS Glue table.
After a successful update of the AWS Glue table metadata, the state machine is complete.

On the Athena console, switch the workgroup to CustomWorkgroup.
Run the following query:

“SELECT * FROM "cleanrooms_patientdb "."patientimmunization" limit 10;"

Visualize the data with QuickSight

Now that you can query your data in Athena, you can use QuickSight to visualize the results. Let’s start by granting QuickSight access to the S3 bucket where your AWS Clean Rooms query results are stored.

Grant QuickSight access to Athena and your S3 bucket

First, grant QuickSight access to the S3 bucket:

Sign in to the QuickSight console.
Choose your user name, then choose Manage QuickSight.
Choose Security and permissions.
For QuickSight access to AWS services, choose Manage.
For Amazon S3, choose Select S3 buckets, and choose the S3 bucket named cleanrooms-query-execution-results -XX-XXXX-XXXXXXXXXXXX (XXXXX represents the AWS Region and account number where the solution is deployed).
Choose Save.

Create your datasets and publish visuals

Before you can analyze and visualize the data in QuickSight, you must create datasets for your Athena tables.

On the QuickSight console, choose Datasets in the navigation pane.
Choose New dataset.
Select Athena.
Enter a name for your dataset.
Choose Create data source.
Choose the AWS Glue database cleanrooms_patientdb and select the table PatientImmunization.
Select Directly query your data.
Choose Visualize.

On the Analysis tab, choose the visual type of your choice and add visuals.

Clean up

Complete the following steps to clean up your resources when you no longer need this solution:

Manually delete the S3 buckets and the data stored in the bucket.
Delete the CloudFormation templates.
Delete the QuickSight analysis.
Delete the data source.

Conclusion

In this post, we demonstrated how to automate running AWS Clean Rooms queries using an API call from Step Functions. We also showed how to update the query results information on the existing AWS Glue table, query the information using Athena, and create visuals using QuickSight.

The automated workflow solution delivers real-time insights from AWS Clean Rooms collaborations to decision makers through automated checks for new outputs, processing, and Amazon QuickSight dashboard refreshes. This eliminates manual handling tasks, enabling faster data-driven decisions based on latest analyses. Additionally, automation frees up staff resources to focus on more strategic initiatives rather than repetitive updates.

Contact the public sector team directly to learn more about how to set up this solution, or reach out to your AWS account team to engage on a proof of concept of this solution for your organization.

About AWS Clean Rooms

AWS Clean Rooms helps companies and their partners more easily and securely analyze and collaborate on their collective datasets—without sharing or copying one another’s underlying data. With AWS Clean Rooms, you can create a secure data clean room in minutes, and collaborate with any other company on the AWS Cloud to generate unique insights about advertising campaigns, investment decisions, and research and development.

The AWS Clean Rooms team is continually building new features to help you collaborate. Watch this video to learn more about privacy-enhanced collaboration with AWS Clean Rooms.

Check out more AWS Partners or contact an AWS Representative to know how we can help accelerate your business.

Additional resources

About the Authors

Venkata Kampana is a Senior Solutions Architect in the AWS Health and Human Services team and is based in Sacramento, CA. In that role, he helps public sector customers achieve their mission objectives with well-architected solutions on AWS.

Jim Daniel is the Public Health lead at Amazon Web Services. Previously, he held positions with the United States Department of Health and Human Services for nearly a decade, including Director of Public Health Innovation and Public Health Coordinator. Before his government service, Jim served as the Chief Information Officer for the Massachusetts Department of Public Health.

AWS Clean Rooms proof of concept scoping part 1: media measurement

2023-12-07 Shaila Mathias

Post Syndicated from Shaila Mathias original https://aws.amazon.com/blogs/big-data/aws-clean-rooms-proof-of-concept-scoping-part-1-media-measurement/

Companies are increasingly seeking ways to complement their data with external business partners’ data to build, maintain, and enrich their holistic view of their business at the consumer level. AWS Clean Rooms helps companies more easily and securely analyze and collaborate on their collective datasets—without sharing or copying each other’s underlying data. With AWS Clean Rooms, you can create a secure data clean room in minutes and collaborate with any other company on Amazon Web Services (AWS) to generate unique insights.

One way to quickly get started with AWS Clean Rooms is with a proof of concept (POC) between you and a priority partner. AWS Clean Rooms supports multiple industries and use cases, and this blog is the first of a series on types of proof of concepts that can be conducted with AWS Clean Rooms.

In this post, we outline planning a POC to measure media effectiveness in a paid advertising campaign. The collaborators are a media owner (“CTV.Co,” a connected TV provider) and brand advertiser (“Coffee.Co,” a quick service restaurant company), that are analyzing their collective data to understand the impact on sales as a result of an advertising campaign. We chose to start this series with media measurement because “Results & Measurement” was the top ranked use case for data collaboration by customers in a recent survey the AWS Clean Rooms team conducted.

Important to keep in mind

AWS Clean Rooms is generally available so any AWS customer can sign in to the AWS Management Console and start using the service today without additional paperwork.
With AWS Clean Rooms, you can perform two types of analyses: SQL queries and machine learning. For the purpose of this blog, we will be focusing only on SQL queries. You can learn more about both types of analyses and their cost structures on the AWS Clean Rooms Features and Pricing webpages. The AWS Clean Rooms team can help you estimate the cost of a POC and can be reached at [email protected].
While AWS Clean Rooms supports multiparty collaboration, we assume two members in the AWS Clean Rooms POC collaboration in this blog post.

Overview

Setting up a POC helps define an existing problem of a specific use case for using AWS Clean Rooms with your partners. After you’ve determined who you want to collaborate with, we recommend three steps to set up your POC:

Defining the business context and success criteria – Determine which partner, which use case should be tested, and what the success criteria are for the AWS Clean Rooms collaboration.
Aligning on the technical choices for this test – Make the technical decisions of who sets up the clean room, who is analyzing the data, which data sets are being used, join keys and what analysis is being run.
Outlining the workflow and timing – Create a workback plan, decide on synthetic data testing, and align on production data testing.

In this post, we walk through an example of how a quick service restaurant (QSR) coffee company (Coffee.Co) would set up a POC with a connected TV provider (CTV.Co) to determine the success of an advertising campaign.

Business context and success criteria for the POC

Define the use case to be tested

The first step in setting up the POC is defining the use case being tested with your partner in AWS Clean Rooms. For example, Coffee.Co wants to run a measurement analysis to determine the media exposure on CTV.Co that led to sign up for Coffee.Co’s loyalty program. AWS Clean Rooms allows for Coffee.Co and CTV.Co to collaborate and analyze their collective datasets without copying each other’s underlying data.

Success criteria

It’s important to determine metrics of success and acceptance criteria to move the POC to production upfront. For example, Coffee.Co’s goal is to achieve a sufficient match rate between their data set and CTV.Co’s data set to ensure the efficacy of the measurement analysis. Additionally, Coffee.Co wants ease-of-use for existing Coffee.Co team members to set up the collaboration and action on the insights driven from the collaboration to optimize future media spend to tactics on CTV.Co that will drive more loyalty members.

Technical choices for the POC

Determine the collaboration creator, AWS account IDs, query runner, payor and results receiver

Each AWS Clean Rooms collaboration is created by a single AWS account inviting other AWS accounts. The collaboration creator specifies which accounts are invited to the collaboration, who can run queries, who pays for the compute, who can receive the results, and the optional query logging and cryptographic computing settings. The creator is also able to remove members from a collaboration. In this POC, Coffee.Co initiates the collaboration by inviting CTV.Co. Additionally, Coffee.Co runs the queries and receives the results, but CTV.Co pays for the compute.

Query logging setting

If logging is enabled in the collaboration, AWS Clean Rooms allows each collaboration member to receive query logs. The collaborator running the queries, Coffee.Co, gets logs for all data tables while the other collaborator, CTV.Co, only sees the logs if their data tables are referenced in the query.

Decide the AWS region

The underlying Amazon Simple Storage Service (Amazon S3) and AWS Glue resources for the data tables used in the collaboration must be in the same AWS Region as the AWS Clean Rooms collaboration. For example, Coffee.Co and CTV.Co agree on the US East (Ohio) Region for their collaboration.

Join keys

To join data sets in an AWS Clean Rooms query, each side of the join must share a common key. Key join comparison with the equal to operator (=) must evaluate to True. AND or OR logical operators can be used in the inner join for matching on multiple join columns. Keys such as email address, phone number, or UID2 are often considered. Third party identifiers from LiveRamp, Experian, or Neustar can be used in the join through AWS Clean Rooms specific work flows with each partner.

If sensitive data is being used as join keys, it’s recommended to use an obfuscation technique to mitigate the risk of exposing sensitive data if the data is mishandled. Both parties must use a technique that produces the same obfuscated join key values such as hashing. Cryptographic Computing for Clean Rooms can be used for this propose.

In this POC, Coffee.Co and CTV.Co are joining on hashed email or hashed mobile. Both collaborators are using the SHA256 hash on their plaintext email and phone number when preparing their data sets for the collaboration.

Data schema

The exact data schema must be determined by collaborators to support the agreed upon analysis. In this POC, Coffee.Co is running a conversion analysis to measure media exposures on CTV.Co that led to sign-up for Coffee.Co’s loyalty program. Coffee.Co’s schema includes hashed email, hashed mobile, loyalty sign up date, loyalty membership type, and birthday of member. CTV.Co’s schema includes hashed email, hashed mobile, impressions, clicks, timestamp, ad placement, and ad placement type.

Analysis rule applied to each configured table associated to the collaboration

An AWS Clean Rooms configured table is a reference to an existing table in the AWS Glue Data Catalog that’s used in the collaboration. It contains an analysis rule that determines how the data can be queried in AWS Clean Rooms. Configured tables can be associated to one or more collaborations.

AWS Clean Rooms offers three types of analysis rules: aggregation, list, and custom.

Aggregation allows you to run queries that generate an aggregate statistic within the privacy guardrails set by each data owner. For example, how large the intersection of two datasets is.
List allows you to run queries that extract the row level list of the intersection of multiple data sets. For example, the overlapped records on two datasets.
Custom allows you to create custom queries and reusable templates using most industry standard SQL, as well as review and approve queries prior to your collaborator running them. For example, authoring an incremental lift query that’s the only query permitted to run on your data tables. You can also use AWS Clean Rooms Differential Privacy by selecting a custom analysis rule and then configuring your differential privacy parameters.

In this POC, CTV.Co uses the custom analysis rule and authors the conversion query. Coffee.Co adds this custom analysis rule to their data table, configuring the table for association to the collaboration. Coffee.Co is running the query, and can only run queries that CTV.Co authors on the collective datasets in this collaboration.

Planned query

Collaborators should define the query that will be run by the collaborator determined to run the queries. In this POC, Coffe.Co runs the custom analysis rule query CTV.Co authored to understand who signed up for their loyalty program after being exposed to an ad on CTV.Co. Coffee.Co can specify their desired time window parameter to analyze when the membership sign-up took place within a specific date range, because that parameter has been enabled in the custom analysis rule query.

Workflow and timeline

To determine the workflow and timeline for setting up the POC, the collaborators should set dates for the following activities.

Coffee.Co and CTV.Co align on business context, success criteria, technical details, and prepare their data tables.
- Example deadline: January 10.
[Optional] Collaborators work to generate representative synthetic datasets for non-production testing prior to production data testing.
- Example deadline: January 15
[Optional] Each collaborator uses synthetic datasets to create an AWS Clean Rooms collaboration between two of their owned AWS non-production accounts and finalizes analysis rules and queries they want to run in production.
- Example deadline: January 30
[Optional] Coffee.Co and CTV.Co create an AWS Clean Rooms collaboration between non-production accounts and tests the analysis rules and queries with the synthetic datasets.
- Example deadline: February 15
Coffee.Co and CTV.Co create a production AWS Clean Rooms collaboration and run the POC queries on production data.
- Example deadline: Feb 28
Evaluate POC results against success criteria to determine when to move to production.
- Example deadline March 15

Conclusion

After you’ve defined the business context and success criteria for the POC, aligned on the technical details, and outlined the workflow and timing, the goal of the POC is to run a successful collaboration using AWS Clean Rooms to validate moving to production. After you’ve validated that the collaboration is ready to move to production, AWS can help you identify and implement automation mechanisms to programmatically run AWS Clean Rooms for your production use cases. Watch this video to learn more about privacy-enhanced collaboration and contact an AWS Representative to learn more about AWS Clean Rooms.

About AWS Clean Rooms

AWS Clean Rooms helps companies and their partners more easily and securely analyze and collaborate on their collective datasets—without sharing or copying one another’s underlying data. With AWS Clean Rooms, customers can create a secure data clean room in minutes, and collaborate with any other company on AWS to generate unique insights about advertising campaigns, investment decisions, and research and development.

Additional resources

About the authors

Shaila Mathias is a Business Development lead for AWS Clean Rooms at Amazon Web Services.

Allison Milone is a Product Marketer for the Advertising & Marketing Industry at Amazon Web Services.

Ryan Malecky is a Senior Solutions Architect at Amazon Web Services. He is focused on helping customers build gain insights from their data, especially with AWS Clean Rooms.

AWS Clean Rooms Differential Privacy enhances privacy protection of your users data (preview)

2023-11-29 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/aws-clean-rooms-differential-privacy-enhances-privacy-protection-of-your-users-data-preview/

Starting today, you can use AWS Clean Rooms Diﬀerential Privacy (preview) to help protect the privacy of your users with mathematically backed and intuitive controls in a few steps. As a fully managed capability of AWS Clean Rooms, no prior differential privacy experience is needed to help you prevent the reidentification of your users.

AWS Clean Rooms Differential Privacy obfuscates the contribution of any individual’s data in generating aggregate insights in collaborations so that you can run a broad range of SQL queries to generate insights about advertising campaigns, investment decisions, clinical research, and more.

Quick overview on differential privacy
Differential privacy is not new. It is a strong, mathematical definition of privacy compatible with statistical and machine learning based analysis, and has been used by the United States Census Bureau as well as companies with vast amounts of data.

Differential privacy helps with a wide variety of use cases involving large datasets, where adding or removing a few individuals has a small impact on the overall result, such as population analyses using count queries, histograms, benchmarking, A/B testing, and machine learning.

The following illustration shows how differential privacy works when it is applied to SQL queries.

When an analyst runs a query, differential privacy adds a carefully calibrated amount of error (also referred to as noise) to query results at run-time, masking the contribution of individuals while still keeping the query results accurate enough to provide meaningful insights. The noise is carefully fine-tuned to mask the presence or absence of any possible individual in the dataset.

Differential privacy also has another component called privacy budget. The privacy budget is a finite resource consumed each time a query is run and thus controls the number of queries that can be run on your datasets, helping ensure that the noise cannot be averaged out to reveal any private information about an individual. When the privacy budget is fully exhausted, no more queries can be run on your tables until it is increased or refreshed.

However, differential privacy is not easy to implement because this technique requires an in-depth understanding of mathematically rigorous formulas and theories to apply it effectively. Configuring differential privacy is also a complex task because customers need to calculate the right level of noise in order to preserve the privacy of their users without negatively impacting the utility of query results.

Customers also want to enable their partners to conduct a wide variety of analyses including highly complex and customized queries on their data. This requirement is hard to support with differential privacy because of the intricate nature of the calculations involved in calibrating the noise while processing various query components such as aggregations, joins, and transformations.

We created AWS Clean Rooms Differential Privacy to help you protect the privacy of your users with mathematically backed controls in a few clicks.

How differential privacy works in AWS Clean Rooms
While differential privacy is quite a sophisticated technique, AWS Clean Rooms Differential Privacy makes it easy for you to apply it and protect the privacy of your users with mathematically backed, flexible, and intuitive controls. You can begin using it with just a few steps after starting or joining an AWS Clean Rooms collaboration as a member with abilities to contribute data.

You create a configured table, which is a reference to your table in the AWS Glue Data Catalog, and choose to turn on differential privacy while adding a custom analysis rule to the configured table.

Next, you associate the configured table to your AWS Clean Rooms collaboration and configure a differential privacy policy in the collaboration to make your table available for querying. You can use a default policy to quickly complete the setup or customize it to meet your specific requirements. As part of this step, you will configure the following:

Privacy budget
Quantified as a value that we call epsilon, the privacy budget controls the level of privacy protection. It is a common, ﬁnite resource that is applied for all of your tables protected with differential privacy in the collaboration because the goal is to preserve the privacy of your users whose information can be present in multiple tables. The privacy budget is consumed every time a query is run on your tables. You have the flexibility to increase the privacy budget value any time during the collaboration and automatically refresh it each calendar month.

Noise added per query
Measured in terms of the number of users whose contributions you want to obscure, this input parameter governs the rate at which the privacy budget is depleted.

In general, you need to balance your privacy needs against the number of queries you want to permit and the accuracy of those queries. AWS Clean Rooms makes it easy for you to complete this step by helping you understand the resulting utility you are providing to your collaboration partner. You can also use the interactive examples to understand how your chosen settings would impact the results for different types of SQL queries.

Now that you have successfully enabled differential privacy protection for your data, let’s see AWS Clean Rooms Differential Privacy in action. For this demo, let’s assume I am your partner in the AWS Clean Rooms collaboration.

Here, I’m running a query to count the number of overlapping customers and the result shows there are 3,227,643 values for tv.customer_id.

Now, if I run the same query again after removing records about an individual from coffee_customers table, it shows a diﬀerent result, 3,227,604 tv.customer_id. This variability in the query results prevents me from identifying the individuals from observing the diﬀerence in query results.

I can also see the impact of differential privacy, including the remaining queries I can run.

Available for preview
Join this preview and start protecting the privacy of your users with AWS Clean Rooms Differential Privacy. During this preview period, you can use AWS Clean Rooms Differential Privacy wherever AWS Clean Rooms is available. To learn more on how to get started, visit the AWS Clean Rooms Differential Privacy page.

Happy collaborating!
— Donnie

AWS Clean Rooms ML helps customers and partners apply ML models without sharing raw data (preview)

2023-11-29 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/aws-clean-rooms-ml-helps-customers-and-partners-apply-ml-models-without-sharing-raw-data-preview/

Today, we’re introducing AWS Clean Rooms ML (preview), a new capability of AWS Clean Rooms that helps you and your partners apply machine learning (ML) models on your collective data without copying or sharing raw data with each other. With this new capability, you can generate predictive insights using ML models while continuing to protect your sensitive data.

During this preview, AWS Clean Rooms ML introduces its ﬁrst model specialized to help companies create lookalike segments for marketing use cases. With AWS Clean Rooms ML lookalike, you can train your own custom model, and you can invite partners to bring a small sample of their records to collaborate and generate an expanded set of similar records while protecting everyone’s underlying data.

In the coming months, AWS Clean Rooms ML will release a healthcare model. This will be the first of many models that AWS Clean Rooms ML will support next year.

AWS Clean Rooms ML helps you to unlock various opportunities for you to generate insights. For example:

Airlines can take signals about loyal customers, collaborate with online booking services, and offer promotions to users with similar characteristics.
Auto lenders and car insurers can identify prospective auto insurance customers who share characteristics with a set of existing lease owners.
Brands and publishers can model lookalike segments of in-market customers and deliver highly relevant advertising experiences.
Research institutions and hospital networks can find candidates similar to existing clinical trial participants to accelerate clinical studies (coming soon).

AWS Clean Rooms ML lookalike modeling helps you apply an AWS managed, ready-to-use model that is trained in each collaboration to generate lookalike datasets in a few clicks, saving months of development work to build, train, tune, and deploy your own model.

How to use AWS Clean Rooms ML to generate predictive insights
Today I will show you how to use lookalike modeling in AWS Clean Rooms ML and assume you have already set up a data collaboration with your partner. If you want to learn how to do that, check out the AWS Clean Rooms Now Generally Available — Collaborate with Your Partners without Sharing Raw Data post.

With your collective data in the AWS Clean Rooms collaboration, you can work with your partners to apply ML lookalike modeling to generate a lookalike segment. It works by taking a small sample of representative records from your data, creating a machine learning (ML) model, then applying the particular model to identify an expanded set of similar records from your business partner’s data.

The following screenshot shows the overall workflow for using AWS Clean Rooms ML.

By using AWS Clean Rooms ML, you don’t need to build complex and time-consuming ML models on your own. AWS Clean Rooms ML trains a custom, private ML model, which saves months of your time while still protecting your data.

Eliminating the need to share data
As ML models are natively built within the service, AWS Clean Rooms ML helps you protect your dataset and customer’s information because you don’t need to share your data to build your ML model.

You can specify the training dataset using the AWS Glue Data Catalog table, which contains user-item interactions.

Under Additional columns to train, you can define numerical and categorical data. This is useful if you need to add more features to your dataset, such as the number of seconds spent watching a video, the topic of an article, or the product category of an e-commerce item.

Applying custom-trained AWS-built models
Once you have defined your training dataset, you can now create a lookalike model. A lookalike model is a machine learning model used to find similar profiles in your partner’s dataset without either party having to share their underlying data with each other.

When creating a lookalike model, you need to specify the training dataset. From a single training dataset, you can create many lookalike models. You also have the flexibility to define the date window in your training dataset using Relative range or Absolute range. This is useful when you have data that is constantly updated within AWS Glue, such as articles read by users.

Easy-to-tune ML models
After you create a lookalike model, you need to configure it to use in AWS Clean Rooms collaboration. AWS Clean Rooms ML provides flexible controls that enable you and your partners to tune the results of the applied ML model to garner predictive insights.

On the Configure lookalike model page, you can choose which Lookalike model you want to use and define the Minimum matching seed size you need. This seed size deﬁnes the minimum number of profiles in your seed data that overlap with profiles in the training data.

You also have the flexibility to choose whether the partner in your collaboration receives metrics in Metrics to share with other members.

With your lookalike models properly configured, you can now make the ML models available for your partners by associating the configured lookalike model with a collaboration.

Creating lookalike segments
Once the lookalike models have been associated, your partners can now start generating insights by selecting Create lookalike segment and choosing the associated lookalike model for your collaboration.

Here on the Create lookalike segment page, your partners need to provide the Seed profiles. Examples of seed profiles include your top customers or all customers who purchased a specific product. The resulting lookalike segment will contain profiles from the training data that are most similar to the profiles from the seed.

Lastly, your partner will get the Relevance metrics as the result of the lookalike segment using the ML models. At this stage, you can use the Score to make a decision.

Export data and use programmatic API
You also have the option to export the lookalike segment data. Once it’s exported, the data is available in JSON format and you can process this output by integrating with AWS Clean Rooms API and your applications.

Join the preview
AWS Clean Rooms ML is now in preview and available via AWS Clean Rooms in US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Seoul, Singapore, Sydney, Tokyo), and Europe (Frankfurt, Ireland, London). Support for additional models is in the works.

Learn how to apply machine learning with your partners without sharing underlying data on the AWS Clean Rooms ML page.

Happy collaborating!
— Donnie

Using Experian identity resolution with AWS Clean Rooms to achieve higher audience activation match rates

2023-09-26 Omar Gonzalez

Post Syndicated from Omar Gonzalez original https://aws.amazon.com/blogs/big-data/using-experian-identity-resolution-with-aws-clean-rooms-to-achieve-higher-audience-activation-match-rates/

This is a guest post co-written with Tyler Middleton, Experian Senior Partner Marketing Manager, and Jay Rakhe, Experian Group Product Manager.

As the data privacy landscape continues to evolve, companies are increasingly seeking ways to collect and manage data while protecting privacy and intellectual property. First party data is more important than ever for companies to understand their customers and improve how they interact with them, such as in digital advertising across channels. Companies are challenged with having a complete view of their customers as they engage with them across different channels and devices, in addition to other third parties that could complement their data to generate rich insights about their customers. This has driven companies to build identity graph solutions or use well-known identity resolution from providers such as Experian. It has also driven companies to grow their first-party consumer-consented data and collaborate with other companies and partners to create better-informed advertising campaigns.

AWS Clean Rooms allows companies to collaborate securely with their partners on their collective datasets without sharing or copying one another’s underlying data. Combining Experian’s identity resolution with AWS Clean Rooms can help you achieve higher match rates with your partners on your collective datasets when you run an AWS Clean Rooms collaboration. You can achieve higher match rates by using Experian’s diverse offline and digital ID database.

In this post, we walk through an example of a retail advertiser collaborating with a connected television (CTV) provider, facilitated by AWS Clean Rooms and Experian. AWS Clean Rooms facilitates a secure collaboration for an audience activation use case.

Use case overview

Retail advertisers recognize the growing consumer behaviors to use streaming TV services over traditional TV channels. Because of this, you may want to use your customer tiering and past purchase history datasets to target your audience in CTV.

The following example advertiser dataset includes the audience to be targeted on the CTV platform.

Advertiser ID	First	Last	Address	City	State	Zip	Customer Tier	LTV	Last Purchase Date
123	Tyler	Smith	4128 Et Street	Franklin	OK	82736	Gold	$823	8/1/21
456	Karleigh	Jones	2588 Nibh Street	Clinton	RI	38947	Gold	$741	2/2/22
984	Alex	Brown	6556 Tincidunt Avenue	Madison	WI	10975	Silver	$231	1/17/22

The following sample CTV provider dataset has email addresses and subscription status.

Email Address	Status
[email protected]	Subscribed
[email protected]	Free Ad Tier
[email protected]	Trial

Experian performs identity resolution on each dataset by matching against Experian’s attributes on 250 million consumers and 126 million households. Experian assigns a unique and synthetic Experian ID referred to as a Living Unit ID (LUID) to each matched record.

The Experian LUIDs for an advertiser and CTV provider are unique per consumer record. For example, LU_ADV_123 in the advertiser table corresponds to LU_CTV_135 in the CTV table. To allow the CTV provider and advertiser to match identities across the datasets, Experian generates a collaboration LUID, as shown in the following figure. This allows a double-blind join to be performed against both tables in AWS Clean Rooms.

Advertiser and CTV Provider Double Blind Join

The following figure illustrates the workflow in our example AWS Clean Rooms collaboration.

Experian identity resolution with AWS Clean Rooms workflow

We walk you through the following high-level steps:

Prepare the data tables with Experian IDs, load the data to Amazon Simple Storage Service (Amazon S3), and catalog the data with AWS Glue.
Associate the configured tables, define the analysis rules, and collaborate with privacy-enhancing controls joining between the Experian LUID encodings using the match table.
Use AWS Clean Rooms to validate that the query conforms to the analysis rules and returns query results that meet all restrictions.

Prepare data tables with Experian IDs, load data to Amazon S3, and catalog data with AWS Glue

First, the advertiser and CTV provider engage with Experian directly to assign Experian LUIDs to their consumer records. During this process, both parties provide identity components to Experian as an input. Experian processes their input data and returns an Experian LUID when a matched identity is found. New and existing Experian customers can start this process by reaching out to Experian Marketing Services.

After the tables are prepared with Experian LUIDs, the advertiser, CTV provider, and Experian join an AWS Clean Rooms collaboration. A collaboration is a secure logical boundary in AWS Clean Rooms in which members perform SQL queries on configured tables. Any participant can create an AWS Clean Rooms collaboration. In this example, the CTV provider has created a collaboration in AWS Clean Rooms and invited the advertiser and Experian to join and contribute data, without sharing their underlying data with each other. The advertiser and Experian will log in to each of their respective AWS accounts and join the collaboration as a member.

The next step is to upload and catalog the data to be queried in AWS Clean Rooms. Each collaborator will upload their dataset to Amazon S3 object storage in their respective accounts. Next, the data is cataloged in the AWS Glue Data Catalog.

Associate the configured tables, define analysis rules, and collaborate with privacy enhancing controls

After the table is cataloged in the AWS Glue Data Catalog, it can be associated with an AWS Clean Rooms configured table. A configured table defines which columns can be used in the collaboration and contains an analysis rule that determines how the data can be queried.

In this step, Experian adds two configured tables that include the collaboration LUIDs that allow the CTV provider and advertiser to match across their datasets.

The advertiser has defined a list analysis rule that allows the CTV provider to run queries that return a row-level list of the collective data. They have also configured their unique Experian advertiser LUIDs as the join keys. In AWS Clean Rooms, join key columns can be used to join datasets, but the values can’t be returned in the result.

{
 "joinColumns": [
   "experian_luid_adv"
 ],
 "listColumns": [
   "ltv",
   "customer_tier"
 ]
}

The CTV provider can perform queries against the datasets. They must duplicate the CTV LUID column to use it as a join key and query dimension, as shown in the following code. This is an important step when configuring a collaboration with Experian as an ID provider.

{
 "joinColumns": [
   "experian_luid_ctv"
 ],
 "listColumns": [
   "experian_luid_ctv_2",
   "sub_status"
 ]
}

Use AWS Clean Rooms to validate the query matches the analysis rule type, expected query structure, and columns and tables defined in the analysis rule

The CTV provider can now perform a SQL query against the datasets using the AWS Clean Rooms console or the AWS Clean Rooms StartProtectedQuery API.

The following sample list query returns the customer tier and LTV (lifetime value) for matched CTV identities:

SELECT DISTINCT ctv.experian_luid_ctv_2,
       ctv.sub_status,
       adv.customer_tier,
       adv.ltv
FROM ctv
   JOIN experian_ctv
       ON ctv.experian_luid_ctv = experian_ctv.experian_luid_ctv
   JOIN experian_adv
       ON experian_ctv.experian_luid_collab = experian_adv.experian_luid_collab
   JOIN adv
       ON experian_adv.experian_luid_adv = adv.experian_luid_adv

The following figure illustrates the results.

AWS Clean Rooms List Query Output

Conclusion

In this post, we showed how a retail advertiser can enrich their data with CTV provider data using Experian in an AWS Clean Rooms collaboration, without sharing or exposing raw data with each other. The advertiser can now use the CTV customer tiering and subscription data to activate specific segments on the CTV platform. For example, if the retail advertiser wants to offer membership to their loyalty program, they can now target their high LTV customers that have a CTV paid subscription. With AWS Clean Rooms, this use case can be expanded further to include additional collaborators to further enrich your data. AWS Clean Rooms partners include identity resolution providers, such as Experian, who can help you more easily join data using Experian identifiers. To learn more about the benefits of Experian identity resolution, refer to Identity resolution solutions. New and existing customers can contact Experian Marketing Services to authorize an AWS Clean Rooms collaboration. Visit the AWS Clean Rooms User Guide to get started using AWS Clean Rooms today.

About the Authors

Omar Gonzalez is a Senior Solutions Architect at Amazon Web Services in Southern California with more than 20 years of experience in IT. He is passionate about helping customers drive business value through the use of technology. Outside of work, he enjoys hiking and spending quality time with his family.

Matt Miller is a Business Development Principal at AWS. In his role, Matt drives customer and partner adoption for the AWS Clean Rooms service specializing in advertising and marketing industry use cases. Matt believes in the primacy of privacy enhanced data collaboration and interoperability underpinning data-driven marketing imperatives from customer experience to addressable advertising. Prior to AWS, Matt led strategy and go-to market efforts for ad technologies, large agencies, and consumer data products purpose-built to inform smarter marketing and deliver better customer experiences.

Enable data collaboration among public health agencies with AWS Clean Rooms – Part 1

2023-06-15 Venkata Kampana

Post Syndicated from Venkata Kampana original https://aws.amazon.com/blogs/big-data/part-1-enable-data-collaboration-among-public-health-agencies-with-aws-clean-rooms/

In this post, we show how you can use AWS Clean Rooms to enable data collaboration between public health agencies. Public health governmental agencies need to understand trends related to a variety of health conditions and care across populations in order to create policies and treatments with the goal of improving the well-being of the various communities they serve.

In order to do this, these agencies need to analyze data from many sources, such as clinical organizations, non-clinical community organizations, and administrative data from other government agencies, so they can identify trends around health conditions and treatments across populations. Public health needs to understand what is happening to populations within the communities they serve.

Because they are looking at populations at risk, they need the flexibility of a line list of cases, stripped of personally identifiable information (PII). With this information, they can assess risk based on a variety of demographic and social factors available in the data sources without divulging PII. The list gives them flexibility to apply more complex analyses, such as regression, on the linked data as well. Programs like MENDS, MDPHnet, and CODI have explored using clinical data in distributed networks to understand the burden of chronic diseases in communities for years. Challenges facing these programs include complex data sharing rules and distributed analytics approaches, across networks of data providers. MENDS and MDPHnet, for example, run analytics at the organization level without deduplicating across sites. Individual queries are pushed to each site where they are processed and reviewed by humans, and combined output is sent to the public health agency.

AWS Clean Rooms offers an opportunity to reduce the burden on data providers in programs like these, while enabling public health agencies to analyze data using their own queries and mitigate risks to data privacy by preventing access to the underlying raw data.

Overview of AWS Clean Rooms

AWS Clean Rooms was first announced at AWS re:Invent 2022, and is now generally available. AWS Clean Rooms allows customers and their partners to more easily and securely collaborate on their collective datasets—without sharing or copying the underlying data with each other. AWS Clean Rooms provides a broad set of privacy-enhancing controls that help protect sensitive data, including query controls, query output restrictions, query logging, and cryptographic computing tools.

With AWS Clean Rooms, you can collaborate and analyze data with other parties in the collaboration without either party having to share or copy the raw data. AWS Clean Rooms is a stateless service; it doesn’t store the data. Instead, it reads the data from where it lives, applies restrictions that protect each participant’s underlying data at query runtime, and returns the results. Queries can be written to intersect and analyze data sources using common metadata elements (for example, geography, shared identifiers, or other demographic factors), generating row-level lists of the overlap between the data sources or aggregated counts by population, condition, or other strata.

AWS Clean Rooms helps public health agencies analyze collective data to gain a more complete view of the health and well-being of their communities, while maintaining the security and privacy of the data.

Solution overview

Before we get started with AWS Clean Rooms, let’s first talk about some of the service’s key concepts:

Collaborations – This is a secure logical boundary in AWS Clean Rooms created by the collaboration creator. When creating the collaboration, the creator can invite additional members to join the collaboration. Invited participants can see the list of collaboration members before they accept the invitation to join the collaboration.
Members – This refers to AWS customers who are participants in a collaboration. All collaboration members can join data; however, only one member can query and receive results per collaboration, and that member is immutable.
Analysis rules – AWS Clean Rooms supports two types of analysis rules:
- Aggregation – Members can run queries that aggregate statistics using COUNT, SUM, or AVG functions along optional dimensions. Aggregation queries won’t reveal row-level data.
- List – Members can run queries that output row-level data of the overlap between two tables.
Configured tables – Members can configure existing AWS Glue tables for use in AWS Clean Rooms. This data is stored in Amazon Simple Storage Service (Amazon S3) in open data formats and cataloged in the AWS Glue Data Catalog. Each configured table contains an analysis rule that determines how the data can be queried. After it’s configured, members can associate the configured table to one or more collaborations.

Getting started with AWS Clean Rooms is a four-step process:

The creator configures a collaboration and invites one or more members to the collaboration.
The invited member joins the collaboration.
Members can configure the existing AWS Glue tables for use in AWS Clean Rooms.
Members with permission to do so can run queries in the collaboration.

Prerequisites

For this walkthrough, you need the following:

An AWS account.
An AWS Identity and Access Management (IAM) user with access to the AWS Management Console.
Datasets uploaded to Amazon S3 and cataloged using AWS Glue. If you want to configure them, refer to Preparing data tables for queries in AWS Clean Rooms.

Create a collaboration and invite one or more members

You must define your collaboration configuration on the AWS Clean Rooms console, via the AWS Command Line Interface (AWS CLI), or with an AWS SDK. We demonstrate how to configure this on the console.

On the AWS Clean Rooms console, choose Create collaboration.
For Name, enter a name (for example, Demo collaboration).
For Description, add an optional description.
In the Members section, add the following members:
1. Member 1 – Enter a member display name (your AWS account ID is automatically populated).
2. Member 2 – Enter a member display name and the AWS account ID for the member you want to invite.
3. Choose Add another member to add more members.
In the Member abilities section, choose one member who will query and receive results.
In the Query logging section, select Support query logging for this collaboration to log the queries in Amazon CloudWatch logs.
Choose Next.
In the Collaboration membership section, select the storage option you prefer for CloudWatch.
Choose Next.
On the Review and create page, choose Create collaboration and membership after reviewing the details to ensure accuracy.

Congratulations on creating your first collaboration! You can see the collaboration details on the Collaborations page.

Join the collaboration

Each collaboration member can log in to AWS Clean Rooms console, review the invitation, and decide to join the collaboration by following these steps:

On the AWS Clean Rooms console, choose Collaborations in the navigation pane.
On the Available to join tab, choose the collaboration you were invited to.

On the details page, you can review the member abilities.

Select your preferred log storage option and choose Create membership.
On the confirmation page, verify that the members listed align with your data sharing agreements, then choose Create membership.

After you create your membership, your member status is changed to Active on the collaboration dashboard.

Configure existing AWS Glue tables for use in AWS Clean Rooms

AWS Clean Rooms doesn’t require you to make a copy of the data because it reads the data from Amazon S3. This eliminates the need to copy and load your data into destinations outside your respective AWS account, or use third-party services to facilitate data sharing.

Each collaboration member can create configured tables, an AWS Clean Rooms resource that contains reference to the AWS Glue Data Catalog with underlying data that defines how that data can be used. The configured table can be used across many collaborations.

On the AWS Clean Rooms console, choose Configured tables in the navigation pane.
Choose Configure new table.
Choose the database to populate the list of AWS Glue tables, and choose the table you want to associate with the collaboration.

For each selected table, you can determine which columns can be accessed in the collaboration.

Select All columns or select Custom list to choose a subset of columns to be available in the collaboration.
Enter a name for the configured table.
Choose Configure new table.

In addition to column-level access controls, AWS Clean Rooms provides fine-grained query controls called analysis rules. With built-in and flexible analysis rules, you can tailor queries to specific business needs. As discussed earlier, AWS Clean Rooms provides two types of analysis rules:

Aggregation analysis rules – These allow queries that aggregate data without revealing row-level information. Available functions include COUNT, SUM, and AVG, along optional dimensions.
List analysis rules – These allow queries that output row-level attribute analyses of the overlap between the tables in the collaboration space.

Both rule types allow data owners to mandate a join between their datasets and the datasets of the collaborator running the query. This limits the results to just their intersection of the collaborators datasets.

On the configured table, choose Configure analysis rule to configure the analysis rules.
For this post, we select List because we want to query patients’ immunization status by joining with immunization data from other contributors.
Select the creation method and select Next.
To define the criteria for the table joins, in the Join controls section, choose the column names appropriate for the join.
To specify which columns will be outputted, identify those in the List controls section.
Choose Next.
Choose Configure analysis rule on the Review and configure page.

You will see the message Successfully configured list analysis rule on the configured tables page.

Choose Associate to collaboration to link this table to the collaboration you created.
Review the details on the Associate table page and choose Associate table.

The collaboration page will display a list of tables that are associated by you to the collaboration.

Each member of the collaboration must repeat the aforementioned steps to associate their AWS Glue Data Catalog tables to the collaboration. For this post, the other members of the collaboration follow these same steps to associate their data to the collaboration. Then the collaboration will list all tables associated by other members.

After defining the analysis rules on the configured tables and associating them to the collaboration, the members who can query and receive results can start writing queries according to the restrictions defined by each participating collaboration member. The following section includes example collaboration queries.

Run queries in the collaboration

The following screenshot is an example of a query that won’t be successful because * is not supported. Column names must be specified in the query.

The following screenshot is an example of a query that won’t be successful because you can’t link columns that members restricted in your joins.

The following screenshot is an example of a query that will be successful because it uses permitted columns (columns that are part of the list analysis rule) in the select clause and join condition.

The sample datasets (Patient and Immunization) used in this post include a unique identifier (patient ID). However, in a real-world scenario, this might not be the case. In those situations, you may consider using privacy-preserving record linkage (PPRL) to create a unique deidentified token. For example, the CDC’s CODI program deduplicates across data owners by obfuscating PII behind each organization’s firewall in a standardized way. That obfuscated information is joined to create a unique deidentified token for each individual that is analyzed across data sources. If public health agencies want to conduct analyses based on individually linked longitudinal data, they could apply PPRL to each data source and use that metadata element to link the data sources in AWS Clean Rooms before conducting their analytics.

Clean up

As part of this walkthrough, you provisioned an AWS Clean Rooms collaboration, invited other members to join the collaboration, and configured tables. To delete these resources, refer to Leaving the collaboration and Disassociating configured tables.

Conclusion

In this post, we showed you how to create a collaboration, invite other members to the collaboration, configure existing AWS Glue Catalog tables, apply analysis rules, and run sample queries on the AWS Clean Rooms console. In Part 2 of this series, we demonstrate how to automate query runs using AWS Lambda, query the results using Amazon Athena, and publish dashboards using Amazon QuickSight.

About the Authors

Dr. Dawn Heisey-Grove is the public health analytics leader for Amazon Web Services’ state and local government team. In this role, she’s responsible for helping state and local public health agencies think creatively about how to achieve their analytics challenges and long-term goals. She’s spent her career finding new ways to use existing or new data to support public health surveillance and research.

Managing data confidentiality for Scope 3 emissions using AWS Clean Rooms

2023-05-31 Sundeep Ramachandran

Post Syndicated from Sundeep Ramachandran original https://aws.amazon.com/blogs/architecture/managing-data-confidentiality-for-scope-3-emissions-using-aws-clean-rooms/

Scope 3 emissions are indirect greenhouse gas emissions that are a result of a company’s activities, but occur outside the company’s direct control or ownership. Measuring these emissions requires collecting data from a wide range of external sources, like raw material suppliers, transportation providers, and other third parties. One of the main challenges with Scope 3 data collection is ensuring data confidentiality when sharing proprietary information between third-party suppliers. Organizations are hesitant to share information that could potentially be used by competitors. This can make it difficult for companies to accurately measure and report on their Scope 3 emissions. And the result is that it limits their ability to manage climate-related impacts and risks.

In this blog, we show how to use AWS Clean Rooms to share Scope 3 emissions data between a reporting company and two of their value chain partners (a raw material purchased goods supplier and a transportation provider). Data confidentially requirements are specified by each organization before participating in the data AWS Clean Rooms collaboration (see Figure 1).

Figure 1. Data confidentiality requirements of reporting company and value chain partners

Each account has confidential data described as follows:

Column 1 lists the raw material Region of origin. This is business confidential information for supplier.
Column 2 lists the emission factors at the raw material level. This is sensitive information for the supplier.
Column 3 lists the mode of transportation. This is business confidential information for the transportation provider.
Column 4 lists the emissions in transporting individual items. This is sensitive information for the transportation provider.
Rows in column 5 list the product recipe at the ingredient level. This is trade secret information for the reporting company.

Overview of solution

In this architecture, AWS Clean Rooms is used to analyze and collaborate on emission datasets without sharing, moving, or revealing underlying data to collaborators (shown in Figure 2).

Figure 2. Architecture for AWS Clean Rooms Scope 3 collaboration

Three AWS accounts are used to demonstrate this approach. The Reporting Account creates a collaboration in AWS Clean Rooms and invites the Purchased Goods Account and Transportation Account to join as members. All accounts can protect their underlying data with privacy-enhancing controls to contribute data directly from Amazon Simple Storage Service (S3) using AWS Glue tables.

The Purchased Goods Account includes users who can update the purchased goods bucket. Similarly, the Transportation Account has users who can update the transportation bucket. The Reporting Account can run SQL queries on the configured tables. AWS Clean Rooms only returns results complying with the analysis rules set by all participating accounts.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Three AWS accounts in the same AWS Region
An Amazon S3 bucket in each account with emissions data (see Figure 1)
An AWS Glue Data Catalog for the emissions data stored in each S3 bucket

Although Amazon S3 and AWS Clean Rooms are free-tier eligible, a low fee applies to AWS Glue. Clean-up actions are provided later in this blog post to minimize costs.

Configuration

We configured the S3 buckets for each AWS account as follows:

Reporting Account: reportingcompany.csv
Purchased Goods Account: purchasedgood.csv
Transportation Account: transportation.csv

Create an AWS Glue Data Catalog for each S3 data source following the method in the Glue Data Catalog Developer Guide. The AWS Glue tables should match the schema detailed previously in Figure 1, for each respective account (see Figure 3).

Figure 3. Configured AWS Glue table for ‘Purchased Goods’

Data consumers can be configured to ingest, analyze, and visualize queries (refer back to Figure 2). We will tag the Reporting Account Glue Database as “reporting-db” and the Glue Table as “reporting.” Likewise, the Purchased Goods Account will have “purchase-db” and “purchase” tags.

Security

Additional actions are recommended to secure each account in a production environment. To configure encryption, review the Further Reading section at the end of this post, AWS Identity and Access Management (IAM) roles, and Amazon CloudWatch.

Walkthrough

This walkthrough consists of four steps:

The Reporting Account creates the AWS Clean Rooms collaboration and invites the Purchased Goods Account and Transportation Account to share data.
The Purchased Goods Account and Transportation Account accepts this invitation.
Rules are applied for each collaboration account restricting how data is shared between AWS Clean Rooms collaboration accounts.
The SQL query is created and run in the Reporting Account.

1. Create the AWS Clean Rooms collaboration in the Reporting Account

(The steps covered in this section require you to be logged into the Reporting Account.)

Navigate to the AWS Clean Rooms console and click Create collaboration.
In the Details section, type “Scope 3 Clean Room Collaboration” in the Name field.
Scroll to the Member 1 section. Enter “Reporting Account” in the Member display name field.
In Member 2 section, enter “Purchased Goods Account” for your first collaboration member name, with their account number in the Member AWS account ID box.
Click Add another member and add “Transportation Account” as the third collaborator with their AWS account number.
Choose the “Reporting Account” as the Member who can query and receive result in the Member abilities section. Click Next.
Select Yes, join by creating membership now. Click Next.
Verify the collaboration settings on the Review and Create page, then select Create and join collaboration and create membership.

Both accounts will then receive an invitation to accept the collaboration (see Figure 4). The console reveals each member status as “Invited” until accepted. Next, we will show how the invited members apply query restrictions on their data.

Figure 4. New collaboration created in AWS Clean Rooms

2. Accept invitations and configure table collaboration rules

Steps in this section are applied to the Purchased Goods Account and Transportation Account following collaboration environment setup. For brevity, we will demonstrate steps using the Purchased Goods Account. Differences for the Transportation Account are noted.

Log in to the AWS account owning the Purchased Goods Account and accept the collaboration invitation.
Open the AWS Clean Rooms console and select Collaborations on the left-hand navigation pane, then click Available to join.
You will see an invitation from the Scope 3 Clean Room Collaboration. Click on Scope 3 Clean Room Collaboration and then Create membership.
Select Tables, then Associate table. Click Configure new table.

The next action is to associate the Glue table created from the purchasedgoods.csv file. This sequence restricts access to the origin_region column (transportation_mode for the Transportation Account table) in the collaboration.

In the Scope 3 Clean Room Collaboration, select Configured tables in the left-hand pane, then Configure new table. Select the AWS Glue table associated with purchasedgoods.csv (shown in Figure 5).
Select the AWS Glue Database (purchase-db) and AWS Glue Table (purchase).
Verify the correct table section by toggling View schema from the AWS Glue slider bar.
In the Columns allowed in collaboration section, select all fields except for origin_region. This action prevents the origin_region column being accessed and viewed in the collaboration.
Complete this step by selecting Configure new table.

Figure 5. Purchased Goods account table configuration

Select Configure analysis rule (see Figure 6).
Select Aggregation type then Next.
Select SUM as the Aggregate function and s3_upstream_purchased_good for the column.
Under Join controls, select Specify Join column. Select “item” from the list of options. This permits SQL join queries to execute on the “item” column. Click Next.

Figure 6. Table rules for the Purchased Goods account

The next page specifies the minimum number of unique rows to aggregate for the “join” command. Select “item” for Column name and “2” for the Minimum number of distinct values. Click Next.
To confirm the table configuration query rules, click Configure analysis rule.
The final step is to click Associate to collaboration and select Scope 3 Clean Room Collaboration in the pulldown menu. Select Associate table after page refresh.

The procedure in this section is repeated for the Transportation Account, with the following exceptions:

The columns shared in this collaboration are item, s3_upstream_transportation, and unit.
The Aggregation function is a SUM applied on the s3_upstream_transportation column.
The item column has an Aggregation constraint minimum of two distinct values.

3. Configure table collaboration rules inside the Reporting Account

At this stage, member account tables are created and shared in the collaboration. The next step is to configure the Reporting Account tables in the Reporting Account’s AWS account.

Navigate to AWS Clean Rooms. Select Configured tables, then Configure new table.
Select the Glue database and table associated with the file reportingcompany.csv.
Under Columns allowed in collaboration, select All columns, then Configure new table.
Configure collaboration rules by clicking Configure analysis rule using the Guided workflow.
Select Aggregation type, then Next.
Select SUM as the Aggregate function and ingredient for the column (see Figure 7).
Only SQL join queries can be executed on the ingredient column by selecting it in the Specify join columns section.
In the Dimension controls, select product. This option permits grouping by product name in the SQL query. Select Next.
Select None in the Scalar functions section. Click Next. Read more about scalar functions in the AWS Clean Rooms User Guide.

Figure 7. Table rules for the Reporting account

On the next page, select ingredient for Column name and 2 for the Minimum number of distinct values. Click Next. To confirm query control submission, select Configure analysis rule on the next page.
Validate the setting in the Review and Configure window, then select Next.
Inside the Configured tables tab, select Associate to collaboration. Assign the table to the Scope 3 Clean Rooms Collaboration.
Select the Scope 3 Clean Room Collaboration in the dropdown menu. Select Choose collaboration.
On the Scope 3 Clean Room Collaboration page, select reporting, then Associate table.

4. Create and run the SQL query

Queries can now be run inside the Reporting Account (shown in Figure 8).

Figure 8. Query results in the Clean Rooms Reporting Account

Select an S3 destination to output the query results. Select Action, then Set results settings.
Enter the S3 bucket name, then click Save changes.
Paste this SQL snippet inside the query text editor (see Figure 8):

SELECT
r.product AS “Product”,
SUM(p.s3_upstream_purchased_good) AS “Scope_3_Purchased_Goods_Emissions”,
SUM(t.s3_upstream_transportation) AS “Scope_3_Transportation_Emissions”
FROM
reporting r
INNER JOIN purchase p ON r.ingredient = p.item
INNER JOIN transportation t ON p.item = t.item
GROUP BY
r.product

Click Run query. The query results should appear after a few minutes on the initial query, but will take less time for subsequent queries.

Conclusion

This example shows how Clean Rooms can aggregate data across collaborators to produce total Scope 3 emissions for each product from purchased goods and transportation. This query was performed between three organizations without revealing underlying emission factors or proprietary product recipe to one another. This alleviates data confidentially concerns and improves sustainability reporting transparency.

Clean Up

The following steps are taken to clean up all resources created in this walkthrough:

Member and Collaboration Accounts:
1. AWS Clean Rooms: Disassociate and delete collaboration tables
2. AWS Clean Rooms: Remove member account in the collaboration
3. AWS Glue: Delete the crawler, database, and tables
4. AWS IAM: Delete the AWS Clean Rooms service policy
5. Amazon S3: Delete the CSV file storage buckets
  ·
Collaboration Account only:

1. Amazon S3: delete the SQL query bucket
2. AWS Clean Rooms: delete the Scope 3 Clean Room Collaboration

Further Reading:

Security Practices

AWS Clean Rooms security practices
Create IAM Roles to enforce S3 and Clean Rooms access policies through least privilege permissions
Apply encryption rules on S3 buckets with key rotation
Use CloudTrail to record AWS Clean Rooms API calls
Encrypt Glue tables using KMS keys

Share and query encrypted data in AWS Clean Rooms

2023-05-16 Jonathan Herzog

Post Syndicated from Jonathan Herzog original https://aws.amazon.com/blogs/security/share-and-query-encrypted-data-in-aws-clean-rooms/

In this post, we’d like to introduce you to the cryptographic computing feature of AWS Clean Rooms. With AWS Clean Rooms, customers can run collaborative data-query sessions on sensitive data sets that live in different AWS accounts, and can do so without having to share, aggregate, or replicate the data. When customers also use the cryptographic computing feature, their data remains cryptographically protected even while it is being processed by an AWS Clean Rooms collaboration.

Where would AWS Clean Rooms be useful? Consider a scenario where two different insurance companies want to identify duplicate claims so they can identify potential fraud. This would be simple if they could compare their claims with each other, but they might not be able to do so due to privacy constraints.

Alternately, consider an advertising network and a client that want to measure the effectiveness of an advertising campaign. To that end, they would like to know how many of the people who saw the campaign (exposures) went on to make a purchase from the client (purchasers). However, confidentiality concerns might prevent the advertising network from sharing their list of exposures with the client or prevent the client from sharing their list of purchasers with the advertising network.

As these examples show, there can be many situations in which different organizations want to collaborate on a joint analysis of their pooled data, but cannot share their individual datasets directly. One solution to this problem is a data clean room, which is a service trusted by a collaboration’s participants to do the following:

Hold the data of individual parties
Enforce access-control rules that collaborators specify regarding their data
Perform analyses over the pooled data

To serve customers with these needs, AWS recently launched a new data clean-room service called AWS Clean Rooms. This service provides AWS customers with a way to collaboratively analyze data (stored in other AWS services as SQL tables) without having to replicate the data, move the data outside of the AWS Cloud, or allow their collaborators to see the data itself.

Additionally, AWS Clean Rooms provides a feature that gives customers even more control over their data: cryptographic computing. This feature allows AWS Clean Rooms to operate over data that customers encrypt themselves and that the service cannot actually read. Specifically, customers can use this feature to select which portions of their data should be encrypted and to encrypt that data themselves. Collaborators can continue to analyze that data as if it were in the clear, however, even though the data in question remains encrypted while it is being processed in AWS Clean Rooms collaborations. In this way, customers can use AWS Clean Rooms to securely collaborate on data they may not have been able to share due to internal policies or regulations.

Cryptographic computing

Using the cryptographic computing feature of AWS Clean Rooms involves these steps:

Users create AWS Clean Rooms collaborations and set collaboration-wide encryption settings. They then invite collaborators to support the analysis process.
Outside of AWS Clean Rooms, those collaborators agree on a shared secret: a common, secret, cryptographic key.
Collaborators individually encrypt their tables outside of the AWS Cloud (typically on their own premises) using the shared secret, the collaboration ID of the intended collaboration, and the Cryptographic Computing for Clean Rooms (C3R) encryption client (which AWS provides as an open-source package). Collaborators then provide the encrypted tables to AWS Clean Rooms, just as they would have provided plaintext tables.
Collaborators continue to use AWS Clean Rooms for their data analysis. They impose access-control rules on their tables, submit SQL queries over the tables in the collaboration, and retrieve results.
These results might contain encrypted columns, and so collaborators decrypt the results by using the shared secret and the C3R encryption client.

As a result, data that enters AWS Clean Rooms in encrypted format will remain encrypted from input tables to intermediate values to result sets. AWS Clean Rooms will be unable to decrypt or read the data even while performing the queries.

Note: For those interested in the academic aspects of this process, the cryptographic computing feature of AWS Clean Rooms is based on server-aided private set intersection (PSI). Server-aided PSI allows two or more participants to submit sets of values to a server and learn which elements are found in all sets, but without (1) allowing the participants to learn anything about the other (non-shared) elements, or (2) allowing the server to learn anything about the underlying data (aside from the degrees to which the sets overlap). PSI is just one example of the field of cryptographic computing, which provides a variety of new methods by which encrypted data can be processed for various purposes and without decryption. These techniques allow our customers to use the scale and power of AWS systems on data that AWS will not be able to read. See our Cryptographic Computing webpage for more about our work in this area.

Let’s dive deeper into each new step in the process for using cryptographic computing in AWS Clean Rooms.

Key agreement. Each collaboration needs its own shared secret: a secure cryptographic secret (of at least 256 bits). Customers sometimes have a regulatory need to maintain ownership of their encryption keys. Therefore, the cryptographic computing feature supports the case where customers generate, distribute, and store their collaboration’s secret themselves. In this way, customers’ encryption keys are never stored on an AWS system.

Encryption. AWS Clean Rooms allows table owners to control how tables are encrypted on a column-by-column basis. In particular, each column in an encrypted table will be one of three types: cleartext, sealed, or fingerprint. These types map directly to both how columns are used in queries and how they are protected with cryptography, described as follows:

Cleartext columns are not cryptographically processed at all. They are copied to encrypted tables verbatim, and can be used anywhere in a SQL query.
Sealed columns are encrypted. The encryption scheme used (AES-GCM) is randomized, meaning that encrypting the same value multiple times yields different ciphertexts each time. This helps prevent the statistical analysis of these columns, but also means that these columns cannot be used in JOIN clauses. They can be used in SELECT clauses, however, which allows them to appear in query results.
Fingerprint columns are hashed using the Hash-based Message Authentication Code (HMAC) algorithm. There is no way to decrypt these values, and therefore no reason for them to appear in the SELECT clause of a query. They can, however, be used in JOIN clauses: HMAC will map a given value to the same fingerprint every time, meaning that JOINs will be able to unify common values across different fingerprint columns.

Encryption settings. This last point—that fingerprint values will always map a given plaintext value to the same fingerprint—might give pause to some readers. If this is true, won’t the encrypted table be vulnerable to statistical analysis? That is absolutely correct: it will. For this reason, users might wish to set collaboration-wide encryption settings to control these forms of analysis.

To see how statistical analysis might be a concern, imagine a table where one fingerprint column is named US_State. In this case, a simple frequency analysis will reverse-engineer the plaintext values relatively quickly: the most common fingerprint is almost certain to be “California”, followed by “Texas”, “Florida”, and so on. Also, imagine that the same table has another fingerprint column called US_City, and that a given fingerprint appears in both columns. In that case, the fingerprint in question is almost certain to be “New York”. If a row has a fingerprint in the US_City column but a NULL in the US_State column, furthermore, it’s very likely that the fingerprint is for “District of Columbia”. And finally, imagine that the table has a cleartext column called Time_Zone. In this case, values of “HST” (Hawaii standard time) or “AKST” (Alaska standard time) reveal the value in the US_State column regardless of the cryptography.

Not all datasets will be vulnerable to these kinds of statistical analysis, but some will. Only customers can determine which types of analysis may reveal their data and which may not. Because of this, the cryptographic computing feature allows the customer to decide which protections will be needed. At the time of collaboration creation, that is, the creator of the AWS Clean Rooms collaboration can configure the following collaboration-wide encryption settings:

Whether or not fingerprint columns can contain duplicate plaintext values (addressing the “California” example)
Whether or not fingerprint columns with different names should fingerprint values in the same way (addressing the “New York” example)
Whether or not NULL values in the plaintext table should be left as NULL in the encrypted table (addressing the “District of Columbia” example)
Whether or not encrypted tables should be allowed to have cleartext columns at all (addressing the time zone example)

Security is maximized when all of these options are set to “no,” but each “no” will limit the queries that C3R will be able to support. For example, the choice of whether or not encrypted tables should be allowed to have cleartext columns will determine which WHERE clauses will be supported: If cleartext columns are not supported, then the Time_Zone column must be cryptographically processed — meaning that the clause WHERE Time_Zone=”EST” will not act as intended. There might be reasons to set these options to “yes” in order to enable a wider variety of queries, which we discuss in the Query behavior section later in this post.

Decryption. AWS Clean Rooms will write query results to an Amazon Simple Storage Service (Amazon S3) bucket. The recipient copies these results from the bucket to some on-premises storage and then runs the C3R encryption client. The client will find encrypted elements of the output and decrypt them. Note that the client can only decrypt elements from sealed columns. If the output contains elements from a fingerprint column, the client will warn you, but will also leave these elements untouched, as cryptographic fingerprints can’t be decrypted.

Having finished our overview, let’s return to the discussion regarding how encryption can affect the behavior of queries.

Query behavior

Implicit in the discussion so far is something worth calling out explicitly: AWS Clean Rooms runs queries over the data that is provided to it. If the data given to AWS Clean Rooms is encrypted, therefore, queries will be run on the ciphertexts and not the plaintexts. This will not affect the results returned, so long as the columns are used for their intended purposes:

Fingerprint columns are used in JOIN clauses
Sealed columns are used in SELECT clauses

(Cleartext columns can be used anywhere.) Queries might produce unexpected results, however, if the columns are used outside of their intended purposes:

Sometimes queries will fail when they would have succeeded on the plaintext. For example, ciphertexts and fingerprints will be string values, even if the original plaintext values were another type. Therefore, SUM() or AVG() calls on fingerprint or sealed columns will yield errors even if the corresponding plaintext columns were numeric.
Sometimes queries will omit results that would have been found by querying the plaintext. For example, attempting to JOIN on sealed columns will yield empty result sets: no two ciphertexts will be the same, even if they encrypt the same plaintext value. (Also, performing a JOIN on fingerprint columns with different names will exhibit the same behavior, if the collaboration-wide encryption settings specified that fingerprint columns of different names should fingerprint values differently.)
Sometimes results will include rows that would not be found by querying the plaintext. As mentioned, ciphertexts and fingerprints will be string values—base64 encodings of random-looking bytes, specifically. This means that a clause such as WHERE ‘US_State’ CONTAINS ‘CA’ will match some ciphertexts or fingerprints even when they would not match the plaintext.

To avoid these issues, fingerprint and sealed columns should only be used for their intended purposes (JOIN and SELECT clauses, respectively).

Conclusion

In this blog post, you have learned how AWS Clean Rooms can help you harness the power of AWS services to query and analyze your most-sensitive data. By using cryptographic computing, you can work with collaborators to perform joint analyses over pooled data without sharing your “raw” data with each other—or with AWS. If you believe that you can benefit from cryptographic computing (in AWS Clean Rooms or elsewhere), we’d like to hear from you. Please contact us with any questions or feedback. Also, we invite you to learn more about AWS Clean Rooms (including its use of cryptographic computing). Finally, the C3R client is open source, and can be downloaded from its GitHub page.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Top 2022 AWS data protection service and cryptography tool launches

2023-02-21 Marta Taggart

Post Syndicated from Marta Taggart original https://aws.amazon.com/blogs/security/top-2022-aws-data-protection-service-and-cryptography-tool-launches/

Given the pace of Amazon Web Services (AWS) innovation, it can be challenging to stay up to date on the latest AWS service and feature launches. AWS provides services and tools to help you protect your data, accounts, and workloads from unauthorized access. AWS data protection services provide encryption capabilities, key management, and sensitive data discovery. Last year, we saw growth and evolution in AWS data protection services as we continue to give customers features and controls to help meet their needs. Protecting data in the AWS Cloud is a top priority because we know you trust us to help protect your most critical and sensitive asset: your data. This post will highlight some of the key AWS data protection launches in the last year that security professionals should be aware of.

AWS Key Management Service
Create and control keys to encrypt or digitally sign your data

In April, AWS Key Management Service (AWS KMS) launched hash-based message authentication code (HMAC) APIs. This feature introduced the ability to create AWS KMS keys that can be used to generate and verify HMACs. HMACs are a powerful cryptographic building block that incorporate symmetric key material within a hash function to create a unique keyed message authentication code. HMACs provide a fast way to tokenize or sign data such as web API requests, credit card numbers, bank routing information, or personally identifiable information (PII). This technology is used to verify the integrity and authenticity of data and communications. HMACs are often a higher performing alternative to asymmetric cryptographic methods like RSA or elliptic curve cryptography (ECC) and should be used when both message senders and recipients can use AWS KMS.

At AWS re:Invent in November, AWS KMS introduced the External Key Store (XKS), a new feature for customers who want to protect their data with encryption keys that are stored in an external key management system under their control. This capability brings new flexibility for customers to encrypt or decrypt data with cryptographic keys, independent authorization, and audit in an external key management system outside of AWS. XKS can help you address your compliance needs where encryption keys for regulated workloads must be outside AWS and solely under your control. To provide customers with a broad range of external key manager options, AWS KMS developed the XKS specification with feedback from leading key management and hardware security module (HSM) manufacturers as well as service providers that can help customers deploy and integrate XKS into their AWS projects.

AWS Nitro System
A combination of dedicated hardware and a lightweight hypervisor enabling faster innovation and enhanced security

In November, we published The Security Design of the AWS Nitro System whitepaper. The AWS Nitro System is a combination of purpose-built server designs, data processors, system management components, and specialized firmware that serves as the underlying virtualization technology that powers all Amazon Elastic Compute Cloud (Amazon EC2) instances launched since early 2018. This new whitepaper provides you with a detailed design document that covers the inner workings of the AWS Nitro System and how it is used to help secure your most critical workloads. The whitepaper discusses the security properties of the Nitro System, provides a deeper look into how it is designed to eliminate the possibility of AWS operator access to a customer’s EC2 instances, and describes its passive communications design and its change management process. Finally, the paper surveys important aspects of the overall system design of Amazon EC2 that provide mitigations against potential side-channel vulnerabilities that can exist in generic compute environments.

AWS Secrets Manager
Centrally manage the lifecycle of secrets

In February, AWS Secrets Manager added the ability to schedule secret rotations within specific time windows. Previously, Secrets Manager supported automated rotation of secrets within the last 24 hours of a specified rotation interval. This new feature added the ability to limit a given secret rotation to specific hours on specific days of a rotation interval. This helps you avoid having to choose between the convenience of managed rotations and the operational safety of application maintenance windows. In November, Secrets Manager also added the capability to rotate secrets as often as every four hours, while providing the same managed rotation experience.

In May, Secrets Manager started publishing secrets usage metrics to Amazon CloudWatch. With this feature, you have a streamlined way to view how many secrets you are using in Secrets Manager over time. You can also set alarms for an unexpected increase or decrease in number of secrets.

At the end of December, Secrets Manager added support for managed credential rotation for service-linked secrets. This feature helps eliminate the need for you to manage rotation Lambda functions and enables you to set up rotation without additional configuration. Amazon Relational Database Service (Amazon RDS) has integrated with this feature to streamline how you manage your master user password for your RDS database instances. Using this feature can improve your database’s security by preventing the RDS master user password from being visible during the database creation workflow. Amazon RDS fully manages the master user password’s lifecycle and stores it in Secrets Manager whenever your RDS database instances are created, modified, or restored. To learn more about how to use this feature, see Improve security of Amazon RDS master database credentials using AWS Secrets Manager.

AWS Private Certificate Authority
Create private certificates to identify resources and protect data

In September, AWS Private Certificate Authority (AWS Private CA) launched as a standalone service. AWS Private CA was previously a feature of AWS Certificate Manager (ACM). One goal of this launch was to help customers differentiate between ACM and AWS Private CA. ACM and AWS Private CA have distinct roles in the process of creating and managing the digital certificates used to identify resources and secure network communications over the internet, in the cloud, and on private networks. This launch coincided with the launch of an updated console for AWS Private CA, which includes accessibility improvements to enhance screen reader support and additional tab key navigation for people with motor impairment.

In October, AWS Private CA introduced a short-lived certificate mode, a lower-cost mode of AWS Private CA that is designed for issuing short-lived certificates. With this new mode, public key infrastructure (PKI) administrators, builders, and developers can save money when issuing certificates where a validity period of 7 days or fewer is desired. To learn more about how to use this feature, see How to use AWS Private Certificate Authority short-lived certificate mode.

Additionally, AWS Private CA supported the launches of certificate-based authentication with Amazon AppStream 2.0 and Amazon WorkSpaces to remove the logon prompt for the Active Directory domain password. AppStream 2.0 and WorkSpaces certificate-based authentication integrates with AWS Private CA to automatically issue short-lived certificates when users sign in to their sessions. When you configure your private CA as a third-party root CA in Active Directory or as a subordinate to your Active Directory Certificate Services enterprise CA, AppStream 2.0 or WorkSpaces with AWS Private CA can enable rapid deployment of end-user certificates to seamlessly authenticate users. To learn more about how to use this feature, see How to use AWS Private Certificate Authority short-lived certificate mode.

AWS Certificate Manager
Provision and manage SSL/TLS certificates with AWS services and connected resources

In early November, ACM launched the ability to request and use Elliptic Curve Digital Signature Algorithm (ECDSA) P-256 and P-384 TLS certificates to help secure your network traffic. You can use ACM to request ECDSA certificates and associate the certificates with AWS services like Application Load Balancer or Amazon CloudFront. Previously, you could only request certificates with an RSA 2048 key algorithm from ACM. Now, AWS customers who need to use TLS certificates with at least 120-bit security strength can use these ECDSA certificates to help meet their compliance needs. The ECDSA certificates have a higher security strength—128 bits for P-256 certificates and 192 bits for P-384 certificates—when compared to 112-bit RSA 2048 certificates that you can also issue from ACM. The smaller file footprint of ECDSA certificates makes them ideal for use cases with limited processing capacity, such as small Internet of Things (IoT) devices.

Amazon Macie
Discover and protect your sensitive data at scale

Amazon Macie introduced two major features at AWS re:Invent. The first is a new capability that allows for one-click, temporary retrieval of up to 10 samples of sensitive data found in Amazon Simple Storage Service (Amazon S3). With this new capability, you can more readily view and understand which contents of an S3 object were identified as sensitive, so you can review, validate, and quickly take action as needed without having to review every object that a Macie job returned. Sensitive data samples captured with this new capability are encrypted by using customer-managed AWS KMS keys and are temporarily viewable within the Amazon Macie console after retrieval.

Additionally, Amazon Macie introduced automated sensitive data discovery, a new feature that provides continual, cost-efficient, organization-wide visibility into where sensitive data resides across your Amazon S3 estate. With this capability, Macie automatically samples and analyzes objects across your S3 buckets, inspecting them for sensitive data such as personally identifiable information (PII) and financial data; builds an interactive data map of where your sensitive data in S3 resides across accounts; and provides a sensitivity score for each bucket. Macie uses multiple automated techniques, including resource clustering by attributes such as bucket name, file types, and prefixes, to minimize the data scanning needed to uncover sensitive data in your S3 buckets. This helps you continuously identify and remediate data security risks without manual configuration and lowers the cost to monitor for and respond to data security risks.

Support for new open source encryption libraries

In February, we announced the availability of s2n-quic, an open source Rust implementation of the QUIC protocol, in our AWS encryption open source libraries. QUIC is a transport layer network protocol used by many web services to provide lower latencies than classic TCP. AWS has long supported open source encryption libraries for network protocols; in 2015 we introduced s2n-tls as a library for implementing TLS over HTTP. The name s2n is short for signal to noise and is a nod to the act of encryption—disguising meaningful signals, like your critical data, as seemingly random noise. Similar to s2n-tls, s2n-quic is designed to be small and fast, with simplicity as a priority. It is written in Rust, so it has some of the benefits of that programming language, such as performance, threads, and memory safety.

Cryptographic computing for AWS Clean Rooms (preview)

At re:Invent, we also announced AWS Clean Rooms, currently in preview, which includes a cryptographic computing feature that allows you to run a subset of queries on encrypted data. Clean rooms help customers and their partners to match, analyze, and collaborate on their combined datasets—without sharing or revealing underlying data. If you have data handling policies that require encryption of sensitive data, you can pre-encrypt your data by using a common collaboration-specific encryption key so that data is encrypted even when queries are run. With cryptographic computing, data that is used in collaborative computations remains encrypted at rest, in transit, and in use (while being processed).

If you’re looking for more opportunities to learn about AWS security services, read our AWS re:Invent 2022 Security recap post or watch the Security, Identity, and Compliance playlist.

Looking ahead in 2023

With AWS, you control your data by using powerful AWS services and tools to determine where your data is stored, how it is secured, and who has access to it. In 2023, we will further the AWS Digital Sovereignty Pledge, our commitment to offering AWS customers the most advanced set of sovereignty controls and features available in the cloud.

You can join us at our security learning conference, AWS re:Inforce 2023, in Anaheim, CA, June 13–14, for the latest advancements in AWS security, compliance, identity, and privacy solutions.

Stay updated on launches by subscribing to the AWS What’s New RSS feed and reading the AWS Security Blog.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Solution overview

Prerequisites

Launch the CloudFormation stack

Test the solution

Visualize the data with QuickSight

Grant QuickSight access to Athena and your S3 bucket

Create your datasets and publish visuals

Clean up

Conclusion

About AWS Clean Rooms

Additional resources

About the Authors

Overview

Business context and success criteria for the POC

Define the use case to be tested

Success criteria

Technical choices for the POC

Determine the collaboration creator, AWS account IDs, query runner, payor and results receiver

Query logging setting

Decide the AWS region

Join keys

Data schema

Analysis rule applied to each configured table associated to the collaboration

Planned query

Workflow and timeline

Conclusion

About AWS Clean Rooms

Additional resources

About the authors

Use case overview

Prepare data tables with Experian IDs, load data to Amazon S3, and catalog data with AWS Glue

Associate the configured tables, define analysis rules, and collaborate with privacy enhancing controls

Use AWS Clean Rooms to validate the query matches the analysis rule type, expected query structure, and columns and tables defined in the analysis rule

Conclusion

About the Authors

Overview of AWS Clean Rooms

Solution overview

Prerequisites

Create a collaboration and invite one or more members

Join the collaboration

Configure existing AWS Glue tables for use in AWS Clean Rooms

Run queries in the collaboration

Clean up

Conclusion

About the Authors

Overview of solution

Prerequisites

Configuration

Walkthrough

1. Create the AWS Clean Rooms collaboration in the Reporting Account

2. Accept invitations and configure table collaboration rules

3. Configure table collaboration rules inside the Reporting Account

4. Create and run the SQL query

Conclusion

Clean Up

Cryptographic computing

Query behavior

Conclusion

Support for new open source encryption libraries

Cryptographic computing for AWS Clean Rooms (preview)

Looking ahead in 2023

The collective thoughts of the interwebz