Post Syndicated from Rajkumar Irudayaraj original https://aws.amazon.com/blogs/big-data/harness-zero-copy-data-sharing-from-salesforce-data-cloud-to-amazon-redshift-for-unified-analytics-part-2/
In the era of digital transformation and data-driven decision making, organizations must rapidly harness insights from their data to deliver exceptional customer experiences and gain competitive advantage. Salesforce and Amazon have collaborated to help customers unlock value from unified data and accelerate time to insights with bidirectional Zero Copy data sharing between Salesforce Data Cloud and Amazon Redshift.
In the Part 1 of this series, we discussed how to configure data sharing between Salesforce Data Cloud and customers’ AWS accounts in the same AWS Region. In this post, we discuss the architecture and implementation details of cross-Region data sharing between Salesforce Data Cloud and customers’ AWS accounts.
Solution overview
Salesforce Data Cloud provides a point-and-click experience to share data with a customer’s AWS account. On the AWS Lake Formation console, you can accept the datashare, create the resource link, mount Salesforce Data Cloud objects as data catalog views, and grant permissions to query the live and unified data in Amazon Redshift. Cross-Region data sharing between Salesforce Data Cloud and a customer’s AWS accounts is supported for two deployment scenarios: Amazon Redshift Serverless and Redshift provisioned clusters (RA3).
Cross-Region data sharing with Redshift Serverless
The following architecture diagram depicts the steps for setting up a cross-Region datashare between a Data Cloud instance in US-WEST-2
with Redshift Serverless in US-EAST-1
.
Cross-Region data sharing set up consists of the following steps:
- The Data Cloud admin identifies the objects to be shared and creates a Data Share in the data cloud provisioned in the
US-WEST-2
- The Data Cloud admin links the Data Share with the Amazon Redshift Data Share target. This creates an AWS Glue Data Catalog view and a cross-account Lake Formation resource share using the AWS Resource Access Manager (RAM) with the customer’s AWS account in
US-WEST-2
. - The customer’s Lake Formation admin accepts the datashare invitation in
US-WEST-2
from the Lake Formation console and grants default (select and describe) permissions to an AWS Identity and Access Management (IAM) principal. - The Lake Formation admin switches to
US-EAST-1
and creates a resource link pointing to the shared database in theUS-WEST-2
Region. - The IAM principal can log in to the Amazon Redshift query editor in
US-EAST-1
and creates an external schema referencing the datashare resource link. The data can be queried through these external tables.
Cross-Region data sharing with a Redshift provisioned cluster
Cross-Region data sharing across Salesforce Data Cloud and a Redshift provisioned cluster requires additional steps on top of the Serverless set up. Based on the Amazon Redshift Spectrum considerations, the provisioned cluster and the Amazon Simple Storage Service (Amazon S3) bucket must be in the same Region for Redshift external tables. The following architecture depicts a design pattern and steps to share data with Redshift provisioned clusters.
Steps 1–5 in the set up remain the same across Redshift Serverless and provisioned cluster cross-Region sharing. Encryption must be enabled on both Redshift Serverless and the provisioned cluster. Listed below are the additional steps:
- Create a table from datashare data with the
CREATE TABLE AS SELECT
Create a datashare in Redshift serverless and grant access to the Redshift provisioned cluster. - Create a database in the Redshift provisioned cluster and grant access to the target IAM principals. The datashare is ready for query.
The new table needs to be refreshed periodically to get the latest data from the shared Data Cloud objects with this solution.
Considerations when using data sharing in Amazon Redshift
For a comprehensive list of considerations and limitations of data sharing, refer to Considerations when using data sharing in Amazon Redshift. Some of the important ones for Zero Copy data sharing includes:
- Data sharing is supported for all provisioned RA3 instance types (ra3.16xlarge, ra3.4xlarge, and ra3.xlplus) and Redshift Serverless. It isn’t supported for clusters with DC and DS node types.
- For cross-account and cross-Region data sharing, both the producer and consumer clusters and serverless namespaces must be encrypted. However, they don’t need to share the same encryption key.
- Data Catalog multi-engine views are generally available in commercial Regions where Lake Formation, the Data Catalog, Amazon Redshift, and Amazon Athena are available.
- Cross-Region sharing is available in all LakeFormation supported regions.
Prerequisites
The prerequisites remain the same across same-Region and cross-Region data sharing, which are required before proceeding with the setup.
Configure cross-Region data sharing
The steps to create a datashare, create a datashare target, link the datashare target to the datashare, and accept the datashare in Lake Formation remain the same across same-Region and cross-Region data sharing. Refer to Part 1 of this series to complete the setup.
Cross-Region data sharing with Redshift Serverless
If you’re using Redshift Serverless, complete the following steps:
- On the Lake Formation console, choose Databases in the navigation pane.
- Choose Create database.
- Under Database details¸ select Resource link.
- For Resource link name, enter a name for the resource link.
- For Shared database’s region, choose the Data Catalog view source Region.
- The Shared database and Shared database’s owner ID fields are populated manually from the database metadata.
- Choose Create to complete the setup.
The resource link appears on the Databases page on the Lake Formation console, as shown in the following screenshot.
- Launch Redshift Query Editor v2 for the Redshift Serverless workspace The cross-region data share tables are auto-mounted and appear under
awsdatacatalog
. To query, run the following command and create an external schema. Specify the resource link as the Data Catalog database, the Redshift Serverless Region, and the AWS account ID. - Refresh the schemas to view the external schema created in the
dev
database
- Run the
show tables
command to check the shared objects under the external database: - Query the datashare as shown in the following screenshot.
Cross-Region data sharing with Redshift provisioned cluster
This section is a continuation of the previous section with additional steps needed for data sharing to work when the consumer is a provisioned Redshift cluster. Refer to Sharing data in Amazon Redshift and Sharing datashares for a deeper understanding of concepts and the implementation steps.
- Create a new schema and table in the Redshift Serverless in the consumer Region:
- Get the namespace for the Redshift Serverless (producer) and Redshift provisioned cluster (consumer) by running the following query in each cluster:
- Create a datashare in the Redshift Serverless (producer) and grant usage to the Redshift provisioned cluster (consumer). Set the datashare, schema, and table names to the appropriate values, and set the namespace to the consumer namespace.
- Log in as a superuser in the Redshift provisioned cluster, create a database from the datashare, and grant permissions. Refer to managing permissions for Amazon Redshift datashare for detailed guidance.
The datashare is now ready for query.
You can periodically refresh the table you created to get the latest data from the data cloud based on your business requirement.
Conclusion
Zero Copy data sharing between Salesforce Data Cloud and Amazon Redshift represents a significant advancement in how organizations can use their customer 360 data. By eliminating the need for data movement, this approach offers real-time insights, reduced costs, and enhanced security. As businesses continue to prioritize data-driven decision-making, Zero Copy data sharing will play a crucial role in unlocking the full potential of customer data across platforms.
This integration empowers organizations to break down data silos, accelerate analytics, and drive more agile customer-centric strategies. To learn more, refer to the following resources:
- Sharing data in Amazon Redshift
- Sharing datashares
- Data sharing in AWS Lake Formation
- Building AWS Glue Data Catalog views
- Lake Formation supported regions
- Transform Your Data Strategy with the Power of Salesforce Data Cloud’s Zero Copy Integration to Amazon Redshift
About the Authors
Rajkumar Irudayaraj is a Senior Product Director at Salesforce with over 20 years of experience in data platforms and services, with a passion for delivering data-powered experiences to customers.
Sriram Sethuraman is a Senior Manager in Salesforce Data Cloud product management. He has been building products for over 9 years using big data technologies. In his current role at Salesforce, Sriram works on Zero Copy integration with major data lake partners and helps customers deliver value with their data strategies.
Jason Berkowitz is a Senior Product Manager with AWS Lake Formation. He comes from a background in machine learning and data lake architectures. He helps customers become data-driven.
Ravi Bhattiprolu is a Senior Partner Solutions Architect at AWS. Ravi works with strategic ISV partners, Salesforce and Tableau, to deliver innovative and well-architected products and solutions that help joint customers achieve their business and technical objectives.
Avijit Goswami is a Principal Solutions Architect at AWS specialized in data and analytics. He supports AWS strategic customers in building high-performing, secure, and scalable data lake solutions on AWS using AWS managed services and open source solutions. Outside of his work, Avijit likes to travel, hike, watch sports, and listen to music.
Ife Stewart is a Principal Solutions Architect in the Strategic ISV segment at AWS. She has been engaged with Salesforce Data Cloud over the last 2 years to help build integrated customer experiences across Salesforce and AWS. Ife has over 10 years of experience in technology. She is an advocate for diversity and inclusion in the technology field.
Michael Chess is a Technical Product Manager at AWS Lake Formation. He focuses on improving data permissions across the data lake. He is passionate about enabling customers to build and optimize their data lakes to meet stringent security requirements.
Mike Patterson is a Senior Customer Solutions Manager in the Strategic ISV segment at AWS. He has partnered with Salesforce Data Cloud to align business objectives with innovative AWS solutions to achieve impactful customer experiences. In his spare time, he enjoys spending time with his family, sports, and outdoor activities.