Post Syndicated from Ramesh Ranganathan original https://aws.amazon.com/blogs/big-data/synchronize-your-salesforce-and-snowflake-data-to-speed-up-your-time-to-insight-with-amazon-appflow/
This post was co-written with Amit Shah, Principal Consultant at Atos.
Customers across industries seek meaningful insights from the data captured in their Customer Relationship Management (CRM) systems. To achieve this, they combine their CRM data with a wealth of information already available in their data warehouse, enterprise systems, or other software as a service (SaaS) applications. One widely used approach is getting the CRM data into your data warehouse and keeping it up to date through frequent data synchronization.
Integrating third-party SaaS applications is often complicated and requires significant effort and development. Developers need to understand the application APIs, write implementation and test code, and maintain the code for future API changes. Amazon AppFlow, which is a low-code/no-code AWS service, addresses this challenge.
Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between SaaS applications, like Salesforce, SAP, Zendesk, Slack, and ServiceNow, and AWS services like Amazon Simple Storage Service (Amazon S3) and Amazon Redshift in just a few clicks. With Amazon AppFlow, you can run data flows at enterprise scale at the frequency you choose—on a schedule, in response to a business event, or on demand.
In this post, we focus on synchronizing your data from Salesforce to Snowflake (on AWS) without writing code. This post walks you through the steps to set up a data flow to address full and incremental data load using an example use case.
Solution overview
Our use case involves the synchronization of the Account object from Salesforce into Snowflake. In this architecture, you use Amazon AppFlow to filter and transfer the data to your Snowflake data warehouse.
You can configure Amazon AppFlow to run your data ingestion in three different ways:
- On-demand – You can manually run the flow through the AWS Management Console, API, or SDK call.
- Event-driven – Amazon AppFlow can subscribe and listen to change data capture (CDC) events from the source SaaS application.
- Scheduled – Amazon AppFlow can run schedule-triggered flows based on a pre-defined schedule rule. With scheduled flows, you can choose either full or incremental data transfer:
- With full transfer, Amazon AppFlow transfers a snapshot of all records at the time of the flow run from the source to the destination.
- With incremental transfer, Amazon AppFlow transfers only the records that have been added or changed since the last successful flow run. To determine the incremental delta of your data, AppFlow requires you to specify a source timestamp field to instruct how Amazon AppFlow identifies new or updated records.
We use the on-demand trigger for the initial load of data from Salesforce to Snowflake, because it helps you pull all the records, irrespective of their creation. To then synchronize data periodically with Snowflake, after we run the on-demand trigger, we configure a scheduled trigger with incremental transfer. With this approach, Amazon AppFlow pulls the records based on a chosen timestamp field from the Salesforce Account object periodically, based on the time interval specified in the flow.
The Account_Staging
table is created in Snowflake to act as a temporary storage that can be used to identify the data change events. Then the permanent table (Account) is updated from the staging table by running a SQL stored procedure that contains the incremental update logic. The following figure depicts the various components of the architecture and the data flow from the source to the target.
The data flow contains the following steps:
- First, the flow is run with on-demand and full transfer mode to load the full data into Snowflake.
- The Amazon AppFlow Salesforce connector pulls the data from Salesforce and stores it in the Account Data S3 bucket in CSV format.
- The Amazon AppFlow Snowflake connector loads the data into the
Account_Staging
table. - A scheduled task, running at regular intervals in Snowflake, triggers a stored procedure.
- The stored procedure starts an atomic transaction that loads the data into the Account table and then deletes the data from the
Account_Staging
table. - After the initial data is loaded, you update the flow to capture incremental updates from Salesforce. The flow trigger configuration is changed to scheduled, to capture data changes in Salesforce. This enables Snowflake to get all updates, deletes, and inserts in Salesforce at configured intervals.
- The flow uses the configured
LastModifiedDate
field to determine incremental changes. - Steps 3, 4, and 5 are run again to load the incremental updates into the Snowflake Accounts table.
Prerequisites
To get started, you need the following prerequisites:
- A Salesforce user account with sufficient privileges to install connected apps. Amazon AppFlow uses a connected app to communicate with Salesforce APIs. If you don’t have a Salesforce account, you can sign up for a developer account.
- A Snowflake account with sufficient permissions to create and configure the integration, external stage, table, stored procedures, and tasks.
- An AWS account with access to AWS Identity and Access Management (IAM), Amazon AppFlow, and Amazon S3.
Set up Snowflake configuration and Amazon S3 data
Complete the following steps to configure Snowflake and set up your data in Amazon S3:
- Create two S3 buckets in your AWS account: one for holding the data coming from Salesforce, and another for holding error records.
A best practice when creating your S3 bucket is to make sure you block public access to the bucket to ensure your data is not accessible by unauthorized users.
- Create an IAM policy named snowflake-access that allows listing the bucket contents and reading S3 objects inside the bucket.
Follow the instructions for steps 1 and 2 in Configuring a Snowflake Storage Integration to Access Amazon S3 to create an IAM policy and role. Replace the placeholders with your S3 bucket names.
- Log in to your Snowflake account and create a new warehouse called
SALESFORCE
and database calledSALESTEST
. - Specify the format in which data will be available in Amazon S3 for Snowflake to load (for this post, CSV):
- Amazon AppFlow uses the Snowflake COPY command to move data using an S3 bucket. To configure this integration, follow steps 3–6 in Configuring a Snowflake Storage Integration to Access Amazon S3.
These steps create a storage integration with your S3 bucket, update IAM roles with Snowflake account and user details, and creates an external stage.
This completes the setup in Snowflake. In the next section, you create the required objects in Snowflake.
Create schemas and procedures in Snowflake
In your Snowflake account, complete the following steps to create the tables, stored procedures, and tasks for implementing the use case:
- In your Snowflake account, open a worksheet and run the following DDL scripts to create the
Account
andAccount_staging
tables:
- Create a stored procedure in Snowflake to load data from
staging
to theAccount
table:
This stored procedure determines whether the data contains new records that need to be inserted or existing records that need to be updated or deleted. After a successful run, the stored procedure clears any data from your staging table.
- Create a task in Snowflake to trigger the stored procedure. Make sure that the time interval for this task is more than the time interval configured in Amazon AppFlow for pulling the incremental changes from Salesforce. The time interval should be sufficient for data to be processed.
- Provide the required permissions to run the task and resume the task:
- As soon as task is created it will be suspended state so needs to resume it manually first time
- If the role which is assigned to us doesn’t have proper access to resume/execute task needs to grant execute task privilege to that role
This completes the Snowflake part of configuration and setup.
Create a Salesforce connection
First, let’s create a Salesforce connection that can be used by AppFlow to authenticate and pull records from your Salesforce instance. On the AWS console, make sure you are in the same Region where your Snowflake instance is running.
- On the Amazon AppFlow console, choose Connections in the navigation pane.
- From the list of connectors, select Salesforce.
- Choose Create connection.
- For Connection name, enter a name of your choice (for example,
Salesforce-blog
). - Leave the rest of the fields as default and choose Continue.
- You’re redirected to a sign-in page, where you need to log in to your Salesforce instance.
- After you allow Amazon AppFlow access to your Salesforce account, your connection is successfully created.
Create a Snowflake connection
Complete the following steps to create your Snowflake connection:
- On the Connections menu, choose Snowflake.
- Choose Create connection.
- Provide information for the Warehouse, Stage name, and Bucket details fields.
- Enter your credential details.
- For Region, choose the same Region where Snowflake is running.
- For Connection name, name your connection
Snowflake-blog
. - Leave the rest of the fields as default and choose Connect.
Create a flow in Amazon AppFlow
Now you create a flow in Amazon AppFlow to load the data from Salesforce to Snowflake. Complete the following steps:
- On the Amazon AppFlow console, choose Flows in the navigation pane.
- Choose Create flow.
- On the Specify flow details page, enter a name for the flow (for example,
AccountData-SalesforceToSnowflake
). - Optionally, provide a description for the flow and tags.
- Choose Next.
- On the Configure flow page, for Source name¸ choose Salesforce.
- Choose the Salesforce connection we created in the previous step (
Salesforce-blog
). - For Choose Salesforce object, choose Account.
- For Destination name, choose Snowflake.
- Choose the newly created Snowflake connection.
- For Choose Snowflake object, choose the staging table you created earlier (
SALESTEST.PUBLIC. ACCOUNT_STAGING
).
- In the Error handling section, provide your error S3 bucket.
- For Choose how to trigger the flow¸ select Run on demand.
- Choose Next.
- Select Manually map fields to map the fields between your source and destination.
- Choose the fields
Account Number
,Account Name
,Account Type
,Annual Revenue
,Active
,Deleted
, andLast Modified Date
.
- Map each source field to its corresponding destination field.
- Under Additional settings, leave the Import deleted records unchecked (default setting).
- In the Validations section, add validations for the data you’re pulling from Salesforce.
Because the schema for the Account_Staging
table in Snowflake database has a NOT NULL constraint for the fields Account_Number
and Active
, records containing a null value for these fields should be ignored.
- Choose Add Validation to configure validations for these fields.
- Choose Next.
- Leave everything else as default, proceed to the final page, and choose Create Flow.
- After the flow is created, choose Run flow.
When the flow run completes successfully, it will bring all records into your Snowflake staging table.
Verify data in Snowflake
The data will be loaded into the Account_staging
table. To verify that data is loaded in Snowflake, complete the following steps:
- Validate the number of records by querying the
ACCOUNT_STAGING
table in Snowflake. - Wait for your Snowflake task to run based on the configured schedule.
- Verify that all the data is transferred to the
ACCOUNT
table and theACCOUNT_STAGING
table is truncated.
Configure an incremental data load from Salesforce
Now let’s configure an incremental data load from Salesforce:
- On the Amazon AppFlow console, select your flow, and choose Edit.
- Go to the Edit configuration step and change to Run flow on schedule.
- Set the flow to run every 5 minutes, and provide a start date of Today, with a start time in the future.
- Choose Incremental transfer and choose the
LastModifiedDate
field. - Choose Next.
- In the Additional settings section, select Import deleted records.
This ensures that deleted records from the source are also ingested.
- Choose Save and then choose Activate flow.
Now your flow is configured to capture all incremental changes.
Test the solution
Log in to your Salesforce account, and edit any record in the Account object.
Within 5 minutes or less, a scheduled flow will pick up your change and write the changed record into your Snowflake staging table and trigger the synchronization process.
You can see the details of the run, including number of records transferred, on the Run History tab of your flow.
Clean up
Clean up the resources in your AWS account by completing the following steps:
- On the Amazon AppFlow console, choose Flows in the navigation pane.
- From the list of flows, select the flow
AccountData-SalesforceToSnowflakeand
delete it. - Enter delete to delete the flow.
- Choose Connections in the navigation pane.
- Choose Salesforce from the list of connectors, select
Salesforce-blog
, and delete it. - Enter delete to delete the connector.
- On the Connections page, choose Snowflake from the list of connectors, select
Snowflake-blog
, and delete it. - Enter delete to delete the connector.
- On the IAM console, choose Roles in the navigation page, then select the role you created for Snowflake and delete it.
- Choose Policies in the navigation pane, select the policy you created for Snowflake, and delete it.
- On the Amazon S3 console, search for the data bucket you created, choose Empty to delete the objects, then delete the bucket.
- Search for the error bucket you created, choose Empty to delete the objects, then delete the bucket.
- Clean up resources in your Snowflake account:
- Delete the task
TASK_ACCOUNT_LOAD
:
- Delete the stored procedure
sp_account_load
:
- Delete the tables
ACCOUNT_STAGING
andACCOUNT
:
Conclusion
In this post, we walked you through how to integrate and synchronize your data from Salesforce to Snowflake using Amazon AppFlow. This demonstrates how you can set up your ETL jobs without having to learn new programming languages by using Amazon AppFlow and your familiar SQL language. This is a proof of concept, but you can try to handle edge cases like failure of Snowflake tasks or understand how incremental transfer works by making multiple changes to a Salesforce record within the scheduled time interval.
For more information on Amazon AppFlow, visit Amazon AppFlow.
About the authors
Ramesh Ranganathan is a Senior Partner Solution Architect at AWS. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, application modernization and cloud native development. He is passionate about technology and enjoys experimenting with AWS Serverless services.
Kamen Sharlandjiev is an Analytics Specialist Solutions Architect and Amazon AppFlow expert. He’s on a mission to make life easier for customers who are facing complex data integration challenges. His secret weapon? Fully managed, low-code AWS services that can get the job done with minimal effort and no coding.
Amit Shah is a cloud based modern data architecture expert and currently leading AWS Data Analytics practice in Atos. Based in Pune in India, he has 20+ years of experience in data strategy, architecture, design and development. He is on a mission to help organization become data-driven.