All posts by Valdiney Gomes

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

Post Syndicated from Valdiney Gomes original https://aws.amazon.com/blogs/big-data/migrate-amazon-redshift-from-dc2-to-ra3-to-accommodate-increasing-data-volumes-and-analytics-demands/

This is a guest post by Valdiney Gomes, Hélio Leal, Flávia Lima, and Fernando Saga from Dafiti.

As businesses strive to make informed decisions, the amount of data being generated and required for analysis is growing exponentially. This trend is no exception for Dafiti, an ecommerce company that recognizes the importance of using data to drive strategic decision-making processes. With the ever-increasing volume of data available, Dafiti faces the challenge of effectively managing and extracting valuable insights from this vast pool of information to gain a competitive edge and make data-driven decisions that align with company business objectives.

Amazon Redshift is widely used for Dafiti’s data analytics, supporting approximately 100,000 daily queries from over 400 users across three countries. These queries include both extract, transform, and load (ETL) and extract, load, and transform (ELT) processes and one-time analytics. Dafiti’s data infrastructure relies heavily on ETL and ELT processes, with approximately 2,500 unique processes run daily. These processes retrieve data from around 90 different data sources, resulting in updating roughly 2,000 tables in the data warehouse and 3,000 external tables in Parquet format, accessed through Amazon Redshift Spectrum and a data lake on Amazon Simple Storage Service (Amazon S3).

The growing need for storage space to maintain data from over 90 sources and the functionality available on the new Amazon Redshift node types, including managed storage, data sharing, and zero-ETL integrations, led us to migrate from DC2 to RA3 nodes.

In this post, we share how we handled the migration process and provide further impressions of our experience.

Amazon Redshift at Dafiti

Amazon Redshift is a fully managed data warehouse service, and was adopted by Dafiti in 2017. Since then, we’ve had the opportunity to follow many innovations and have gone through three different node types. We started with 115 dc2.large nodes and with the launch of Redshift Spectrum and the migration of our cold data to the data lake, then we considerably improved our architecture and migrated to four dc2.8xlarge nodes. RA3 introduced many features, allowing us to scale and pay for computing and storage independently. This is what brought us to the current moment, where we have eight ra3.4xlarge nodes in the production environment and a single node ra3.xlplus cluster for development.

Given our scenario, where we have many data sources and a lot of new data being generated every moment, we came across a problem: the 10 TB we had available in our cluster was insufficient for our needs. Although most of our data is currently in the data lake, more storage space was needed in the data warehouse. This was solved by RA3, which scales compute and storage independently. Also, with zero-ETL, we simplified our data pipelines, ingesting tons of data in near real time from our Amazon Relational Database Service (Amazon RDS) instances, while data sharing enables a data mesh approach.

Migration process to RA3

Our first step towards migration was to understand how the new cluster should be sized; for this, AWS provides a recommendation table.

Given the configuration of our cluster, consisting of four dc2.8xlarge nodes, the recommendation was to switch to ra3.4xlarge.

At this point, one concern we had was regarding reducing the amount of vCPU and memory. With DC2, our four nodes provided a total of 128 vCPUs and 976 GiB; in RA3, even with eight nodes, these values were reduced to 96 vCPUs and 768 GiB. However, the performance was improved, with processing of workloads 40% faster in general.

AWS offers Redshift Test Drive to validate whether the configuration chosen for Amazon Redshift is ideal for your workload before migrating the production environment. At Dafiti, given the particularities of our workload, which gives us some flexibility to make changes to specific windows without affecting the business, it wasn’t necessary to use Redshift Test Drive.

We carried out the migration as follows:

  1. We created a new cluster with eight ra3.4xlarge nodes from the snapshot of our four-node dc2.8xlarge cluster. This process took around 10 minutes to create the new cluster with 8.75 TB of data.
  2. We turned off our internal ETL and ELT orchestrator, to prevent our data from being updated during the migration period.
  3. We changed the DNS pointing to the new cluster in a transparent way for our users. At this point, only one-time queries and those made by Amazon QuickSight reached the new cluster.
  4. After the read query validation stage was complete and we were satisfied with the performance, we reconnected our orchestrator so that the data transformation queries could be run in the new cluster.
  5. We removed the DC2 cluster and completed the migration.

The following diagram illustrates the migration architecture.

Migrate architecture

During the migration, we defined some checkpoints at which a rollback would be performed if something unwanted happened. The first checkpoint was in Step 3, where the reduction in performance in user queries would lead to a rollback. The second checkpoint was in Step 4, if the ETL and ELT processes presented errors or there was a loss of performance compared to the metrics collected from the processes run in DC2. In both cases, the rollback would simply occur by changing the DNS to point to DC2 again, because it would still be possible to rebuild all processes within the defined maintenance window.

Results

The RA3 family introduced many features, allowed scaling, and enabled us to pay for compute and storage independently, which changed the game at Dafiti. Before, we had a cluster that performed as expected, but limited us in terms of storage, requiring daily maintenance to maintain control of disk space.

The RA3 nodes performed better and workloads ran 40% faster in general. It represents a significant decrease in the delivery time of our critical data analytics processes.

This improvement became even more pronounced in the days following the migration, due to the ability in Amazon Redshift to optimize caching, statistics, and apply performance recommendations. Additionally, Amazon Redshift is able to provide recommendations for optimizing our cluster based on our workload demands through Amazon Redshift Advisor recommendations, and offers automatic table optimization, which played a key role in achieving a seamless transition.

Moreover, the storage capacity leap from 10 TB to multiple PB solved Dafiti’s primary challenge of accommodating growing data volumes. This substantial increase in storage capabilities, combined with the unexpected performance enhancements, demonstrated that the migration to RA3 nodes was a successful strategic decision that addressed Dafiti’s evolving data infrastructure requirements.

Data sharing has been used since the moment of migration, to share data between the production and development environment, but the natural evolution is to enable the data mesh at Dafiti through this resource. The limitation we had was the need to activate case sensitivity, which is a prerequisite for data sharing, and which forced us to change some broken processes. But that was nothing compared to the benefits we’re seeing from migrating to RA3.

Conclusion

In this post, we discussed how Dafiti handled migrating to Redshift RA3 nodes, and the benefits of this migration.

Do you want to know more about what we’re doing in the data area at Dafiti? Check out the following resources:

 The content and opinions in this post are those of Dafiti’s authors and AWS is not responsible for the content or accuracy of this post.


About the Authors

Valdiney Gomes is Data Engineering Coordinator at Dafiti. He worked for many years in software engineering, migrated to data engineering, and currently leads an amazing team responsible for the data platform for Dafiti in Latin America.

Hélio Leal is a Data Engineering Specialist at Dafiti, responsible for maintaining and evolving the entire data platform at Dafiti using AWS solutions.

Flávia Lima is a Data Engineer at Dafiti, responsible for sustaining the data platform and providing data from many sources to internal customers.

Fernando Saga is a data engineer at Dafiti, responsible for maintaining Dafiti’s data platform using AWS solutions.

How Dafiti made Amazon QuickSight its primary data visualization tool

Post Syndicated from Valdiney Gomes original https://aws.amazon.com/blogs/big-data/how-dafiti-made-amazon-quicksight-its-primary-data-visualization-tool/

This is a guest post by Valdiney Gomes, Hélio Leal, and Flávia Lima from Dafiti.

Data and its various uses is increasingly evident in companies, and each professional has their preferences about which technologies to use to visualize data, which isn’t necessarily in line with the technological needs and infrastructure of a company. At Dafiti, a Brazilian fashion and style e-commerce retailer, it was no different. Five tools were used by different sectors of the company, which caused misalignment and management overhead, spreading our resources thin to support them. Looking for a tool that would enable us to democratize our data, we chose Amazon QuickSight, a cloud-native, serverless business intelligence (BI) service that powers interactive dashboards that lets us make better data-driven decisions, as a corporate solution for data visualization.

In this post, we discuss why we chose QuickSight and how we implemented it.

Why we chose QuickSight

We had specific requirements for our BI solution and looked at many different options. The following factors guided our decision:

  • Tool close to data – It was important to have the data visualization tool as close to the data as possible. At Dafiti, the entire infrastructure is on AWS, and we use Amazon Redshift as our Data Warehouse. QuickSight, when using SPICE (Super-fast, Parallel, In-memory Calculation Engine), extracts data from Amazon Redshift as efficiently as possible using UNLOAD, which optimizes the use of Amazon Redshift.
  • Highly available and accessible solution – We wanted to be able to be access the tool by web or mobile interface, in addition to being able to do almost anything through API calls.
  • Serverless solution – All the other data visualization solutions that were used at Dafiti were on premises, which created unnecessary cost and effort to maintain these services, taking the focus away from what was most important to us: data.
  • Flexible pricing model – We needed a pricing model that would allow us to provide access to everyone in the company and at a price defined by usage and not by license. Thanks to AWS pay-as-you-go pricing, with more than double the number of users we had on our previous main data visualization solution, our cost with QuickSight is about 10 times lower.
  • Robust documentation – The material provided by AWS proved to be helpful, allowing our team to put the project into production.

Unifying our solution

We were previously using Qlikview, Sisense, Tableau, SAP, and Excel to analyze our data across different teams. We were already using other AWS services and learning about QuickSight when we hosted a Data Battle with AWS, a hybrid event for more than 230 Dafiti employees. This event had a hands-on approach with a workshop followed by a friendly QuickSight competition. Participants had to get information in their own dashboard to answer correctly. This 5-hour event flew by, accelerated the learning path of technical and business teams, and proved that QuickSight was the right tool for us.

QuickSight has brought all of our teams into one tool, while lowering costs by 80% and enabling us to do so much more together. Currently, over 400 employees, including our CEO, across nine different business units are using QuickSight as their sole source of truth on a daily basis. This includes human resources, auditing, and customer service, which previously had their analyses spread across several sources.

Data democratization

Data democratization is one of Dafiti’s main objectives. We believe that allowing everyone to analyze the data, following Brazilian, Argentinean, and Colombian privacy laws, unlocks potential for improving decision-making processes by extracting value from the data generated by the company. However, the democratization of data comes with the responsible use of resources. Yes, we want all users to be able to access and extract value from the data, but the cost can never be greater than the value that this generates.

How we organized the project

Data democratization drives Dafiti’s strategy. When implementing QuickSight, the obsession of becoming an even more data-driven company (we talk about this at the AWS Summit SP 2022) and having data increasingly accessible was what guided the project.

We organized QuickSight by folders, as can be seen in the following figure, and each folder represents a business area. This makes it easier to grant access and ensures that all people from the same area have access to exactly the same set of data and reports.

model of Dafiti's QuickSight folders

In this model, people from the corporate data area can view and edit any resource from any area, while customer service users can view and edit resources only for customer service.

Expanding the model a bit, the reports created by one area can be shared with others, as can be seen in the following figure, in which the SAC report was shared with Support, creating what we call a reporting portfolio.

an expansion of the folders

In this way, all users who join any of the groups will have exactly the same view as any of their peers, eliminating privileges in accessing data. In addition, the portfolio is enriched every day with reports that are created and maintained by other areas, but which may be of interest to areas other than the one responsible for creating it.

For this to work correctly, a certain rigidity is necessary in relation to the few naming and documentation standards that have been defined. On the other hand, designers have complete freedom to define the characteristics of their reports.

Another highlight in this model is that no report can be shared directly with a specific user; this restriction was defined using custom permissions in QuickSight. Therefore, the reports are always shared only through the folders. After all, we want the data to be accessible equally to everyone in the company.

Technical configurations

QuickSight offers a comprehensive API, and all the activities we carry out on a daily basis take place through these APIs. Among these activities, we highlight the granting of access and the monitoring of various aspects of the tool.

The QuickSight visual interface allows most of the tool’s maintenance activities to be performed and integration with Active Directory or the use of AWS Identity and Access Management (IAM) users is possible, but we understand that it wouldn’t be the ideal choice to grant access. Therefore, we defined an access grant flow for users and groups based on the QuickSight API, as can be seen in the following figure. In this model, the creation and removal of users is done through a JSON file with the following structure:

{
 "Version":"1.0.0",
 "Namespace":"default",
 "AwsAccountId":"<AwsAccountId>",
 "AwsRegion":"<AwsRegion>",
 "Permission":{
  "GroupList":[
   {"GroupName":"QUICKSIGHT_DATA_EDITOR"},
   {"GroupName":"QUICKSIGHT_DATA_VIEWER"},
   {"GroupName":"QUICKSIGHT_DATA_DESIGNER"},
   {"GroupName":"QUICKSIGHT_SAC_VIEWER"},
   {"GroupName":"QUICKSIGHT_SAC_DESIGNER"},
    ...
  ],
  "UserList":[
   {"UserName":"[email protected]","Active":"True","GroupList":[{"GroupName":"QUICKSIGHT_DATA_EDITOR"}]},
   {"UserName":"[email protected]","Active":"True","GroupList":[{"GroupName":"QUICKSIGHT_SAC_VIEWER"}]},
   ...
  ]
 }
}

Whenever a user needs to be added or changed, the file is edited and a pull request is submitted to GitHub. If the request is approved, an action is triggered to send the file to an Amazon Simple Storage Service (Amazon S3) bucket. From this, an AWS Lambda function is triggered that performs two activities: the first is the maintenance of users and groups, and the second is the sending of an invitation through Amazon Simple Email Service (Amazon SES) for users to join QuickSight. In our case, we opted for a personalized invitation model that would emphasize the data democratization initiative that is being conducted.

an architecture diagram from JSON to QuickSight

To monitor the tool, we implemented the architecture shown in the following figure, in which we used AWS CloudTrail to pull out the QuickSight logs and the QuickSight API to extract information from the tool’s resources, such as reports, users, datasets, data sources, and more. All of this data is processed by Glove, our data integration tool, stored in Amazon Redshift, and analyzed in QuickSight itself. This allows us to understand the behavior of our users and concentrate efforts on the most-used resources, in addition to allowing optimal cost control and the use of SPICE.

an architecture diagram from QuickSight to Redshift

To update the datasets, we don’t use the QuickSight internal scheduler, due to the large volume of data and the complexity of the DAGs. We prefer updating the datasets within our ETL (extract, transform, and load) and ELT process orchestration flow. For this purpose, we use Hanger, our orchestration tool. This approach allows the datasets to be updated only when the data source is changed and the data quality processes are executed. This model is represented by the following figure.

an architecture diagram with Redshift, Hanger, and QuickSight API

Conclusion

Choosing a data visualization tool is not a simple task. It involves many considerations, and several aspects must be analyzed in order for the choice to fit the characteristics of the company and to be consistent with the profile of business users.

For Dafiti, QuickSight was a natural choice from the moment we learned about its features. We needed a service that was in the same cloud as our main data sources, extremely fast using SPICE, and solved the maintenance and cost problem of on-premises applications. In terms of functionalities that are necessary for our business, it met our needs perfectly.

Do you want to know more about what we are doing in the data area here at Dafiti? Check out the following videos:


About the Authors

Valdiney Gomes is Data Engineering Coordinator at Dafiti. He worked for many years in software engineering, migrated to data engineering, and currently leads an amazing team responsible for the data platform for Dafiti in Latin America.

Hélio Leal is a Data Engineering Specialist at Dafiti, responsible for maintaining and evolving the entire data platform at Dafiti using AWS solutions.

Flávia Lima is a Data Engineer at Dafiti, responsible for sustaining the data platform and providing the data from many sources to internal customers.