Post Syndicated from Sreenivasa Munagala original https://aws.amazon.com/blogs/big-data/how-fanduel-adopted-a-modern-amazon-redshift-architecture-to-serve-critical-business-workloads/
This post is co-written with Sreenivasa Mungala and Matt Grimm from FanDuel.
In this post, we share how FanDuel moved from a DC2 nodes architecture to a modern Amazon Redshift architecture, which includes Redshift provisioned clusters using RA3 instances, Amazon Redshift data sharing, and Amazon Redshift Serverless.
Part of Flutter Entertainment, FanDuel Group is a gaming company that offers sportsbooks, daily fantasy sports, horse racing, and online casinos. The company operates sportsbooks in a number of US and Canadian states. Fanduel first carved out a niche in the US through daily fantasy sports, such as their most popular fantasy sport: NFL football.
As FanDuel’s business footprint grew, so too did the complexity of their analytical needs. More and more of FanDuel’s community of analysts and business users looked for comprehensive data solutions that centralized the data across the various arms of their business. Their individual, product-specific, and often on-premises data warehouses soon became obsolete. FanDuel’s data team solved the problem of creating a new massive data store for centralizing the data in one place, with one version of the truth. At the heart of this new Global Data Platform was Amazon Redshift, which fast became the trusted data store from which all analysis was derived. Users could now assess risk, profitability, and cross-sell opportunities not only for piecemeal divisions or products, but also globally for the business as a whole.
FanDuel’s journey on Amazon Redshift
FanDuel’s first Redshift cluster was launched using Dense Compute (DC2) nodes. This was chosen over Dense Storage (DS2) nodes in order to take advantage of the greater compute power for the complex queries in their organization. As FanDuel grew, so did their data workloads. This meant that there was a constant challenge to scale and overcome contention while providing the performance their user community needed for day-to-day decision-making. FanDuel met this challenge initially by continuously adding nodes and experimenting with workload management (WLM), but it became abundantly obvious that they needed to take a more significant step to meet the needs of their users.
In 2021, FanDuel’s workloads almost tripled since they first started using Amazon Redshift in 2018, and they started evaluating Redshift RA3 nodes vs. DC2 nodes to take advantage of the storage and compute separation and deliver better performance at lower costs. FanDuel wanted to make the move primarily to separate storage and compute, and evaluate data sharing in the hopes of bringing different compute to the data to alleviate user contention on their primary cluster. FanDuel decided to launch a new RA3 cluster when they were satisfied that the performance matched that of their existing DC2 architecture, providing them the ability to scale storage and compute independently.
In 2022, FanDuel shifted their focus to using data sharing. Data sharing allows you to share live data securely across Redshift data warehouses for read and write (in preview) purposes. This means that workloads can be isolated to individual clusters, allowing for a more streamlined schema design, WLM configuration, and right-sizing for cost optimization. The following diagram illustrates this architecture.
To achieve a data sharing architecture, the plan was to first spin up consumer clusters for development and testing environments for their data engineers that were moving key legacy code to dbt. FanDuel wanted their engineers to have access to production datasets to test their new models and match the results from their legacy SQL-based code sets. They also wanted to ensure that they had adequate compute to run many jobs concurrently. After they saw the benefits of data sharing, they spun up their first production consumer cluster in the spring of 2022 to handle other analytics use cases. This was sharing most of the schemas and their tables from the main producer cluster.
Benefits of moving to a data sharing architecture
FanDuel saw a lot of benefits from the data sharing architecture, where data engineers had access to real production data to test their jobs without impacting the producer’s performance. Since splitting the workloads through a data sharing architecture, FanDuel has doubled their query concurrency and reduced the query queuing, resulting in a better end-to-end query time. FanDuel received positive feedback on the new environment and soon reaped the rewards of increased engineering velocity and reduced performance issues in production after deployments. Their initial venture into the world of data sharing was definitely considered a success.
Given the successful rollout of their first consumer in a data sharing architecture, they looked for opportunities to meet other users’ needs with new targeted consumers. With the assistance of AWS, FanDuel initiated the development of a comprehensive strategy aimed at safeguarding their extract, load, and transform (ELT) jobs. This approach involved implementing workload isolation and allocating dedicated clusters for these workloads, designated as the producer cluster within the data sharing architecture. Simultaneously, they planned to migrate all other activities onto one or more consumer clusters, apart from the existing cluster utilized by their data engineering team.
They spun up a second consumer in the summer of 2022 with the hopes of moving some of their more resource-intensive analytical processes off the main cluster. In order to empower their analysts over time, they had allowed a pattern in which users other than data engineers could create and share their own objects.
As the calendar flipped from 2022 to 2023, several developments changed the landscape of architecture at FanDuel. For one, FanDuel launched their initial event-based streaming work for their sportsbook data, which allowed them to micro-batch data into Amazon Redshift at a much lower latency than their previous legacy batch approach. This allowed them to generate C-Suite revenue reports at a much earlier SLA, which was a big win for the data team, because this was never achieved before the Super Bowl.
FanDuel introduced a new internal KPI called Query Efficiency, a measure to capture the amount of time users spent waiting for their queries to run. As the workload started increasing exponentially, FanDuel also noticed an increase in this KPI, specifically for risk and trading workloads.
Working with AWS Enterprise Support and the Amazon Redshift service team, FanDuel soon realized that the risk and trading use case was a perfect opportunity to move it to Amazon Redshift Serverless. Redshift Serverless offers scalability across dimensions such a data volume changes, concurrent users and query complexity, enabling you to automatically scale compute up or down to manage demanding and unpredictable workloads. Because billing is only accrued while queries are run, it also means that you no longer need to cover costs for compute you’re not utilizing. Redshift Serverless also manages workload management (WLM) entirely, allowing you to focus only on the query monitoring rules (QMRs) you want and usage limits, further limiting the need for you to manage your data warehouses. This adoption also complimented data sharing, where Redshift Serverless endpoints can read and write (in preview) from provisioned clusters during peak hours, offering flexible compute scalability and workload isolation and avoiding the impact on other mission-critical workloads. Seeing the benefits of what Redshift Serverless offers for their risk and trading workloads, they also moved some of their other workloads like business intelligence (BI) dashboards and risk and trading (RT) to a Redshift Serverless environment.
Benefits of introducing Redshift Serverless in a data sharing architecture
Through a combination of data sharing and a serverless architecture, FanDuel could elastically scale their most critical workloads on demand. Redshift Serverless Automatic WLM allowed users to get started without the need to configure WLM. With the intelligent and automated scaling capabilities of Redshift Serverless, FanDuel could focus on their business objectives without worrying about the data warehouse capacity. This architecture alleviated the constraints of a single predefined Redshift provisioned cluster and reduced the need for FanDuel to manage data warehouse capacity and any WLM configuration.
In terms of cost, Redshift Serverless enabled FanDuel to elegantly handle the most demanding workloads with a pay-as-you-go model, paying only when the data warehouse is in use, along with complete separation of compute and storage.
Having now introduced workload isolation and Redshift Serverless, FanDuel is able to achieve a more granular understanding of each team’s compute requirements without the noise of ELT and contending workloads all in the same environment. This allowed comprehensive analytics workloads to be conducted on consumers with vastly minimized contention while also being serviced with the most cost-efficient configuration possible.
The following diagram illustrates the updated architecture.
FanDuel’s re-architecting efforts for workload isolation with risk and trading (RT) workloads using Redshift data sharing and Redshift Serverless resulted in the most critical business SLAs finishing three times faster, along with an increase in average query efficiency of 55% for overall workloads. These SLA improvements have resulted into an overall saving of tenfold in business cost, and they have been able to deliver business insights to other verticals such as product, commercial, and marketing much faster.
By harnessing the power of Redshift provisioned clusters and serverless endpoints with data sharing, FanDuel has been able to better scale and run analytical workloads without having to manage any data warehouse infrastructure. FanDuel is looking forward to future Amazon partnerships and is excited to embark on a journey of new innovation with Redshift Serverless and continued enhancements such as machine learning optimization and auto scaling.
If you’re new to Amazon Redshift, you can explore demos, other customer stories, and the latest features at Amazon Redshift. If you’re already using Amazon Redshift, reach out to your AWS account team for support, and learn more about what’s new with Amazon Redshift.
About the authors
Sreenivasa Munagala is a Principal Data Architect at FanDuel Group. He defines their Amazon Redshift optimization strategy and works with the data analytics team to provide solutions to their key business problems.
Matt Grimm is a Principal Data Architect at FanDuel Group, moving the company to an event-based, data-driven architecture using the integration of both streaming and batch data, while also supporting their Machine Learning Platform and development teams.
Luke Shearer is a Cloud Support Engineer at Amazon Web Services for the Data Insight Analytics profile, where he is engaged with AWS customers every day and is always working to identify the best solution for each customer.
Dhaval Shah is Senior Customer Success Engineer at AWS and specializes in bringing the most complex and demanding data analytics workloads to Amazon Redshift. He has more then 20 years of experiences in different databases and data warehousing technologies. He is passionate about efficient and scalable data analytics cloud solutions that drive business value for customers.
Ranjan Burman is an Sr. Analytics Specialist Solutions Architect at AWS. He specializes in Amazon Redshift and helps customers build scalable analytical solutions. He has more than 17 years of experience in different database and data warehousing technologies. He is passionate about automating and solving customer problems with cloud solutions.
Sidhanth Muralidhar is a Principal Technical Account Manager at AWS. He works with large enterprise customers who run their workloads on AWS. He is passionate about working with customers and helping them architect workloads for cost, reliability, performance, and operational excellence at scale in their cloud journey. He has a keen interest in data analytics as well.