Optimize your Tableau integration with Amazon Redshift Serverless

Post Syndicated from Nidhi Nayak original https://aws.amazon.com/blogs/big-data/optimize-your-tableau-integration-with-amazon-redshift-serverless/

This is a guest blog post co-written by Adiascar Cisneros, from Tableau at Salesforce.

Integrating Tableau with Amazon Redshift Serverless gives you high-performance analytics with serverless scaling and minimal capacity planning. Although automatic scaling handles warehouse management for you, optimization requires a strategic approach to data modeling, security, and query management.

In this post, we provide a guide to help you use Tableau’s Relationships and Amazon Redshift Serverless architecture to deliver sub-second insights while maximizing every Redshift Processing Unit (RPU). We also provide guidance on five key areas: data model architecture for optimal query performance, security configuration and access control, performance optimization through smart configuration, cost management strategies, and query and join optimization techniques.

Prerequisites

Before implementing these optimization strategies, make sure you have:

Tableau Desktop (version 2022.1 or later) or Tableau Server deployed.
An active Amazon Redshift Serverless workspace.
AWS Identity and Access Management (IAM) permissions to configure authentication and access controls.
Network connectivity configured between your Tableau environment and Amazon Redshift Serverless.
The native Amazon Redshift driver installed.

Building the foundation

The success of any analytics system begins with its data model. True scalability starts with the end-user experience. Your data model is more than a storage structure. It’s the foundation of dashboard responsiveness. By aligning your database design in Amazon Redshift with your analytical requirements, you empower Tableau to generate highly efficient queries, reducing costs and keeping your users engaged with the data.

When connecting to Amazon Redshift, we recommend using Tableau’s logical data model, specifically Relationships. With Relationship, you can preserve the native level of detail for each table, so Tableau can perform join culling and dynamically query only the specific tables needed for a particular visualization.

When designing your Amazon Redshift schema, implement a well-structured star or snowflake schema, or one big denormalized table where appropriate. This allows Tableau to optimize query execution automatically. Modern Amazon Redshift deployments benefit significantly from Automatic Table Optimization (ATO), which uses AI and machine learning (ML) to continuously monitor and adjust sort keys and distribution keys. To take advantage of ATO, keep sort keys and distribution styles at their default AUTO setting when you create tables. ATO then continuously monitors workload patterns and adjusts keys to improve query performance.

Start by implementing Relationships in your existing workbooks to take advantage of join culling and improved query performance.

Securing your connection

Native database drivers provide enhanced security features and better integration with Amazon Redshift capabilities compared to generic ODBC or JDBC alternatives.

The integrity of your analytics relies on the quality of the connection between your platforms. Use the native Amazon Redshift driver rather than generic ODBC or JDBC alternatives. The native driver is specifically engineered to use the advanced capabilities of Amazon Redshift and supports modern security protocols, such as AWS IAM Identity Center, out of the box. By prioritizing the native driver, you verify that your connection uses the latest security patches and performance optimizations, establishing a hardened and efficient entry point for your data. For more information, see Integrate Tableau and Okta with Amazon Redshift using AWS IAM Identity Center.

Connection stability for high-scale environments

In Amazon Redshift, cursors are used to retrieve a result set from a query and process the data row-by-row or in smaller chunks rather than loading the entire set into memory at once. For high-scale environments, stable connections depend on how you handle large result sets. In some high-volume scenarios, Amazon Redshift cursors can introduce resource overhead that impacts user concurrency. Monitor your workload and, if necessary, fine-tune your connection configurations using Tableau Data Customization (TDC) files. TDC files are XML configuration files that customize how Tableau connects to your database. Specifically, validate whether disabling cursors improves throughput.

Important: This configuration loads the entire dataset into memory. For large datasets, this might cause performance degradation or out-of-memory errors. Evaluate your dataset size and business requirements before you turn on this setting. This is a key step in tuning your deployment, helping verify that your Amazon Redshift resources remain available and responsive for secure, ad-hoc analysis.

Security best practices

Follow security best practices while deploying Amazon Redshift Serverless. Configure security groups to control inbound access from Tableau Server and Desktop IP ranges. IAM authentication must be the primary method, complemented by SSL/TLS encryption for all connections.

Role-based access control (RBAC) forms the backbone of your security framework:

Map IAM roles to database users.
Implement least privilege access in Amazon Redshift using database security controls.
Implement comprehensive monitoring through audit logging.
Use Amazon CloudWatch for failed login attempts.
Track API activity with AWS CloudTrail.

For authorization, implement a layered security model:

Apply explicit GRANT statements.
Create distinct database roles aligned with business functions.
Use Amazon Redshift system-defined roles judiciously.
Apply dynamic data masking for sensitive data.
Conduct regular security audits to support ongoing protection.

Audit your current connection types and migrate to the native Amazon Redshift driver if you’re using ODBC or JDBC connections.

Enhancing performance through smart configuration

Smart configuration spans how much data you query, where you push complex logic, how you design dashboards, and how you tune connections. The following sections cover each area.

Managing data volume

To maximize workbook efficiency, start by rigorously managing your data volume. Although Amazon Redshift handles large datasets well, your dashboard should query only what is strictly necessary. Use Tableau Hyper Extracts for production environments to provide a consistent, high-speed cache that offloads repetitive query processing from Amazon Redshift. If a live connection is required, strictly limit your data intake by using Data Source Filters and hiding all unused fields. This helps verify that Tableau generates leaner queries, significantly reducing network latency and processing time.

Shifting complexity to the database

Next, shift the burden of complexity away from the visualization layer. Materialize calculations within your extracts or push complex logic (especially row-level string manipulations and regex) directly down to the Amazon Redshift database level. By pre-calculating these values before the user ever loads the dashboard, you eliminate expensive runtime processing.

Simplify your logic within Tableau by using native features like CASE statements or Sets rather than complex IF/THEN statements. Testing shows these methods perform significantly faster for grouping dimensions.

Streamlining dashboard design

Additionally, optimize the rendering process by streamlining your dashboard design:

Limit the number of visualizations per dashboard.
Prioritize fixed-size dashboards to maximize server-side caching effectiveness.
Avoid high-cardinality filters (fields with thousands of unique values).
Don’t use the ‘Show Only Relevant Values’ setting on large datasets, because it forces the system to run extra background queries that slow down your dashboard.

Connection and parameter tuning

Optimize Tableau’s performance by enabling connection pooling tailored to your concurrent user count. Configure datetime handling and parallel query execution settings to match your workload patterns.

You can enhance the automatic resource management of Amazon Redshift Serverless through parameter optimization. Key parameters include:

Set enable_result_cache_for_session to OFF during development to verify you’re testing against live query performance, not cached results. Set it to ON in production.
Use AI scaling for spiky workloads.
Use queue based query resource management to set monitoring rules that control compute usage and prevent runaway queries from impacting workloads.

Choosing between extracts and live queries is a foundational architectural decision. We recommend a hybrid approach tailored to specific use cases rather than a one-size-fits-all policy.

When to use live queries

Live queries are best for real-time analytics. They use Amazon Redshift Serverless automatic scaling to query massive datasets in place. Use this approach for:

Up-to-the-minute data requirements.
Datasets too massive for extracts.
Scenarios requiring database-level row security.
Integration with Amazon Redshift Spectrum for Amazon Simple Storage Service (Amazon S3) data.

Keep in mind that live connections rely entirely on the database’s performance, so optimizing your Amazon Redshift tables and using materialization techniques within the database is important for maintaining interactivity.

When to use extracts

For scenarios when data is static or where query performance is critical, Tableau Hyper Extracts provide a high-speed cache that shifts the processing load from Amazon Redshift to Tableau’s data engine. This is valuable for dashboards with complex calculations (such as row-level string manipulations or heavy aggregations) where an extract can pre-materialize results, effectively baking in the logic before the user ever loads the view. By using extracts for these heavy workloads, you reduce the compute load on Amazon Redshift, lowering costs while delivering sub-second response times to end users.

Right-sizing your extracts

To maximize efficiency, right-size your extracts for your dashboard’s specific needs:

Avoid the SELECT * mentality.
Use data source filters to limit rows.
Hide unused fields to remove redundant columns.
For higher-level analysis, aggregate your data during the extract process. For example, summarize daily transactions into monthly trends to significantly reduce file size and query time.
Schedule refreshes during off-peak hours.
Use incremental updates to add only new rows, minimizing Amazon Redshift RPU usage and network overhead.

Balance performance and cost by aligning your connection choice with business freshness requirements and data complexity. Monitor usage patterns to refine this balance over time.

Star schema query and join optimization

Optimize your star schema joins and queries to reduce execution time and compute costs by using Tableau Relationships. Relationships keep tables separate, allowing Tableau to automatically query only the necessary tables for the fields in the view. Relationships are more flexible and often perform better than joins because they don’t force a row-level merge on all fields.

Inefficient joins and poorly optimized queries force Amazon Redshift to scan unnecessary data, increasing both query execution time and compute costs.

Query optimization best practices

Avoid Custom SQL, which forces Tableau to wrap queries in complex sub-selects. Instead, connect directly to tables or views to let the database optimizer function effectively.

Define primary and foreign keys in your Amazon Redshift schema to allow Tableau to assume referential integrity.

Important: Amazon Redshift does not enforce primary or foreign key constraints. They are informational only, and the query optimizer uses them to generate more efficient execution plans. You’re responsible for data integrity at the application or ETL layer. For more information, see Defining constraints. Assume Referential Integrity is a Tableau setting that tells the engine to trust defined key relationships without validating them at query time, reducing query complexity.

Use Materialized Views to pre-compute heavy aggregations, which reduces execution time for frequently accessed data patterns. For example, create materialized views for common date-based aggregations or customer-level summaries.

Optimize Amazon Redshift Serverless by denormalizing data to minimize complex joins. After you apply these changes, use Tableau’s Performance Recorder to regularly validate your query speeds and identify bottlenecks.

Cost optimization and monitoring

Amazon Redshift Serverless charges in RPU-hours on a per-second basis (60-second minimum), so you only pay for the workloads you run.

Optimizing query volumes and resource usage helps you control Amazon Redshift Serverless costs and maintain predictable spending. To help control compute costs, optimize Tableau queries before they reach Amazon Redshift by using Data Source Filters and ‘Hide All Unused Fields.’ This forces the generation of lean SELECT statements that scan only the necessary rows and columns. Because Amazon Redshift Serverless scales resources based on workload, reducing data volume and complexity at the Tableau source layer can help lower RPU consumption and costs.

For more information, see Amazon Redshift Serverless billing.

Using extracts as a cost buffer

Tableau Hyper Extracts act as a cost buffer for high-traffic dashboards. By extracting data into Tableau’s in-memory engine, database costs are typically incurred during scheduled refreshes rather than for every individual user interaction. For live connections, maximize Tableau’s caching architecture by setting server cache policies to “Refresh less often,” ensuring that repetitive dashboard views are served instantly from memory and avoid redundant, billable queries.

Monitoring and alerting

Monitor RPU usage patterns and set billing alerts to maintain cost control:

Combine query result caching with strategic scheduling for resource-intensive tasks.
Use scaling event data and query patterns to define thresholds.
Set up Amazon CloudWatch alarms for RPU consumption spikes.
Review Amazon Redshift query monitoring metrics weekly to identify optimization opportunities.

Clean up

To avoid incurring ongoing charges, delete the resources you created while testing the configurations described in this post.

Delete the Amazon Redshift Serverless workgroup and namespace if they were created for testing.
Remove any IAM roles, policies, and users created specifically for Tableau connectivity.
Delete security groups configured for Tableau Server or Desktop IP access.
Remove any materialized views, tables, or schemas created during testing.
Cancel any scheduled Tableau extract refreshes connected to test workgroups.
Delete Tableau data sources and workbooks that reference test environments.
Remove any CloudWatch alarms or CloudTrail configurations set up for monitoring test resources.

For more information about managing Amazon Redshift Serverless resources, see Billing for Amazon Redshift Serverless.

Conclusion

This post covered key optimization strategies for Tableau and Amazon Redshift Serverless integration: data model architecture using Relationships, security configuration with native drivers and AWS IAM, performance optimization through extracts and smart configuration, cost management with RPU monitoring, and query optimization techniques.

As AI-driven optimization evolves, staying informed about Amazon Redshift AI features and best practices, including Tableau Pulse, is key. Regularly review your configuration, performance, and security to verify that your Tableau and Amazon Redshift Serverless integration remains secure, cost-effective, and high-performing.

Optimization is an ongoing, iterative process. To keep your environment optimized, regularly review your settings, monitor performance, and adapt as workload patterns evolve. This approach maintains a cost-effective analytics environment that scales with your organization.

Ready to build a secure, high-performance analytics solution that delivers both speed and cost efficiency? Visit the Salesforce and AWS partnership webpage to start scaling your insights today.

Noise