All posts by Ziad Wali

Successfully conduct a proof of concept in Amazon Redshift

2024-03-27 Ziad Wali

Post Syndicated from Ziad Wali original https://aws.amazon.com/blogs/big-data/successfully-conduct-a-proof-of-concept-in-amazon-redshift/

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you securely access your data in operational databases, data lakes, or third-party datasets with minimal movement or copying of data. Tens of thousands of customers use Amazon Redshift to process large amounts of data, modernize their data analytics workloads, and provide insights for their business users.

In this post, we discuss how to successfully conduct a proof of concept in Amazon Redshift by going through the main stages of the process, available tools that accelerate implementation, and common use cases.

Proof of concept overview

A proof of concept (POC) is a process that uses representative data to validate whether a technology or service fulfills a customer’s technical and business requirements. By testing the solution against key metrics, a POC provides insights that allow you to make an informed decision on the suitability of the technology for the intended use case.

There are three major POC validation areas:

Workload – Take a representative portion of an existing workload and test it on Amazon Redshift, such as an extract, transform, and load (ETL) process, reporting, or management
Capability – Demonstrate how a specific Amazon Redshift feature, such as zero-ETL integration with Amazon Redshift, data sharing, or Amazon Redshift Spectrum, can simplify or enhance your overall architecture
Architecture – Understand how Amazon Redshift fits into a new or existing architecture along with other AWS services and tools

A POC is not:

Planning and implementing a large-scale migration
User-facing deployments, such as deploying a configuration for user testing and validation over extended periods (this is more of a pilot)
End-to-end implementation of a use case (this is more of a prototype)

Proof of concept process

For a POC to be successful, it is recommended to follow and apply a well-defined and structured process. For a POC on Amazon Redshift, we recommend a three-phase process of discovery, implementation, and evaluation.

Discovery phase

The discovery phase is considered the most essential among the three phases and the longest. It defines through multiple sessions the scope of the POC and the list of tasks that need to be completed and later evaluated. The scope should contain inputs and data points on the current architecture as well as the target architecture. The following items need to be defined and documented to have a defined scope for the POC:

Current state architecture and its challenges
Business goals and the success criteria of the POC (such as cost, performance, and security) along with their associated priorities
Evaluation criteria that will be used to evaluate and interpret the success criteria, such as service-level agreements (SLAs)
Target architecture (the communication between the services and tools that will be used during the implementation of the POC)
Dataset and the list of tables and schemas

After the scope has been clearly defined, you should proceed with defining and planning the list of tasks that need to be run during the next phase in order to implement the scope. Also, depending on the technical familiarity with the latest developments in Amazon Redshift, a technical enablement session on Amazon Redshift is also highly recommended before starting the implementation phase.

Optionally, a responsibility assignment matrix (RAM) is recommended, especially in large POCs.

Implementation phase

The implementation phase takes the output of the previous phase as input. It consists of the following steps:

Set up the environment by respecting the defined POC architecture.
Complete the implementation tasks such as data ingestion and performance testing.
Collect data metrics and statistics on the completed tasks.
Analyze the data and then optimize as necessary.

Evaluation phase

The evaluation phase is the POC assessment and the final step of the process. It aggregates the implementation results of the preceding phase, interprets them, and evaluates the success criteria described in the discovery phase.

It is recommended to use percentiles instead of averages whenever possible for a better interpretation.

Challenges

In this section, we discuss the major challenges that you may encounter while planning your POC.

Scope

You may face challenges during the discovery phase while defining the scope of the POC, especially in complex environments. You should focus on the crucial requirements and prioritized success criteria that need to be evaluated so you avoid ending up with a small migration project instead of a POC. In terms of technical content (such as data structures, transformation jobs, and reporting queries), make sure to identify and consider as little as possible of the content that will still provide you with all the necessary information at the end of the implementation phase in order to assess the defined success criteria. Additionally, document any assumptions you are making.

Time

A time period should be defined for any POC project to ensure it stays focused and achieves clear results. Without an established time frame, scope creep can occur as requirements shift and unnecessary features get added. This may lead to misleading evaluations about the technology or concept being tested. The duration set for the POC depends on factors like workload complexity and resource availability. If a period such as 3 weeks has been committed to already without accounting for these considerations, the scope and planned content should be scaled to feasibly fit that fixed time period.

Cost

Cloud services operate on a pay-as-you-go model, and estimating costs accurately can be challenging during a POC. Overspending or underestimating resource requirements can impact budget allocations. It’s important to carefully estimate the initial sizing of the Redshift cluster, monitor resource usage closely, and consider setting service limits along with AWS Budget alerts to avoid unexpected expenditures.

Technical

The team running the POC has to be ready for initial technical challenges, especially during environment setup, data ingestion, and performance testing. Each data warehouse technology has its own design and architecture, which sometimes requires some initial tuning at the data structure or query level. This is an expected challenge that needs to be considered in the implementation phase timeline. Having a technical enablement session beforehand can alleviate such hurdles.

Amazon Redshift POC tools and features

In this section, we discuss tools that you can adapt based on the specific requirements and nature of the POC being conducted. It’s essential to choose tools that align with the scope and technologies involved.

AWS Analytics Automation Toolkit

The AWS Analytics Automation Toolkit enables automatic provisioning and integration of not only Amazon Redshift, but database migration services like AWS Database Migration Service (AWS DMS), AWS Schema Conversion Tool (AWS SCT), and Apache JMeter. This toolkit is essential in most POCs because it automates the provisioning of infrastructure and setup of the necessary environment.

AWS SCT

The AWS SCT makes heterogeneous database migrations predictable, secure, and fast by automatically converting the majority of the database code and storage objects to a format that is compatible with the target database. Any objects that can’t be automatically converted are clearly marked so that they can be manually converted to complete the migration.

In the context of a POC, the AWS SCT becomes crucial by streamlining and enhancing the efficiency of the schema conversion process from one database system to another. Given the time-sensitive nature of POCs, the AWS SCT automates the conversion process, facilitating planning, and estimation of time and efforts. Additionally, the AWS SCT plays a role in identifying potential compatibility issues, data mapping challenges, or other hurdles at an early stage of the process.

Furthermore, the database migration assessment report summarizes all the action items for schemas that can’t be converted automatically to your target database. Getting started with AWS SCT is a straightforward process. Also, consider following the best practices for AWS SCT.

Amazon Redshift auto-copy

The Amazon Redshift auto-copy (preview) feature can automate data ingestion from Amazon Simple Storage Service (Amazon S3) to Amazon Redshift with a simple SQL command. COPY statements are invoked and start loading data when Amazon Redshift auto-copy detects new files in the specified S3 prefixes. This also makes sure that end-users have the latest data available in Amazon Redshift shortly after the source files are available.

You can use this feature for the purpose of data ingestion throughout the POC. To learn more about ingesting from files located in Amazon S3 using a SQL command, refer to Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy (preview). The post also shows you how to enable auto-copy using COPY jobs, how to monitor jobs, and considerations and best practices.

Redshift Auto Loader

The custom Redshift Auto Loader framework automatically creates schemas and tables in the target database and continuously loads data from Amazon S3 to Amazon Redshift. You can use this during the data ingestion phase of the POC. Deploying and setting up the Redshift Auto Loader framework to transfer files from Amazon S3 to Amazon Redshift is a straightforward process.

For more information, refer to Migrate from Google BigQuery to Amazon Redshift using AWS Glue and Custom Auto Loader Framework.

Apache JMeter

Apache JMeter is an open-source load testing application written in Java that you can use to load test web applications, backend server applications, databases, and more. In a database context, it’s an extremely valuable tool for repeating benchmark tests in a consistent manner, simulating concurrency workloads, and scalability testing on different database configurations.

When implementing your POC, benchmarking Amazon Redshift is often one of the main components of evaluation and a key source of insight into the price-performance of different Amazon Redshift configurations. With Apache JMeter, you can construct high-quality benchmark tests for Amazon Redshift.

Workload Replicator

If you are currently using Amazon Redshift and looking to replicate your existing production workload or isolate specific workloads in a POC, you can use the Workload Replicator to run them across different configurations of Redshift clusters (ra3.xlplus, ra3.4xl,ra3.16xl, serverless) for performance evaluation and comparison.

This utility has the ability to mimic COPY and UNLOAD workloads and can run the transactions and queries in the same time interval as they’re run in the production cluster. However, it’s crucial to assess the limitations of the utility and AWS Identity and Access Management (IAM) security and compliance requirements.

Node Configuration Comparison utility

If you’re using Amazon Redshift and have stringent SLAs for query performance in your Amazon Redshift cluster, or you want to explore different Amazon Redshift configurations based on the price-performance of your workload, you can use the Amazon Redshift Node Configuration Comparison utility.

This utility helps evaluate performance of your queries using different Redshift cluster configurations in parallel and compares the end results to find the best cluster configuration that meets your need. Similarly, If you’re already using Amazon Redshift and want to migrate from your existing DC2 or DS2 instances to RA3, you can refer to our recommendations on node count and type when upgrading. Before doing that, you can use this utility in your POC to evaluate the new cluster’s performance by replaying your past workloads, which integrates with the Workload Replicator utility to evaluate performance metrics for different Amazon Redshift configurations to meet your needs.

This utility functions in a fully automated manner and has similar limitations as the workload replicator. However, it requires full permissions across various services for the user running the AWS CloudFormation stack.

Use cases

You have the opportunity to explore various functionalities and aspects of Amazon Redshift by defining and selecting a business use case you want to validate during the POC. In this section, we discuss some specific use cases you can explore using a POC.

Functionality evaluation

Amazon Redshift consists of a set of functionalities and options that simplify data pipelines and effortlessly integrate with other services. You can use a POC to test and evaluate one or more of those capabilities before refactoring your data pipeline and implementing them in your ecosystem. Functionalities could be existing features or new ones such as zero-ETL integration, streaming ingestion, federated queries, or machine learning.

Workload isolation

You can use the data sharing feature of Amazon Redshift to achieve workload isolation across diverse analytics use cases and achieve business-critical SLAs without duplicating or moving the data.

Amazon Redshift data sharing enables a producer cluster to share data objects with one or more consumer clusters, thereby eliminating data duplication. This facilitates collaboration across isolated clusters, allowing data to be shared for innovation and analytic services. Sharing can occur at various levels such as databases, schemas, tables, views, columns, and user-defined functions, offering fine-grained access control. It is recommended to use Workload Replicator for performance evaluation and comparison in a workload isolation POC.

The following sample architectures explain workload isolation using data sharing. The first diagram illustrates the architecture before using data sharing.

The following diagram illustrates the architecture with data sharing.

Migrating to Amazon Redshift

If you’re interested in migrating from your existing data warehouse platform to Amazon Redshift, you can try out Amazon Redshift by developing a POC on a selected business use case. In this type of POC, it is recommended to use the AWS Analytics Automation Toolkit for setting up the environment, auto-copy or Redshift Auto Loader for data ingestion, and AWS SCT for schema conversion. When the development is complete, you can perform performance testing using Apache JMeter, which provides data points to measure price-performance and compare results with your existing platform. The following diagram illustrates this process.

Moving to Amazon Redshift Serverless

You can migrate your unpredictable and variable workloads to Amazon Redshift Serverless, which enables you to scale as and when needed and pay as per usage, making your infrastructure scalable and cost-efficient. If you’re migrating your full workload from provisioned (DC2, RA3) to serverless, you can use the Node Configuration Comparison utility for performance evaluation. The following diagram illustrates this workflow.

Conclusion

In a competitive environment, conducting a successful proof of concept is a strategic imperative for businesses aiming to validate the feasibility and effectiveness of new solutions. Amazon Redshift provides you with better price-performance compared to other cloud-centered data warehouses, and a large list of features that help you modernize and optimize your data pipelines. For more details, see Amazon Redshift continues its price-performance leadership.

With the process discussed in this post and by choosing the tools needed for your specific use case, you can accelerate the process of conducting a POC. This allows you to collect the data metrics that can help you understand the potential challenges, benefits, and implications of implementing the proposed solution on a larger scale. A POC provides essential data points that evaluate price-performance as well as feasibility, which plays a vital role in decision-making.

About the Authors

Ziad WALI is an Acceleration Lab Solutions Architect at Amazon Web Services. He has over 10 years of experience in databases and data warehousing, where he enjoys building reliable, scalable, and efficient solutions. Outside of work, he enjoys sports and spending time in nature.

Omama Khurshid is an Acceleration Lab Solutions Architect at Amazon Web Services. She focuses on helping customers across various industries build reliable, scalable, and efficient solutions. Outside of work, she enjoys spending time with her family, watching movies, listening to music, and learning new technologies.

Srikant Das is an Acceleration Lab Solutions Architect at Amazon Web Services. His expertise lies in constructing robust, scalable, and efficient solutions. Beyond the professional sphere, he finds joy in travel and shares his experiences through insightful blogging on social media platforms.

Build an ETL process for Amazon Redshift using Amazon S3 Event Notifications and AWS Step Functions

2023-08-31 Ziad Wali

Post Syndicated from Ziad Wali original https://aws.amazon.com/blogs/big-data/build-an-etl-process-for-amazon-redshift-using-amazon-s3-event-notifications-and-aws-step-functions/

Data warehousing provides a business with several benefits such as advanced business intelligence and data consistency. It plays a big role within an organization by helping to make the right strategic decision at the right moment which could have a huge impact in a competitive market. One of the major and essential parts in a data warehouse is the extract, transform, and load (ETL) process which extracts the data from different sources, applies business rules and aggregations and then makes the transformed data available for the business users.

This process is always evolving to reflect new business and technical requirements, especially when working in an ambitious market. Nowadays, more verification steps are applied to source data before processing them which so often add an administration overhead. Hence, automatic notifications are more often required in order to accelerate data ingestion, facilitate monitoring and provide accurate tracking about the process.

Amazon Redshift is a fast, fully managed, cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you to securely access your data in operational databases, data lakes or third-party datasets with minimal movement or copying. AWS Step Functions is a fully managed service that gives you the ability to orchestrate and coordinate service components. Amazon S3 Event Notifications is an Amazon S3 feature that you can enable in order to receive notifications when specific events occur in your S3 bucket.

In this post we discuss how we can build and orchestrate in a few steps an ETL process for Amazon Redshift using Amazon S3 Event Notifications for automatic verification of source data upon arrival and notification in specific cases. And we show how to use AWS Step Functions for the orchestration of the data pipeline. It can be considered as a starting point for teams within organizations willing to create and build an event driven data pipeline from data source to data warehouse that will help in tracking each phase and in responding to failures quickly. Alternatively, you can also use Amazon Redshift auto-copy from Amazon S3 to simplify data loading from Amazon S3 into Amazon Redshift.

Solution overview

The workflow is composed of the following steps:

A Lambda function is triggered by an S3 event whenever a source file arrives at the S3 bucket. It does the necessary verifications and then classifies the file before processing by sending it to the appropriate Amazon S3 prefix (accepted or rejected).
There are two possibilities:
- If the file is moved to the rejected Amazon S3 prefix, an Amazon S3 event sends a message to Amazon SNS for further notification.
- If the file is moved to the accepted Amazon S3 prefix, an Amazon S3 event is triggered and sends a message with the file path to Amazon SQS.
An Amazon EventBridge scheduled event triggers the AWS Step Functions workflow.
The workflow executes a Lambda function that pulls out the messages from the Amazon SQS queue and generates a manifest file for the COPY command.
Once the manifest file is generated, the workflow starts the ETL process using stored procedure.

The following image shows the workflow.

Prerequisites

Before configuring the previous solution, you can use the following AWS CloudFormation template to set up and create the infrastructure

Give the stack a name, select a deployment VPC and define the master user for the Amazon Redshift cluster by filling in the two parameters MasterUserName and MasterUserPassword.

The template will create the following services:

An S3 bucket
An Amazon Redshift cluster composed of two ra3.xlplus nodes
An empty AWS Step Functions workflow
An Amazon SQS queue
An Amazon SNS topic
An Amazon EventBridge scheduled rule with a 5-minute rate
Two empty AWS Lambda functions
IAM roles and policies for the services to communicate with each other

The names of the created services are usually prefixed by the stack’s name or the word blogdemo. You can find the names of the created services in the stack’s resources tab.

Step 1: Configure Amazon S3 Event Notifications

Create the following four folders in the S3 bucket:

received
rejected
accepted
manifest

In this scenario, we will create the following three Amazon S3 event notifications:

Trigger an AWS Lambda function on the received folder.
Send a message to the Amazon SNS topic on the rejected folder.
Send a message to Amazon SQS on the accepted folder.

To create an Amazon S3 event notification:

Go to the bucket’s Properties tab.
In the Event Notifications section, select Create Event Notification.
Fill in the necessary properties:
- Give the event a name.
- Specify the appropriate prefix or folder (accepted/, rejected/ or received/).
- Select All object create events as an event type.
- Select and choose the destination (AWS lambda, Amazon SNS or Amazon SQS).
  Note: for an AWS Lambda destination, choose the function that starts with ${stackname}-blogdemoVerify_%

At the end, you should have three Amazon S3 events:

An event for the received prefix with an AWS Lambda function as a destination type.
An event for the accepted prefix with an Amazon SQS queue as a destination type.
An event for the rejected prefix with an Amazon SNS topic as a destination type.

The following image shows what you should have after creating the three Amazon S3 events:

Step 2: Create objects in Amazon Redshift

Connect to the Amazon Redshift cluster and create the following objects:

Three schemas:

create schema blogdemo_staging; -- for staging tables
create schema blogdemo_core; -- for target tables
create schema blogdemo_proc; -- for stored procedures

A table in the blogdemo_staging and blogdemo_core schemas:

create table ${schemaname}.rideshare
(
  id_ride bigint not null,
  date_ride timestamp not null,
  country varchar (20),
  city varchar (20),
  distance_km smallint,
  price decimal (5,2),
  feedback varchar (10)
) distkey(id_ride);

A stored procedure to extract and load data into the target schema:

create or replace procedure blogdemo_proc.elt_rideshare (bucketname in varchar(200),manifestfile in varchar (500))
as $$
begin
-- purge staging table
truncate blogdemo_staging.rideshare;

-- copy data from s3 bucket to staging schema
execute 'copy blogdemo_staging.rideshare from ''s3://' + bucketname + '/' + manifestfile + ''' iam_role default delimiter ''|'' manifest;';

-- apply transformation rules here

-- insert data into target table
insert into blogdemo_core.rideshare
select * from blogdemo_staging.rideshare;

end;
$$ language plpgsql;

Set the role ${stackname}-blogdemoRoleRedshift_% as a default role:
1. In the Amazon Redshift console, go to clusters and click on the cluster blogdemoRedshift%.
2. Go to the Properties tab.
3. In the Cluster permissions section, select the role ${stackname}-blogdemoRoleRedshift%.
4. Click on Set default then Make default.

Step 3: Configure Amazon SQS queue

The Amazon SQS queue can be used as it is; this means with the default values. The only thing you need to do for this demo is to go to the created queue ${stackname}-blogdemoSQS% and purge the test messages generated (if any) by the Amazon S3 event configuration. Copy its URL in a text file for further use (more precisely, in one of the AWS Lambda functions).

Step 4: Setup Amazon SNS topic

In the Amazon SNS console, go to the topic ${stackname}-blogdemoSNS%
Click on the Create subscription button.
Choose the blogdemo topic ARN, email protocol, type your email and then click on Create subscription.
Confirm your subscription in your email that you received.

Step 5: Customize the AWS Lambda functions

The following code verifies the name of a file. If it respects the naming convention, it will move it to the accepted folder. If it does not respect the naming convention, it will move it to the rejected one. Copy it to the AWS Lambda function ${stackname}-blogdemoLambdaVerify and then deploy it:

import boto3
import re

def lambda_handler (event, context):
    objectname = event['Records'][0]['s3']['object']['key']
    bucketname = event['Records'][0]['s3']['bucket']['name']
    
    result = re.match('received/rideshare_data_20[0-5][0-9]((0[1-9])|(1[0-2]))([0-2][1-9]|3[0-1])\.csv',objectname)
    targetfolder = ''
    
    if result: targetfolder = 'accepted'
    else: targetfolder = 'rejected'
    
    s3 = boto3.resource('s3')
    copy_source = {
        'Bucket': bucketname,
        'Key': objectname
    }
    target_objectname=objectname.replace('received',targetfolder)
    s3.meta.client.copy(copy_source, bucketname, target_objectname)
    
    s3.Object(bucketname,objectname).delete()
    
    return {'Result': targetfolder}

The second AWS Lambda function ${stackname}-blogdemonLambdaGenerate% retrieves the messages from the Amazon SQS queue and generates and stores a manifest file in the S3 bucket manifest folder. Copy the following content, replace the variable ${sqs_url} by the value retrieved in Step 3 and then click on Deploy.

import boto3
import json
import datetime

def lambda_handler(event, context):

    sqs_client = boto3.client('sqs')
    queue_url='${sqs_url}'
    bucketname=''
    keypath='none'
    
    manifest_content='{\n\t"entries": ['
    
    while True:
        response = sqs_client.receive_message(
            QueueUrl=queue_url,
            AttributeNames=['All'],
            MaxNumberOfMessages=1
        )
        try:
            message = response['Messages'][0]
        except KeyError:
            break
        
        message_body=message['Body']
        message_data = json.loads(message_body)
        
        objectname = message_data['Records'][0]['s3']['object']['key']
        bucketname = message_data['Records'][0]['s3']['bucket']['name']

        manifest_content = manifest_content + '\n\t\t{"url":"s3://' +bucketname + '/' + objectname + '","mandatory":true},'
        receipt_handle = message['ReceiptHandle']

        sqs_client.delete_message(
            QueueUrl=queue_url,
            ReceiptHandle=receipt_handle
        )
        
    if bucketname != '':
        manifest_content=manifest_content[:-1]+'\n\t]\n}'
        s3 = boto3.resource("s3")
        encoded_manifest_content=manifest_content.encode('utf-8')
        current_datetime=datetime.datetime.now()
        keypath='manifest/files_list_'+current_datetime.strftime("%Y%m%d-%H%M%S")+'.manifest'
        s3.Bucket(bucketname).put_object(Key=keypath, Body=encoded_manifest_content)

    sf_tasktoken = event['TaskToken']
    
    step_function_client = boto3.client('stepfunctions')
    step_function_client.send_task_success(taskToken=sf_tasktoken,output='{"manifestfilepath":"' + keypath + '",\"bucketname":"' + bucketname +'"}')

Step 6: Add tasks to the AWS Step Functions workflow

Create the following workflow in the state machine ${stackname}-blogdemoStepFunctions%.

If you would like to accelerate this step, you can drag and drop the content of the following JSON file in the definition part when you click on Edit. Make sure to replace the three variables:

${GenerateManifestFileFunctionName} by the ${stackname}-blogdemoLambdaGenerate% arn.
${RedshiftClusterIdentifier} by the Amazon Redshift cluster identifier.
${MasterUserName} by the username that you defined while deploying the CloudFormation template.

Step 7: Enable Amazon EventBridge rule

Enable the rule and add the AWS Step Functions workflow as a rule target:

Go to the Amazon EventBridge console.
Select the rule created by the Amazon CloudFormation template and click on Edit.
Enable the rule and click Next.
You can change the rate if you want. Then select Next.
Add the AWS Step Functions state machine created by the CloudFormation template blogdemoStepFunctions% as a target and use an existing role created by the CloudFormation template ${stackname}-blogdemoRoleEventBridge%
Click on Next and then Update rule.

Test the solution

In order to test the solution, the only thing you should do is upload some csv files in the received prefix of the S3 bucket. Here are some sample data; each file contains 1000 rows of rideshare data.

If you upload them in one click, you should receive an email because the ridesharedata2022.csv does not respect the naming convention. The other three files will be loaded in the target table blogdemo_core.rideshare. You can check the Step Functions workflow to verify that the process finished successfully.

Clean up

Go to the Amazon EventBridge console and delete the rule ${stackname}-blogdemoevenbridge%.
In the Amazon S3 console, select the bucket created by the CloudFormation template ${stackname}-blogdemobucket% and click on Empty.
Go to Subscriptions in the Amazon SNS console and delete the subscription created in Step 4.
In the AWS CloudFormation console, select the stack and delete it.

Conclusion

In this post, we showed how different AWS services can be easily implemented together in order to create an event-driven architecture and automate its data pipeline, which targets the cloud data warehouse Amazon Redshift for business intelligence applications and complex queries.

About the Author

Ziad WALI is an Acceleration Lab Solutions Architect at Amazon Web Services. He has over 10 years of experience in databases and data warehousing where he enjoys building reliable, scalable and efficient solutions. Outside of work, he enjoys sports and spending time in nature.

Proof of concept overview

Proof of concept process

Discovery phase

Implementation phase

Evaluation phase

Challenges

Scope

Time

Cost

Technical

Amazon Redshift POC tools and features

AWS Analytics Automation Toolkit

AWS SCT

Amazon Redshift auto-copy

Redshift Auto Loader

Apache JMeter

Workload Replicator

Node Configuration Comparison utility

Use cases

Functionality evaluation

Workload isolation

Migrating to Amazon Redshift

Moving to Amazon Redshift Serverless

Conclusion

About the Authors

Solution overview

Prerequisites

Step 1: Configure Amazon S3 Event Notifications

Step 2: Create objects in Amazon Redshift

Step 3: Configure Amazon SQS queue

Step 4: Setup Amazon SNS topic

Step 5: Customize the AWS Lambda functions

Step 6: Add tasks to the AWS Step Functions workflow

Step 7: Enable Amazon EventBridge rule

Test the solution

Clean up

Conclusion

About the Author

The collective thoughts of the interwebz