All posts by Jan Michael Go Tan

Access a VPC-hosted Amazon OpenSearch Service domain with SAML authentication using AWS Client VPN

2026-01-26 Jan Michael Go Tan

Post Syndicated from Jan Michael Go Tan original https://aws.amazon.com/blogs/big-data/access-a-vpc-hosted-amazon-opensearch-service-domain-with-saml-authentication-using-aws-client-vpn/

Customers often want to deploy Amazon OpenSearch Service domains in virtual private clouds (VPC) and use single sign-on (SSO) with SAML for access control to enhance security. However, setting this up can be challenging.

In this post, we explore different OpenSearch Service authentication methods and network topology considerations. Then we show how to build an architecture to access an OpenSearch Service domain hosted in a VPC using AWS Client VPN, AWS Transit Gateway, and AWS IAM Identity Center.

Solution overview

The following diagram illustrates the solution architecture.

The end-user authenticates with IAM Identity Center and connects to the AWS environment from their browser through Client VPN. The traffic is routed from the VPN VPC to the database VPC where the OpenSearch service endpoints are deployed. The user then authenticates to OpenSearch Service through IAM Identity Center. This architecture provides a scalable, enterprise-grade solution that avoids using bastion hosts while making sure only authorized users can access your OpenSearch Service domains through a secure VPN connection. In the following sections, we walk through the steps to set up IAM Identity Center, configure Transit Gateway to facilitate communication between VPCs, and configure SAML-based authentication using IAM Identity Center for both OpenSearch Service and VPN access. Prior experience setting up Client VPN, IAM Identity Center, and Transit Gateway would be beneficial but is not necessary to follow along with this post.

OpenSearch Service authentication methods and SAML

OpenSearch Service supports multiple authentication methods. You can use AWS Identity and Access Management (IAM) to call the OpenSearch Service configuration API (for details, see Making and signing OpenSearch Service requests). However, this doesn’t give you access to the visual dashboard. To access the visual dashboard and call the OpenSearch Service configuration API, you can use the OpenSearch Service built-in internal user database or Amazon Cognito for authentication and user management features. However, these options use separate user pools, which adds additional security and management overhead when adding and removing users.

Therefore, many customers choose to use SAML federation to integrate OpenSearch Service authentication with their existing identity providers like Entra ID, Okta, or JumpCloud. For this post, we use the IAM Identity Center directory as our identity source. One limitation of this approach is that it only supports identity provider-initiated authentication. This means that users must log in through the IAM Identity Center portal and then access their OpenSearch Service dashboard from there.

Private network topology options for OpenSearch Service

When deploying OpenSearch Service domains in a private VPC, organizations must establish secure and reliable network connectivity to access their OpenSearch Service domains. AWS offers several networking solutions that can be implemented individually or in combination to meet specific access requirements. These options include Transit Gateway for centralized network management, AWS Direct Connect or AWS Site-to-Site VPN for on-premises connectivity, and Client VPN for secure remote access. Each solution provides unique benefits and can be combined to meet different organizational needs, security requirements, and performance expectations.

AWS Transit Gateway

Transit Gateway functions as a cloud router that simplifies network connectivity by acting as a central hub for connecting VPCs and on-premises networks. Implementing Transit Gateway with OpenSearch Service enables consolidated access to your OpenSearch Service domain across multiple VPCs and AWS accounts. Through Transit Gateway route tables, you can precisely control traffic flow between attached networks. It supports transitive routing between VPCs and on-premises networks, significantly reducing the number of peering connections needed to access your OpenSearch Service domain. This centralized approach is a common pattern used by customers, which makes network management scalable as your infrastructure grows.

AWS Client VPN

With Client VPN, you can securely access your private OpenSearch Service domain through a managed OpenVPN-based solution. Using Client VPN removes the need to use a bastion host or proxy server to access an OpenSearch Service domain, reducing your management burden and improving security. Client VPN supports both certificate-based and SAML-based authentication. Client VPN endpoints can be associated with multiple subnets to provide high availability. The service includes comprehensive security features such as connection logging and security group controls.

For more information on VPC connectivity options, refer to the AWS Direct Connect whitepaper.

Combining Client VPN with Transit Gateway provides a scalable and flexible way to access an OpenSearch Service domain in a private VPC. In the subsequent sections, we walk you through how to integrate the various services.

Prerequisites

If you haven’t yet set up IAM Identity Center, refer to Enable IAM Identity Center to enable it. Both organization instances and account instances will work. The Identity Center instance must be deployed in the same AWS Region as your OpenSearch Service domain.

After you set up IAM Identity Center, complete the following steps to create an IAM Identity Center group:

On the IAM Identity Center console, choose Groups in the navigation pane.
Choose Create group and create a group (for this example, we name the group vpn_users.
After you create the group, choose the group name to open its details page.
Locate the group ID under General information. Save this in a text editor.
Create a user (or multiple users) and assign them to the vpn_users group. This can be done directly through the user creation flow or after creating the user.

Set up the initial network topology

For this post, we use the network topology shown in the following diagram. One VPC hosts the client VPN endpoint with CIDR range 10.0.0.0/16 and a separate VPC with CIDR range 10.1.0.0/16 that hosts our OpenSearch Service nodes. The two VPCs are connected with Transit Gateway. The CIDR ranges in your environment may vary. The only requirement is that they can’t overlap.

Complete the following steps to create the two VPCs using Amazon Virtual Private Cloud (Amazon VPC):

On the Amazon VPC console, choose Create VPC.
Choose VPC and more.
For this post, name the VPC VPN-VPC and use 10.0.0.0/16 for the IPv4 CIDR block.
Choose 3 for the number of Availability Zones.
Choose 0 for the number of public subnets.
Choose 3 for the number of private subnets.
Choose None for the number of NAT gateways.
Choose None for the number of VPC endpoints.
Repeat these steps to create the second VPC for the OpenSearch Service domain. Keep the same configuration settings except for the following:
1. Name: Database-VPC
2. IPv4 CIDR Block: 10.1.0.0/16

Configure Transit Gateway

Follow the instructions in Create an AWS Transit Gateway using the Amazon VPC Console to create a transit gateway and attach your VPCs to it.

Next, you must update each VPC route table to facilitate connectivity to the OpenSearch Service domain.

On the Amazon VPC console, choose Route tables in the navigation pane.
For VPN-VPC, add routes on the subnets where the Client VPN endpoints are attached. The route is 10.1.0.0/16 using Transit Gateway. This route allows VPN users to reach Database-VPC.
For Database-VPC, add routes on the subnets of the OpenSearch Service domain endpoint. The route is 10.0.0.0/16 using Transit Gateway. This route allows responses from Database-VPC back to reach the VPN users.

Next, you must update the Transit Gateway Security Group Referencing support configuration. This allows the OpenSearch Service domain’s security group to open port 443 to only the Client VPN security group. This makes applying least privilege simpler.
On the Transit Gateway console, select the transit gateway you’re using.
On the Actions menu, choose Modify transit gateway.
Select Security Group Referencing support and choose Modify transit gateway.

Configure Client VPN authentication

Client VPN can be associated to multiple VPC subnets for high availability. Client VPN supports multiple client authentication methods. For this post, we use SAML-based authentication with IAM Identity Center.

To set up SAML-based authentication with IAM Identity Center, follow the instructions in the following sections. For more details, refer to Authenticate AWS Client VPN users with AWS IAM Identity Center. Deploy and associate the Client VPN endpoint with VPN-VPC.

Configure Client VPN access to database VPC

During the initial setup of the Client VPN endpoint, you defined authorization rules that authorized the VPN_users group to access the VPN-VPC network, which is 10.0.0.0/16.Complete the following steps to add connectivity to database-VPC:

On the Amazon VPC console, choose Client VPC endpoints in the navigation pane.
Select the endpoint you created.
In the Authorization rules section, choose Add authorization rules.
For Destination network to enable access, enter 10.1.0.0/16 (this is the database VPC).
For Grant access to, select Allow access to all users.
Choose Add authorization rule.

After you create the authorization rule, the user now has access to that CIDR range. Next, you add an entry in the Client VPN endpoint’s route table to provide reachability from a network perspective.
On the Client VPN endpoints page, select the endpoint you just created.
In the Route table section, choose Create route.
For Route destination, enter the CIDR range for Database-VPC (10.1.0.0/16).
For Subnet ID for target network association, choose a subnet ID.
Choose Create route.

You should see the new route in the “Creating” state. After it has reached the “Active” state, VPN users will have a network path to the database VPC to be able to reach the OpenSearch Service domain.

Configure Client VPN application on your client

Complete the following steps to configure the Client VPN application to your client:

Download the relevant installer for Client VPN for Desktop and install Client VPN.
Download and prepare the Client VPN endpoint file.
Open the Client VPN application.
Choose Manage Profile, then choose Add Profile.
Enter a display name and upload the VPN configuration file.
Choose Add Profile.

Set up federation with IAM Identity Center with OpenSearch Service

Complete the following steps to set up federation with IAM Identity Center with OpenSearch Service:

Create an OpenSearch Service domain in the database VPC.
Set up the SAML integration between OpenSearch Service and IAM Identity Center. Assign the same groups that you assigned to the VPN custom application to the OpenSearch Service custom application.
Modify the security group associated with the OpenSearch Service domain to allow access from the Client VPN subnet.
Modify the security group of Client VPN and add the following entry:
1. Type: HTTPS
2. Source: Use Custom and reference the security group of the OpenSearch Service domain

Test the end-to-end flow

Now you can test the entire flow end-to-end:

Run Client VPN on your local machine. Use the profile that you previously configured.
The client will prompt you to authenticate with IAM Identity Center. After authentication, you will see the message “Authentication details received, processing details. You may close this window at any time.”
Access your IAM Identity Center access portal URL (this can be found on the IAM Identity Center console, under Dashboard). Sign in as a user that has been assigned to the OpenSearch Service custom application in the previous step.
After authentication, choose the Applications tab in AWS Access Portal and choose the OpenSearch Service application.

This should redirect you to the OpenSearch Service Dashboards page with the role that you assigned.

Clean up

After you test the solution, delete the resources you created to avoid incurring future charges:

Delete the OpenSearch Service domain and the SAML application, users, and groups in IAM Identity Center.
Delete the client VPN endpoints that you created and remove the routing rules from Transit Gateway.

Conclusion

In this post, we discussed the networking options for securely accessing an OpenSearch Service domain deployed in a private VPC through services like Transit Gateway, Client VPN, and Site-to-Site VPN. We also discussed how to use IAM Identity Center for authentication and authorization, helping you simplify identity management for OpenSearch Service. If you have feedback about this post, provide it in the comments section.

About the authors

Use the AWS CDK with the Data Solutions Framework to provision and manage Amazon Redshift Serverless

2024-09-04 Jan Michael Go Tan

Post Syndicated from Jan Michael Go Tan original https://aws.amazon.com/blogs/big-data/use-the-aws-cdk-with-the-data-solutions-framework-to-provision-and-manage-amazon-redshift-serverless/

In February 2024, we announced the release of the Data Solutions Framework (DSF), an opinionated open source framework for building data solutions on AWS. DSF is built using the AWS Cloud Development Kit (AWS CDK) to package infrastructure components into L3 AWS CDK constructs on top of AWS services. L3 constructs are implementations of common technical patterns and create multiple resources that are configured to work with each other.

In this post, we demonstrate how to use the AWS CDK and DSF to create a multi-data warehouse platform based on Amazon Redshift Serverless. DSF simplifies the provisioning of Redshift Serverless, initialization and cataloging of data, and data sharing between different data warehouse deployments. Using a programmatic approach with the AWS CDK and DSF allows you to apply GitOps principles to your analytics workloads and realize the following benefits:

You can deploy using continuous integration and delivery (CI/CD) pipelines, including the definitions of Redshift objects (databases, tables, shares, and so on)
You can roll out changes consistently across multiple environments
You can bootstrap data warehouses (table creation, ingestion of data, and so on) using code and use version control to simplify the setup of testing environments
You can test changes before deployment using AWS CDK built-in testing capabilities

In addition, DSF’s Redshift Serverless L3 constructs provide a number of built-in capabilities that can accelerate development while helping you follow best practices. For example:

Running extract, transform, and load (ETL) jobs to and from Amazon Redshift is more straightforward because an AWS Glue connection resource is automatically created and configured. This means data engineers don’t have to configure this resource and can use it right away with their AWS Glue ETL jobs.
Similarly, with discovery of data inside Amazon Redshift, DSF provides a convenient method to configure an AWS Glue crawler to populate the AWS Glue Data Catalog for ease of discovery as well as ease of referencing tables when creating ETL jobs. The configured AWS Glue crawler uses an AWS Identity and Access Management (IAM) role that follows least privilege.
Sharing data between Redshift data warehouses is a common approach to improve collaboration between lines of business without duplicating data. DSF provides convenient methods for the end-to-end flow for both data producer and consumer.

Solution overview

The solution demonstrates a common pattern where a data warehouse is used as a serving layer for business intelligence (BI) workloads on top of data lake data. The source data is stored in Amazon Simple Storage Service (Amazon S3) buckets, then ingested into a Redshift producer data warehouse to create materialized views and aggregate data, and finally shared with a Redshift consumer running BI queries from the end-users. The following diagram illustrates the high-level architecture.

Solution Overview

In the post, we use Python for the example code. DSF also supports TypeScript.

Prerequisites

Because we’re using the AWS CDK, complete the steps in Getting Started with the AWS CDK before you implement the solution.

Initialize the project and provision a Redshift Serverless namespace and workgroup

Let’s start with initializing the project and including DSF as a dependency. You can run this code in your local terminal, or you can use AWS Cloud9:

mkdir dsf-redshift-blog && cd dsf-redshift-blog
cdk init --language python

Open the project folder in your IDE and complete the following steps:

Open the app.py file.
In this file, make sure to uncomment the first env This configures the AWS CDK environment depending on the AWS profile used during the deployment.
Add a configuration flag in the cdk.context.json file at the root of the project (if it doesn’t exist, create the file):
```
{  
    "@data-solutions-framework-on-aws/removeDataOnDestroy": true 
}
```

Setting the @data-solutions-framework-on-aws/removeDataOnDestroy configuration flag to true makes sure resources that have the removal_policy parameter set to RemovalPolicy.DESTROY are destroyed when the AWS CDK stack is deleted. This is a guardrail DSF uses to prevent accidentally deleting data.

Now that the project is configured, you can start adding resources to the stack.

Navigate to the dsf_redshift_blog folder and open the dsf_redshift_blog_stack.py file.

This is where we configure the resources to be deployed.

To get started building the end-to-end demo, add the following import statements at the top of the file, which allows you to start defining the resources from both the AWS CDK core library as well as DSF:

from aws_cdk import (
    RemovalPolicy,
    Stack
)

from aws_cdk.aws_s3 import Bucket
from aws_cdk.aws_iam import Role, ServicePrincipal
from constructs import Construct
from cdklabs import aws_data_solutions_framework as dsf

We use several DSF-specific constructs to build the demo:

DataLakeStorage – This creates three S3 buckets, named Bronze, Silver, and Gold, to represent the different data layers.
S3DataCopy – This manages the copying of data from one bucket to another bucket.
RedshiftServerlessNamespace – This creates a Redshift Serverless namespace where database objects and users are stored.
RedshiftServerlessWorkgroup – This creates a Redshift Serverless workgroup that contains compute- and network-related configurations for the data warehouse. This is also the entry point for several convenient functionalities that DSF provides, such as cataloging of Redshift tables, running SQL statements as part of the AWS CDK (such as creating tables, data ingestion, merging of tables, and more), and sharing datasets across different Redshift clusters without moving data.

Now that you have imported the libraries, create a set of S3 buckets following the medallion architecture best practices with bronze, silver, and gold data layers.

The high-level definitions of each layer are as follows:

Bronze represents raw data; this is where data from various source systems lands. No schema is needed.
Silver is cleaned and potentially augmented data. The schema is enforced in this layer.
Gold is data that’s further refined and aggregated to serve a specific business need.

Using the DataLakeStorage construct, you can create these three S3 buckets with the following best practices:

Encryption at rest through AWS Key Management Service (AWS KMS) is turned on
SSL is enforced
The use of S3 bucket keys is turned on

There’s a default S3 lifecycle rule defined to delete incomplete multipart uploads after 1 day

data_lake = dsf.storage.DataLakeStorage(self,
    'DataLake',
    removal_policy=RemovalPolicy.DESTROY)

After you create the S3 buckets, copy over the data using the S3DataCopy For this demo, we land the data in the Silver bucket because it’s already cleaned:

source_bucket = Bucket.from_bucket_name(self, 
    'SourceBucket', 
    bucket_name='redshift-immersionday-labs')

data_copy = dsf.utils.S3DataCopy(self,
    'SourceData', 
    source_bucket=source_bucket, 
    source_bucket_prefix='data/amazon-reviews/', 
    source_bucket_region='us-west-2', 
    target_bucket=data_lake.silver_bucket, 
    target_bucket_prefix='silver/amazon-reviews/')

In order for Amazon Redshift to ingest the data in Amazon S3, it needs an IAM role with the right permissions. This role will be associated with the Redshift Serverless namespace that you create next.
```
lake_role = Role(self, 
    'LakeRole', 
    assumed_by=ServicePrincipal('redshift.amazonaws.com'))

data_lake.silver_bucket.grant_read(lake_role)
```
To provision Redshift Serverless, configure two resources: a namespace and a workgroup. DSF provides L3 constructs for both:
1. RedshiftServerlessNamespace
2. RedshiftServerlessWorkgroup
Both constructs follow security best practices, including:
- The default virtual private cloud (VPC) uses private subnets (with public access disabled).
- Data is encrypted at rest through AWS KMS with automatic key rotation.
- Admin credentials are stored in AWS Secrets Manager with automatic rotation managed by Amazon Redshift.
- A default AWS Glue connection is automatically created using private connectivity. This can be used by AWS Glue crawlers as well as AWS Glue ETL jobs to connect to Amazon Redshift.
The RedshiftServerlessWorkgroup construct is the main entry point for other capabilities, such as integration with the AWS Glue Data Catalog, Redshift Data API, and Data Sharing API.
1. In the following example, use the defaults provided by the construct and associate the IAM role that you created earlier to give Amazon Redshift access to the data lake for data ingestion:
```
namespace = dsf.consumption.RedshiftServerlessNamespace(self, 
    'Namespace', 
    db_name='defaultdb', 
    name='producer', 
    removal_policy=RemovalPolicy.DESTROY, 
    default_iam_role=lake_role)

workgroup = dsf.consumption.RedshiftServerlessWorkgroup(self, 
    'Workgroup', 
    name='producer', 
    namespace=namespace, 
    removal_policy=RemovalPolicy.DESTROY)
```

Create tables and ingest data

To create a table, you can use the runCustomSQL method in the RedshiftServerlessWorkgroup construct. This method allows you to run arbitrary SQL statements when the resource is being created (such as create table or create materialized view) and when it’s being deleted (such as drop table or drop materialized view).

Add the following code after the RedshiftServerlessWorkgroup instantiation:

create_amazon_reviews_table = workgroup.run_custom_sql('CreateAmazonReviewsTable', 
    database_name='defaultdb', 
    sql='CREATE TABLE amazon_reviews (marketplace character varying(16383) ENCODE lzo, customer_id character varying(16383) ENCODE lzo, review_id character varying(16383) ENCODE lzo, product_id character varying(16383) ENCODE lzo, product_parent character varying(16383) ENCODE lzo, product_title character varying(16383) ENCODE lzo, star_rating integer ENCODE az64, helpful_votes integer ENCODE az64, total_votes integer ENCODE az64, vine character varying(16383) ENCODE lzo, verified_purchase character varying(16383) ENCODE lzo, review_headline character varying(max) ENCODE lzo, review_body character varying(max) ENCODE lzo, review_date date ENCODE az64, year integer ENCODE az64) DISTSTYLE AUTO;', 
    delete_sql='drop table amazon_reviews')

load_amazon_reviews_data = workgroup.ingest_data('amazon_reviews_ingest_data', 
    'defaultdb', 
    'amazon_reviews', 
    data_lake.silver_bucket, 
    'silver/amazon-reviews/', 
    'FORMAT parquet')

load_amazon_reviews_data.node.add_dependency(create_amazon_reviews_table)
load_amazon_reviews_data.node.add_dependency(data_copy)

Given the asynchronous nature of some of the resource creation, we also enforce dependencies between some resources; otherwise, the AWS CDK would try to create them in parallel to accelerate the deployment. The preceding dependency statements establish the following:

Before you load the data, the S3 data copy is complete, so the data exists in the source bucket of the ingestion
Before you load the data, the target table has been created in the Redshift namespace

Bootstrapping example (materialized views)

The workgroup.run_custom_sql() method provides flexibility in how you can bootstrap your Redshift data warehouse using the AWS CDK. For example, you can create a materialized view to improve the queries’ performance by pre-aggregating data from the Amazon reviews:

materialized_view = workgroup.run_custom_sql('MvProductAnalysis',
    database_name='defaultdb',
    sql=f'''CREATE MATERIALIZED VIEW mv_product_analysis AS SELECT review_date, product_title, COUNT(1) AS review_total, SUM(star_rating) AS rating FROM amazon_reviews WHERE marketplace = 'US' GROUP BY 1,2;''',
    delete_sql='drop materialized view mv_product_analysis')

materialized_view.node.add_dependency(load_amazon_reviews_data)

Catalog tables in Amazon Redshift

The deployment of RedshiftServerlessWorkgroup automatically creates an AWS Glue connection resource that can be used by AWS Glue crawlers and AWS Glue ETL jobs. This is directly exposed from the workgroup construct through the glue_connection property. Using this connection, the workgroup construct exposes a convenient method to catalog the tables inside the associated Redshift Serverless namespace. The following an example code:

workgroup.catalog_tables('DefaultDBCatalog', 'mv_product_analysis')

This single line of code creates a database in the Data Catalog named mv_product_analysis and the associated crawler with the IAM role and network configuration already configured. By default, it crawls all the tables inside the public schema in the default database indicated when the Redshift Serverless namespace was created. To override this, the third parameter in the catalogTables method allows you to define a pattern on what to crawl (see the JDBC data store in the include path).

You can run the crawler using the AWS Glue console or invoke it using the SDK, AWS Command Line Interface (AWS CLI), or AWS CDK using AwsCustomResource.

Data sharing

DSF supports Redshift data sharing for both sides (producers and consumers) as well as same account and cross-account scenarios. Let’s create another Redshift Serverless namespace and workgroup to demonstrate the interaction:

namespace2 = dsf.consumption.RedshiftServerlessNamespace(self, 
    "Namespace2", 
    db_name="defaultdb", 
    name="consumer", 
    default_iam_role=lake_role, 
    removal_policy=RemovalPolicy.DESTROY)

workgroup2 = dsf.consumption.RedshiftServerlessWorkgroup(self, 
    "Workgroup2", 
    name="consumer", 
    namespace=namespace2, 
    removal_policy=RemovalPolicy.DESTROY)

For producers

For producers, complete the following steps:

Create the new share and populate the share with the schema or tables:

data_share = workgroup.create_share('DataSharing', 
    'defaultdb', 
    'defaultdbshare', 
    'public', ['mv_product_analysis'])

data_share.new_share_custom_resource.node.add_dependency(materialized_view)

Create access grants:

To grant to a cluster in the same account:

share_grant = workgroup.grant_access_to_share("GrantToSameAccount", 
    data_share, 
    namespace2.namespace_id)

share_grant.resource.node.add_dependency(data_share.new_share_custom_resource)
share_grant.resource.node.add_dependency(namespace2)

To grant to a different account:

workgroup.grant_access_to_share('GrantToDifferentAccount', 
    tpcdsShare, 
    undefined, 
    '<ACCOUNT_ID_OF_CONSUMER>', 
    true)

The last parameter in the grant_access_to_share method allows to automatically authorize the cross-account access on the data share. Omitting this parameter would default to no authorization; which means a Redshift administrator needs to authorize the cross-account share either using the AWS CLI, SDK, or Amazon Redshift console.

For consumers

For the same account share, to create the database from the share, use the following code:

create_db_from_share = workgroup2.create_database_from_share("CreateDatabaseFromShare", 
    "marketing", 
    data_share.data_share_name, 
    data_share.producer_namespace)

create_db_from_share.resource.node.add_dependency(share_grant.resource)
create_db_from_share.resource.node.add_dependency(workgroup2)

For cross-account grants, the syntax is similar, but you need to indicate the producer account ID:

consumerWorkgroup.create_database_from_share('CreateCrossAccountDatabaseFromShare', 
    'tpcds', 
    <PRODUCER_SHARE_NAME>, 
    <PRODUCER_NAMESPACE_ID>, 
    <PRODUCER_ACCOUNT_ID>)

To see the full working example, follow the instructions in the accompanying GitHub repository.

Deploy the resources using the AWS CDK

To deploy the resources, run the following code:

cdk deploy

You can review the resources created, as shown in the following screenshot.

Confirm the changes for the deployment to start. Wait a few minutes for the project to be deployed; you can keep track of the deployment using the AWS CLI or the AWS CloudFormation console.

When the deployment is complete, you should see two Redshift workgroups (one producer and one consumer).

Using Amazon Redshift Query Editor v2, you can log in to the producer Redshift workgroup using Secrets Manager, as shown in the following screenshot.

Producer QEV2 Login

After you log in, you can see the tables and views that you created using DSF in the defaultdb database.

QEv2 Tables

Log in to the consumer Redshift workgroup to see the shared dataset from the producer Redshift workgroup under the marketing database.

Clean up

You can run cdk destroy in your local terminal to delete the stack. Because you marked the constructs with a RemovalPolicy.DESTROY and configured DSF to remove data on destroy, running cdk destroy or deleting the stack from the AWS CloudFormation console will clean up the provisioned resources.

Conclusion

In this post, we demonstrated how to use the AWS CDK along with the DSF to manage Redshift Serverless as code. Codifying the deployment of resources helps provide consistency across multiple environments. Aside from infrastructure, DSF also provides capabilities to bootstrap (table creation, ingestion of data, and more) Amazon Redshift and manage objects, all from the AWS CDK. This means that changes can be version controlled, reviewed, and even unit tested.

In addition to Redshift Serverless, DSF supports other AWS services, such as Amazon Athena, Amazon EMR, and many more. Our roadmap is publicly available, and we look forward to your feature requests, contributions, and feedback.

You can get started using DSF by following our quick start guide.

About the authors

Jan Michael Go Tan is a Principal Solutions Architect for Amazon Web Services. He helps customers design scalable and innovative solutions with the AWS Cloud.

Vincent Gromakowski is an Analytics Specialist Solutions Architect at AWS where he enjoys solving customers’ analytics, NoSQL, and streaming challenges. He has a strong expertise on distributed data processing engines and resource orchestration platform.

Use an event-driven architecture to build a data mesh on AWS

2022-11-15 Jan Michael Go Tan

Post Syndicated from Jan Michael Go Tan original https://aws.amazon.com/blogs/big-data/use-an-event-driven-architecture-to-build-a-data-mesh-on-aws/

In this post, we take the data mesh design discussed in Design a data mesh architecture using AWS Lake Formation and AWS Glue, and demonstrate how to initialize data domain accounts to enable managed sharing; we also go through how we can use an event-driven approach to automate processes between the central governance account and data domain accounts (producers and consumers). We build a data mesh pattern from scratch as Infrastructure as Code (IaC) using AWS CDK and use an open-source self-service data platform UI to share and discover data between business units.

The key advantage of this approach is being able to add actions in response to data mesh events such as permission management, tag propagation, search index management, and to automate different processes.

Before we dive into it, let’s look at AWS Analytics Reference Architecture, an open-source library that we use to build our solution.

AWS Analytics Reference Architecture

AWS Analytics Reference Architecture (ARA) is a set of analytics solutions put together as end-to-end examples. It regroups AWS best practices for designing, implementing, and operating analytics platforms through different purpose-built patterns, handling common requirements, and solving customers’ challenges.

ARA exposes reusable core components in an AWS CDK library, currently available in Typescript and Python. This library contains AWS CDK constructs (L3) that can be used to quickly provision analytics solutions in demos, prototypes, proofs of concept, and end-to-end reference architectures.

The following table lists data mesh specific constructs in the AWS Analytics Reference Architecture library.

Construct Name	Purpose
CentralGovernance	Creates an Amazon EventBridge event bus for central governance account that is used to communicate with data domain accounts (producer/consumer). Creates workflows to automate data product registration and sharing.
DataDomain	Creates an Amazon EventBridge event bus for data domain account (producer/consumer) to communicate with central governance account. It creates data lake storage (Amazon S3), and workflow to automate data product registration. It also creates a workflow to populate AWS Glue Catalog metadata for newly registered data product.

You can find AWS CDK constructs for the AWS Analytics Reference Architecture on Construct Hub.

In addition to ARA constructs, we also use an open-source Self-service data platform (User Interface). It is built using AWS Amplify, Amazon DynamoDB, AWS Step Functions, AWS Lambda, Amazon API Gateway, Amazon EventBridge, Amazon Cognito, and Amazon OpenSearch. The frontend is built with React. Through the self-service data platform you can: 1) manage data domains and data products, and 2) discover and request access to data products.

Central Governance and data sharing

For the governance of our data mesh, we will use AWS Lake Formation. AWS Lake Formation is a fully managed service that simplifies data lake setup, supports centralized security management, and provides transactional access on top of your data lake. Moreover, it enables data sharing across accounts and organizations. This centralized approach has a number of key benefits, such as: centralized audit; centralized permission management; and centralized data discovery. More importantly, this allows organizations to gain the benefits of centralized governance while taking advantage of the inherent scaling characteristics of decentralized data product management.

There are two ways to share data resources in Lake Formation: 1) Named Based Access Control (NRAC), and 2) Tag-Based Access Control (LF-TBAC). NRAC uses AWS Resource Access Manager (AWS RAM) to share data resources across accounts. Those are consumed via resource links that are based on created resource shares. Tag-Based Access Control (LF-TBAC) is another approach to share data resources in AWS Lake Formation, that defines permissions based on attributes. These attributes are called LF-tags. You can read this blog to learn about LF-TBAC in the context of data mesh.

The following diagram shows how NRAC and LF-TBAC data sharing works. In this example, data domain is registered as a node on mesh and therefore we create two databases in the central governance account. NRAC database is shared with data domain via AWS RAM. Access to data products that we register in this database will be handled through NRAC. LF-TBAC database is tagged with data domain N line of business (LOB) LF-tag: <LOB:N>. LOB tag is automatically shared with data domain N account and therefore database is available in that account. Access to Data Products in this database will be handled through LF-TBAC.

In our solution we will demonstrate both NRAC and LF-TBAC approaches. With the NRAC approach, we will build up an event-based workflow that would automatically accept RAM share in the data domain accounts and automate the creation of the necessary metadata objects (eg. local database, resource links, etc). While with the LF-TBAC approach, we rely on permissions associated with the shared LF-Tags to allow producer data domains to manage their data products, and consumer data domains read access to the relevant data products associated with the LF-Tags that they requested access to.

We use CentralGovernance construct from ARA library to build a central governance account. It creates an EventBridge event bus to enable communication with data domain accounts that register as nodes on mesh. For each registered data domain, specific event bus rules are created that route events towards that account. Central governance account has a central metadata catalog that allows for data to be stored in different data domains, as opposed to a single central lake. For each registered data domain, we create two separate databases in central governance catalog to demonstrate both NRAC and LF-TBAC data sharing. CentralGovernance construct creates workflows for data product registration and data product sharing. We also deploy a self-service data platform UI to enable good user experience to manage data domains, data products, and to simplify data discovery and sharing.

A data domain: producer and consumer

We use DataDomain construct from ARA library to build a data domain account that can be either producer, consumer, or both. Producers manage the lifecycle of their respective data products in their own AWS accounts. Typically, this data is stored in Amazon Simple Storage Service (Amazon S3). DataDomain construct creates a data lake storage with cross-account bucket policy that enables central governance account to access the data. Data is encrypted using AWS KMS, and central governance account has a permission to use the key. Config secret in AWS Secrets Manager contains all the necessary information to register data domain as a node on mesh in central governance. It includes: 1) data domain name, 2) S3 location that holds data products, and 3) encryption key ARN. DataDomain construct also creates data domain and crawler workflows to automate data product registration.

Creating an event-driven data mesh

Data mesh architectures typically require some level of communication and trust policy management to maintain least privileges of the relevant principals between the different accounts (for example, central governance to producer, central governance to consumer). We use event-driven approach via EventBridge to securely forward events from one event bus to event bus in another account while maintaining the least privilege access. When we register data domain to central governance account through the self-service data platform UI, we establish bi-directional communication between the accounts via EventBridge. Domain registration process also creates database in the central governance catalog to hold data products for that particular domain. Registered data domain is now a node on mesh and we can register new data products.

The following diagram shows data product registration process:

Starts Register Data Product workflow that creates an empty table (the schema is managed by the producers in their respective producer account). This workflow also grants a cross-account permission to the producer account that allows producer to manage the schema of the table.
When complete, this emits an event into the central event bus.
The central event bus contains a rule that forwards the event to the producer’s event bus. This rule was created during the data domain registration process.
When the producer’s event bus receives the event, it triggers the Data Domain workflow, which creates resource-links and grants permissions.
Still in the producer account, Crawler workflow gets triggered when the Data Domain workflow state changes to Successful. This creates the crawler, runs it, waits and checks if the crawler is done, and deletes the crawler when it’s complete. This workflow is responsible for populating tables’ schemas.

Now other data domains can find newly registered data products using the self-service data platform UI and request access. The sharing process works in the same way as product registration by sending events from the central governance account to consumer data domain, and triggering specific workflows.

Solution Overview

The following high-level solution diagram shows how everything fits together and how event-driven architecture enables multiple accounts to form a data mesh. You can follow the workshop that we released to deploy the solution that we covered in this blog post. You can deploy multiple data domains and test both data registration and data sharing. You can also use self-service data platform UI to search through data products and request access using both LF-TBAC and NRAC approaches.

Conclusion

Implementing a data mesh on top of an event-driven architecture provides both flexibility and extensibility. A data mesh by itself has several moving parts to support various functionalities, such as onboarding, search, access management and sharing, and more. With an event-driven architecture, we can implement these functionalities in smaller components to make them easier to test, operate, and maintain. Future requirements and applications can use the event stream to provide their own functionality, making the entire mesh much more valuable to your organization.

To learn more how to design and build applications based on event-driven architecture, see the AWS Event-Driven Architecture page. To dive deeper into data mesh concepts, see the Design a Data Mesh Architecture using AWS Lake Formation and AWS Glue blog.

If you’d like our team to run data mesh workshop with you, please reach out to your AWS team.

About the authors

Jan Michael Go Tan is a Principal Solutions Architect for Amazon Web Services. He helps customers design scalable and innovative solutions with the AWS Cloud.

Dzenan Softic is a Senior Solutions Architect at AWS. He works with startups to help them define and execute their ideas. His main focus is in data engineering and infrastructure.

David Greenshtein is a Specialist Solutions Architect for Analytics at AWS with a passion for ETL and automation. He works with AWS customers to design and build analytics solutions enabling business to make data-driven decisions. In his free time, he likes jogging and riding bikes with his son.

Build a data sharing workflow with AWS Lake Formation for your data mesh

2022-02-11 Jan Michael Go Tan

Post Syndicated from Jan Michael Go Tan original https://aws.amazon.com/blogs/big-data/build-a-data-sharing-workflow-with-aws-lake-formation-for-your-data-mesh/

A key benefit of a data mesh architecture is allowing different lines of business (LOBs) and organizational units to operate independently and offer their data as a product. This model not only allows organizations to scale, but also gives the end-to-end ownership of maintaining the product to data producers that are the domain experts of the data. This ownership entails maintaining the data pipelines, debugging ETL scripts, fixing data quality issues, and keeping the catalog entries up to date as the dataset evolves over time.

On the consumer side, teams can search the central catalog for relevant data products and request access. Access to the data is done via the data sharing feature in AWS Lake Formation. As the amount of data products grow and potentially more sensitive information is stored in an organization’s data lake, it’s important that the process and mechanism to request and grant access to specific data products are done in a scalable and secure manner.

This post describes how to build a workflow engine that automates the data sharing process while including a separate approval mechanism for data products that are tagged as sensitive (for example, containing PII data). Both the workflow and approval mechanism are customizable and should be adapted to adhere to your company’s internal processes. In addition, we include an optional workflow UI to demonstrate how to integrate with the workflow engine. The UI is just one example of how the interaction works. In a typical large enterprise, you can also use ticketing systems to automatically trigger both the workflow and the approval process.

Solution overview

A typical data mesh architecture for analytics in AWS contains one central account that collates all the different data products from multiple producer accounts. Consumers can search the available data products in a single location. Sharing data products to consumers doesn’t actually make a separate copy, but instead just creates a pointer to the catalog item. This means any updates that producers make to their products are automatically reflected in the central account as well as in all the consumer accounts.

Building on top of this foundation, the solution contains several components, as depicted in the following diagram:

The central account includes the following components:

AWS Glue – Used for Data Catalog purposes.
AWS Lake Formation – Used to secure access to the data as well as provide the data sharing capabilities that enable the data mesh architecture.
AWS Step Functions – The actual workflow is defined as a state machine. You can customize this to adhere to your organization’s approval requirements.
AWS Amplify – The workflow UI uses the Amplify framework to secure access. It also uses Amplify to host the React-based application. On the backend, the Amplify framework creates two Amazon Cognito components to support the security requirements:
- User pools – Provide a user directory functionality.
- Identity pools – Provide federated sign-in capabilities using Amazon Cognito user pools as the location of the user details. The identity pools vend temporary credentials so the workflow UI can access AWS Glue and Step Functions APIs.
AWS Lambda – Contains the application logic orchestrated by the Step Functions state machine. It also provides the necessary application logic when a producer approves or denies a request for access.
Amazon API Gateway – Provides the API for producers to accept and deny requests.

The producer account contains the following components:

Amazon Simple Notification Service (Amazon SNS) – Producers maintain their list of approvers in an SNS topic. The workflow engine publishes a message in the respective data owner’s SNS topic when an approval is required.
AWS Identity and Access Management (IAM) – The ProducerWorkflowRole role is assumed by the workflow engine if it needs to send an approval request to the producer’s SNS topic.

The consumer account contains the following components:

AWS Glue – Used for Data Catalog purposes.
AWS Lake Formation – After the data has been made available, consumers can grant access to its own users via Lake Formation.
AWS Resource Access Manager (AWS RAM) – If the grantee account is in the same organization as the grantor account, the shared resource is available immediately to the grantee. If the grantee account is not in the same organization, AWS RAM sends an invitation to the grantee account to accept or reject the resource grant. For more details about Lake Formation cross-account access, see Cross-Account Access: How It Works.

The solution is split into multiple steps:

Deploy the central account backend, including the workflow engine and its associated components.
Deploy the backend for the producer accounts. You can repeat this step multiple times depending on the number of producer accounts that you’re onboarding into the workflow engine.
Deploy the optional workflow UI in the central account to interact with the central account backend.

Workflow overview

The following diagram illustrates the workflow. In this particular example, the state machine checks if the table or database (depending on what is being shared) has the pii_flag parameter and if it’s set to TRUE. If both conditions are valid, it sends an approval request to the producer’s SNS topic. Otherwise, it automatically shares the product to the requesting consumer.

This workflow is the core of the solution, and can be customized to fit your organization’s approval process. In addition, you can add custom parameters to databases, tables, or even columns to attach extra metadata to support the workflow logic.

Prerequisites

The following are the deployment requirements:

A data mesh set up
NodeJS v14.x
Yarn v1.22.10+
The AWS Cloud Development Kit (AWS CDK) CLI v1.119.0+
The Amplify CLI v5.3.0+ (for the workflow UI)
AWS profiles for each of the accounts (central and producers) that you’re deploying in

You can clone the workflow UI and AWS CDK scripts from the GitHub repository.

Deploy the central account backend

To deploy the backend for the central account, go to the root of the project after cloning the GitHub repository and enter the following code:

yarn deploy-central --profile <PROFILE_OF_CENTRAL_ACCOUNT>

This deploys the following:

IAM roles used by the Lambda functions and Step Functions state machine
Lambda functions
The Step Functions state machine (the workflow itself)
An API Gateway

When the deployment is complete, it generates a JSON file in the src/cfn-output.json location. This file is used by the UI deployment script to generate a scoped-down IAM policy and workflow UI application to locate the state machine that was created by the AWS CDK script.

The actual AWS CDK scripts for the central account deployment are in infra/central/. This also includes the Lambda functions (in the infra/central/functions/ folder) that are used by both the state machine and the API Gateway.

Lake Formation permissions

The following table contains the minimum required permissions that the central account data lake administrator needs to grant to the respective IAM roles for the backend to have access to the AWS Glue Data Catalog.

Role	Permission	Grantable
WorkflowLambdaTableDetails	Database: DESCRIBE Tables: DESCRIBE	N/A
WorkflowLambdaShareCatalog	Tables: SELECT, DESCRIBE	Tables: SELECT, DESCRIBE

Workflow catalog parameters

The workflow uses the following catalog parameters to provide its functionality.

Catalog Type	Parameter Name	Description
Database	`data_owner`	(Required) The account ID of the producer account that owns the data products.
Database	`data_owner_name`	A readable friendly name that identifies the producer in the UI.
Database	`pii_flag`	A flag (`true/false`) that determines whether the data product requires approval (based on the example workflow).
Column	`pii_flag`	A flag (`true/false`) that determines whether the data product requires approval (based on the example workflow). This is only applicable if requesting table-level access.

You can use UpdateDatabase and UpdateTable to add parameters to database and column-level granularity, respectively. Alternatively, you can use the CLI for AWS Glue to add the relevant parameters.

Use the AWS CLI to run the following command to check the current parameters in your database:

aws glue get-database --name <DATABASE_NAME> --profile <PROFILE_OF_CENTRAL_ACCOUNT>

You get the following response:

{
  "Database": {
    "Name": "<DATABASE_NAME>",
    "CreateTime": "<CREATION_TIME>",
    "CreateTableDefaultPermissions": [],
    "CatalogId": "<CATALOG_ID>"
  }
}

To update the database with the parameters indicated in the preceding table, we first create the input JSON file, which contains the parameters that we want to update the database with. For example, see the following code:

{
  "Name": "<DATABASE_NAME>",
  "Parameters": {
    "data_owner": "<AWS_ACCOUNT_ID_OF_OWNER>",
    "data_owner_name": "<AWS_ACCOUNT_NAME_OF_OWNER>",
    "pii_flag": "true"
  }
}

Run the following command to update the Data Catalog:

aws glue update-database --name <DATABASE_NAME> --database-input file://<FILE_NAME>.json --profile <PROFILE_OF_CENTRAL_ACCOUNT>

Deploy the producer account backend

To deploy the backend for your producer accounts, go to the root of the project and run the following command:

yarn deploy-producer --profile <PROFILE_OF_PRODUCER_ACCOUNT> --parameters centralMeshAccountId=<central_account_account_id>

This deploys the following:

An SNS topic where approval requests get published.
The ProducerWorkflowRole IAM role with a trust relationship to the central account. This role allows Amazon SNS publish to the previously created SNS topic.

You can run this deployment script multiple times, each time pointing to a different producer account that you want to participate in the workflow.

To receive notification emails, subscribe your email in the SNS topic that the deployment script created. For example, our topic is called DataLakeSharingApproval. To get the full ARN, you can either go to the Amazon Simple Notification Service console or run the following command to list all the topics and get the ARN for DataLakeSharingApproval:

aws sns list-topics --profile <PROFILE_OF_PRODUCER_ACCOUNT>

After you have the ARN, you can subscribe your email by running the following command:

aws sns subscribe --topic-arn <TOPIC_ARN> --protocol email --notification-endpoint <EMAIL_ADDRESS> --profile <PROFILE_OF_PRODUCER_ACCOUNT>

You then receive a confirmation email via the email address that you subscribed. Choose Confirm subscription to receive notifications from this SNS topic.

Deploy the workflow UI

The workflow UI is designed to be deployed in the central account where the central data catalog is located.

To start the deployment, enter the following command:

yarn deploy-ui

This deploys the following:

Amazon Cognito user pool and identity pool
React-based application to interact with the catalog and request data access

The deployment command prompts you for the following information:

Project information – Use the default values.
AWS authentication – Use your profile for the central account. Amplify uses this profile to deploy the backend resources.

UI authentication – Use the default configuration and your username. Choose No, I am done when asked to configure advanced settings.

UI hosting – Use hosting with the Amplify console and choose manual deployment.

The script gives a summary of what is deployed. Entering Y triggers the resources to be deployed in the backend. The prompt looks similar to the following screenshot:

When the deployment is complete, the remaining prompt is for the initial user information such as user name and email. A temporary password is automatically generated and sent to the email provided. The user is required to change the password after the first login.

The deployment script grants IAM permissions to the user via an inline policy attached to the Amazon Cognito authenticated IAM role:

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "glue:GetDatabase",
            "glue:GetTables",
            "glue:GetDatabases",
            "glue:GetTable"
         ],
         "Resource":"*"
      },
      {
         "Effect":"Allow",
         "Action":[
            "states:ListExecutions",
            "states:StartExecution"
         ],
         "Resource":[
"arn:aws:states:<REGION>:<AWS_ACCOUNT_ID>:stateMachine:<STATE_MACHINE_NAME>"
]
      },
      {
         "Effect":"Allow",
         "Action":[
             "states:DescribeExecution"
         ],
         "Resource":[
"arn:aws:states:<REGION>:<AWS_ACCOUNT_ID>:execution:<STATE_MACHINE_NAME>:*"
]
      }


   ]
}

The last remaining step is to grant Lake Formation permissions (DESCRIBE for both databases and tables) to the authenticated IAM role associated with the Amazon Cognito identity pool. You can find the IAM role by running the following command:

cat amplify/team-provider-info.json

The IAM role name is in the AuthRoleName property under the awscloudformation key. After you grant the required permissions, you can use the URL provided in your browser to open the workflow UI.

Your temporary password is emailed to you so you can complete the initial login, after which you’re asked to change your password.

The first page after logging in is the list of databases that consumers can access.

Choose Request Access to see the database details and the list of tables.

Choose Request Per Table Access and see more details at the table level.

Going back in the previous page, we request database-level access by entering the consumer account ID that receives the share request.

Because this database has been tagged with a pii_flag, the workflow needs to send an approval request to the product owner. To receive this approval request email, the product owner’s email needs to be subscribed to the DataLakeSharingApproval SNS topic in the product account. The details should look similar to the following screenshot:

The email looks similar to the following screenshot:

The product owner chooses the Approve link to trigger the Step Functions state machine to continue running and share the catalog item to the consumer account.

For this example, the consumer account is not part of an organization, so the admin of the consumer account has to go to AWS RAM and accept the invitation.

After the resource share is accepted, the shared database appears in the consumer account’s catalog.

Clean up

If you no longer need to use this solution, use the provided cleanup scripts to remove the deployed resources.

Producer account

To remove the deployed resources in producer accounts, run the following command for each producer account that you deployed in:

yarn clean-producer --profile <PROFILE_OF_PRODUCER_ACCOUNT>

Central account

Run the following command to remove the workflow backend in the central account:

yarn clean-central --profile <PROFILE_OF_CENTRAL_ACCOUNT>

Workflow UI

The cleanup script for the workflow UI relies on an Amplify CLI command to initiate the teardown of the deployed resources. Additionally, you can use a custom script to remove the inline policy in the authenticated IAM role used by Amazon Cognito so that Amplify can fully clean up all the deployed resources. Run the following command to trigger the cleanup:

yarn clean-ui

This command doesn’t require the profile parameter because it uses the existing Amplify configuration to infer where the resources are deployed and which profile was used.

Conclusion

This post demonstrated how to build a workflow engine to automate an organization’s approval process to gain access to data products with varying degrees of sensitivity. Using a workflow engine enables data sharing in a self-service manner while codifying your organization’s internal processes to be able to safely scale as more data products and teams get onboarded.

The provided workflow UI demonstrated one possible integration scenario. Other possible integration scenarios include integration with your organization’s ticketing system to trigger the workflow as well as receive and respond to approval requests, or integration with business chat applications to further shorten the approval cycle.

Lastly, a high degree of customization is possible with the demonstrated approach. Organizations have complete control over the workflow, how data product sensitivity levels are defined, what gets auto-approved and what needs further approvals, the hierarchy of approvals (such as a single approver or multiple approvers), and how the approvals get delivered and acted upon. You can take advantage of this flexibility to automate your company’s processes to help them safely accelerate towards being a data-driven organization.

About the Author

Jan Michael Go Tan is a Principal Solutions Architect for Amazon Web Services. He helps customers design scalable and innovative solutions with the AWS Cloud.

Solution overview

OpenSearch Service authentication methods and SAML

Private network topology options for OpenSearch Service

AWS Transit Gateway

AWS Client VPN

Prerequisites

Set up the initial network topology

Configure Transit Gateway

Configure Client VPN authentication

Configure Client VPN access to database VPC

Configure Client VPN application on your client

Set up federation with IAM Identity Center with OpenSearch Service

Test the end-to-end flow

Clean up

Conclusion

About the authors

Solution overview

Prerequisites

Initialize the project and provision a Redshift Serverless namespace and workgroup

Create tables and ingest data

Bootstrapping example (materialized views)

Catalog tables in Amazon Redshift

Data sharing

For producers

For consumers

Deploy the resources using the AWS CDK

Clean up

Conclusion

About the authors

AWS Analytics Reference Architecture

Central Governance and data sharing

A data domain: producer and consumer

Creating an event-driven data mesh

Solution Overview

Conclusion

About the authors

Solution overview

Workflow overview

Prerequisites

Deploy the central account backend

Lake Formation permissions

Workflow catalog parameters

Deploy the producer account backend

Deploy the workflow UI

Clean up

Producer account

Central account

Workflow UI

Conclusion

About the Author

The collective thoughts of the interwebz