All posts by Maneesh Sharma

Integrate Okta with Amazon Redshift Query Editor V2 using AWS IAM Identity Center for seamless Single Sign-On

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/integrate-okta-with-amazon-redshift-query-editor-v2-using-aws-iam-identity-center-for-seamless-single-sign-on/

AWS IAM Identity Center (IdC) allows you to manage single sign-on (SSO) access to all your AWS accounts and applications from a single location. We are pleased to announce that Amazon Redshift now integrates with AWS IAM Identity Center, and supports trusted identity propagation, allowing you to use third-party Identity Providers (IdP) such as Microsoft Entra ID (Azure AD), Okta, Ping, and OneLogin. This integration simplifies the authentication and authorization process for Amazon Redshift users using Query Editor V2 or Amazon Quicksight, making it easier for them to securely access your data warehouse. Additionally, this integration positions Amazon Redshift as an IdC-managed application, enabling you to use database role-based access control on your data warehouse for enhanced security.

AWS IAM Identity Center offers automatic user and group provisioning from Okta to itself by utilizing the System for Cross-domain Identity Management (SCIM) 2.0 protocol. This integration allows for seamless synchronization of information between two services, ensuring accurate and up-to-date information in AWS IAM Identity Center.

In this post, we’ll outline a comprehensive guide for setting up SSO to Amazon Redshift using integration with IdC and Okta as the Identity Provider. This guide shows how you can SSO onto Amazon Redshift for Amazon Redshift Query Editor V2 (QEV2).

Solution overview

Using IAM IdC with Amazon Redshift can benefit your organization in the following ways:

  • Users can connect to Amazon Redshift without requiring an administrator to set up AWS IAM roles with complex permissions.
  • IAM IdC integration allows mapping of IdC groups with Amazon Redshift database roles. Administrators can then assign different privileges to different roles and assigning these roles to different users, giving organizations granular control for user access.
  • IdC provides a central location for your users in AWS. You can create users and groups directly in IdC or connect your existing users and groups that you manage in a standards-based identity provider like Okta, Ping Identity, or Microsoft Entra ID (i.e., Azure Active Directory [AD]).
  • IdC directs authentication to your chosen source of truth for users and groups, and it maintains a directory of users and groups for access by Amazon Redshift.
  • You can share one IdC instance with multiple Amazon Redshift data warehouses with a simple auto-discovery and connect capability. This makes it fast to add clusters without the extra effort of configuring the IdC connection for each, and it ensures that all clusters and workgroups have a consistent view of users, their attributes, and groups. Note: Your organization’s IdC instance must be in the same region as the Amazon Redshift data warehouse you’re connecting to.
  • Because user identities are known and logged along with data access, it’s easier for you to meet compliance regulations through auditing user access in AWS CloudTrail authorizes access to data.
    Architecture Diagram

Amazon Redshift Query Editor V2 workflow:

  1. End user initiates the flow using AWS access portal URL (this URL would be available on IdC dashboard console). A browser pop-up triggers and takes you to the Okta Login page where you enter Okta credentials. After successful authentication, you’ll be logged into the AWS Console as a federated user. Click on your AWS Account and choose the Amazon Redshift Query Editor V2 application. Once you federate to Query Editor V2, select the IdC authentication method.
  2. QEv2 invokes browser flow where you re-authenticate, this time with their AWS IdC credentials. Since Okta is the IdP, you enter Okta credentials, which are already cached in browser. At this step, federation flow with IdC initiates and at the end of this flow, the Session token and Access token is available to the QEv2 console in browser as cookies.
  3. Amazon Redshift retrieves your authorization details based on session token retrieved and fetches user’s group membership.
  4. Upon a successful authentication, you’ll be redirected back to QEV2, but logged in as an IdC authenticated user.

This solution covers following steps:

  1. Integrate Okta with AWS IdC to sync user and groups.
  2. Setting up IdC integration with Amazon Redshift
  3. Assign Users or Groups from IdC to Amazon Redshift Application.
  4. Enable IdC integration for a new Amazon Redshift provisioned or Amazon Redshift Serverless endpoint.
  5. Associate an IdC application with an existing provisioned or serverless data warehouse.
  6. Configure Amazon Redshift role-based access.
  7. Create a permission set.
  8. Assign permission set to AWS accounts.
  9. Federate to Redshift Query Editor V2 using IdC.
  10. Troubleshooting

Prerequisites

Walkthrough

Integrate Okta with AWS IdC to sync user and groups

Enable group and user provisioning from Okta with AWS IdC by following this documentation here.

If you see issues while syncing users and groups, then refer to this section these considerations for using automatic provisioning.

Setting up IAM IdC integration with Amazon Redshift

Amazon Redshift cluster administrator or Amazon Redshift Serverless administrator must perform steps to configure Redshift as an IdC-enabled application. This enables Amazon Redshift to discover and connect to the IdC automatically to receive sign-in and user directory services.

After this, when the Amazon Redshift administrator creates a cluster or workgroup, they can enable the new data warehouse to use IdC and its identity-management capabilities. The point of enabling Amazon Redshift as an IdC-managed application is so you can control user and group permissions from within the IdC, or from a source third-party identity provider that’s integrated with it.

When your database users sign in to an Amazon Redshift database, for example an analyst or a data scientist, it checks their groups in IdC and these are mapped to roles in Amazon Redshift. In this manner, a group can map to an Amazon Redshift database role that allows read access to a set of tables.

The following steps show how to make Amazon Redshift an AWS-managed application with IdC:

  1. Select IAM Identity Center connection from Amazon Redshift console menu.
    Redshift Application Creation
  2. Choose Create application
    Redshift IdC Create Application
  3. The IAM Identity Center connection opens. Choose Next.
  4. In IAM Identity Center integration setup section, for:
    1. IAM Identity Center display name – Enter a unique name for Amazon Redshift’s IdC-managed application.
    2. Managed application name – You can enter the managed Amazon Redshift application name or use the assigned value as it is.
  5. In Connection with third-party identity providers section, for:
    1. Identity Provider Namespace – Specify the unique namespace for your organization. This is typically an abbreviated version of your organization’s name. It’s added as a prefix for your IdC-managed users and roles in the Amazon Redshift database.
  6. In IAM role for IAM Identity Center access – Select an IAM role to use. You can create a new IAM role if you don’t have an existing one. The specific policy permissions required are the following:
    • sso:DescribeApplication – Required to create an identity provider (IdP) entry in the catalog.
    • sso:DescribeInstance – Used to manually create IdP federated roles or users.
    • redshift:DescribeQev2IdcApplications – Used to detect capability for IDC authentication from Redshift Query Editor V2.

The following screenshot is from the IAM role:
IAM IdC Role

IAM IdC Role Trust Relationship

  1. We won’t enable Trusted identity propagation because we are not integrating with AWS Lake Formation in this post.
  2. Choose Next.
    Redshift IdC Connection
  3. In Configure client connections that use third-party IdPs section, choose Yes if you want to connect Amazon Redshift with a third-party application. Otherwise, choose No. For this post we chose No because we’ll be integrating only with Amazon Redshift Query Editor V2.
  4. Choose Next.
    Redshift IdC No Third Party App
  5. In the Review and create application section, review all the details you have entered before and choose Create application.
    Redshift IdC Application - Review and Create

After the Amazon Redshift administrator finishes the steps and saves the configuration, the IdC properties appear in the Redshift Console. Completing these tasks makes Redshift an IdC-enabled application.
Redshift IdC Application Created

After you select the managed application name, the properties in the console includes the integration status. It says Success when it’s completed. This status indicates if IdC configuration is completed.
Amazon Redshift IdC Application Status

Assigning users or groups from IdC to Amazon Redshift application

In this step, Users or groups synced to your IdC directory are available to assign to your application where the Amazon Redshift administrator can decide which users or groups from IDC need to be included as part of Amazon Redshift application.

For example, if you have total 20 groups from your IdC and you don’t want all the groups to include as part of Amazon Redshift application, then you have options to choose which IdC groups to include as part of Amazon Redshift-enabled IdC application. Later, you can create two Redshift database roles as part of IDC integration in Amazon Redshift.

The following steps assign groups to Amazon Redshift-enabled IdC application:

  1. On IAM Identity Center properties in the Amazon Redshift Console, select Assign under Groups tab.
  2. If this is the first time you’re assigning groups, then you’ll see a notification. Select Get started.
  3. Enter which groups you want to synchronize in the application. In this example, we chose the groups wssso-sales and awssso-finance.
  4. Choose Done.
    Redshift Application - Assigning User and Groups

Enabling IdC integration for a new Amazon Redshift provisioned cluster or Amazon Redshift Serverless

After completing steps under section (Setting up IAM Identity Center integration with Amazon Redshift) — Amazon Redshift database administrator needs to configure new Redshift resources to work in alignment with IdC to make sign-in and data access easier. This is performed as part of the steps to create a provisioned cluster or a Serverless workgroup. Anyone with permissions to create Amazon Redshift resources can perform these IdC integration tasks. When you create a provisioned cluster, you start by choosing Create Cluster in the Amazon Redshift Console.

  1. Choose Enable for the cluster (recommended) in the section for IAM Identity Center connection in the create-cluster steps.
  2. From the drop down, choose the redshift application which you created in above steps.

Note that when a new data warehouse is created, the IAM role specified for IdC integration is automatically attached to the provisioned cluster or Serverless Namespace. After you finish entering the required cluster metadata and create the resource, you can check the status for IdC integration in the properties.
Amazon Redshift - Create Cluster

Associating an IdC application with an existing provisioned cluster or Serverless endpoint

If you have an existing provisioned cluster or serverless workgroup that you would like to enable for IdC integration, then you can do that by running a SQL command. You run the following command to enable integration. It’s required that a database administrator run the query.

CREATE IDENTITY PROVIDER "<idc_application_name>" TYPE AWSIDC
NAMESPACE '<idc_namespace_name>'
APPLICATION_ARN '<idc_application_arn>'
IAM_ROLE '<IAM role for IAM Identity Center access>';

Example:

CREATE IDENTITY PROVIDER "redshift-idc-app" TYPE AWSIDC
NAMESPACE 'awsidc'
APPLICATION_ARN 'arn:aws:sso::123456789012:application/ssoins-12345f67fe123d4/apl-a0b0a12dc123b1a4'
IAM_ROLE 'arn:aws:iam::123456789012:role/EspressoRedshiftRole';

To alter the IdP, use the following command (this new set of parameter values completely replaces the current values):

ALTER IDENTITY PROVIDER "<idp_name>"
NAMESPACE '<idc_namespace_name>'|
IAM_ROLE default | 'arn:aws:iam::<AWS account-id-1>:role/<role-name>';

Few of the examples are:

ALTER IDENTITY PROVIDER "redshift-idc-app"
NAMESPACE 'awsidctest';

ALTER IDENTITY PROVIDER "redshift-idc-app"
IAM_ROLE 'arn:aws:iam::123456789012:role/administratoraccess';

Note: If you update the idc-namespace value, then all the new cluster created afterwards will be using the updated namespace.

For existing clusters or serverless workgroups, you need to update the namespace manually on each Amazon Redshift cluster using the previous command. Also, all the database roles associated with identity provider will be updated with new namespace value.

You can disable or enable the identity provider using the following command:

ALTER IDENTITY PROVIDER <provider_name> disable|enable;

Example:

ALTER IDENTITY PROVIDER <provider_name> disable;

You can drop an existing identity provider. The following example shows how CASCADE deletes users and roles attached to the identity provider.

DROP IDENTITY PROVIDER <provider_name> [ CASCADE ]

Configure Amazon Redshift role-based access

In this step, we pre-create the database roles in Amazon Redshift based on the groups that you synced in IdC. Make sure the role name matches with the IdC Group name.

Amazon Redshift roles simplify managing privileges required for your end-users. In this post, we create two database roles, sales and finance, and grant them access to query tables with sales and finance data, respectively. You can download this sample SQL Notebook and import into Redshift Query Editor v2 to run all cells in the notebook used in this example. Alternatively, you can copy and enter the SQL into your SQL client.

Below is the syntax to create role in Amazon Redshift:

create role "<IDC_Namespace>:<IdP groupname>;

For example:

create role "awsidc:awssso-sales";
create role "awsidc:awssso-finance";

Create the sales and finance database schema:

create schema sales_schema;
create schema finance_schema;

Creating the tables:

CREATE TABLE IF NOT EXISTS finance_schema.revenue
(
account INTEGER   ENCODE az64
,customer VARCHAR(20)   ENCODE lzo
,salesamt NUMERIC(18,0)   ENCODE az64
)
DISTSTYLE AUTO
;

insert into finance_schema.revenue values (10001, 'ABC Company', 12000);
insert into finance_schema.revenue values (10002, 'Tech Logistics', 175400);
insert into finance_schema.revenue values (10003, 'XYZ Industry', 24355);
insert into finance_schema.revenue values (10004, 'The tax experts', 186577);

CREATE TABLE IF NOT EXISTS sales_schema.store_sales
(
ID INTEGER   ENCODE az64,
Product varchar(20),
Sales_Amount INTEGER   ENCODE az64
)
DISTSTYLE AUTO
;

Insert into sales_schema.store_sales values (1,'product1',1000);
Insert into sales_schema.store_sales values (2,'product2',2000);
Insert into sales_schema.store_sales values (3,'product3',3000);
Insert into sales_schema.store_sales values (4,'product4',4000);

Below is the syntax to grant permission to the Amazon Redshift Serverless role:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]| ALL [ PRIVILEGES ] } ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] } TO role <IdP groupname>;

Grant relevant permission to the role as per your requirement. In following example, we grant full permission to role sales on sales_schema and only select permission on finance_schema to role finance.

For example:

grant usage on schema sales_schema to role "awsidc:awssso-sales";
grant select on all tables in schema sales_schema to role "awsidc:awssso-sales";

grant usage on schema finance_schema to role "awsidc:awssso-finance";
grant select on all tables in schema finance_schema to role "awsidc:awssso-finance";

Create a permission set

A permission set is a template that you create and maintain that defines a collection of one or more IAM policies. Permission sets simplify the assignment of AWS account access for users and groups in your organization. We’ll create a permission set to allow federated user to access Query Editor V

The following steps to create permission set:

  1. Open the IAM Identity Center Console.
  2. In the navigation pane, under Multi-Account permissions, choose Permission sets.
  3. Choose Create permission set.
    IdC-Create Permission Set
  4. Choose Custom permission set and then choose Next.
  5. Under AWS managed policies, choose AmazonRedshiftQueryEditorV2ReadSharing.
  6. Under Customer managed policies, provide the policy name which you created in step 4 under section – Setting up IAM Identity Center integration with Amazon Redshift.
  7. Choose Next.
  8. Enter permission set name. For example, Amazon Redshift-Query-Editor-V2.
  9. Under Relay state – optional – set default relay state to the Query Editor V2 URL, using the format : https://<region>.console.aws.amazon.com/sqlworkbench/home.
    For this post, we use: https://us-east-1.console.aws.amazon.com/sqlworkbench/home.
  10. Choose Next.
  11. On the Review and create screen, choose Create. The console displays the following message: The permission set Redshift-Query-Editor-V2 was successfully created.
    IdC- Review Create Permission

Assign permission set to AWS accounts

  1. Open the IAM Identity Center Console.
  2. In the navigation pane, under Multi-account permissions, choose AWS accounts.
  3. On the AWS accounts page, select one or more AWS accounts that you want to assign single sign-on access to.
  4. Choose Assign users or groups.
  5. On the Assign users and groups to AWS-account-name, choose the groups that you want to create the permission set for. Then, choose Next.
  6. On the Assign permission sets to AWS-account-name, choose the permission set you created in the section – Create a permission set. Then, choose Next.
  7. On the Review and submit assignments to AWS-account-name page, for Review and submit, choose Submit. The console displays the following message: We reprovisioned your AWS account successfully and applied the updated permission set to the account.
    Idc-Review and Submit AssignmentsIdC-Assignment Permission Created

Federate to Amazon Redshift using Query Editor V2 using IdC

Now you’re ready to connect to Amazon Redshift Query Editor V2 and federated login using IdC authentication:

  1. Open the IAM Identity Center Console.
  2. Go to dashboard and select the AWS access portal URL.
  3. A browser pop-up triggers and takes you to the Okta Login page where you enter your Okta credentials.
  4. After successful authentication, you’ll be logged into the AWS console as a federated user.
  5. Select your AWS Account and choose the Amazon Redshift Query Editor V2 application.
  6. Once you federate to Query Editor V2, choose your Redshift instance (i.e., right-click) and choose Create connection.
  7. To authenticate using IdC, choose the authentication method IAM Identity Center.
  8. It will show a pop-up and since your Okta credentials is already cached, it utilizes the same credentials and connects to Amazon Redshift Query Editor V2 using IdC authentication.

The following demonstration shows a federated user (Ethan) used the AWS access portal URL to access Amazon Redshift using IdC authentication. User Ethan accesses the sales_schema tables. If User Ethan tries to access the tables in finance_schema, then the user gets a permission denied error.
QEV2-IdC-Demo

Troubleshooting

  • If you get the following error:
    ERROR: registered identity provider does not exist for "<idc-namespace:redshift_database_role_name"

This means that you are trying to create a role with a wrong namespace. Please check current namespace using the command select * from identity_providers;

  • If you get below error:
    ERROR: (arn:aws:iam::123456789012:role/<iam_role_name>) Insufficient privileges to access AWS IdC. [ErrorId: 1-1234cf25-1f23ea1234c447fb12aa123e]

This means that an IAM role doesn’t have sufficient privileges to access to the IdC. Your IAM role should contain a policy with following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "redshift:DescribeQev2IdcApplications",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "sso:DescribeApplication",
                "sso:DescribeInstance"
            ],
            "Resource": "arn:aws:sso::726034267621:application/ssoins-722324fe82a3f0b8/*"
        }
    ]
}
  • If you get below error:
    FATAL: Invalid scope. User's credentials are not authorized to connect to Redshift. [SQL State=28000]

Please make sure that the user and group are added to the Amazon Redshift IdC application.

Clean up

Complete the following steps to clean up your resources:

  1. Delete the Okta Applications which you have created to integrate with IdC.
  2. Delete IAM Identity Center configuration.
  3. Delete the Redshift application and the Redshift provisioned cluster which you have created for testing.
  4. Delete the IAM role which you have created for IdC and Redshift integration.

Conclusion

In this post, we showed you a detailed walkthrough of how you can integrate Okta with the IdC and Amazon Redshift Query Editor version 2 to simplify your SSO setup. This integration allows you to use role-based access control with Amazon Redshift. We encourage you to try out this integration.

To learn more about IdC with Amazon Redshift, visit the documentation.


About the Authors

Debu Panda is a Senior Manager, Product Management at AWS. He is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world.

Maneesh Sharma is a Senior Database Engineer at AWS with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.

Harshida Patel is a Principal Solutions Architect, Analytics with AWS.

Praveen Kumar Ramakrishnan is a Senior Software Engineer at AWS. He has nearly 20 years of experience spanning various domains including filesystems, storage virtualization and network security. At AWS, he focuses on enhancing the Redshift data security.

Karthik Ramanathan is a Sr. Software Engineer with AWS Redshift and is based in San Francisco. He brings close to two decades of development experience across the networking, data storage and IoT verticals prior to Redshift. When not at work, he is also a writer and loves to be in the water.

Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/simplify-external-object-access-in-amazon-redshift-using-automatic-mounting-of-the-aws-glue-data-catalog/

Amazon Redshift is a petabyte-scale, enterprise-grade cloud data warehouse service delivering the best price-performance. Today, tens of thousands of customers run business-critical workloads on Amazon Redshift to cost-effectively and quickly analyze their data using standard SQL and existing business intelligence (BI) tools.

Amazon Redshift now makes it easier for you to run queries in AWS data lakes by automatically mounting the AWS Glue Data Catalog. You no longer have to create an external schema in Amazon Redshift to use the data lake tables cataloged in the Data Catalog. Now, you can use your AWS Identity and Access Management (IAM) credentials or IAM role to browse the Glue Data Catalog and query data lake tables directly from Amazon Redshift Query Editor v2 or your preferred SQL editors.

This feature is now available in all AWS commercial and US Gov Cloud Regions where Amazon Redshift RA3, Amazon Redshift Serverless, and AWS Glue are available. To learn more about auto-mounting of the Data Catalog in Amazon Redshift, refer to Querying the AWS Glue Data Catalog.

Enabling easy analytics for everyone

Amazon Redshift is helping tens of thousands of customers manage analytics at scale. Amazon Redshift offers a powerful analytics solution that provides access to insights for users of all skill levels. You can take advantage of the following benefits:

  • It enables organizations to analyze diverse data sources, including structured, semi-structured, and unstructured data, facilitating comprehensive data exploration
  • With its high-performance processing capabilities, Amazon Redshift handles large and complex datasets, ensuring fast query response times and supporting real-time analytics
  • Amazon Redshift provides features like Multi-AZ (preview) and cross-Region snapshot copy for high availability and disaster recovery, and provides authentication and authorization mechanisms to make it reliable and secure
  • With features like Amazon Redshift ML, it democratizes ML capabilities across a variety of user personas
  • The flexibility to utilize different table formats such as Apache Hudi, Delta Lake, and Apache Iceberg (preview) optimizes query performance and storage efficiency
  • Integration with advanced analytical tools empowers you to apply sophisticated techniques and build predictive models
  • Scalability and elasticity allow for seamless expansion as data and workloads grow

Overall, Amazon Redshift empowers organizations to uncover valuable insights, enhance decision-making, and gain a competitive edge in today’s data-driven landscape.

Amazon Redshift Top Benefits

Amazon Redshift Top Benefits

The new automatic mounting of the AWS Glue Data Catalog feature enables you to directly query AWS Glue objects in Amazon Redshift without the need to create an external schema for each AWS Glue database you want to query. With automatic mounting the Data Catalog, Amazon Redshift automatically mounts the cluster account’s default Data Catalog during boot or user opt-in as an external database, named awsdatacatalog.

Relevant use cases for automatic mounting of the AWS Glue Data Catalog feature

You can use tools like Amazon EMR to create new data lake schemas in various formats, such as Apache Hudi, Delta Lake, and Apache Iceberg (preview). However, when analysts want to run queries against these schemas, it requires administrators to create external schemas for each AWS Glue database in Amazon Redshift. You can now simplify this integration using automatic mounting of the AWS Glue Data Catalog.

The following diagram illustrates this architecture.

Solution overview

You can now use SQL clients like Amazon Redshift Query Editor v2 to browse and query awsdatacatalog. In Query Editor V2, to connect to the awsdatacatalog database, choose the following:

Complete the following high-level steps to integrate the automatic mounting of the Data Catalog using Query Editor V2 and a third-party SQL client:

  1. Provision resources with AWS CloudFormation to populate Data Catalog objects.
  2. Connect Redshift Serverless and query the Data Catalog as a federated user using Query Editor V2.
  3. Connect with Redshift provisioned cluster and query the Data Catalog using Query Editor V2.
  4. Configure permissions on catalog resources using AWS Lake Formation.
  5. Federate with Redshift Serverless and query the Data Catalog using Query Editor V2 and a third-party SQL client.
  6. Discover the auto-mounted objects.
  7. Connect with Redshift provisioned cluster and query the Data Catalog as a federated user using a third-party client.
  8. Connect with Amazon Redshift and query the Data Catalog as an IAM user using third-party clients.

The following diagram illustrates the solution workflow.

Prerequisites

You should have the following prerequisites:

Provision resources with AWS CloudFormation to populate Data Catalog objects

In this post, we use an AWS Glue crawler to create the external table ny_pub stored in Apache Parquet format in the Amazon Simple Storage Service (Amazon S3) location s3://redshift-demos/data/NY-Pub/. In this step, we create the solution resources using AWS CloudFormation to create a stack named CrawlS3Source-NYTaxiData in either us-east-1 (use the yml download or launch stack) or us-west-2 (use the yml download or launch stack). Stack creation performs the following actions:

  • Creates the crawler NYTaxiCrawler along with the new IAM role AWSGlueServiceRole-RedshiftAutoMount
  • Creates automountdb as the AWS Glue database

When the stack is complete, perform the following steps:

  1. On the AWS Glue console, under Data Catalog in the navigation pane, choose Crawlers.
  2. Open NYTaxiCrawler and choose Run crawler.

After the crawler is complete, you can see a new table called ny_pub in the Data Catalog under the automountdb database.


Alternatively, you can follow the manual instructions from the Amazon Redshift labs to create the ny_pub table.

Connect with Redshift Serverless and query the Data Catalog as a federated user using Query Editor V2

In this section, we use an IAM role with principal tags to enable fine-grained federated authentication to Redshift Serverless to access auto-mounting AWS Glue objects.

Complete the following steps:

  1. Create an IAM role and add following permissions. For this post, we add full AWS Glue, Amazon Redshift, and Amazon S3 permissions for demo purposes. In an actual production scenario, it’s recommended to apply more granular permissions.

  2. On the Tags tab, create a tag with Key as RedshiftDbRoles and Value as automount.
  3. In Query Editor V2, run the following SQL statement as an admin user to create a database role named automount:
    Create role automount;

  4. Grant usage privileges to the database role:
    GRANT USAGE ON DATABASE awsdatacatalog to role automount;

  5. Switch the role to automountrole by passing the account number and role name.
  6. In the Query Editor v2, choose your Redshift Serverless endpoint (right-click) and choose Create connection.
  7. For Authentication, select Federated user.
  8. For Database, enter the database name you want to connect to.
  9. Choose Create connection.

You’re now ready to explore and query the automatic mounting of the Data Catalog in Redshift Serverless.

Connect with Redshift provisioned cluster and query the Data Catalog using Query Editor V2

To connect with Redshift provisioned cluster and access the Data Catalog, make sure you have completed the steps in the preceding section. Then complete the following steps:

  1. Connect to Redshift Query Editor V2 using the database user name and password authentication method. For example, connect to the dev database using the admin user and password.
  2. In an editor tab, assuming the user is present in Amazon Redshift, run the following SQL statement to grant an IAM user access to the Data Catalog:
    GRANT USAGE ON DATABASE awsdatacatalog to "IAMR:automountrole";

  3. As an admin user, choose the Settings icon, choose Account settings, and select Authenticate with IAM credentials.
  4. Choose Save.
  5. Switch roles to automountrole by passing the account number and role name.
  6. Create or edit the connection and use the authentication method Temporary credentials using your IAM identity.

For more information about this authentication method, see Connecting to an Amazon Redshift database.

You are ready to explore and query the automatic mounting of the Data Catalog in Amazon Redshift.

Discover the auto-mounted objects

This section illustrates the SHOW commands for discovery of auto-mounted objects. See the following code:

// Discovery of Glue databases at the schema level 
SHOW SCHEMAS FROM DATABASE awsdatacatalog;

// Discovery of Glue tables 
 Syntax: SHOW TABLES FROM SCHEMA awsdatacatalog.<glue_db_name>;
Example: SHOW TABLES FROM SCHEMA awsdatacatalog.automountdb;

// Disocvery of Glue table columns 
 Syntax: SHOW COLUMNS FROM TABLE awsdatacatalog.<glue_db_name>.<glue_table_name>;
Example: SHOW COLUMNS FROM TABLE awsdatacatalog.automountdb.ny_pub;

Configure permissions on catalog resources using AWS Lake Formation

To maintain backward compatibility with AWS Glue, Lake Formation has the following initial security settings:

  • The Super permission is granted to the group IAMAllowedPrincipals on all existing Data Catalog resources
  • The Use only IAM access control setting is enabled for new Data Catalog resources

These settings effectively cause access to Data Catalog resources and Amazon S3 locations to be controlled solely by IAM policies. Individual Lake Formation permissions are not in effect.

In this step, we will configure permissions on catalog resources using AWS Lake Formation. Before you create the Data Catalog, you need to update the default settings of Lake Formation so that access to Data Catalog resources (databases and tables) is managed by Lake Formation permissions:

  1. Change the default security settings for new resources. For instructions, see Change the default permission model.
  2. Change the settings for existing Data Catalog resources. For instructions, see Upgrading AWS Glue data permissions to the AWS Lake Formation model.

For more information, refer to Changing the default settings for your data lake.

Federate with Redshift Serverless and query the Data Catalog using Query Editor V2 and a third-party SQL client

With Redshift Serverless, you can connect to awsdatacatalog from a third-party client as a federated user from any identity provider (IdP). In this section, we will configure permission on catalog resources for Federated IAM role in AWS Lake Formation. Using AWS Lake Formation with Redshift, currently permission can be applied on IAM user or IAM role level.

To connect as a federated user, we will be using Redshift Serverless. For setup instructions, refer to Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients.

There are additional changes required on following resources:

  1. In Amazon Redshift, as an admin user, grant the usage to each federated user who needs access on awsdatacatalog:
    GRANT USAGE ON DATABASE awsdatacatalog to "IAMR:[email protected]";

If the user doesn’t exist in Amazon Redshift, you may need to create the IAM user with the password disabled as shown in the following code and then grant usage on awsdatacatalog:

Create User "IAMR:[email protected]" with password disable;
  1. On the Lake Formation console, assign permissions on the AWS Glue database to the IAM role that you created as part of the federated setup.
    1. Under Principals, select IAM users and roles.
    2. Choose IAM role oktarole.
    3. Apply catalog resource permissions, selecting automountdb database and granting appropriate table permissions.
  2. Update the IAM role used in the federation setup. In addition to the permissions added to the IAM role, you need to add AWS Glue permissions and Amazon S3 permissions to access objects from Amazon S3. For this post, we add full AWS Glue and AWS S3 permissions for demo purposes. In an actual production scenario, it’s recommended to apply more granular permissions.

Now you’re ready to connect to Redshift Serverless using the Query Editor V2 and federated login.

  1. Use the SSO URL from Okta and log in to your Okta account with your user credentials. For this demo, we log in with user Ethan.
  2. In the Query Editor v2, choose your Redshift Serverless instance (right-click) and choose Create connection.
  3. For Authentication, select Federated user.
  4. For Database, enter the database name you want to connect to.
  5. Choose Create connection.
  6. Run the command select current_user to validate that you are logged in as a federated user.

User Ethan will be able to explore and access awsdatacatalog data.

To connect Redshift Serverless with a third-party client, make sure you have followed all the previous steps.

For SQLWorkbench setup, refer to the section Configure the SQL client (SQL Workbench/J) in Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients.

The following screenshot shows that federated user ethan is able to query the awsdatacatalog tables using three-part notation:

Connect with Redshift provisioned cluster and query the Data Catalog as a federated user using third-party clients

With Redshift provisioned cluster, you can connect with awsdatacatalog from a third-party client as a federated user from any IdP.

To connect as a federated user with the Redshift provisioned cluster, you need to follow the steps in the previous section that detailed how to connect with Redshift Serverless and query the Data Catalog as a federated user using Query Editor V2 and a third-party SQL client.

There are additional changes required in IAM policy. Update the IAM policy with the following code to use the GetClusterCredentialsWithIAM API:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "iam:ListGroups",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "redshift:GetClusterCredentialsWithIAM",
            "Resource": "arn:aws:redshift:us-east-2:01234567891:dbname:redshift-cluster-1/dev"
        }
    ]
}

Now you’re ready to connect to Redshift provisioned cluster using a third-party SQL client as a federated user.

For SQLWorkbench setup, refer to the section Configure the SQL client (SQL Workbench/J) in the post Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients.

Make the following changes:

  • Use the latest Redshift JDBC driver because it only supports querying the auto-mounted Data Catalog table for federated users
  • For URL, enter jdbc:redshift:iam://<cluster endpoint>:<port>:<databasename>?groupfederation=true. For example, jdbc:redshift:iam://redshift-cluster-1.abdef0abc0ab.us-east-2.redshift.amazonaws.com:5439/dev?groupfederation=true.

In the preceding URL, groupfederation is a mandatory parameter that allows you to authenticate with the IAM credentials.

The following screenshot shows that federated user ethan is able to query the awsdatacatalog tables using three-part notation.

Connect and query the Data Catalog as an IAM user using third-party clients

In this section, we provide instructions to set up a SQL client to query the auto-mounted awsdatacatalog.

Use three-part notation to reference the awsdatacatalog table in your SELECT statement. The first part is the database name, the second part is the AWS Glue database name, and the third part is the AWS Glue table name:

SELECT * FROM awsdatacatalog.<aws-glue-db-name>.<aws-glue-table-name>;

You can perform various scenarios that read the Data Catalog data and populate Redshift tables.

For this post, we use SQLWorkbench/J as the SQL client to query the Data Catalog. To set up SQL Workbench/J, complete the following steps:

  1. Create a new connection in SQL Workbench/J and choose Amazon Redshift as the driver.
  2. Choose Manage drivers and add all the files from the downloaded AWS JDBC driver pack .zip file (remember to unzip the .zip file).

You must use the latest Redshift JDBC driver because it only supports querying the auto-mounted Data Catalog table.

  1. For URL, enter jdbc:redshift:iam://<cluster endpoint>:<port>:<databasename>?profile=<profilename>&groupfederation=true. For example, jdbc:redshift:iam://redshift-cluster-1.abdef0abc0ab.us-east-2.redshift.amazonaws.com:5439/dev?profile=user2&groupfederation=true.

We are using profile-based credentials as an example. You can use any AWS profile or IAM credential-based authentication as per your requirement. For more information on IAM credentials, refer to Options for providing IAM credentials.

The following screenshot shows that IAM user johndoe is able to list the awsdatacatalog tables using the SHOW command.

The following screenshot shows that IAM user johndoe is able to query the awsdatacatalog tables using three-part notation:

If you get the following error while using groupfederation=true, you need to use the latest Redshift driver:

Something unusual has occurred to cause the driver to fail. Please report this exception:Authentication with plugin is not supported for group federation [SQL State=99999]

Clean up

Complete the following steps to clean up your resources:

  1. Delete the IAM role automountrole.
  2. Delete the CloudFormation stack CrawlS3Source-NYTaxiData to clean up the crawler NYTaxiCrawler, the automountdb database from the Data Catalog, and the IAM role AWSGlueServiceRole-RedshiftAutoMount.
  3. Update the default settings of Lake Formation:
    1. In the navigation pane, under Data catalog, choose Settings.
    2. Select both access control options choose Save.
    3. In the navigation pane, under Permissions, choose Administrative roles and tasks.
    4. In the Database creators section, choose Grant.
    5. Search for IAMAllowedPrincipals and select Create database permission.
    6. Choose Grant.

Considerations

Note the following considerations:

  • The Data Catalog auto-mount provides ease of use to analysts or database users. The security setup (setting up the permissions model or data governance) is owned by account and database administrators.
    • To achieve fine-grained access control, build a permissions model in AWS Lake Formation.
    • If the permissions have to be maintained at the Redshift database level, leave the AWS Lake Formation default settings as is and then run grant/revoke in Amazon Redshift.
  • If you are using a third-party SQL editor, and your query tool does not support browsing of multiple databases, you can use the “SHOW“ commands to list your AWS Glue databases and tables. You can also query awsdatacatalog objects using three-part notation (SELECT * FROM awsdatacatalog.<aws-glue-db-name>.<aws-glue-table-name>;) provided you have access to the external objects based on the permission model.

Conclusion

In this post, we introduced the automatic mounting of AWS Glue Data Catalog, which makes it easier for customers to run queries in their data lakes. This feature streamlines data governance and access control, eliminating the need to create an external schema in Amazon Redshift to use the data lake tables cataloged in AWS Glue Data Catalog. We showed how you can manage permission on auto-mounted AWS Glue-based objects using Lake Formation. The permission model can be easily managed and organized by administrators, allowing database users to seamlessly access external objects they have been granted access to.

As we strive for enhanced usability in Amazon Redshift, we prioritize unified data governance and fine-grained access control. This feature minimizes manual effort while ensuring the necessary security measures for your organization are in place.

For more information about automatic mounting of the Data Catalog in Amazon Redshift, refer to Querying the AWS Glue Data Catalog.


About the Authors

Maneesh Sharma is a Senior Database Engineer at AWS with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.

Debu-PandaDebu Panda is a Senior Manager, Product Management at AWS. He is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world.

Rohit Vashishtha is a Senior Analytics Specialist Solutions Architect at AWS based in Dallas, Texas. He has 17 years of experience architecting, building, leading, and maintaining big data platforms. Rohit helps customers modernize their analytic workloads using the breadth of AWS services and ensures that customers get the best price/performance with utmost security and data governance.

Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/single-sign-on-with-amazon-redshift-serverless-with-okta-using-amazon-redshift-query-editor-v2-and-third-party-sql-clients/

Amazon Redshift Serverless makes it easy to run and scale analytics in seconds without the need to set up and manage data warehouse clusters. With Redshift Serverless, users such as data analysts, developers, business professionals, and data scientists can get insights from data by simply loading and querying data in the data warehouse.

Customers use their preferred SQL clients to analyze their data in Redshift Serverless. They want to use an identity provider (IdP) or single sign-on (SSO) credentials to connect to Redshift Serverless to reuse existing using credentials and avoid additional user setup and configuration. When you use AWS Identity and Access Management (IAM) or IdP-based credentials to connect to a serverless data warehouse, Amazon Redshift automatically creates a database user for the end-user. You can simplify managing user privileges by using role-based access control. Admins can use a database-role mapping for SSO with the IAM roles that users are assigned to get their database privileges automatically. With this integration, organizations can simplify user management because they no longer need to create users and map them to database roles manually. You can define the mapped database roles as a principal tag for the IdP groups or IAM role, so Amazon Redshift database roles and users who are members of those IdP groups are granted to the database roles automatically.

In this post, we focus on Okta as the IdP and provide step-by-step guidance to integrate Redshift Serverless with Okta using the Amazon Redshift Query Editor V2 and with SQL clients like SQL Workbench/J. You can use this mechanism with other IdP providers such as Azure Active Directory or Ping with any applications or tools using Amazon’s JDBC/ODBC/Python driver.

Solution overview

The following diagram illustrates the authentication flow of Okta with Redshift Serverless using federated IAM roles and automatic database-role mapping.

The workflow contains the following steps:

  1. Either the user chooses an IdP app in their browser, or the SQL client initiates a user authentication request to the IdP (Okta).
  2. Upon a successful authentication, Okta submits a request to the AWS federation endpoint with a SAML assertion containing the PrincipalTags.
  3. The AWS federation endpoint validates the SAML assertion and invokes the AWS Security Token Service (AWS STS) API AssumeRoleWithSAML. The SAML assertion contains the IdP user and group information that is stored in the RedshiftDbUser and RedshiftDbRoles principal tags, respectively. Temporary IAM credentials are returned to the SQL client or, if using the Query Editor v2, the user’s browser is redirected to the Query Editor v2 console using the temporary IAM credentials.
  4. The temporary IAM credentials are used by the SQL client or Query Editor v2 to call the Redshift Serverless GetCredentials API. The API uses the principal tags to determine the user and database roles that the user belongs to. An associated database user is created if the user is signing in for the first time and is granted the matching database roles automatically. A temporary password is returned to the SQL client.
  5. Using the database user and temporary password, the SQL client or Query Editor v2 connects to Redshift Serverless. Upon login, the user is authorized based on the Amazon Redshift database roles that were assigned in Step 4.

To set up the solution, we complete the following steps:

  1. Set up your Okta application:
    • Create Okta users.
    • Create groups and assign groups to users.
    • Create the Okta SAML application.
    • Collect Okta information.
  2. Set up AWS configuration:
    • Create the IAM IdP.
    • Create the IAM role and policy.
  3. Configure Redshift Serverless role-based access.
  4. Federate to Redshift Serverless using the Query Editor V2.
  5. Configure the SQL client (for this post, we use SQL Workbench/J).
  6. Optionally, implement MFA with SQL Client and Query Editor V2.

Prerequisites

You need the following prerequisites to set up this solution:

Set up Okta application

In this section, we provide the steps to configure your Okta application.

Create Okta users

To create your Okta users, complete the following steps:

  1. Sign in to your Okta organization as a user with administrative privileges.
  2. On the admin console, under Directory in the navigation pane, choose People.
  3. Choose Add person.
  4. For First Name, enter the user’s first name.
  5. For Last Name, enter the user’s last name.
  6. For Username, enter the user’s user name in email format.
  7. Select I will set password and enter a password.
  8. Optionally, deselect User must change password on first login if you don’t want the user to change their password when they first sign in. Choose Save.

Create groups and assign groups to users

To create your groups and assign them to users, complete the following steps:

  1. Sign in to your Okta organization as a user with administrative privileges.
  2. On the admin console, under Directory in the navigation pane, choose Groups.
  3. Choose Add group.
  4. Enter a group name and choose Save.
  5. Choose the recently created group and then choose Assign people.
  6. Choose the plus sign and then choose Done.
  7. Repeat Steps 1–6 to add more groups.

In this post, we create two groups: sales and finance.

Create an Okta SAML application

To create your Okta SAML application, complete the following steps:

  1. Sign in to your Okta organization as a user with administrative privileges.
  2. On the admin console, under Applications in the navigation pane, choose Applications.
  3. Choose Create App Integration.
  4. Select SAML 2.0 as the sign-in method and choose Next.
  5. Enter a name for your app integration (for example, redshift_app) and choose Next.
  6. Enter following values in the app and leave the rest as is:
    • For Single Sign On URL, enter https://signin.aws.amazon.com/saml.
    • For Audience URI (SP Entity ID), enter urn:amazon:webservices.
    • For Name ID format, enter EmailAddress.
  7. Choose Next.
  8. Choose I’m an Okta customer adding an internal app followed by This is an internal app that we have created.
  9. Choose Finish.
  10. Choose Assignments and then choose Assign.
  11. Choose Assign to groups and then select Assign next to the groups that you want to add.
  12. Choose Done.

Set up Okta advanced configuration

After you create the custom SAML app, complete the following steps:

  1. On the admin console, navigate to General and choose Edit under SAML settings.
  2. Choose Next.
  3. Set Default Relay State to the Query Editor V2 URL, using the format https://<region>.console.aws.amazon.com/sqlworkbench/home. For this post, we use https://us-west-2.console.aws.amazon.com/sqlworkbench/home.
  4. Under Attribute Statements (optional), add the following properties:
    • Provide the IAM role and IdP in comma-separated format using the Role attribute. You’ll create this same IAM role and IdP in a later step when setting up AWS configuration.
    • Set user.login for RoleSessionName. This is used as an identifier for the temporary credentials that are issued when the role is assumed.
    • Set the DB roles using PrincipalTag:RedshiftDbRoles. This uses the Okta groups to fill the principal tags and map them automatically with the Amazon Redshift database roles. Its value must be a colon-separated list in the format role1:role2.
    • Set user.login for PrincipalTag:RedshiftDbUser. This uses the user name in the directory. This is a required tag and defines the database user that is used by Query Editor V2.
    • Set the transitive keys using TransitiveTagKeys. This prevents users from changing the session tags in case of role chaining.

The preceding tags are forwarded to the GetCredentials API to get temporary credentials for your Redshift Serverless instance and map automatically with Amazon Redshift database roles. The following table summarizes their attribute statements configuration.

Name Name Format Format Example
https://aws.amazon.com/SAML/Attributes/Role Unspecified arn:aws:iam::<yourAWSAccountID>:role/role-name,arn:aws:iam:: <yourAWSAccountID>:saml-provider/provider-name arn:aws:iam::112034567890:role/oktarole,arn:aws:iam::112034567890:saml-provider/oktaidp
https://aws.amazon.com/SAML/Attributes/RoleSessionName Unspecified user.login user.login
https://aws.amazon.com/SAML/Attributes/PrincipalTag:RedshiftDbRoles Unspecified String.join(":", isMemberOfGroupName("group1") ? 'group1' : '', isMemberOfGroupName("group2") ? 'group2' : '') String.join(":", isMemberOfGroupName("sales") ? 'sales' : '', isMemberOfGroupName("finance") ? 'finance' : '')
https://aws.amazon.com/SAML/Attributes/PrincipalTag:RedshiftDbUser Unspecified user.login user.login
https://aws.amazon.com/SAML/Attributes/TransitiveTagKeys Unspecified Arrays.flatten("RedshiftDbUser", "RedshiftDbRoles") Arrays.flatten("RedshiftDbUser", "RedshiftDbRoles")
  1. After you add the attribute claims, choose Next followed by Finish.

Your attributes should be in similar format as shown in the following screenshot.

Collect Okta information

To gather your Okta information, complete the following steps:

  1. On the Sign On tab, choose View SAML setup instructions.
  2. For Identity Provider Single Sign-on URL, Use this URL when connecting with any third-party SQL client such as SQL Workbench/J.
  3. Use the IdP metadata in block 4 and save the metadata file in .xml format (for example, metadata.xml).

Set up AWS configuration

In this section, we provide the steps to configure your IAM resources.

Create the IAM IdP

To create your IAM IdP, complete the following steps:

  1. On the IAM console, under Access management in the navigation pane, choose Identity providers.
  2. Choose Add provider.
  3. For Provider type¸ select SAML.
  4. For Provider name¸ enter a name.
  5. Choose Choose file and upload the metadata file (.xml) you downloaded earlier.
  6. Choose Add provider.

Create the IAM Amazon Redshift access policy

To create your IAM policy, complete the following steps:

  1. On the IAM console, choose Policies.
  2. Choose Create policy.
  3. On the Create policy page, choose the JSON tab.
  4. For the policy, enter the JSON in following format:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": "redshift-serverless:GetCredentials",
                "Resource": "<Workgroup ARN>"
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": "redshift-serverless:ListWorkgroups",
                "Resource": "*"
            }
        ]
    }

The workgroup ARN is available on the Redshift Serverless workgroup configuration page.

The following example policy includes only a single Redshift Serverless workgroup; you can modify the policy to include multiple workgroups in the Resource section:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "redshift-serverless:GetCredentials",
            "Resource": "arn:aws:redshift-serverless:us-west-2:123456789012:workgroup/4a4f12vc-123b-2d99-fd34-a12345a1e87f"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "redshift-serverless:ListWorkgroups",
            "Resource": "*"
        }
    ]
}

  1. Choose Next: Tags.
  2. Choose Next: Review.
  3. In the Review policy section, for Name, enter the name of your policy; for example, OktaRedshiftPolicy.
  4. For Description, you can optionally enter a brief description of what the policy does.
  5. Choose Create policy.

Create the IAM role

To create your IAM role, complete the following steps:

  1. On the IAM console, choose Roles in the navigation pane.
  2. Choose Create role.
  3. For Trusted entity type, select SAML 2.0 federation.
  4. For SAML 2.0-based provider, choose the IdP you created earlier.
  5. Select Allow programmatic and AWS Management Console access.
  6. Choose Next.
  7. Choose the policy you created earlier.
  8. Also, add the policy AmazonRedshiftQueryEditorV2ReadSharing.
  9. Choose Next.
  10. In the Review section, for Role Name, enter the name of your role; for example, oktarole.
  11. For Description, you can optionally enter a brief description of what the role does.
  12. Choose Create role.
  13. Navigate to the role that you just created and choose Trust Relationships.
  14. Choose Edit trust policy and choose TagSession under Add actions for STS.

When using session tags, trust policies for all roles connected to the IdP passing tags must have the sts:TagSession permission. For roles without this permission in the trust policy, the AssumeRole operation fails.

  1. Choose Update policy.

The following screenshot shows the role permissions.

The following screenshot shows the trust relationships.

Update the advanced Okta Role Attribute

Complete the following steps:

  1. Switch back to Okta.com.
  2. Navigate to the application which you created earlier.
  3. Navigate to General and click Edit under SAML settings.
  4. Under Attribute Statements (optional), update the value for the attribute – https://aws.amazon.com/SAML/Attributes/Role, using the actual role and identity provider arn values from the above step. For example, arn:aws:iam::123456789012:role/oktarole,arn:aws:iam::123456789012:saml-provider/oktaidp.

Configure Redshift Serverless role-based access

In this step, we create database roles in Amazon Redshift based on the groups that you created in Okta. Make sure the role name matches with the Okta Group name.

Amazon Redshift roles simplify managing privileges required for your end-users. In this post, we create two database roles, sales and finance, and grant them access to query tables with sales and finance data, respectively. You can download this sample SQL Notebook and import into Redshift Query Editor v2 to run all cells in the notebook used in this example. Alternatively, you can copy and enter the SQL into your SQL client.

The following is the syntax to create a role in Redshift Serverless:

create role <IdP groupname>;

For example:

create role sales;
create role finance;

Create the sales and finance database schema:

create schema sales_schema;
create schema finance_schema;

Create the tables:

CREATE TABLE IF NOT EXISTS finance_schema.revenue
(
account INTEGER   ENCODE az64
,customer VARCHAR(20)   ENCODE lzo
,salesamt NUMERIC(18,0)   ENCODE az64
)
DISTSTYLE AUTO
;

insert into finance_schema.revenue values (10001, 'ABC Company', 12000);
insert into finance_schema.revenue values (10002, 'Tech Logistics', 175400);
insert into finance_schema.revenue values (10003, 'XYZ Industry', 24355);
insert into finance_schema.revenue values (10004, 'The tax experts', 186577);

CREATE TABLE IF NOT EXISTS sales_schema.store_sales
(
ID INTEGER   ENCODE az64,
Product varchar(20),
Sales_Amount INTEGER   ENCODE az64
)
DISTSTYLE AUTO
;

Insert into sales_schema.store_sales values (1,'product1',1000);
Insert into sales_schema.store_sales values (2,'product2',2000);
Insert into sales_schema.store_sales values (3,'product3',3000);
Insert into sales_schema.store_sales values (4,'product4',4000);

The following is the syntax to grant permission to the Redshift Serverless role:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]| ALL [ PRIVILEGES ] } ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] } TO role <IdP groupname>;

Grant relevant permission to the role as per your requirements. In the following example, we grant full permission to the role sales on sales_schema and only select permission on finance_schema to the role finance:

grant usage on schema sales_schema to role sales;
grant select on all tables in schema sales_schema to role sales;

grant usage on schema finance_schema to role finance;
grant select on all tables in schema finance_schema to role finance;

Federate to Redshift Serverless using Query Editor V2

The RedshiftDbRoles principal tag and DBGroups are both mechanisms that can be used to integrate with an IdP. However, federating with the RedshiftDbRoles principal has some clear advantages when it comes to connecting with an IdP because it provides automatic mapping between IdP groups and Amazon Redshift database roles. Overall, RedshiftDbRoles is more flexible, easier to manage, and more secure, making it the better option for integrating Amazon Redshift with your IdP.

Now you’re ready to connect to Redshift Serverless using the Query Editor V2 and federated login:

  1. Use the SSO URL you collected earlier and log in to your Okta account with your user credentials. For this demo, we log in with user Ethan.
  2. In the Query Editor v2, choose your Redshift Serverless instance (right-click) and choose Create connection.
  3. For Authentication, select Federated user.
  4. For Database, enter the database name you want to connect to.
  5. Choose Create Connection.

User Ethan will be able to access sales_schema tables. If Ethan tries to access the tables in finance_schema, he will get a permission denied error.

Configure the SQL client (SQL Workbench/J)

To set up SQL Workbench/J, complete the following steps:

  1. Create a new connection in SQL Workbench/J and choose Redshift Serverless as the driver.
  2. Choose Manage drivers and add all the files from the downloaded AWS JDBC driver pack .zip file (remember to unzip the .zip file).
  3. For Username and Password, enter the values that you set in Okta.
  4. Capture the values for app_id, app_name, and idp_host from the Okta app embed link, which can be found on the General tab of your application.
  5. Set the following extended properties:
    • For app_id, enter the value from app embed link (for example, 0oa8p1o1RptSabT9abd0/avc8k7abc32lL4izh3b8).
    • For app_name, enter the value from app embed link (for example, dev-123456_redshift_app_2).
    • For idp_host, enter the value from app embed link (for example, dev-123456.okta.com).
    • For plugin_name, enter com.amazon.redshift.plugin.OktaCredentialsProvider. The following screenshot shows the SQL Workbench/J extended properties.

      1. Choose OK.
      2. Choose Test from SQL Workbench/J to test the connection.
      3. When the connection is successful, choose OK.
      4. Choose OK to sign in with the users created.

User Ethan will be able to access the sales_schema tables. If Ethan tries to access the tables in the finance_schema, he will get a permission denied error.

Congratulations! You have federated with Redshift Serverless and Okta with SQL Workbench/J using RedshiftDbRoles.

[Optional] Implement MFA with SQL Client and Query Editor V2

Implementing MFA poses an additional challenge because the nature of multi-factor authentication is an asynchronous process between initiating the login (the first factor) and completing the login (the second factor). The SAML response will be returned to the appropriate listener in each scenario; the SQL Client or the AWS console in the case of QEV2. Depending on which login options you will be giving your users, you may need an additional Okta application. See below for the different scenarios:

  1. If you are ONLY using QEV2 and not using any other SQL client, then you can use MFA with Query Editor V2 with the above application. There are no changes required in the custom SAML application which we have created above.
  2. If you are NOT using QEV2 and only using third party SQL client (SQL Workbench/J etc), then you need to modify the above custom SAML app as mentioned below.
  3. If you want to use QEV2 and third-party SQL Client with MFA, then you need create an additional custom SAML app as mentioned below.

Prerequisites for MFA

Each identity provider (IdP) has step for enabling and managing MFA for your users. In the case of Okta, see the following guides on how to enable MFA using the Okta Verify application and by defining an authentication policy.

Steps to create/update SAML application which supports MFA for a SQL Client

  1. If creating a second app, follow all the steps which are described under section 1 (Create Okta SAML application).
  2. Open the custom SAML app and select General.
  3. Select Edit under SAML settings
  4. Click Next in General Settings
  5. Under General, update the Single sign-on URL to http://localhost:7890/redshift/
  6. Select Next followed by Finish.

Below is the screenshot from the MFA App after making above changes:

Configure SQL Client for MFA

To set up SQL Workbench/J, complete the following steps:

  1. Follow all the steps which are described under (Configure the SQL client (SQL Workbench/J))
  2. Modify your connection updating the extended properties:
    • login_url – Get the Single Sign-on URL as shown in section -Collect Okta information. (For example, https://dev-123456.okta.com/app/dev-123456_redshiftapp_2/abc8p6o5psS6xUhBJ517/sso/saml)
    • plugin_name – com.amazon.redshift.plugin.BrowserSamlCredentialsProvider
  3. Choose OK
  4. Choose OK from SQL Workbench/J. You’re redirected to the browser to sign in with your Okta credentials.
  5. After that, you will get prompt for MFA. Choose either Enter a code or Get a push notification.
  6. Once authentication is successful, log in to be redirected to a page showing the connection as successful.
  7. With this connection profile, run the following query to return federated user name.

Troubleshooting

If your connection didn’t work, consider the following:

  • Enable logging in the driver. For instructions, see Configure logging.
  • Make sure to use the latest Amazon Redshift JDBC driver version.
  • If you’re getting errors while setting up the application on Okta, make sure you have admin access.
  • If you can authenticate via the SQL client but get a permission issue or can’t see objects, grant the relevant permission to the role, as detailed earlier in this post.

Clean up

When you’re done testing the solution, clean up the resources to avoid incurring future charges:

  1. Delete the Redshift Serverless instance by deleting both the workgroup and the namespace.
  2. Delete the IAM roles, IAM IdPs, and IAM policies.

Conclusion

In this post, we provided step-by-step instructions to integrate Redshift Serverless with Okta using the Amazon Redshift Query Editor V2 and SQL Workbench/J with the help of federated IAM roles and automatic database-role mapping. You can use a similar setup with any other SQL client (such as DBeaver or DataGrip) or business intelligence tool (such as Tableau Desktop). We also showed how Okta group membership is mapped automatically with Redshift Serverless roles to use role-based authentication seamlessly.

For more information about Redshift Serverless single sign-on using database roles, see Defining database roles to grant to federated users in Amazon Redshift Serverless.


About the Authors

Maneesh Sharma is a Senior Database Engineer at AWS with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.

Debu-PandaDebu Panda is a Senior Manager, Product Management at AWS. He is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world.

Mohamed ShaabanMohamed Shaaban is a Senior Software Engineer in Amazon Redshift and is based in Berlin, Germany. He has over 12 years of experience in the software engineering. He is passionate about cloud services and building solutions that delight customers. Outside of work, he is an amateur photographer who loves to explore and capture unique moments.

Rajiv Gupta is Sr. Manager of Analytics Specialist Solutions Architects based out of Irvine, CA. He has 20+ years of experience building and managing teams who build data warehouse and business intelligence solutions.

Amol Mhatre is a Database Engineer in Amazon Redshift and works on Customer & Partner engagements. Prior to Amazon, he has worked on multiple projects involving Database & ERP implementations.

Ning Di is a Software Development Engineer at Amazon Redshift, driven by a genuine passion for exploring all aspects of technology.

Harsha Kesapragada is a Software Development Engineer for Amazon Redshift with a passion to build scalable and secure systems. In the past few years, he has been working on Redshift Datasharing, Security and Redshift Serverless.

Integrate Amazon Redshift row-level security with Amazon Redshift native IdP authentication

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/integrate-amazon-redshift-row-level-security-with-amazon-redshift-native-idp-authentication/

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers.

As enterprise customers look to build their data warehouse on Amazon Redshift, they have business requirements to enforce adequate fine-grained access controls to govern who can access which rows of their sensitive data based on the user profiler. Furthermore, many organizations want to access Amazon Redshift using their existing identity provider (IdP) while maintaining these compliance and security requirements around their data. Without proper integrated features to enforce row-level and secure authentication, you may have to develop additional solutions such as views, or try to integrate third-party solutions around your data to enforce security.

With the introduction of row-level security in Amazon Redshift, you can restrict user access at the row level. Additionally, we have introduced a native IdP functionality to help you implement authentication and authorization with your choice of business intelligence (BI) tools in a seamless way.

Amazon Redshift row-level security (RLS) provides granular access control over your sensitive data. It does this by using RLS policies to determine which rows to return in the query result sets.

In this post, we walk you through an example on how you can implement row-level security in Amazon Redshift while using existing IdP credentials to simplify authentication and managing permissions. You can use this flexible solution to provide complete control over data access while maintaining authorization using your existing IdP.

Solution overview

For our use case, an organization requires row-level security to restrict access to sales performance data to specific states and their allocated salesperson. We have the following business rules and conditions:

  • Alice, the salesperson for NY, should have access to NY sales data only
  • Bob, the salesperson for CA, should get access to CA sales data only
  • Charlie, the sales manager for the North America region, should have access to sales data for all states
  • Jen, who belongs to HR department, shouldn’t have access to any sales data

The following diagram illustrates the solution architecture we implement to solve this problem statement using Amazon Redshift row-level security and Amazon Redshift native IdP authentication.

Solution Overview

The solution contains the following steps:

  1. Create RLS policies to provide fine-grained access control for row-level data on the Sales table.
  2. Create Amazon Redshift roles for each of the different Azure AD groups and assign relevant permissions to the table.

With native IdP, roles get created automatically based on Azure groups. However, as a best practice, we’re pre-creating the Amazon Redshift roles and assigning relevant permissions.

  1. Attach row-level security policies to the roles.
  2. Configure a JDBC or ODBC driver in your SQL client to use Azure AD federation and use Azure AD login credentials to sign in.
  3. Upon successful authentication, Azure AD issues an authentication token (OAuth token) back to the Amazon Redshift driver.
  4. The driver forwards the authentication token to the Amazon Redshift cluster to initiate a new database session. Amazon Redshift verifies and validates the authentication token.
  5. Amazon Redshift calls the Azure Graph API to obtain the user’s group membership.
  6. Amazon Redshift maps the logged-in Azure AD user to the Amazon Redshift user and maps the Azure AD groups to Amazon Redshift roles.
  7. The Amazon Redshift roles are pre-mapped with the RLS policies mentioned in step 3. This allows the respective users to query the fine-grained row-level access data from the client.

Prerequisites

To implement this solution, you must have the following prerequisites:

Implement your Amazon Redshift native IdP

To set up your Amazon Redshift native IdP setup, refer to Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client. Follow the steps to set up your Azure application and collect Azure AD information for the Amazon Redshift IdP.

For this post, we have created the following four groups in Azure AD:

  • sales_ny
  • sales_ca
  • sales_manager
  • hr_group

Then we created the following four users in Azure AD:

  • Alice – The salesperson in NY state
  • Bob – The salesperson in CA state
  • Charlie – The manager for the North America region
  • Jen – A member of the HR group

Add the respective users to their appropriate group:

  • Alicesales_ny
  • Bobsales_ca
  • Charliesales_manager
  • JenHR

Next, we need to register the IdP in Amazon Redshift using the following command:

CREATE IDENTITY PROVIDER rls_idp TYPE
azure NAMESPACE 'aad'
PARAMETERS '{
"issuer":"https://sts.windows.net/87f4aa26-78b7-410e-bf29-57b39929ef9a/",
"audience":["https://analysis.windows.net/powerbi/connector/AmazonRedshift",
"api://991abc78-78ab-4ad8-a123-zf123ab03612p"],
"client_id":"123ab555-a321-666d-7890-11a123a44890",
"client_secret":"KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB" }'
;

In the preceding statement, the type azure indicates that the provider specifically facilitates communication with Microsoft Azure AD. We use the following parameters to collect Azure AD information (for more information, refer to Collect Azure AD Information in Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client).

  • issuer – The issuer ID to trust when a token is received. The unique identifier for the tenant_id is appended to the issuer.
  • client_id – The unique public identifier of the application registered with the IdP. This can be referred to as the application ID.
  • client_secret – A secret identifier, or password, known only to the IdP and the registered application.
  • audience – The application ID that is assigned to the application in Azure. For this post, we connect with Amazon Redshift using Power BI Desktop and SQL Workbench/J. The audience value is hardcoded for Power BI desktop, for example https://analysis.windows.net/powerbi/connector/AmazonRedshift . The second audience value is for the SQL client, which you get from the application ID URI in the OAuth application. For example, api://991abc78-78ab-4ad8-a123-zf123ab03612p.

Use the following command to view the registered IdP on Amazon Redshift:

DESC IDENTITY PROVIDER rls_idp;

Native IdP - data

Use the following command to view all the IdPs registered:

select * from svv_identity_providers;

The following Sales table contains information about each salesperson, the respective state they cover, and their total sales amount:

CREATE TABLE SALES (sales_person VARCHAR(30), state CHAR(2), "total_sales" INT);
INSERT INTO SALES VALUES ('Alice', 'NY', 5000);
INSERT INTO SALES VALUES ('Bob', 'CA', 6000);
INSERT INTO SALES VALUES ('Sally', 'IL', 7000);

Sales data

Now we create four roles in the Amazon Redshift cluster based on the groups that we created on the Azure AD portal and assign relevant permissions to them. This simplifies administration by assigning different permissions to different roles and assigning them to different users.

The role name in the Amazon Redshift cluster looks like <namespace>:<azure_ad_group_name>, where the namespace is the one we provided in the IdP creation command (aad) and the group name is the Azure AD group. See the following code:

CREATE ROLE "aad:sales_ny";
CREATE ROLE "aad:sales_ca";
CREATE ROLE "aad:sales_manager";
CREATE ROLE "aad:hr";

Now we grant permission to the Amazon Redshift role on the appropriate tables. For this post, we assign SELECT permission on the Sales table for all four roles:

GRANT SELECT ON TABLE SALES TO ROLE "aad:sales_ny";
GRANT SELECT ON TABLE SALES TO ROLE "aad:sales_ca";
GRANT SELECT ON TABLE SALES TO ROLE "aad:sales_manager";
GRANT SELECT ON TABLE SALES TO ROLE "aad:hr";

Use the following command to view all the roles in the cluster:

select * from svv_roles;

roles information

Create a row-level security policy

Let’s enforce an RLS policy on the Sales table to restrict access to sales performance information for a salesperson specific to a particular state. We create the following policy:

CREATE RLS POLICY policy_sales_ny
WITH (state char(2))
USING (state = 'NY');
CREATE RLS POLICY policy_sales_ca
WITH (state char(2))
USING (state = 'CA');

The sales manager is also required to view sales across the North American region. For this, we create the following policy:

CREATE RLS POLICY policy_sales_all
USING (true);

The policy_sales_all policy allows the sales manager to view all the information in the sales table.

Attach the row-level security policy to roles

Now we have to attach the row-level security policies to their respective Amazon Redshift roles so that when the user logs in using their Amazon Redshift native IdP, they can get fine-grained access to the records.

ATTACH RLS POLICY policy_sales_ny ON public.sales TO ROLE "aad:sales_ny";
ATTACH RLS POLICY policy_sales_ca ON public.sales TO ROLE "aad:sales_ca";
ATTACH RLS POLICY policy_sales_all ON public.sales TO ROLE "aad:sales_manager";

For the HR role, we haven’t created or attached any RLS policy because we don’t want any user from the HR group to get access to sales records.

Enable row-level security on the table

Now let’s enable row-level security on the respective tables. In this demo, we enable the RLS policy on the Sales table using the following command:

ALTER TABLE public.sales ROW LEVEL SECURITY ON;

Use the following command to view the RLS policies:

SELECT * FROM svv_rls_attached_policy;

RLS policy - Data

Test row-level security using Power BI Desktop

In this example, we use Microsoft Power BI Desktop to connect with Amazon Redshift using a native IdP. For this solution, use Microsoft Power BI Desktop- Version: 2.102.683.0 64-bit and above.

  1. In your Microsoft Power BI Desktop, choose Get data.

Native IdP- PowerBI Desktop-Login

  1. Search for the Amazon Redshift connector, choose it, and choose Connect.

Native IdP- PowerBI Desktop-Login

  1. For Server, enter your Amazon Redshift cluster’s endpoint. For example: test-cluster.ct4abcufthff.us-east-1.redshift.amazonaws.com.
  2. For Database, enter your database name (for this post, we enter dev).
  3. Choose OK.

Native IdP- PowerBI Desktop-connection

  1. Choose Microsoft Account.

Native IdP- PowerBI Desktop-Login

  1. Choose Sign in.

RLS-Native IdP- PowerBI Desktop-Login

  1. Enter your Microsoft Account credentials in the authorization dialog. For this example, we sign in with user Alice.
  1. Choose Next.

RLS-Native IdP- PowerBI Desktop-Login

Once connected, you will see the message “You are currently signed in.”

  1. Choose Connect.

As shown in the following screenshot, Azure AD user Alice is able to authenticate using an Amazon Redshift native IdP, and the RLS policies were applied automatically, allowing Alice to access sales performance information for only NY state.

RLS-Native IdP- PowerBI Desktop-Authorized to view respective data

Similarly, we can try signing in as user Bob and see only CA state information.

RLS-Native IdP- PowerBI Desktop-UnAuthorized to view respective data

Charlie belongs to the manager role where the view all policy has been applied, so when he signs in, he is able to view all the rows in the sales table.

RLS-Native IdP- PowerBI Desktop-Authorized to view data

Finally, when Jen signs in, she can access the table, but isn’t able to view any sales records because no RLS policy has been attached to the HR role.

RLS-Native IdP- PowerBI Desktop-UnAuthorized to view data

If we haven’t granted SELECT on the sales table to the role aad:hr, which Jen belongs to, then she can’t access the sales table.

RLS-Native IdP- PowerBI Desktop-UnAuthorized to access table

Test row-level security using SQL Workbench/J

Now we test row-level security with an Amazon Redshift native IdP using SQL Workbench/J.

  1. Create a new connection in SQL Workbench/J and choose Amazon Redshift as the driver.
  2. Choose Manage drivers and add all the files from the downloaded AWS JDBC driver pack .zip file. (Remember to unzip the file.)

Make sure to use the Amazon Redshift driver 2.1.0.4 onwards, because all previous Amazon Redshift driver versions don’t support the Amazon Redshift native IDP feature.

Native IdP- Workbench/J drivers

  1. For URL, enter jdbc:redshift://<cluster endpoint>:<port>:<databasename>. For example: jdbc:redshift://test-cluster.ab6yejheyhgf.us-east-1.redshift.amazonaws.com:5439/dev.

Native IdP- Workbench/J Connection

  1. On the Driver properties tab, add the following properties:
    1. plugin_namecom.amazon.redshift.plugin.BrowserAzureOAuth2CredentialsProvider
    2. listen_port – 7890
    3. idp_response_timeout – 50
    4. scope – Enter the scope value from the OAuth application. For example, api://991abc78-78ab-4ad8-a123-zf123ab03612p/jdbc_login.
    5. client_id – Enter the client_id value from the OAuth application. For example, 991abc78-78ab-4ad8-a123-zf123ab03612p.
    6. idp_tenant – Enter the tenant ID value from the OAuth application. For example, 87f4aa26-78b7-410e-bf29-57b39929ef9a.

Native IdP- Workbench/J Parameters

  1. Choose OK from SQL Workbench/J.

You’re redirected to the browser to sign in with your Azure AD credentials.

As shown in the following screenshot, Azure AD user Alice is able to authenticate using an Amazon Redshift native IdP and view only sales performance information for NY state.

Similarly, we can re-authenticate and sign in as user Bob, who is able to view sales information specific to CA state.

When Charlie signs in, he is able to view all the rows from every state.

Finally, when Jen signs in, she is able to access the table, but can’t view any sales records because no RLS policy has been attached to the HR role.

If we haven’t granted SELECT on the sales table to the role aad:hr, which Jen belongs to, then Jen can’t access the sales table.

Summary

In this post, we covered how you can achieve a secure end-to-end experience using Amazon Redshift native IdP authentication, which simplifies administration and row-level security to enable fine-grained row-level access in Amazon Redshift.

For more information about Amazon Redshift row-level security and native IdP federation, refer to:


About the authors

Maneesh Sharma is a Senior Database Engineer at AWS with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.

Harshida Patel is a Specialist Sr. Solutions Architect, Analytics, with AWS.

Kiran Chinta is a Software Development Manager at Amazon Redshift. He leads a strong team in query processing, SQL language, data security, and performance. Kiran is passionate about delivering products that seamlessly integrate with customers’ business applications with the right ease of use and performance. In his spare time, he enjoys reading and playing tennis.

Debu-PandaDebu Panda is a Senior Manager, Product Management, with AWS. He is an industry leader in analytics, application platforms, and database technologies, and has more than 25 years of experience in the IT world. Debu has published numerous articles on analytics, enterprise Java, and databases, and has presented at multiple conferences such as AWS re:Invent, Oracle Open World, and Java One. He is lead author of the EJB 3 in Action (Manning Publications 2007, 2014) and Middleware Management (Packt).

Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/integrate-amazon-redshift-native-idp-federation-with-microsoft-azure-ad-using-a-sql-client/

Amazon Redshift accelerates your time to insights with fast, easy, and secure cloud data warehousing at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries.

The new Amazon Redshift native identity provider authentication simplifies administration by sharing identity and group membership information to Amazon Redshift from a third-party identity provider (IdP) service, such as Microsoft Azure Active Directory (Azure AD), and enabling Amazon Redshift to natively process third-party tokens, identities, and group permissions. This process is very easy to set up, provides a secure and smoother customer experience for managing identities and groups at a centralized external IdP, and integrates natively with Amazon Redshift.

In this post, we focus on Microsoft Azure AD as the IdP and provide step-by-step guidance to connect SQL clients like SQL Workbench/J and DBeaver with Amazon Redshift using a native IdP process. Azure AD manages the users and provides federated access to Amazon Redshift. You don’t need to create separate Amazon Redshift database users, AWS Identity and Access Management (IAM) roles, or IAM policies with this setup.

Solution overview

Using an Amazon Redshift native IdP has the following benefits:

  • Enables your users to be automatically signed in to Amazon Redshift with their Azure AD accounts
  • You can manage users and groups from a centralized IdP
  • External users can securely access Amazon Redshift without manually creating new user names or roles using their existing corporate directory credentials
  • External user group memberships are natively mirrored with Amazon Redshift roles and users

The following diagram illustrates the architecture of a native IdP for Amazon Redshift:

The workflow contains the following steps:

  1. You configure a JDBC or ODBC driver in your SQL client to use Azure AD federation and use Azure AD login credentials to sign in.
  2. Upon a successful authentication, Azure AD issues an authentication token (OAuth token) back to the Amazon Redshift driver.
  3. The driver forwards the authentication token to the Amazon Redshift cluster to initiate a new database session.
  4. Amazon Redshift verifies and validates the authentication token.
  5. Amazon Redshift calls the Azure Graph API to obtain the user’s group membership.
  6. Amazon Redshift maps the logged-in Azure AD user to the Amazon Redshift user and maps the Azure AD groups to Amazon Redshift roles. If the user and groups don’t exist, Amazon Redshift automatically creates those identities within the IdP namespace.

To implement the solution, you complete the following high-level steps:

  1. Set up your Azure application.
    1. Create OAuth Application
    2. Create Redshift Client application
    3. Create Azure AD Group
  2. Collect Azure AD information for the Amazon Redshift IdP.
  3. Set up the IdP on Amazon Redshift.
  4. Set up Amazon Redshift permissions to external identities.
  5. Configure the SQL client (for this post, we use SQL Workbench/J and DBeaver).

Prerequisites

You need the following prerequisites to set up this solution:

Set up your Azure application

For integrating with any SQL client/BI tool except Microsoft Power BI, we would be creating two applications. First application will be used to authenticate the user and provide a login token.  Second application will be used by Redshift to retrieve user and group information.

Step 1: Create OAuth Application

  1. Sign in to the Azure portal with your Microsoft account.
  2. Navigate to the Azure Active Directory application.
  3. Under Manage in the navigation pane, choose App registrations and then choose New registration.
  4. For Name, enter a name (for example, oauth_application).
  5. For Redirect URI, choose Public client/native (mobile and desktop) and enter the redirect URL http://localhost:7890/redshift/. For this post, we are keeping the default settings for the rest of the fields.
  6. Choose Register.
  7. In the navigation pane, under Manage, choose Expose an API.

If you’re setting up for the first time, you can see Set to the right of Application ID URI.

  1. Choose Set and then choose Save.
  2. After the application ID URI is set up, choose Add a scope.
  3. For Scope name, enter a name (for example, jdbc_login).
  4. For Admin consent display name, enter a display name (for example, JDBC login).
  5. For Admin consent description, enter a description of the scope.
  6. Choose Add scope.

  7. After the scope is added, note down the application ID URI (for example, api://991abc78-78ab-4ad8-a123-zf123ab03612p) and API scope (api://991abc78-78ab-4ad8-a123-zf123ab03612p/jdbc_login) in order to register the IdP in Amazon Redshift later.

The application ID URI is known as <Microsoft_Azure_Application_ID_URI> in the following section.

The API scope is known as <Microsoft_Azure_API_scope_value> when setting up the SQL client such as DBeaver and SQL Workbench/J.

Step 2. Create Redshift Client Application

  1. Navigate to the Azure Active Directory application.
  2. Under Manage in the navigation pane, choose App registrations and then choose New registration.
  3. For Name, enter a name (for example, redshift_client). For this post, we are keeping the default settings for the rest of the fields.
  4. Choose Register.
  5. On the newly created application Overview page, locate the client ID and tenant ID and note down these IDs in order to register the IdP in Amazon Redshift later.
  6. In the navigation pane, choose Certificates & secrets.
  7. Choose New client secret.
  8. Enter a Description, select an expiration for the secret or specify a custom lifetime. We are keeping Microsoft recommended default expiration value of 6 months. Choose Add.
  9. Copy the secret value.

It would only be present one time and after that you cannot read it.

  1. In the navigation pane, choose API permissions.
  2. Choose Add a permission and choose Microsoft Graph.
  3. Choose Application permissions.
  4. Search the directory and select the Directory.Read.All permission.
  5. Choose Add permissions.
  6. After the permission is created, choose Grant admin consent.
  7. In the pop-up box, choose Yes to grant the admin consent.

The status for the permission shows as Granted for with a green check mark.

Step 3. Create Azure AD Group

  1. On the Azure AD home page, under Manage, choose Groups.
  2. Choose New group.
  3. In the New Group section, provide the required information.
  4. Choose No members selected and then search for the members.
  5. Select the members and choose Select. For this example, you can search your username and click select.

You can see the number of members in the Members section.

  1. Choose Create.

Collect Azure AD information

Before we collect the Azure AD information, we need to identify the access token version from the application which you have created earlier on the Azure portal under Step 1. Create OAuth Application. In the navigation pane, under Manage, choose Manifest section, then view the accessTokenAcceptedVersion parameter: null and 1 indicate v1.0 tokens, and 2 indicates v2.0 tokens.

To configure your IdP in Amazon Redshift, collect the following parameters from Azure AD. If you don’t have these parameters, contact your Azure admin.

  1. issuer – This is known as <Microsoft_Azure_issuer_value> in the following sections. If you’re using the v1.0 token, use https://sts.windows.net/<Microsoft_Azure_tenantid_value>/. If you’re using the v2.0 token, use https://login.microsoftonline.com/<Microsoft_Azure_tenantid_value>/v2.0. To find your Azure tenant ID, complete the following steps:
    • Sign in to the Azure portal with your Microsoft account.
    • Under Manage, choose App registrations.
    • Choose any application which you have created in previous sections.
    • Click on the Overview (left panel) page and it’s listed in the Essentials section as Directory (tenant) ID.

  2. client_id – This is known as <Microsoft_Azure_clientid_value> in the following sections. An example of a client ID is 123ab555-a321-666d-7890-11a123a44890). To get your client ID value, locate the application you created earlier on the Azure portal under Step 2. Create Redshift Client Application. Click on the Overview (left panel) page and it’s listed in the Essentials section.
  3. client_secret – This is known as <Microsoft_Azure_client_secret_value> in the following sections. An example of a client secret value is KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB). To create your client secret value, refer to the section under Step 2. Create Redshift Client Application.
  4. audience – This is known as <Microsoft_Azure_token_audience_value> in the following sections. If you’re using a v1.0 token, the audience value is the application ID URI (for example, api://991abc78-78ab-4ad8-a123-zf123ab03612p). If you’re using a v2.0 token, the audience value is the client ID value (for example, 991abc78-78ab-4ad8-a123-zf123ab03612p). To get these values, please refer to the application which you have created in Step 1: Create OAuth Application. Click on the Overview (left panel) page and it’s listed in the Essentials section.

Set up the IdP on Amazon Redshift

To set up the IdP on Amazon Redshift, complete the following steps:

  1. Log in to Amazon Redshift with a superuser user name and password using query editor v2 or any SQL client.
  2. Run the following SQL:
    CREATE IDENTITY PROVIDER <idp_name> TYPE azure 
    NAMESPACE '<namespace_name>' 
    PARAMETERS '{ 
    "issuer":"<Microsoft_Azure_issuer_value>", 
    "audience":["<Microsoft_Azure_token_audience_value>"],
    "client_id":"<Microsoft_Azure_clientid_value>", 
    "client_secret":"<Microsoft_Azure_client_secret_value>"
    }';

For example, the following code uses a v1.0 access token:

CREATE IDENTITY PROVIDER test_idp TYPE 
azure NAMESPACE 'oauth_aad' 
PARAMETERS '{
"issuer":"https://sts.windows.net/87f4aa26-78b7-410e-bf29-57b39929ef9a/", 
"audience":["api://991abc78-78ab-4ad8-a123-zf123ab03612p"],
"client_id":"123ab555-a321-666d-7890-11a123a44890", 
"client_secret":"KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB"
}';

The following code uses a v2.0 access token:

CREATE IDENTITY PROVIDER test_idp TYPE 
azure NAMESPACE 'oauth_aad' 
PARAMETERS '{
"issuer":
"https://login.microsoftonline.com/87f4aa26-78b7-410e-bf29-57b39929ef9a/v2.0",
"audience":["991abc78-78ab-4ad8-a123-zf123ab03612p"], 
"client_id":"123ab555-a321-666d-7890-11a123a44890", 
"client_secret":"KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB" 
}';
  1. To alter the IdP, use the following command (this new set of parameter values completely replaces the current values):
    ALTER IDENTITY PROVIDER <idp_name> PARAMETERS 
    '{
    "issuer":"<Microsoft_Azure_issuer_value>/",
    "audience":["<Microsoft_Azure_token_audience_value>"], 
    "client_id":"<Microsoft_Azure_clientid_value>", 
    "client_secret":"<Microsoft_Azure_client_secret_value>"
    }';

  2. To view a single registered IdP in the cluster, use the following code:
    DESC IDENTITY PROVIDER <idp_name>;

  3. To view all registered IdPs in the cluster, use the following code:
    select * from svv_identity_providers;

  4. To drop the IdP, use the following command:
    DROP IDENTITY PROVIDER <idp_name> [CASCADE];

Set up Amazon Redshift permissions to external identities

The users, roles, and role assignments are automatically created in your Amazon Redshift cluster during the first login using your native IdP unless they were manually created earlier.

Create and assign permission to Amazon Redshift roles

In this step, we create a role in the Amazon Redshift cluster based on the groups that you created on the Azure AD portal. This helps us avoid creating multiple user names manually on the Amazon Redshift side and assign permissions for multiple users individually.

The role name in the Amazon Redshift cluster looks like <namespace>:<azure_ad_group_name>, where the namespace is the one we provided in the IdP creation command and the group name is the one we specified when we were setting up the Azure application. In our example, it’s oauth_aad:rsgroup.

Run the following command in the Amazon Redshift cluster to create a role:

create role "<namespace_name>:<Azure AD groupname>";

For example:

create role "oauth_aad:rsgroup";

To grant permission to the Amazon Redshift role, enter the following command:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]
 | ALL [ PRIVILEGES ] }
ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
TO role "<namespace_name>:<Azure AD groupname>";

Then grant relevant permission to the role as per your requirement. For example:

grant select on all tables in schema public to role "oauth_aad:rsgroup";

Create and assign permission to an Amazon Redshift user

This step is only required if you want to grant permission to an Amazon Redshift user instead of roles. We create an Amazon Redshift user that maps to a Azure AD user and then grant permission to it. If you don’t want to explicitly assign permission to an Amazon Redshift user, you can skip this step.

To create the user, use the following syntax:

CREATE USER "<namespace_name>:<Azure AD username>" PASSWORD DISABLE;

For example:

CREATE USER "oauth_aad:[email protected]" PASSWORD DISABLE;

We use the following syntax to grant permission to the Amazon Redshift user:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]
 | ALL [ PRIVILEGES ] }
ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
TO "<namespace_name>:<Azure AD username>";

For example:

grant select on all tables in schema public to "oauth_aad:[email protected]";

Configure the SQL client

In this section, we provide instructions to set up a SQL client using either DBeaver or SQL Workbench/J.

Set up DBeaver

To set up DBeaver, complete the following steps:

  1. Go to Database and choose Driver Manager.
  2. Search for Redshift, then choose it and choose Copy.
  3. On the Settings tab, for Driver name, enter a name, such as Redshift Native IDP.
  4. Update the URL template to jdbc:redshift://{host}:{port}/{database}?plugin_name=com.amazon.redshift.plugin.BrowserAzureOAuth2CredentialsProvider
    Note: In this URL template, do not replace the template parameters with the actual values. Please keep the value as shown in screenshot below.
  5. On the Libraries tab, choose Add files. Keep only one set of the latest driver version (2.1.0.4 and upwards) and if you see any older versions, delete those files.
  6. Add all the files from the downloaded AWS JDBC driver pack .zip file and choose OK (remember to unzip the .zip file).

Note: Use the Amazon Redshift driver 2.1.0.4 onwards, because all previous Amazon Redshift driver versions don’t support the Amazon Redshift native IdP feature.

  1. Close the Driver Manager.
  2. Go to Database and choose New Database Connection.
  3. Search for Redshift Native IDP, then choose it and choose Next.
  4. For Host/Instance, enter your Amazon Redshift endpoint. For e.g. test-cluster.ab6yejheyhgf.us-east-1.redshift.amazonaws.com.
  5. For Database, enter the database name (for this post, we use dev).
  6. For Port, enter 5439.
  7. Please get the below parameter values (scope, client_id and idp_tenant) from the application which you have created in Step 1: Create OAuth Application. On the Driver properties tab, add the following properties:
    1. listen_port – 7890
    2. idp_response_timeout – 50
    3. scope – Enter the value for <Microsoft_Azure_API_scope_value>.
      • If you’re using a v1.0 token, then use the scope value (for example, api://991abc78-78ab-4ad8-a123-zf123ab03612p/jdbc_login).
      • If you’re using a v2.0 token, the scope value is the client ID value (for example, 991abc78-78ab-4ad8-a123-zf123ab03612p).
    4. client_id – Enter the value for <Microsoft_Azure_clientid_value>. For example, 991abc78-78ab-4ad8-a123-zf123ab03612p.
    5. idp_tenant – Enter the value for <Microsoft_Azure_tenantid_value>. For example, 87f4aa26-78b7-410e-bf29-57b39929ef9a.
  8. You can verify the connection by choosing Test Connection.

You’re redirected to the browser to sign in with your Azure AD credentials. In case, you get SSL related error, then go to SSL tab and select Use SSL

  1. Log in to be redirected to a page showing the connection as successful.
  2. Choose Ok.

Congratulations! You have completed the Amazon Redshift native IdP setup with DBeaver.

Set up SQL Workbench/J

To set up SQL Workbench/J, complete the following steps:

  1. Create a new connection in SQL Workbench/J and choose Amazon Redshift as the driver.
  2. Choose Manage drivers and add all the files from the downloaded AWS JDBC driver pack .zip file (remember to unzip the .zip file).

Use the Amazon Redshift driver 2.1.0.4 onwards, because all previous Amazon Redshift driver versions don’t support the Amazon Redshift native IdP feature.

  1. For URL, enter jdbc:redshift://<cluster endpoint>:<port>:<databasename>. For e.g., jdbc:redshift://test-cluster.ab6yejheyhgf.us-east-1.redshift.amazonaws.com:5439/dev.
  2. Please get the below parameter values (scope, client_id and idp_tenant) from the application which you have created in Step 1: Create OAuth Application. On the Driver properties tab, add the following properties:
    1. plugin_namecom.amazon.redshift.plugin.BrowserAzureOAuth2CredentialsProvider
    2. listen_port – 7890
    3. idp_response_timeout – 50
    4. scope – Enter the value for <Microsoft_Azure_API_scope_value>.
      • If you’re using a v1.0 token, then use the scope value (for example, api://991abc78-78ab-4ad8-a123-zf123ab03612p/jdbc_login).
      • If you’re using a v2.0 token, the scope value is the client ID value (for example, 991abc78-78ab-4ad8-a123-zf123ab03612p).
    5. client_id – Enter the value for <Microsoft_Azure_clientid_value>. For example, 991abc78-78ab-4ad8-a123-zf123ab03612p.
    6. idp_tenant – Enter the value for <Microsoft_Azure_tenantid_value>. For example, 87f4aa26-78b7-410e-bf29-57b39929ef9a.
  3. Choose OK.
  4. Choose Test from SQL Workbench/J.

You’re redirected to the browser to sign in with your Azure AD credentials.

  1. Log in to be redirected to a page showing the connection as successful.
  2. Choose Finish.

    sqlworkbenchj-test-successful
  3. With this connection profile, run the following query to test Amazon Redshift native IdP authentication.
    sqlworkbenchj-current-user

Congratulations! You have completed the Amazon Redshift native IdP setup with SQL Workbench/J.

Best Practices with Redshift native IdP:

  • Pre-create the Amazon Redshift roles based upon the groups which you have created on the Azure AD portal.
  • Assign permissions to Redshift roles instead of assigning to each individual external user. This will provide smoother end user experience as user will have all the required permission when they login using native IdP.

Troubleshooting

If your connection didn’t work, consider the following:

  • Enable logging in the driver. For instructions, see Configure logging.
  • Make sure to use the latest Amazon Redshift JDBC driver version 2.1.0.4 onwards, which supports Amazon Redshift native IdP authentication.
  • If you’re getting errors while setting up the application on Azure AD, make sure you have admin access.
  • If you can authenticate via the SQL client but get a permission issue or can’t see objects, grant the relevant permission to the role, as detailed earlier in this post.
  • If you get the error “claim value does not match expected value,” make sure you provided the correct parameters during Amazon Redshift IdP registration.
  • Check stl_error or stl_connection_log views on the Amazon Redshift cluster for authentication failures.

Conclusion

In this post, we provided step-by-step instructions to integrate Amazon Redshift with Azure AD and SQL clients (SQL Workbench/J and DBeaver) using Amazon Redshift native IdP authentication. We also showed how Azure group membership is mapped automatically with Amazon Redshift roles and how to set up Amazon Redshift permissions.

For more information about Amazon Redshift native IdP federation, see:


About the Authors

Maneesh Sharma is a Senior Database Engineer at AWS with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.

Debu-PandaDebu Panda is a Senior Manager, Product Management at AWS. He is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world.

Ilesh Garish is a Software Development Engineer at AWS. His role is to develop connectors for Amazon Redshift. Prior to AWS, he built database drivers for the Oracle RDBMS, TigerLogic XDMS, and OpenAccess SDK. He also worked in the database internal technologies at San Francisco Bay Area startups.

Dengfeng(Davis) Li is a Software Development Engineer at AWS. His passion is creating ease-of-use, secure and scalable applications. In the past few years, he worked on Redshift security, data sharing and catalog optimization.

Integrate Amazon Redshift native IdP federation with Microsoft Azure AD and Power BI

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/integrate-amazon-redshift-native-idp-federation-with-microsoft-azure-ad-and-power-bi/

Amazon Redshift accelerates your time to insights with fast, easy, and secure cloud data warehousing at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries.

As enterprise customers look to build their data warehouse on Amazon Redshift, they have many integration needs with the business intelligence (BI) tools they’re using. For customers who want to integrate Amazon Redshift with their existing identity provider (IdP) such as Microsoft Azure Active Directory (Azure AD) using BI tools and services such as Power BI Desktop and Power BI service, we have introduced a native IdP for Amazon Redshift to help you implement authentication and authorization for these tools in a seamless way.

Amazon Redshift native IdP simplifies the administration process of managing identities and permission. This feature provides native integration with Microsoft Azure AD, which you can use for authentication and authorization with tools like Power BI. It uses your existing IdP to simplify authentication and managing permissions. It does this by making it possible to share identity metadata to Amazon Redshift from your IdP. In this approach, an external IdP (such as Azure AD) issues an access token, which is passed to Amazon Redshift via a client, and then Amazon Redshift performs the token validation and claim extraction natively.

This post shows a step-by-step implementation of the Amazon Redshift native IdP setup with Azure AD, which demonstrates how to manage users and groups with an organizational directory, and how to federate into Amazon Redshift. You don’t need to create AWS Identity and Access Management (IAM) roles, policies, separate database users, or groups in Amazon Redshift with this setup.

Solution overview

Using an Amazon Redshift native IdP has the following benefits:

  • You can manage users and groups from a centralized IdP
  • Enables your users to be automatically signed in to Amazon Redshift with their Azure AD accounts
  • You can automatically create Amazon Redshift roles with a namespace that represents external groups (such as Azure AD groups)
  • External user group membership is natively mirrored with Amazon Redshift roles and users

The general configuration steps of the Amazon Redshift native IdP approach are as follows:

  1. Register an application in Azure AD and set up groups.
  2. Collect Azure AD information for the Amazon Redshift IdP.
  3. Set up the IdP on Amazon Redshift.
  4. Set up Amazon Redshift permissions to external identities.
  5. Configure the client connection.

The following diagram illustrates the resulting solution.

To get authorized, the Power BI client sends an authentication request to the Azure enterprise application using Azure AD credentials. After verification, Azure sends a JSON web token (OAuth token) to the Power BI application. The Power BI application forwards the connection string with the OAuth token to Amazon Redshift. Amazon Redshift parses and validates the token, and requests group information from Azure AD. Upon reception, Amazon Redshift automatically creates the user and roles, and does the respective mapping.

Prerequisites

You need the following prerequisites to set up this solution:

  • A Microsoft Azure account that has an active subscription. You need to an admin role to set up the application on Azure AD.
  • Power BI Desktop version 2.102.683.0 64-bit and above downloaded and installed. In this example, we have use a Windows environment.
  • The latest version of the Microsoft Enterprise/Standard Gateway installed.
  • An AWS account with an Amazon Redshift cluster. In this post, we connect Power BI Desktop and service with a publicly accessible Amazon Redshift cluster.

Register an application in Azure AD and set up groups

To set up the Azure application and group permission, complete the following steps:

  1. Sign in to the Azure portal with your Microsoft account.
  2. Navigate to the Azure Active Directory application.
  3. Under Manage, choose App registrations and New registration.
  4. For Name, enter an application name (for example, nativeoauthsetup).
  5. Keep the default settings for the rest of the fields.
  6. Choose Register to complete the initial application registration.
  7. On the newly created application Overview page, locate the client ID and tenant ID and note down these IDs in order to register the IdP in Amazon Redshift later.
  8. Under Manage in the navigation pane, choose API permissions.
  9. Choose Add a permission.
  10. Choose Microsoft Graph and then choose Application permissions.
  11. Search for directory and select the Directory.Read.All permission.
  12. Choose Add permissions.
  13. Choose Grant admin consent.
  14. In the popup box, choose Yes to grant the admin consent.

The status of the permission shows Granted for with a green check mark.

  1. Under Manage in the navigation pane, choose Certificates & secrets.
  2. Choose Client secrets and choose New client secret.
  3. Enter a Description, select an expiration for the secret or specify a custom lifetime. We are keeping Microsoft recommended default expiration value of 6 months. Choose Add.
  4. Copy the secret value.

It would only be present one time and after that you cannot read it.

  1. On the Azure AD home page, under Manage in the navigation pane, choose Groups.
  2. Choose New group.
  3. In the New Group section, provide the required information.
  4. Choose No members selected and then search for the members.
  5. Select your members and choose Select. For this example, you can search your username and click select.

You can see the number of members in the Members section.

  1. Choose Create.

Collect Azure AD Information for Amazon Redshift IdP

Before we collect the Azure AD information, we need to identify the access token version from the application which you have created earlier. In the navigation pane, under Manage, choose Manifest section, then view the accessTokenAcceptedVersion parameter: null and 1 indicate v1.0 tokens, and 2 indicates v2.0 tokens.

To configure your IdP in Amazon Redshift, collect the following parameters from Azure AD. If don’t have these parameters, contact your Azure admin.

  • issuer – This is known as <Microsoft_Azure_issuer_value>. If you’re using the v1.0 token, use https://sts.windows.net/<Microsoft_Azure_tenantid_value>/. Currently, Power BI only uses v1.0 token. If you’re using the v2.0 token, use https://login.microsoftonline.com/<Microsoft_Azure_tenantid_value>/v2.0. To find your Microsoft Azure tenant ID, complete the following steps:
    • Sign in to the Azure portal with your Microsoft account.
    • Under Manage, choose App registrations.
    • Choose the Amazon Redshift application you created earlier.
    • Click on the Overview (left panel) page and under Essentials, note down the values.
  • client_id – This is known as <Microsoft_Azure_clientid_value> in the following sections. An example of a client ID is 5ab12345-1234-1a12-123a-11abc1a12ab1). To get your client ID value, locate the Amazon Redshift application you created earlier on the Azure portal; it’s listed in the Essentials section.
  • client_secret – This is known as <Microsoft_Azure_client_secret_value> in the following sections. An example of a client secret value is KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB). To create your client secret value, refer to the steps in the previous section.
  • audience – This is known as <Microsoft_Azure_token_audience_value> in the following sections. With Power BI Desktop, you need to set the audience value as https://analysis.windows.net/powerbi/connector/AmazonRedshift.

Set up the IdP on Amazon Redshift

To set up the IdP on Amazon Redshift, complete the following steps:

  1. Log in to Amazon Redshift with a superuser user name and password using query editor v2 or any SQL client.
  2. Run the following SQL:
    CREATE IDENTITY PROVIDER <idp_name> TYPE azure 
    NAMESPACE '<namespace_name>' 
    PARAMETERS '{ 
    "issuer":"<Microsoft_Azure_issuer_value>", 
    "audience":["<Microsoft_Azure_token_audience_value>"],
    "client_id":"<Microsoft_Azure_clientid_value>", 
    "client_secret":"<Microsoft_Azure_client_secret_value>"
    }';

In our example, we use the v1.0 token issuer because as of this writing, Power BI only uses the v1.0 token:

CREATE IDENTITY PROVIDER oauth_standard TYPE azure
NAMESPACE 'aad'
PARAMETERS '{
"issuer":"https://sts.windows.net/e12b1bb1-1234-12ab-abc1-1ab012345a12/",
"audience":["https://analysis.windows.net/powerbi/connector/AmazonRedshift"],
"client_id":"5ab12345-1234-1a12-123a-11abc1a12ab1",
"client_secret":"KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB"
}'
  1. To alter the IdP, use the following command (this new set of parameter values completely replaces the current values):
    ALTER IDENTITY PROVIDER <idp_name> PARAMETERS 
    '{
    "issuer":"<Microsoft_Azure_issuer_value>",
    "audience":["<Microsoft_Azure_token_audience_value>"], 
    "client_id":"<Microsoft_Azure_clientid_value>", 
    "client_secret":"<Microsoft_Azure_client_secret_value>"
    }';

  2. To view a single registered IdP in the cluster, use the following code:
    DESC IDENTITY PROVIDER <idp_name>;

  3. To view all registered IdPs in the cluster, use the following code:
    select * from svv_identity_providers;

  4. To drop the IdP, use the following command:
    DROP IDENTITY PROVIDER <idp_name> [CASCADE];

Set up Amazon Redshift permissions to external identities

The users, roles, and role assignments are automatically created in your Amazon Redshift cluster during the first login using your native IdP unless they were manually created earlier.

Create and assign permission to Amazon Redshift roles

In this step, we create a role in the Amazon Redshift cluster based on the groups that you created on the Azure AD portal.

The role name in the Amazon Redshift cluster looks like <namespace>:<azure_ad_group_name>, where the namespace is the one we provided in the IdP creation command and the group name is the one we specified when we were setting up the Azure application. In our example, it’s aad:rsgroup.

Run the following command in the Amazon Redshift cluster:

create role "<namespace_name>:<Azure AD groupname>";

For example:

create role "aad:rsgroup";

To grant permission to the Amazon Redshift role, enter the following command:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]
 | ALL [ PRIVILEGES ] }
ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
TO role "<namespace_name>:<Azure AD groupname>";

Then grant relevant permission to the role as per your requirement. For example:

grant select on all tables in schema public to role "aad:rsgroup";

Create and assign permission to an Amazon Redshift user

This step is only required if you want to grant permission to an Amazon Redshift user instead of roles. We create an Amazon Redshift user that maps to a Azure AD user and then grant permission to it. If you don’t want to explicitly assign permission to an Amazon Redshift user, you can skip this step.

To create the user, use the following syntax:

CREATE USER "<namespace_name>:<Azure AD username>" PASSWORD DISABLE;

For example:

CREATE USER "aad:[email protected]" PASSWORD DISABLE;

We use the following syntax to grant permission to the Amazon Redshift user:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]
 | ALL [ PRIVILEGES ] }
ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
TO "<namespace_name>:<Azure AD username>";

For example:

grant select on all tables in schema public to "aad:[email protected]"

Configure your client connection using an Amazon Redshift native IdP

In this section, we provide instructions to set up your client connection for either Power BI Desktop or the Power BI service.

Connect Power BI Desktop

In this example, we use Power BI Desktop to connect with Amazon Redshift using a native IdP. Use Power BI Desktop version: 2.102.683.0 64-bit and above.

  1. In your Power BI Desktop, choose Get data.
  2. Search for the Amazon Redshift connector, then choose it and choose Connect.
  3. For Server, enter your Amazon Redshift cluster’s endpoint. For example, test-cluster.ct4abcufthff.us-east-1.redshift.amazonaws.com.
  4. For Database, enter your database name. In this example, we use dev.
  5. Choose OK.
  6. Choose Microsoft Account.
  7. Choose Sign in.
  8. Enter your Microsoft Account credentials.

When you’re connected, you can see the message You are currently signed in.

  1. Choose Connect.

Congratulations! You are signed in using the Amazon Redshift native IdP with Power BI Desktop. Now you can browse your data.

After that, you can create your own Power BI report on the desktop version and publish it to your Microsoft account. For this example, we created and published a report named RedshiftOAuthReport, which I refer to later in this post.

Connect Power BI service

Now, let’s connect a Power BI gateway with Amazon Redshift using a native IdP. Before proceeding with below setup, please make sure you have downloaded and installed the latest version of the Microsoft Enterprise/Standard Gateway.

  1. Open the Power BI web application and sign in if necessary.

You can see the RedshiftOAuthReport report that we created earlier.

  1. In the navigation pane, under Datasets, choose the menu icon (three dots) next to the report name and then choose Settings.
  2. Enable Gateway connection on the settings page.
  3. Click on the arrow on right side and select Manually add to gateway.

  4. In the Data Source Settings section, enter the appropriate values:
    1. For Data Source Name, enter a name.
    2. For Data Source Type, choose Amazon Redshift.
    3. For Server, enter your Amazon Redshift cluster’s endpoint.
    4. For Database, enter your database name (for this post, we use dev).
    5. For Authentication Method, choose OAuth2.
  5. Choose Edit credentials.
  6. In the pop-up box, choose Sign in.
  7. Enter your Microsoft account credentials and follow the authentication process.
  8. After the authentication, choose Add on the Data Source Settings page.
  9. Make sure that Gateway connection is enabled. If not, enable it.
  10. Select your gateway from the gateway list.
  11. On the Maps to menu, choose your data source.
  12. Choose Apply.

Congratulations! You have completed the Amazon Redshift native IdP setup with Power BI web service.

Best Practices with Redshift native IdP:

  • Pre-create the Amazon Redshift roles based upon the groups which you have created on the Azure AD portal.
  • Assign permissions to Redshift roles instead of assigning to each individual external user. This will provide smoother end user experience as user will have all the required permission when they login using native IdP.

Troubleshooting

If your connection didn’t work, consider the following:

  • Enable logging in the driver. For instructions, see Configure logging.
  • Make sure to use the latest Amazon Redshift JDBC driver version 2.1.0.4 onwards, which supports Amazon Redshift native IdP authentication.
  • If you’re getting errors while setting up the application on Azure AD, make sure you have admin access.
  • If you can authenticate via the SQL client but get a permission issue or can’t see objects, grant the relevant permission to the role, as detailed earlier in this post.
  • If you get the error “claim value does not match expected value,” make sure you provided the correct parameters during Amazon Redshift IdP registration.
  • Check stl_error or stl_connection_log views on the Amazon Redshift cluster for authentication failures.

Summary

In this post, we covered the step-by-step process of integrating Amazon Redshift with Azure AD and Power BI Desktop and web service using Amazon Redshift native IdP federation. The process consisted of registering a Azure application, creating Azure AD groups, setting up the Amazon Redshift IdP, creating and assigning permission to Amazon Redshift roles, and finally configuring client connections.

For more information about Amazon Redshift native IdP federation, see:

If you have questions or suggestions, please leave a comment.


About the Authors

Maneesh Sharma is a Senior Database Engineer at AWS with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.

Ilesh Garish is a Software Development Engineer at AWS. His role is to develop connectors for Amazon Redshift. Prior to AWS, he built database drivers for the Oracle RDBMS, TigerLogic XDMS, and OpenAccess SDK. He also worked in the database internal technologies at San Francisco Bay Area startups.

Debu-PandaDebu Panda is a Senior Manager, Product Management at AWS. He is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world.

Sergey Konoplev is a Senior Database Engineer on the amazon Redshift Team at AWS. Sergey has been focusing on Automation and improvement of database and data operations for more than a decade.