All posts by Maneesh Sharma

Simplify enterprise data access using the Amazon Redshift integration with Amazon S3 Access Grants

2025-05-16 Maneesh Sharma

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/simplify-enterprise-data-access-using-the-amazon-redshift-integration-with-amazon-s3-access-grants/

Scaling data access securely while maintaining operational efficiency is a critical challenge for organizations. Access rights are often fragmented across various AWS services, as different business units own and manage different data stores, such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift. As data grows, modeling access in AWS Identity and Access Management (IAM) policies becomes challenging for data owners, as they try to manage access for different groups and users across accounts in the organization. Managing these distributed access rights requires substantial overhead, because security teams and data owners must collaborate to update and monitor permissions to make sure data is only accessible to authorized users.

Recognizing this challenge, the Amazon S3 Access Grants integration with Amazon Redshift allows centralized user authentication through AWS IAM Identity Center, providing unified identity across the organization. S3 Access Grants allows specific IAM Identity Center users or groups to access registered Amazon S3 locations through a grant. Creating a grant with a group as grantee lets the group members access only the S3 bucket, prefix, or object within the grant’s scope. This means that access can be managed by simply creating a grant for a group and adding or removing the user from the group, reducing administrative overhead.

In this post, we show how to grant Amazon S3 permissions to IAM Identity Center users and groups using S3 Access Grants. We also test the integration using an IAM Identity Center federated user to unload data from Amazon Redshift to Amazon S3 and load data from Amazon S3 to Amazon Redshift.

Solution overview

This post covers a use case where a large organization manages thousands of corporate users across multiple business units through their identity provider (IdP). These users regularly interact with vast amounts of data stored across numerous S3 buckets, frequently performing extract, transform, and load (ETL) operations through Amazon Redshift. Their goal is to have a simpler ETL process of data loading and unloading operations in Amazon Redshift without managing multiple IAM roles and policies for Amazon S3 access. Also, they want a centralized access management solution that seamlessly integrates their corporate identities from existing IdP with AWS services.

For this solution, AWS Organizations is enabled and IAM Identity Center is configured in the delegated administration account. The organization has two member accounts: Member Account 1 runs analytical workloads on Amazon Redshift, with all the services enabled with trusted identity propagation, and Member Account 2 manages data stored in Amazon S3; here you will set up S3 Access Grants. Amazon Redshift will load the user-specific data from Amazon S3 stored in Member Account 2 using access control based on IAM Identity Center users and groups. This improves the user experience maintaining a single authentication mechanism within an organization, retaining access control, and resource separation using AWS accounts as a boundary per business units.

The following diagram illustrates the solution architecture.

Figure 1: Architecture showing the solution

To run this solution in a single account, configure Amazon Redshift and S3 Access Grants with account instances of IAM Identity Center. Review When to use account instances for more information.

The solution workflow includes the following steps:

The user configures and connects with their respective clients (such as Amazon Redshift Query Editor v2 or a SQL client) to access Amazon Redshift using IAM Identity Center.
A new browser windows opens and is redirected to the login page of the IdP.
The user logs in with their IdP user name and password.
After the login is successful, the user is redirected to the client application, such as the Amazon Redshift Query Editor.
When the user tries to access data in Amazon S3 using the LOAD or UNLOAD SQL command, Amazon Redshift in Member Account 1 will request credentials from the S3 Access Grants instance from Member Account 2, where the Amazon S3 data is stored. This request will contain the user context.
S3 Access Grants will then evaluate the request against the grants it has, matching the identity specified in the grant with the one received in the request. If there is a match, the requestor will receive temporary access to the Amazon S3 locations specified in the grant’s scope.

To implement the solution, we walk you through the following steps:

Enable S3 Access Grants in your Amazon Redshift managed application.
Update IAM role permissions used in the application.
Create a bucket for S3 Access Grants.
Create an IAM policy and role for S3 Access Grants.
Set up S3 Access Grants.
Allow cross-account access of resources.
Create Redshift tables.
Unload and load data in Amazon Redshift.

Prerequisites

You should have the following prerequisites already set up:

IAM Identity Center is enabled with a delegated administration account. For more information, see Enable IAM Identity Center and Register a member account. Keep the IAM Identity Center instance ID and the IAM Identity Center delegated administrator account number.
Connect IAM Identity Center with your preferred IdP and sync users and groups. For instructions, refer to IAM Identity Center identity source tutorials.
Two member accounts:
- One with an existing Redshift cluster and managed application (Member Account 1). To set up a Redshift cluster with IAM Identity Center integration enabled, refer to Integrate Identity Provider (IdP) with Amazon Redshift Query Editor V2 using AWS IAM Identity Center for seamless Single Sign-On.
- The second account for Amazon S3 (Member Account 2). You will set up Amazon S3 and S3 Access Grants later in this post.
Resource sharing enabled in your organization with AWS Resource Access Manager (AWS RAM). For instructions, refer to Enable resource sharing within AWS Organizations.

Enable S3 Access Grants from the Amazon Redshift managed application

After you have created your Redshift application in IAM Identity Center, you need to perform the following steps to enable S3 Access Grants in the account where Amazon Redshift exists. For this post, we use Member Account 1:

Log in to the AWS Management Console as admin.
On the Amazon Redshift console, choose IAM Identity Center connection in the navigation pane.
Select the managed Redshift application and choose Edit.
Choose Amazon S3 access grants in Trusted identity propagation.
Choose Save changes.

The following screenshot shows the updated configuration.

Figure 2: Redshift managed application

Update the IAM role permission attached to the Amazon Redshift managed application

The Amazon Redshift managed application has an IAM role attached (in the preceding screenshot, you can see the role called IAMIDCRedshiftRole under IAM role for IAM Identity Center access. We now need to modify the policy on this role and add permissions to allow interaction with Amazon S3. Edit the role and add s3:GetAccessGrantsInstanceForPrefix and s3:GetDataAccess as shown in the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowGetRedsfhitInformation",
            "Effect": "Allow",
            "Action": [
                "redshift-serverless:ListNamespaces",
                "redshift-serverless:ListWorkgroups",
                "redshift:DescribeQev2IdcApplications",
                "redshift-serverless:GetWorkgroup"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowDescribeIdentityCenter",
            "Effect": "Allow",
            "Action": [
                "sso:DescribeApplication",
                "sso:DescribeInstance"
            ],
            "Resource": [
                "arn:aws:sso:::instance/<IAM Identity Center Instance ID>",
                "arn:aws:sso::<Delegated Adminstration AWS Account ID>:application/<IAM Identity Center Instance ID>/*"
            ]
        },
        {
            "Sid": "RetrieveAGinstanceforParticularPrefix",
            "Effect": "Allow",
            "Action": 
                      "s3:GetAccessGrantsInstanceForPrefix",
            "Resource": "*"
        },
        {
            "Sid": "CrossAccountAccessGrantsPolicy",
            "Effect": "Allow",
            "Action": [
                "s3:GetDataAccess"
            ],
            "Resource": "arn:aws:s3:<region>:<AWS Account of S3 Access Grant>:access-grants/default"
        }
    ]
}

Replace <IAM Identity Center Instance ID> with your IAM Identity Center instance ID and <Delegated Adminstration AWS Account ID> with the account ID where IAM Identity Center is set up. You also need to replace the resource in CrossAccountAccessGrantscasePolicy with your S3 Access Grants instance information.

Create an S3 bucket for S3 Access Grants

In this step, you create a S3 bucket that you want to grant access to or use an existing bucket. For this post, we create a bucket called amzn-s3-demo-bucket. You can choose another appropriate name. For more information, see Creating a general purpose bucket.

The bucket must be located in the same AWS Region as your S3 Access Grants instance and IAM Identity Center.

Next, create two folders in the newly created S3 bucket. If you’re using an existing S3 bucket, identify two folders to use for this walkthrough. For this blog post, we create two folders: awssso-sales and awssso-finance, under a bucket named amzn-s3-demo-bucket. The purpose of creating two folders is so that users from different groups have access only to their respective folder.

Create an IAM policy and role for S3 Access Grants

Complete the following steps to create an IAM policy to scope the permissions for a specific access grant:

Create an IAM policy with the following permissions. For more information on creating IAM policy, see Create IAM policies. To get additional information on the following specific policy, refer to Register a location.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ObjectLevelReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:GetObjectAcl",
                "s3:GetObjectVersionAcl",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": "arn:aws:s3:::<bucket-name>/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": "<AWS Account of S3 Access Grant>"
                },
                "ArnEquals": {
                    "s3:AccessGrantsInstanceArn": [
                        "arn:aws:s3:<region>:<AWS Account of S3 Access Grant>:access-grants/default"
                    ]
                }
            }
        },
        {
            "Sid": "ObjectLevelWritePermissions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:PutObjectVersionAcl",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion",
                "s3:AbortMultipartUpload"
            ],
            "Resource": "arn:aws:s3:::<bucket-name>/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": "<AWS Account of S3 Access Grant>"
                },
                "ArnEquals": {
                    "s3:AccessGrantsInstanceArn": "arn:aws:s3:<region>:<AWS Account of S3 Access Grant>:access-grants/default"
                }
            }
        },
        {
            "Sid": "BucketLevelReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::<bucket-name>",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": "<AWS Account of S3 Access Grant>"
                },
                "ArnEquals": {
                    "s3:AccessGrantsInstanceArn": "arn:aws:s3:<region>:<AWS Account of S3 Access Grant>:access-grants/default"
                }
            }
        }
    ]
}

Create an IAM role that has permission to access your S3 data in the Region. For more information, see IAM role creation. In this example, we create an IAM role called iamidcs3accessgrant. You need to attach the preceding policy to the IAM role.

Use the following trust policy for the IAM role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ForAccessGrants",
            "Effect": "Allow",
            "Principal": {
                "Service": "access-grants.s3.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:SetContext",
                "sts:SetSourceIdentity"
            ],
            "Condition": {
        "StringEquals": {
          "aws:SourceAccount":"<accountId>",
          "aws:SourceArn":"arn:aws:s3:<region>:<accountId>:access-grants/default"
        }
      }
        }
    ]
}

Set up S3 Access Grants

The S3 Access Grants instance serves as the container for your S3 Access Grants resources, which include registered locations and grants. You can create only one S3 Access Grants instance per Region per account. You can associate this S3 Access Grants instance to your corporate directory with your IAM Identity Center instance. After you’ve done so, you can create grants for your corporate users and groups. S3 Access Grants requires registering a location to map an S3 bucket or prefix to an IAM role, enabling secure access by providing temporary credentials to grantees for that specific location.

Complete the following steps to set up S3 Access Grants:

On the Amazon S3 console, choose your preferred Region.
In the navigation pane, choose Access Grants.
Choose Create S3 Access Grants instance.
Select Add IAM Identity Center instance in <region> and enter the IAM Identity Center instance Amazon Resource Name (ARN). For this post, we use the delegated administration account IAM Identity Center ARN.
Choose Next.

Figure 3: S3 Access Grants instance
After you create an Amazon S3 Access Grants instance in a Region in your account, you register an Amazon S3 location in that instance. For Location scope, choose Browse S3 or enter the S3 URI path to the location that you want to register. After you enter a URI, you can choose View to browse the location. In this example, we provide the scope as s3://amzn-s3-demo-bucket.
For IAM role, select Choose from existing IAM roles and choose the IAM role you previously created (iamidcs3accessgrant).
Choose Next.

This will register a location in your S3 Access Grants instance.

Figure 4: S3 Access Grants instance location scope

You will now create a grant.
1. If you selected the default Amazon S3 location, use the Subprefix box to narrow the scope of the access grant. For more information, see Working with grants in S3 Access Grants.
2. If you’re granting access only to an object, select Grant scope is an object. In our example, we register the location as s3://amzn-s3-demo-bucket and then for the subprefix, we specify the folder name followed by an asterisk (awssso-sales/*).
Under Permissions and access, select the Permission level, either Read, Write, or both. In this example, we select both because we will first unload from Amazon S3 to Amazon Redshift and then copy from the same bucket to Amazon Redshift.
For Grantee type, choose Directory identity from IAM Identity Center.
For Directory identity type, you can choose either User or Group. In this example, we choose Group.
For IAM Identity Center group ID, enter the group ID from IAM Identity Center where user and group information belongs.

To get this value, open the IAM Identity Center console and choose Groups in the navigation pane, then choose one of the groups you want to provide access and copy the value under Group ID. In the following example, we collect the group ID information from the delegated administration account.

Figure 5: IAM Identity Center group information

Choose Next.

Figure 6: S3 Access Grants instance permissions and access
Choose Finish.

Figure 7: S3 Access Grants instance review information page

You can view the details of the access grant on the Amazon S3 console, as shown in the following screenshot. For more information, see View a grant.

Figure 8: S3 Access Grants grants

Similarly, you can get the details of a location that’s registered in your S3 Access Grants instance. For more information, see View the details of a registered location.

Figure 9: S3 Access Grants locations

Allow cross-account access of resources and create initial tables

Now we want to share resources to make our cross-account scenario work. This step is only needed if your Amazon Redshift and Amazon S3 resources are in different accounts. This should be done in the account where Amazon S3 is set up. Complete the following steps:

On the AWS RAM console, in the navigation pane, choose Resource shares.
Choose Create resource share.
For Name, enter a descriptive name for the resource share (for example, s3accessgrant).
For Resources – optional, choose S3 Access Grants. The S3 Access Grants instance you created will be shown; select the default S3 Access Grant instance ARN.
Choose Next.
Under Managed permission for s3:AccessGrants, you can choose to associate a managed permission created by AWS with the resource type, choose an existing customer managed permission, or create your own customer managed permission for supported resource types. In this post, we choose the existing permission named AWSRAMPermissionAccessGrantsData.
Choose Next.
For Grant access to principals, choose Allow sharing only within your organization and enter the account ID where the Redshift instance exists.
Choose Add.
Choose Next.
Choose Create resource share.

The following screenshot shows the new resource share details.

Figure 10: AWS RAM – create resource share wizard

Create tables in Amazon Redshift

As an Amazon Redshift admin user, you need to first create the tables you will use to unload data. In the following code, we create a new store_sales_s3access table:

CREATE TABLE IF NOT EXISTS 
sales_schema.store_sales_s3access ( 
ID INTEGER ENCODE az64, 
Product varchar(20), 
Sales_Amount INTEGER ENCODE az64 
) 
DISTSTYLE AUTO ;

Also make sure the following permissions are applied on the respective IAM Identity Center group; this group is represented in Amazon Redshift as a Redshift role. For this post, we grant permissions to the awssso-sales group:

grant usage on schema sales_schema to role "awsidc:awssso-sales";
grant select,insert  for tables in schema sales_schema to role "awsidc:awssso-sales";

As an Amazon Redshift admin user, you have created a Redshift table and assigned relevant permissions to the Redshift database role awsidc:awssso-sales. Now when an authenticated user that belongs to the group awssso-sales runs a query in Amazon Redshift to access Amazon S3 (such as a COPY, UNLOAD, or Amazon Redshift Spectrum operation), Amazon Redshift retrieves temporary Amazon S3 access credentials scoped to that IAM Identity Center user from S3 Access Grants. Amazon Redshift then uses the retrieved temporary credentials to access the authorized Amazon S3 locations for that query.

Unload and load data in Amazon Redshift

In this step, we log in to the Amazon Redshift Query Editor using IAM Identity Center authentication and run an UNLOAD command to unload data from the table created earlier into the S3 bucket. After that, we run the COPY command to copy information from Amazon S3 into the same table in the same directory we unloaded the data from.

Complete the following steps to access the Amazon Redshift Query Editor with an IAM Identity Center user:

On the Amazon Redshift console, open the Amazon Redshift Query Editor.
Choose (right-click) your Redshift instance and choose Create connection.
Choose IAM Identity Center as your authentication method.
A pop-up will appear. Because your IdP credentials are already cached, it uses the same credentials and connects to the Amazon Redshift Query Editor using IAM Identity Center authentication.

Now you’re ready to run the SQL queries in Amazon Redshift.

Unload data

As a federated user, you will first run an unload command from the table store_sales in the bucket s3://amzn-s3-demo-bucket/awssso-sales/.

In this post, we run an UNLOAD command as a federated IAM Identity Center user (Ethan), where we will be unloading the data from a Redshift table. Replace the S3 bucket name with the one you created.

UNLOAD ('SELECT * FROM "dev"."sales_schema"."store_sales"')
TO 's3://amzn-s3-demo-bucket/awssso-sales/';

The preceding command doesn’t include an IAM role ARN. This simplified syntax not only makes your code more readable, but also reduces the potential for configuration errors. The underlying permissions are handled automatically through S3 Access Grants and trusted identity propagation, maintaining robust security while simplifying permissions management.

Load data

Now we demonstrate a common data workflow using the same federated IAM Identity Center user (Ethan), where we will be running the COPY command accessing the same Amazon S3 location where we previously unloaded our data. Use to following command to load data into a separate table called store_sales_s3access:

copy dev.sales_schema.store_sales_s3access 
from 's3://amzn-s3-demo-bucket/awssso-sales/' delimiter '|'

If user Ethan tries to unload "sales_schema"."store_sales" in sales_schema to a different folder in the S3 bucket (awssso-finance), they get a permission denied error. This is because access is controlled by S3 Access Grants, and this user doesn’t have a grant to the awssso-finance folder. Use the following command to test the access denied use case:

UNLOAD ('SELECT * FROM "dev"."sales_schema"."store_sales"')
TO 's3://amzn-s3-demo-bucket/awssso-finance/';

Figure 11: QEv2 query result error

IAM Identity Center related operations are automatically captured and logged in AWS CloudTrail, offering enhanced visibility and comprehensive audit capabilities. To view detailed error information on the CloudTrail console, choose Event history in the navigation pane, then specify s3.amazonaws.com as the event source and open GetDataAccess.

The following screenshot shows the snippet from the CloudTrail logs showing that user access is denied.

Figure 12: Amazon CloudTrail

Clean up

Complete the following steps to clean up your resources:

Delete the IdP applications that you created to integrate with IAM Identity Center.
Delete the IAM Identity Center configuration.
Delete the Redshift application and the Amazon Redshift provisioned cluster or serverless instance that you created for testing.
Delete the IAM role and IAM policies that you created in this post.
Delete the permission set from IAM Identity Center that you created for the Amazon Redshift Query Editor in the management account.
Delete the S3 bucket and associated S3 Access Grants instance.

Conclusion

In this post, we explored how to integrate Amazon Redshift with S3 Access Grants using IAM Identity Center. We established cross-account access to enable centralized user authentication through IAM Identity Center in the delegated administrator account, while keeping Amazon Redshift and Amazon S3 isolated by business unit in separate member accounts. We also showed simplified versions of running COPY and UNLOAD commands as a federated IAM Identity Center user without using an IAM role ARN. This setup creates a robust and secure analytics environment that streamlines data access for business users.

For additional guidance and detailed documentation, refer to the following key resources:

About the Authors

Maneesh Sharma is a Senior Database Engineer at AWS with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.

Laura is an Identity Solutions Architect at AWS, where she thrives on helping customers overcome security and identity challenges. In her free time, she enjoys wreck diving and traveling around the world.

Praveen Kumar Ramakrishnan is a Senior Software Engineer at AWS. He has nearly 20 years of experience spanning various domains including filesystems, storage virtualization and network security. At AWS, he focuses on enhancing the Redshift data security.

Yanzhu Ji is a Product Manager in the Amazon Redshift team. She has experience in product vision and strategy in industry-leading data products and platforms. She has outstanding skill in building substantial software products using web development, system design, database, and distributed programming techniques. In her personal life, Yanzhu likes painting, photography, and playing tennis.

Integrate ThoughtSpot with Amazon Redshift using AWS IAM Identity Center

2025-04-10 Maneesh Sharma

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/integrate-thoughtspot-with-amazon-redshift-using-aws-iam-identity-center/

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. Tens of thousands of customers use Amazon Redshift to process large amounts of data, modernize their data analytics workloads, and provide insights for their business users.

The combination of Amazon Redshift and ThoughtSpot’s AI-powered analytics service enables organizations to transform their raw data into actionable insights with unprecedented speed and efficiency. Through this collaboration, Amazon Redshift now supports AWS IAM Identity Center integration with ThoughtSpot, enabling seamless and secure data access with streamlined authentication and authorization workflows. This single sign-on (SSO) integration is available across ThoughtSpot’s cloud landscape and can be used for both embedded and standalone analytics implementations.

Prior to the IAM Identity Center integration, ThoughtSpot users didn’t have native connectivity to integrate Amazon Redshift with their identity providers (IdPs), which can provide unified governance and identity propagation across multiple AWS services like AWS Lake Formation and Amazon Simple Storage Service (Amazon S3).

Now, ThoughtSpot users can natively connect to Amazon Redshift using the IAM Identity Center integration, which streamlines data analytics access management while maintaining robust security. By configuring Amazon Redshift as an AWS managed application, organizations benefit from SSO capabilities with trusted identity propagation and a trusted token issuer (TTI). The IAM Identity Center integration with Amazon Redshift provides centralized user management, automatically synchronizing access permissions with organizational changes—whether employees join, transition roles, or leave the organization. The solution uses Amazon Redshift role-based access control features that align with IdP groups synced in IAM Identity Center. Organizations can further enhance their security posture by using Lake Formation to define granular access control permissions on catalog resources for IdP identities. From a compliance and security standpoint, the integration offers comprehensive audit trails by logging end-user identities both in Amazon Redshift and AWS CloudTrail, providing visibility into data access patterns and user activities.

Dime Dimovski, a Data Warehousing Architect at Merck, shares:

“The recent integration of Amazon Redshift with our identity access management center will significantly enhance our data access management because we can propagate user identities across various tools. By using OAuth authentication from ThoughtSpot to Amazon Redshift, we will benefit from a seamless single sign-on experience—giving us granular access controls as well as the security and efficiency we need.”

In this post, we walk you through the process of setting up ThoughtSpot integration with Amazon Redshift using IAM Identity Center authentication. The solution provides a secure, streamlined analytics environment that empowers your team to focus on what matters most: discovering and sharing valuable business insights.

Solution overview

The following diagram illustrates the architecture of the ThoughtSpot SSO integration with Amazon Redshift, IAM Identity Center, and your IdP.

The solution includes the following steps:

The user configures ThoughtSpot to access Amazon Redshift using IAM Identity Center.
When a user attempts to sign in, ThoughtSpot initiates a browser-based OAuth flow and redirects the user to their preferred IdP (such as Okta or Microsoft EntraID) sign-in page to enter their credentials.
Following successful authentication, IdP issues authentication tokens (ID and access token) to ThoughtSpot.
The Amazon Redshift driver then makes a call to the Amazon Redshift enabled AWS Identity Center application and forwards the access token.
Amazon Redshift passes the token to IAM Identity Center for validation.
IAM Identity Center first validates the token using the OpenID Connect (OIDC) discovery connection to the TTI and returns an IAM Identity Center generated access token for the same user. The TTI enables you to use trusted identity propagation with applications that authenticate outside of AWS. In the preceding figure, the IdP authorization server is the TTI.
Amazon Redshift uses IAM Identity Center APIs to obtain the user and group membership information from AWS Identity Center.
The ThoughtSpot user can now connect with Amazon Redshift and access data based on the user and group membership returned from IAM Identity Center.

In this post, you will use the following steps to build the solution:

Set up an OIDC application.
Set up a TTI in IAM Identity Center.
Set up client connections and TTIs in Amazon Redshift.
Federate to Amazon Redshift from ThoughtSpot using IAM Identity Center.

Prerequisites

Before you begin implementing the solution, you must have the following in place:

Set up IAM Identity Center and Amazon Redshift integration by following the steps in Integrate Identity Provider (IdP) with Amazon Redshift Query Editor V2 using AWS IAM Identity Center for seamless Single Sign-On
Have a ThoughtSpot paid account with admin access. IAM Identity Center authentication only works with a ThoughtSpot paid account.
Have an Okta account that has an active subscription. You need an admin role to set up the application on Okta. If you’re new to Okta, you can sign up for a free trial or for a developer account.
Alternatively, have an EntraID account that has an active subscription. You need an admin role to set up the application on EntraID. If you don’t have an EntraID account, you can create an account for free.

Set up an OIDC application

In this section, we’ll show you the step-by-step process to set up an OIDC application using both Okta and EntraID as the identity providers.

Set up an Okta OIDC application

Complete the following steps to set up an Okta OIDC application:

Sign in to your Okta organization as a user with administrative privileges.
On the admin console, under Applications in the navigation pane, choose Applications.
Choose Create App Integration.
Select OIDC – OpenID Connect for Sign-in method and Web Application for Application type.
Choose Next.
On the General tab, provide the following information:
1. For App integration name, enter a name for your app integration. For example, ThoughtSpot_Redshift_App.
2. For Grant type, select Authorization Code and Refresh Token.
3. For Sign-in redirect URIs, choose Add URI and along with the default URI, add the URI https://<your_okta_instance_name>/callosum/v1/connection/generateTokens. The sign-in redirect URI is where Okta sends the authentication response and ID token for the sign-in request. The URIs must be absolute URIs.
4. For Sign-out redirect URIs, keep the default value as http://localhost:8080.
5. Skip the Trusted Origins section and for Assignments, select Skip group assignment for now.
6. Choose Save.
Choose the Assignments tab and then choose Assign to Groups. In this example, we’re assigning awssso-finance and awssso-sales.
Choose Done.

Set up an EntraID OIDC application

To create your EntraID application, follow these steps:

Sign in to the Microsoft Entra admin center as Cloud Application Administrator (or higher level of access).
Browse to App registrations under Manage, and choose New registration.
Enter a name for the application. For example, ThoughtSpot-OIDC-App.
Select a supported account type, which determines who can use the application. For this example, select the first option in the list.
Under Redirect URI, choose Web for the type of application you want to create. Enter the URI where the access token is sent to. Your redirect URL will be in the format https://<your_instance_name>/callosum/v1/connection/generateTokens.
Choose Register.
In the navigation pane, choose Certificates & secrets.
Choose New client secret.
Enter a description and select an expiration for the secret or specify a custom lifetime. For this example, keep the Microsoft recommended default expiration value of 6 months.
Choose Add.
Copy the secret value.

The secret value will only be presented one time; after that you can’t read it. Make sure to copy it now. If you fail to save it, you must generate a new client secret.

In the navigation pane, under Manage, choose Expose an API.

If you’re setting up for the first time, you can see Add to the right of the application ID URI.

Choose Save.
After the application ID URI is set up, choose Add a scope.
For Scope name, enter a name. For example, redshift_login.
For Admin consent display name, enter a display name. For example, redshift_login.
For Admin consent description, enter a description of the scope.
Choose Add scope.
In the navigation pane, choose API permissions.
Choose Add a permission and choose Microsoft Graph.
Choose Delegated Permission.
Under OpenId permissions, choose email, offlines_access, openid, and profile, and choose Add permissions.

Set up a TTI in IAM Identity Center

Assuming you have completed the prerequisites, you will establish your IdP as a TTI in your delegated administration account. To create a TTI, refer to How to add a trusted token issuer to the IAM Identity Center console. In this post, we walk through the steps to set up a TTI for both Okta and EntraID.

Set up a TTI for Okta

To get the issuer URL from Okta, complete the following steps:

Sign in as an admin to Okta and navigate to Security and then to API.
Choose Default on the Authorization Servers tab and copy the Issuer
url.
In the Map attributes section, choose which IdP attributes correspond to Identity Center attributes. For example, in the following screenshot, we mapped Okta’s Subject attribute to the Email attribute in IAM Identity Center.
Choose Create trusted token issuer.

Set up a TTI for EntraID

Complete the following steps to set up a TTI for EntraID:

To find out which token your application is using, under Manage, choose Manifest.
Locate the accessTokenAcceptedVersion parameter: null or 1 indicate v1.0 tokens, and 2 indicates v2.0 tokens.

Next, you need to find the tenant ID value from EntraID.

Go to the EntraID application, choose Overview, and a new page will appear containing the Essentials
You can find the tenant ID value as shown in the following screenshot. If you’re using the v1.0 token, the issuer URL will be https://sts.windows.net/<Directory (tenant) ID>/. If you’re using the v2.0 token, the issuer URL will be https://login.microsoftonline.com/<Directory (tenantid) ID>/v2.0.
For Map attributes, the following example uses Other, where we’re specifying the user principal name (upn) as the IdP attribute to map with Email from the IAM identity Center attribute.
Choose Create trusted token issuer.

Set up client connections and TTIs in Amazon Redshift

In this step, you configure the Amazon Redshift applications that exchange externally generated tokens to use the TTI you created in the previous step. Also, the audience claim (or aud claim) from your IdP must be specified. You need to collect the audience value from the respective IdP.

Acquire the audience value from Okta

To acquire the audience value from Okta, complete the following steps:

Sign in as an admin to Okta and navigate to Security and then to API.
Choose Default on the Authorization Servers tab and copy the Audience value.

Acquire the audience value from EntraID

Similarly, to get the audience value EntraID, complete the following steps:

Go to the EntraID application, choose Overview, and a new page will appear containing the Essentials
You can find the audience value (Application ID URI) as shown in the following screenshot.

Configure the application

After you collect the audience value from the respective IdP, you need to configure the Amazon Redshift application in the member account where the Amazon Redshift cluster or serverless instance exists.

Choose IAM Identity Center connection in the navigation pane on the Amazon Redshift console.
Choose the Amazon Redshift application that you created as part of the prerequisites.
Choose the Client connections tab and choose Edit.
Choose Yes under Configure client connections that use third-party IdPs.
Select the check box for Trusted token issuer that you created in the previous section.
For Aud claim, enter the audience claim value under Configure selected trusted token issuers.
Choose Save.

Your IAM Identity Center, Amazon Redshift, and IdP configuration is complete. Next, you need to configure ThoughtSpot.

Federate to Amazon Redshift from ThoughtSpot using IAM Identity Center

Complete the following steps in ThoughtSpot to federate with Amazon Redshift using IAM Identity Center authentication:

Sign in to ThoughtSpot cloud.
Choose Data in the top navigation bar.
Open the Connections tab in the navigation pane, and select the Redshift

Alternatively, you can choose Create new in the navigation pane, choose Connection, and select the Redshift tile.

Create a name for your connection and a description (optional), then choose Continue.
Under Authentication Type, choose AWS IDC OAuth and enter following details:
1. For Host, enter the Redshift endpoint. For example, test-cluster.ab6yejheyhgf.us-east-1.redshift.amazonaws.com.
2. For Port, enter 5439.
3. For OAuth Client ID, enter the client ID from the IdP OIDC application.
4. For OAuth Client Secret, enter the client secret from the IdP OIDC application.
5. For Scope, enter the scope from the IdP application:
  - For Okta, use openid offline_access openid profile. You can use the Okta scope values shared earlier as is on ThoughtSpot. You can modify the scope according to your requirements.
  - For EntraID, use the API scope and API permissions. For example, api://1230a234-b456-7890-99c9-a12345bcc123/redshift_login offline_access.
6. For API scope value, go to the OIDC application, and under Manage, choose Expose an API to acquire the value.
7. For API permissions, go to the OIDC application, and under Manage, choose API permissions to acquire the permissions.
8. For Auth Url, enter the authorization endpoint URI:
  - For Okta use https:// <okta-hostname>/oauth2/default/v1/authorize. For example, https://prod-1234567.okta.com/oauth2/default/v1/authorize.
  - For EntraID, use https://login.microsoftonline.com/<Directory (tenantid) ID>/oauth2/v2.0/authorize. For example, https://login.microsoftonline.com/e12a1ab3-1234-12ab-12b3-1a5012221d12/oauth2/v2.0/authorize.
9. For Access token Url, enter the token endpoint URI:
  - For Okta, use https://<okta-hostname>/oauth2/default/v1/token. For example, https://prod-1234567.okta.com/oauth2/default/v1/token.
  - For EntraID, use https://login.microsoftonline.com/<Directory (tenantid) ID>/oauth2/v2.0/token. For example, https://login.microsoftonline.com/e12a1ab3-1234-12ab-12b3-1a5012221d12/oauth2/v2.0/token.
10. For AWS Identity Namespace, enter the namespace configured in your Amazon Redshift IAM Identity Center application. The default value is AWSIDC unless previously customized. For this example, we use awsidc.
11. For Database, enter the database name you want to connect. For example, dev.
Choose Continue.
Enter your IdP user credentials in the browser pop-up window.

The following screenshot illustrates the ThoughtSpot integration with Amazon Redshift using Okta as the IdP.

The following screenshot shows the ThoughtSpot integration with Amazon Redshift using EntraID as the IdP.

Upon a successful authentication, you will be redirected back to ThoughtSpot and logged in as an IAM Identity Center authenticated user.

Congratulations! You’ve logged in through IAM Identity Center and Amazon Redshift, and you’re ready to dive into your data analysis with ThoughtSpot.

Clean up

Complete the following steps to clean up your resources:

Delete the IdP applications that you created to integrate with IAM Identity Center.
Delete the IAM Identity Center configuration.
Delete the Amazon Redshift application and the Amazon Redshift provisioned cluster or serverless instance that you created for testing.
Delete the IAM role and IAM policy that you created for IAM Identity Center and Amazon Redshift integration.
Delete the permission set from IAM Identity Center that you created for Amazon Redshift Query Editor V2 in the management account.
Delete the ThoughtSpot connection to integrate with Amazon Redshift using AWS IDC OAuth.

Conclusion

In this post, we explored how to integrate ThoughtSpot with Amazon Redshift using IAM Identity Center. The process consisted of registering an OIDC application, setting up an IAM Identity Center TTI, and finally configuring ThoughtSpot for IAM Identity Center authentication. This setup creates a robust and secure analytics environment that streamlines data access for business users.

For additional guidance and detailed documentation, refer to the following key resources:

About the authors

BP Yau is a Sr Partner Solutions Architect at AWS. His role is to help customers architect big data solutions to process data at scale. Before AWS, he helped Amazon.com Supply Chain Optimization Technologies migrate its Oracle data warehouse to Amazon Redshift and build its next generation big data analytics platform using AWS technologies.

Ali Alladin is the Senior Director of Product Management and Partner Solutions at ThoughtSpot. In this role, Ali oversees Cloud Engineering and Operations, ensuring seamless integration and optimal performance of ThoughtSpot’s cloud-based services. Additionally, Ali spearheads the development of AI-powered solutions in augmented and embedded analytics, collaborating closely with technology partners to drive innovation and deliver cutting-edge analytics capabilities. With a robust background in product management and a keen understanding of AI technologies, Ali is dedicated to pushing the boundaries of what’s possible in the analytics space, helping organizations harness the full potential of their data.

Debu Panda is a Senior Manager, Product Management at AWS. He is an industry leader in analytics, application platform, and database technologies, and has more than 25 years of experience in the IT world.

Integrate Okta with Amazon Redshift Query Editor V2 using AWS IAM Identity Center for seamless Single Sign-On

2023-11-30 Maneesh Sharma

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/integrate-okta-with-amazon-redshift-query-editor-v2-using-aws-iam-identity-center-for-seamless-single-sign-on/

AWS IAM Identity Center (IdC) allows you to manage single sign-on (SSO) access to all your AWS accounts and applications from a single location. We are pleased to announce that Amazon Redshift now integrates with AWS IAM Identity Center, and supports trusted identity propagation, allowing you to use third-party Identity Providers (IdP) such as Microsoft Entra ID (Azure AD), Okta, Ping, and OneLogin. This integration simplifies the authentication and authorization process for Amazon Redshift users using Query Editor V2 or Amazon Quicksight, making it easier for them to securely access your data warehouse. Additionally, this integration positions Amazon Redshift as an IdC-managed application, enabling you to use database role-based access control on your data warehouse for enhanced security.

AWS IAM Identity Center offers automatic user and group provisioning from Okta to itself by utilizing the System for Cross-domain Identity Management (SCIM) 2.0 protocol. This integration allows for seamless synchronization of information between two services, ensuring accurate and up-to-date information in AWS IAM Identity Center.

In this post, we’ll outline a comprehensive guide for setting up SSO to Amazon Redshift using integration with IdC and Okta as the Identity Provider. This guide shows how you can SSO onto Amazon Redshift for Amazon Redshift Query Editor V2 (QEV2).

Solution overview

Using IAM IdC with Amazon Redshift can benefit your organization in the following ways:

Users can connect to Amazon Redshift without requiring an administrator to set up AWS IAM roles with complex permissions.
IAM IdC integration allows mapping of IdC groups with Amazon Redshift database roles. Administrators can then assign different privileges to different roles and assigning these roles to different users, giving organizations granular control for user access.
IdC provides a central location for your users in AWS. You can create users and groups directly in IdC or connect your existing users and groups that you manage in a standards-based identity provider like Okta, Ping Identity, or Microsoft Entra ID (i.e., Azure Active Directory [AD]).
IdC directs authentication to your chosen source of truth for users and groups, and it maintains a directory of users and groups for access by Amazon Redshift.
You can share one IdC instance with multiple Amazon Redshift data warehouses with a simple auto-discovery and connect capability. This makes it fast to add clusters without the extra effort of configuring the IdC connection for each, and it ensures that all clusters and workgroups have a consistent view of users, their attributes, and groups. Note: Your organization’s IdC instance must be in the same region as the Amazon Redshift data warehouse you’re connecting to.
Because user identities are known and logged along with data access, it’s easier for you to meet compliance regulations through auditing user access in AWS CloudTrail authorizes access to data.

Amazon Redshift Query Editor V2 workflow:

End user initiates the flow using AWS access portal URL (this URL would be available on IdC dashboard console). A browser pop-up triggers and takes you to the Okta Login page where you enter Okta credentials. After successful authentication, you’ll be logged into the AWS Console as a federated user. Click on your AWS Account and choose the Amazon Redshift Query Editor V2 application. Once you federate to Query Editor V2, select the IdC authentication method.
QEv2 invokes browser flow where you re-authenticate, this time with their AWS IdC credentials. Since Okta is the IdP, you enter Okta credentials, which are already cached in browser. At this step, federation flow with IdC initiates and at the end of this flow, the Session token and Access token is available to the QEv2 console in browser as cookies.
Amazon Redshift retrieves your authorization details based on session token retrieved and fetches user’s group membership.
Upon a successful authentication, you’ll be redirected back to QEV2, but logged in as an IdC authenticated user.

This solution covers following steps:

Integrate Okta with AWS IdC to sync user and groups.
Setting up IdC integration with Amazon Redshift
Assign Users or Groups from IdC to Amazon Redshift Application.
Enable IdC integration for a new Amazon Redshift provisioned or Amazon Redshift Serverless endpoint.
Associate an IdC application with an existing provisioned or serverless data warehouse.
Configure Amazon Redshift role-based access.
Create a permission set.
Assign permission set to AWS accounts.
Federate to Redshift Query Editor V2 using IdC.
Troubleshooting

Prerequisites

An AWS account. If you don’t have one, you can sign up for one.
An Amazon Redshift Serverless data warehouse. For setup instructions, see Getting started with Amazon Redshift Serverless.
The latest Redshift Serverless JDBC SDK driver-dependent libraries (download the libraries and unzip the JDBC JAR zipped folder).
An Okta account that has an active subscription. You need an administrator role to set up the application on Okta. If you’re new to Okta, then you can sign up for a free trial or sign up for a developer account.

Walkthrough

Integrate Okta with AWS IdC to sync user and groups

Enable group and user provisioning from Okta with AWS IdC by following this documentation here.

If you see issues while syncing users and groups, then refer to this section these considerations for using automatic provisioning.

Setting up IAM IdC integration with Amazon Redshift

Amazon Redshift cluster administrator or Amazon Redshift Serverless administrator must perform steps to configure Redshift as an IdC-enabled application. This enables Amazon Redshift to discover and connect to the IdC automatically to receive sign-in and user directory services.

After this, when the Amazon Redshift administrator creates a cluster or workgroup, they can enable the new data warehouse to use IdC and its identity-management capabilities. The point of enabling Amazon Redshift as an IdC-managed application is so you can control user and group permissions from within the IdC, or from a source third-party identity provider that’s integrated with it.

When your database users sign in to an Amazon Redshift database, for example an analyst or a data scientist, it checks their groups in IdC and these are mapped to roles in Amazon Redshift. In this manner, a group can map to an Amazon Redshift database role that allows read access to a set of tables.

The following steps show how to make Amazon Redshift an AWS-managed application with IdC:

Select IAM Identity Center connection from Amazon Redshift console menu.
Choose Create application
The IAM Identity Center connection opens. Choose Next.
In IAM Identity Center integration setup section, for:
1. IAM Identity Center display name – Enter a unique name for Amazon Redshift’s IdC-managed application.
2. Managed application name – You can enter the managed Amazon Redshift application name or use the assigned value as it is.
In Connection with third-party identity providers section, for:
1. Identity Provider Namespace – Specify the unique namespace for your organization. This is typically an abbreviated version of your organization’s name. It’s added as a prefix for your IdC-managed users and roles in the Amazon Redshift database.
In IAM role for IAM Identity Center access – Select an IAM role to use. You can create a new IAM role if you don’t have an existing one. The specific policy permissions required are the following:

- sso:DescribeApplication – Required to create an identity provider (IdP) entry in the catalog.
- sso:DescribeInstance – Used to manually create IdP federated roles or users.
- redshift:DescribeQev2IdcApplications – Used to detect capability for IDC authentication from Redshift Query Editor V2.

The following screenshot is from the IAM role:

We won’t enable Trusted identity propagation because we are not integrating with AWS Lake Formation in this post.
Choose Next.
In Configure client connections that use third-party IdPs section, choose Yes if you want to connect Amazon Redshift with a third-party application. Otherwise, choose No. For this post we chose No because we’ll be integrating only with Amazon Redshift Query Editor V2.
Choose Next.
In the Review and create application section, review all the details you have entered before and choose Create application.

After the Amazon Redshift administrator finishes the steps and saves the configuration, the IdC properties appear in the Redshift Console. Completing these tasks makes Redshift an IdC-enabled application.

After you select the managed application name, the properties in the console includes the integration status. It says Success when it’s completed. This status indicates if IdC configuration is completed.

Assigning users or groups from IdC to Amazon Redshift application

In this step, Users or groups synced to your IdC directory are available to assign to your application where the Amazon Redshift administrator can decide which users or groups from IDC need to be included as part of Amazon Redshift application.

For example, if you have total 20 groups from your IdC and you don’t want all the groups to include as part of Amazon Redshift application, then you have options to choose which IdC groups to include as part of Amazon Redshift-enabled IdC application. Later, you can create two Redshift database roles as part of IDC integration in Amazon Redshift.

The following steps assign groups to Amazon Redshift-enabled IdC application:

On IAM Identity Center properties in the Amazon Redshift Console, select Assign under Groups tab.
If this is the first time you’re assigning groups, then you’ll see a notification. Select Get started.
Enter which groups you want to synchronize in the application. In this example, we chose the groups wssso-sales and awssso-finance.
Choose Done.

Enabling IdC integration for a new Amazon Redshift provisioned cluster or Amazon Redshift Serverless

After completing steps under section (Setting up IAM Identity Center integration with Amazon Redshift) — Amazon Redshift database administrator needs to configure new Redshift resources to work in alignment with IdC to make sign-in and data access easier. This is performed as part of the steps to create a provisioned cluster or a Serverless workgroup. Anyone with permissions to create Amazon Redshift resources can perform these IdC integration tasks. When you create a provisioned cluster, you start by choosing Create Cluster in the Amazon Redshift Console.

Choose Enable for the cluster (recommended) in the section for IAM Identity Center connection in the create-cluster steps.
From the drop down, choose the redshift application which you created in above steps.

Note that when a new data warehouse is created, the IAM role specified for IdC integration is automatically attached to the provisioned cluster or Serverless Namespace. After you finish entering the required cluster metadata and create the resource, you can check the status for IdC integration in the properties.

Associating an IdC application with an existing provisioned cluster or Serverless endpoint

If you have an existing provisioned cluster or serverless workgroup that you would like to enable for IdC integration, then you can do that by running a SQL command. You run the following command to enable integration. It’s required that a database administrator run the query.

CREATE IDENTITY PROVIDER "<idc_application_name>" TYPE AWSIDC
NAMESPACE '<idc_namespace_name>'
APPLICATION_ARN '<idc_application_arn>'
IAM_ROLE '<IAM role for IAM Identity Center access>';

Example:

CREATE IDENTITY PROVIDER "redshift-idc-app" TYPE AWSIDC
NAMESPACE 'awsidc'
APPLICATION_ARN 'arn:aws:sso::123456789012:application/ssoins-12345f67fe123d4/apl-a0b0a12dc123b1a4'
IAM_ROLE 'arn:aws:iam::123456789012:role/EspressoRedshiftRole';

To alter the IdP, use the following command (this new set of parameter values completely replaces the current values):

ALTER IDENTITY PROVIDER "<idp_name>"
NAMESPACE '<idc_namespace_name>'|
IAM_ROLE default | 'arn:aws:iam::<AWS account-id-1>:role/<role-name>';

Few of the examples are:

ALTER IDENTITY PROVIDER "redshift-idc-app"
NAMESPACE 'awsidctest';

ALTER IDENTITY PROVIDER "redshift-idc-app"
IAM_ROLE 'arn:aws:iam::123456789012:role/administratoraccess';

Note: If you update the idc-namespace value, then all the new cluster created afterwards will be using the updated namespace.

For existing clusters or serverless workgroups, you need to update the namespace manually on each Amazon Redshift cluster using the previous command. Also, all the database roles associated with identity provider will be updated with new namespace value.

You can disable or enable the identity provider using the following command:

ALTER IDENTITY PROVIDER <provider_name> disable|enable;

Example:

ALTER IDENTITY PROVIDER <provider_name> disable;

You can drop an existing identity provider. The following example shows how CASCADE deletes users and roles attached to the identity provider.

DROP IDENTITY PROVIDER <provider_name> [ CASCADE ]

Configure Amazon Redshift role-based access

In this step, we pre-create the database roles in Amazon Redshift based on the groups that you synced in IdC. Make sure the role name matches with the IdC Group name.

Amazon Redshift roles simplify managing privileges required for your end-users. In this post, we create two database roles, sales and finance, and grant them access to query tables with sales and finance data, respectively. You can download this sample SQL Notebook and import into Redshift Query Editor v2 to run all cells in the notebook used in this example. Alternatively, you can copy and enter the SQL into your SQL client.

Below is the syntax to create role in Amazon Redshift:

create role "<IDC_Namespace>:<IdP groupname>;

For example:

create role "awsidc:awssso-sales";
create role "awsidc:awssso-finance";

Create the sales and finance database schema:

create schema sales_schema;
create schema finance_schema;

Creating the tables:

CREATE TABLE IF NOT EXISTS finance_schema.revenue
(
account INTEGER   ENCODE az64
,customer VARCHAR(20)   ENCODE lzo
,salesamt NUMERIC(18,0)   ENCODE az64
)
DISTSTYLE AUTO
;

insert into finance_schema.revenue values (10001, 'ABC Company', 12000);
insert into finance_schema.revenue values (10002, 'Tech Logistics', 175400);
insert into finance_schema.revenue values (10003, 'XYZ Industry', 24355);
insert into finance_schema.revenue values (10004, 'The tax experts', 186577);

CREATE TABLE IF NOT EXISTS sales_schema.store_sales
(
ID INTEGER   ENCODE az64,
Product varchar(20),
Sales_Amount INTEGER   ENCODE az64
)
DISTSTYLE AUTO
;

Insert into sales_schema.store_sales values (1,'product1',1000);
Insert into sales_schema.store_sales values (2,'product2',2000);
Insert into sales_schema.store_sales values (3,'product3',3000);
Insert into sales_schema.store_sales values (4,'product4',4000);

Below is the syntax to grant permission to the Amazon Redshift Serverless role:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]| ALL [ PRIVILEGES ] } ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] } TO role <IdP groupname>;

Grant relevant permission to the role as per your requirement. In following example, we grant full permission to role sales on sales_schema and only select permission on finance_schema to role finance.

For example:

grant usage on schema sales_schema to role "awsidc:awssso-sales";
grant select on all tables in schema sales_schema to role "awsidc:awssso-sales";

grant usage on schema finance_schema to role "awsidc:awssso-finance";
grant select on all tables in schema finance_schema to role "awsidc:awssso-finance";

Create a permission set

A permission set is a template that you create and maintain that defines a collection of one or more IAM policies. Permission sets simplify the assignment of AWS account access for users and groups in your organization. We’ll create a permission set to allow federated user to access Query Editor V

The following steps to create permission set:

Open the IAM Identity Center Console.
In the navigation pane, under Multi-Account permissions, choose Permission sets.
Choose Create permission set.
Choose Custom permission set and then choose Next.
Under AWS managed policies, choose AmazonRedshiftQueryEditorV2ReadSharing.
Under Customer managed policies, provide the policy name which you created in step 4 under section – Setting up IAM Identity Center integration with Amazon Redshift.
Choose Next.
Enter permission set name. For example, Amazon Redshift-Query-Editor-V2.
Under Relay state – optional – set default relay state to the Query Editor V2 URL, using the format : https://<region>.console.aws.amazon.com/sqlworkbench/home.
For this post, we use: https://us-east-1.console.aws.amazon.com/sqlworkbench/home.
Choose Next.
On the Review and create screen, choose Create. The console displays the following message: The permission set Redshift-Query-Editor-V2 was successfully created.

Assign permission set to AWS accounts

Open the IAM Identity Center Console.
In the navigation pane, under Multi-account permissions, choose AWS accounts.
On the AWS accounts page, select one or more AWS accounts that you want to assign single sign-on access to.
Choose Assign users or groups.
On the Assign users and groups to AWS-account-name, choose the groups that you want to create the permission set for. Then, choose Next.
On the Assign permission sets to AWS-account-name, choose the permission set you created in the section – Create a permission set. Then, choose Next.
On the Review and submit assignments to AWS-account-name page, for Review and submit, choose Submit. The console displays the following message: We reprovisioned your AWS account successfully and applied the updated permission set to the account.

Federate to Amazon Redshift using Query Editor V2 using IdC

Now you’re ready to connect to Amazon Redshift Query Editor V2 and federated login using IdC authentication:

Open the IAM Identity Center Console.
Go to dashboard and select the AWS access portal URL.
A browser pop-up triggers and takes you to the Okta Login page where you enter your Okta credentials.
After successful authentication, you’ll be logged into the AWS console as a federated user.
Select your AWS Account and choose the Amazon Redshift Query Editor V2 application.
Once you federate to Query Editor V2, choose your Redshift instance (i.e., right-click) and choose Create connection.
To authenticate using IdC, choose the authentication method IAM Identity Center.
It will show a pop-up and since your Okta credentials is already cached, it utilizes the same credentials and connects to Amazon Redshift Query Editor V2 using IdC authentication.

The following demonstration shows a federated user (Ethan) used the AWS access portal URL to access Amazon Redshift using IdC authentication. User Ethan accesses the sales_schema tables. If User Ethan tries to access the tables in finance_schema, then the user gets a permission denied error.

Troubleshooting

If you get the following error:

ERROR: registered identity provider does not exist for "<idc-namespace:redshift_database_role_name"

This means that you are trying to create a role with a wrong namespace. Please check current namespace using the command select * from identity_providers;

If you get below error:

ERROR: (arn:aws:iam::123456789012:role/<iam_role_name>) Insufficient privileges to access AWS IdC. [ErrorId: 1-1234cf25-1f23ea1234c447fb12aa123e]

This means that an IAM role doesn’t have sufficient privileges to access to the IdC. Your IAM role should contain a policy with following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "redshift:DescribeQev2IdcApplications",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "sso:DescribeApplication",
                "sso:DescribeInstance"
            ],
            "Resource": "arn:aws:sso::726034267621:application/ssoins-722324fe82a3f0b8/*"
        }
    ]
}

If you get below error:

FATAL: Invalid scope. User's credentials are not authorized to connect to Redshift. [SQL State=28000]

Please make sure that the user and group are added to the Amazon Redshift IdC application.

Clean up

Complete the following steps to clean up your resources:

Delete the Okta Applications which you have created to integrate with IdC.
Delete IAM Identity Center configuration.
Delete the Redshift application and the Redshift provisioned cluster which you have created for testing.
Delete the IAM role which you have created for IdC and Redshift integration.

Conclusion

In this post, we showed you a detailed walkthrough of how you can integrate Okta with the IdC and Amazon Redshift Query Editor version 2 to simplify your SSO setup. This integration allows you to use role-based access control with Amazon Redshift. We encourage you to try out this integration.

To learn more about IdC with Amazon Redshift, visit the documentation.

About the Authors

Harshida Patel is a Principal Solutions Architect, Analytics with AWS.

Karthik Ramanathan is a Sr. Software Engineer with AWS Redshift and is based in San Francisco. He brings close to two decades of development experience across the networking, data storage and IoT verticals prior to Redshift. When not at work, he is also a writer and loves to be in the water.

Simplify external object access in Amazon Redshift using automatic mounting of the AWS Glue Data Catalog

2023-07-28 Maneesh Sharma

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/simplify-external-object-access-in-amazon-redshift-using-automatic-mounting-of-the-aws-glue-data-catalog/

Amazon Redshift is a petabyte-scale, enterprise-grade cloud data warehouse service delivering the best price-performance. Today, tens of thousands of customers run business-critical workloads on Amazon Redshift to cost-effectively and quickly analyze their data using standard SQL and existing business intelligence (BI) tools.

Amazon Redshift now makes it easier for you to run queries in AWS data lakes by automatically mounting the AWS Glue Data Catalog. You no longer have to create an external schema in Amazon Redshift to use the data lake tables cataloged in the Data Catalog. Now, you can use your AWS Identity and Access Management (IAM) credentials or IAM role to browse the Glue Data Catalog and query data lake tables directly from Amazon Redshift Query Editor v2 or your preferred SQL editors.

This feature is now available in all AWS commercial and US Gov Cloud Regions where Amazon Redshift RA3, Amazon Redshift Serverless, and AWS Glue are available. To learn more about auto-mounting of the Data Catalog in Amazon Redshift, refer to Querying the AWS Glue Data Catalog.

Enabling easy analytics for everyone

Amazon Redshift is helping tens of thousands of customers manage analytics at scale. Amazon Redshift offers a powerful analytics solution that provides access to insights for users of all skill levels. You can take advantage of the following benefits:

It enables organizations to analyze diverse data sources, including structured, semi-structured, and unstructured data, facilitating comprehensive data exploration
With its high-performance processing capabilities, Amazon Redshift handles large and complex datasets, ensuring fast query response times and supporting real-time analytics
Amazon Redshift provides features like Multi-AZ (preview) and cross-Region snapshot copy for high availability and disaster recovery, and provides authentication and authorization mechanisms to make it reliable and secure
With features like Amazon Redshift ML, it democratizes ML capabilities across a variety of user personas
The flexibility to utilize different table formats such as Apache Hudi, Delta Lake, and Apache Iceberg (preview) optimizes query performance and storage efficiency
Integration with advanced analytical tools empowers you to apply sophisticated techniques and build predictive models
Scalability and elasticity allow for seamless expansion as data and workloads grow

Overall, Amazon Redshift empowers organizations to uncover valuable insights, enhance decision-making, and gain a competitive edge in today’s data-driven landscape.

Amazon Redshift Top Benefits

The new automatic mounting of the AWS Glue Data Catalog feature enables you to directly query AWS Glue objects in Amazon Redshift without the need to create an external schema for each AWS Glue database you want to query. With automatic mounting the Data Catalog, Amazon Redshift automatically mounts the cluster account’s default Data Catalog during boot or user opt-in as an external database, named awsdatacatalog.

Relevant use cases for automatic mounting of the AWS Glue Data Catalog feature

You can use tools like Amazon EMR to create new data lake schemas in various formats, such as Apache Hudi, Delta Lake, and Apache Iceberg (preview). However, when analysts want to run queries against these schemas, it requires administrators to create external schemas for each AWS Glue database in Amazon Redshift. You can now simplify this integration using automatic mounting of the AWS Glue Data Catalog.

The following diagram illustrates this architecture.

Solution overview

You can now use SQL clients like Amazon Redshift Query Editor v2 to browse and query awsdatacatalog. In Query Editor V2, to connect to the awsdatacatalog database, choose the following:

Must use authentication method Temporary credentials using your IAM identity with the Redshift provisioned cluster
Must use the authentication method federated user to connect with a Redshift Serverless workgroup.

Complete the following high-level steps to integrate the automatic mounting of the Data Catalog using Query Editor V2 and a third-party SQL client:

Provision resources with AWS CloudFormation to populate Data Catalog objects.
Connect Redshift Serverless and query the Data Catalog as a federated user using Query Editor V2.
Connect with Redshift provisioned cluster and query the Data Catalog using Query Editor V2.
Configure permissions on catalog resources using AWS Lake Formation.
Federate with Redshift Serverless and query the Data Catalog using Query Editor V2 and a third-party SQL client.
Discover the auto-mounted objects.
Connect with Redshift provisioned cluster and query the Data Catalog as a federated user using a third-party client.
Connect with Amazon Redshift and query the Data Catalog as an IAM user using third-party clients.

The following diagram illustrates the solution workflow.

Prerequisites

You should have the following prerequisites:

An AWS account. If you don’t have one, you can sign up for one.
A Redshift cluster. For setup instructions, see Create a sample Amazon Redshift cluster.
Alternatively, you could use a Redshift Serverless endpoint. For setup instructions, see Getting started with Amazon Redshift Serverless.
The latest Amazon Redshift JDBC driver version.
A SQL client such as SQL workbench/J.

Provision resources with AWS CloudFormation to populate Data Catalog objects

In this post, we use an AWS Glue crawler to create the external table ny_pub stored in Apache Parquet format in the Amazon Simple Storage Service (Amazon S3) location s3://redshift-demos/data/NY-Pub/. In this step, we create the solution resources using AWS CloudFormation to create a stack named CrawlS3Source-NYTaxiData in either us-east-1 (use the yml download or launch stack) or us-west-2 (use the yml download or launch stack). Stack creation performs the following actions:

Creates the crawler NYTaxiCrawler along with the new IAM role AWSGlueServiceRole-RedshiftAutoMount
Creates automountdb as the AWS Glue database

When the stack is complete, perform the following steps:

On the AWS Glue console, under Data Catalog in the navigation pane, choose Crawlers.
Open NYTaxiCrawler and choose Run crawler.

After the crawler is complete, you can see a new table called ny_pub in the Data Catalog under the automountdb database.

Alternatively, you can follow the manual instructions from the Amazon Redshift labs to create the ny_pub table.

Connect with Redshift Serverless and query the Data Catalog as a federated user using Query Editor V2

In this section, we use an IAM role with principal tags to enable fine-grained federated authentication to Redshift Serverless to access auto-mounting AWS Glue objects.

Complete the following steps:

Create an IAM role and add following permissions. For this post, we add full AWS Glue, Amazon Redshift, and Amazon S3 permissions for demo purposes. In an actual production scenario, it’s recommended to apply more granular permissions.
On the Tags tab, create a tag with Key as RedshiftDbRoles and Value as automount.
In Query Editor V2, run the following SQL statement as an admin user to create a database role named automount:
```
Create role automount;
```

Grant usage privileges to the database role:

GRANT USAGE ON DATABASE awsdatacatalog to role automount;

Switch the role to automountrole by passing the account number and role name.
In the Query Editor v2, choose your Redshift Serverless endpoint (right-click) and choose Create connection.
For Authentication, select Federated user.
For Database, enter the database name you want to connect to.
Choose Create connection.

You’re now ready to explore and query the automatic mounting of the Data Catalog in Redshift Serverless.

Connect with Redshift provisioned cluster and query the Data Catalog using Query Editor V2

To connect with Redshift provisioned cluster and access the Data Catalog, make sure you have completed the steps in the preceding section. Then complete the following steps:

Connect to Redshift Query Editor V2 using the database user name and password authentication method. For example, connect to the dev database using the admin user and password.
In an editor tab, assuming the user is present in Amazon Redshift, run the following SQL statement to grant an IAM user access to the Data Catalog:
```
GRANT USAGE ON DATABASE awsdatacatalog to "IAMR:automountrole";
```
As an admin user, choose the Settings icon, choose Account settings, and select Authenticate with IAM credentials.
Choose Save.
Switch roles to automountrole by passing the account number and role name.
Create or edit the connection and use the authentication method Temporary credentials using your IAM identity.

For more information about this authentication method, see Connecting to an Amazon Redshift database.

You are ready to explore and query the automatic mounting of the Data Catalog in Amazon Redshift.

Discover the auto-mounted objects

This section illustrates the SHOW commands for discovery of auto-mounted objects. See the following code:

// Discovery of Glue databases at the schema level 
SHOW SCHEMAS FROM DATABASE awsdatacatalog;

// Discovery of Glue tables 
 Syntax: SHOW TABLES FROM SCHEMA awsdatacatalog.<glue_db_name>;
Example: SHOW TABLES FROM SCHEMA awsdatacatalog.automountdb;

// Disocvery of Glue table columns 
 Syntax: SHOW COLUMNS FROM TABLE awsdatacatalog.<glue_db_name>.<glue_table_name>;
Example: SHOW COLUMNS FROM TABLE awsdatacatalog.automountdb.ny_pub;

Configure permissions on catalog resources using AWS Lake Formation

To maintain backward compatibility with AWS Glue, Lake Formation has the following initial security settings:

The Super permission is granted to the group IAMAllowedPrincipals on all existing Data Catalog resources
The Use only IAM access control setting is enabled for new Data Catalog resources

These settings effectively cause access to Data Catalog resources and Amazon S3 locations to be controlled solely by IAM policies. Individual Lake Formation permissions are not in effect.

In this step, we will configure permissions on catalog resources using AWS Lake Formation. Before you create the Data Catalog, you need to update the default settings of Lake Formation so that access to Data Catalog resources (databases and tables) is managed by Lake Formation permissions:

Change the default security settings for new resources. For instructions, see Change the default permission model.
Change the settings for existing Data Catalog resources. For instructions, see Upgrading AWS Glue data permissions to the AWS Lake Formation model.

For more information, refer to Changing the default settings for your data lake.

Federate with Redshift Serverless and query the Data Catalog using Query Editor V2 and a third-party SQL client

With Redshift Serverless, you can connect to awsdatacatalog from a third-party client as a federated user from any identity provider (IdP). In this section, we will configure permission on catalog resources for Federated IAM role in AWS Lake Formation. Using AWS Lake Formation with Redshift, currently permission can be applied on IAM user or IAM role level.

To connect as a federated user, we will be using Redshift Serverless. For setup instructions, refer to Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients.

There are additional changes required on following resources:

In Amazon Redshift, as an admin user, grant the usage to each federated user who needs access on awsdatacatalog:
```
GRANT USAGE ON DATABASE awsdatacatalog to "IAMR:[email protected]";
```

If the user doesn’t exist in Amazon Redshift, you may need to create the IAM user with the password disabled as shown in the following code and then grant usage on awsdatacatalog:

Create User "IAMR:[email protected]" with password disable;

On the Lake Formation console, assign permissions on the AWS Glue database to the IAM role that you created as part of the federated setup.
1. Under Principals, select IAM users and roles.
2. Choose IAM role oktarole.
3. Apply catalog resource permissions, selecting automountdb database and granting appropriate table permissions.
Update the IAM role used in the federation setup. In addition to the permissions added to the IAM role, you need to add AWS Glue permissions and Amazon S3 permissions to access objects from Amazon S3. For this post, we add full AWS Glue and AWS S3 permissions for demo purposes. In an actual production scenario, it’s recommended to apply more granular permissions.

Now you’re ready to connect to Redshift Serverless using the Query Editor V2 and federated login.

Use the SSO URL from Okta and log in to your Okta account with your user credentials. For this demo, we log in with user Ethan.
In the Query Editor v2, choose your Redshift Serverless instance (right-click) and choose Create connection.
For Authentication, select Federated user.
For Database, enter the database name you want to connect to.
Choose Create connection.
Run the command select current_user to validate that you are logged in as a federated user.

User Ethan will be able to explore and access awsdatacatalog data.

To connect Redshift Serverless with a third-party client, make sure you have followed all the previous steps.

For SQLWorkbench setup, refer to the section Configure the SQL client (SQL Workbench/J) in Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients.

The following screenshot shows that federated user ethan is able to query the awsdatacatalog tables using three-part notation:

Connect with Redshift provisioned cluster and query the Data Catalog as a federated user using third-party clients

With Redshift provisioned cluster, you can connect with awsdatacatalog from a third-party client as a federated user from any IdP.

To connect as a federated user with the Redshift provisioned cluster, you need to follow the steps in the previous section that detailed how to connect with Redshift Serverless and query the Data Catalog as a federated user using Query Editor V2 and a third-party SQL client.

There are additional changes required in IAM policy. Update the IAM policy with the following code to use the GetClusterCredentialsWithIAM API:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "iam:ListGroups",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "redshift:GetClusterCredentialsWithIAM",
            "Resource": "arn:aws:redshift:us-east-2:01234567891:dbname:redshift-cluster-1/dev"
        }
    ]
}

Now you’re ready to connect to Redshift provisioned cluster using a third-party SQL client as a federated user.

For SQLWorkbench setup, refer to the section Configure the SQL client (SQL Workbench/J) in the post Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients.

Make the following changes:

Use the latest Redshift JDBC driver because it only supports querying the auto-mounted Data Catalog table for federated users
For URL, enter jdbc:redshift:iam://<cluster endpoint>:<port>:<databasename>?groupfederation=true. For example, jdbc:redshift:iam://redshift-cluster-1.abdef0abc0ab.us-east-2.redshift.amazonaws.com:5439/dev?groupfederation=true.

In the preceding URL, groupfederation is a mandatory parameter that allows you to authenticate with the IAM credentials.

The following screenshot shows that federated user ethan is able to query the awsdatacatalog tables using three-part notation.

Connect and query the Data Catalog as an IAM user using third-party clients

In this section, we provide instructions to set up a SQL client to query the auto-mounted awsdatacatalog.

Use three-part notation to reference the awsdatacatalog table in your SELECT statement. The first part is the database name, the second part is the AWS Glue database name, and the third part is the AWS Glue table name:

SELECT * FROM awsdatacatalog.<aws-glue-db-name>.<aws-glue-table-name>;

You can perform various scenarios that read the Data Catalog data and populate Redshift tables.

For this post, we use SQLWorkbench/J as the SQL client to query the Data Catalog. To set up SQL Workbench/J, complete the following steps:

Create a new connection in SQL Workbench/J and choose Amazon Redshift as the driver.
Choose Manage drivers and add all the files from the downloaded AWS JDBC driver pack .zip file (remember to unzip the .zip file).

You must use the latest Redshift JDBC driver because it only supports querying the auto-mounted Data Catalog table.

For URL, enter jdbc:redshift:iam://<cluster endpoint>:<port>:<databasename>?profile=<profilename>&groupfederation=true. For example, jdbc:redshift:iam://redshift-cluster-1.abdef0abc0ab.us-east-2.redshift.amazonaws.com:5439/dev?profile=user2&groupfederation=true.

We are using profile-based credentials as an example. You can use any AWS profile or IAM credential-based authentication as per your requirement. For more information on IAM credentials, refer to Options for providing IAM credentials.

The following screenshot shows that IAM user johndoe is able to list the awsdatacatalog tables using the SHOW command.

The following screenshot shows that IAM user johndoe is able to query the awsdatacatalog tables using three-part notation:

If you get the following error while using groupfederation=true, you need to use the latest Redshift driver:

Something unusual has occurred to cause the driver to fail. Please report this exception:Authentication with plugin is not supported for group federation [SQL State=99999]

Clean up

Complete the following steps to clean up your resources:

Delete the IAM role automountrole.
Delete the CloudFormation stack CrawlS3Source-NYTaxiData to clean up the crawler NYTaxiCrawler, the automountdb database from the Data Catalog, and the IAM role AWSGlueServiceRole-RedshiftAutoMount.
Update the default settings of Lake Formation:
1. In the navigation pane, under Data catalog, choose Settings.
2. Select both access control options choose Save.
3. In the navigation pane, under Permissions, choose Administrative roles and tasks.
4. In the Database creators section, choose Grant.
5. Search for IAMAllowedPrincipals and select Create database permission.
6. Choose Grant.

Considerations

Note the following considerations:

The Data Catalog auto-mount provides ease of use to analysts or database users. The security setup (setting up the permissions model or data governance) is owned by account and database administrators.
- To achieve fine-grained access control, build a permissions model in AWS Lake Formation.
- If the permissions have to be maintained at the Redshift database level, leave the AWS Lake Formation default settings as is and then run grant/revoke in Amazon Redshift.
If you are using a third-party SQL editor, and your query tool does not support browsing of multiple databases, you can use the “SHOW“ commands to list your AWS Glue databases and tables. You can also query awsdatacatalog objects using three-part notation (SELECT * FROM awsdatacatalog.<aws-glue-db-name>.<aws-glue-table-name>;) provided you have access to the external objects based on the permission model.

Conclusion

In this post, we introduced the automatic mounting of AWS Glue Data Catalog, which makes it easier for customers to run queries in their data lakes. This feature streamlines data governance and access control, eliminating the need to create an external schema in Amazon Redshift to use the data lake tables cataloged in AWS Glue Data Catalog. We showed how you can manage permission on auto-mounted AWS Glue-based objects using Lake Formation. The permission model can be easily managed and organized by administrators, allowing database users to seamlessly access external objects they have been granted access to.

As we strive for enhanced usability in Amazon Redshift, we prioritize unified data governance and fine-grained access control. This feature minimizes manual effort while ensuring the necessary security measures for your organization are in place.

For more information about automatic mounting of the Data Catalog in Amazon Redshift, refer to Querying the AWS Glue Data Catalog.

About the Authors

Rohit Vashishtha is a Senior Analytics Specialist Solutions Architect at AWS based in Dallas, Texas. He has 17 years of experience architecting, building, leading, and maintaining big data platforms. Rohit helps customers modernize their analytic workloads using the breadth of AWS services and ensures that customers get the best price/performance with utmost security and data governance.

Single sign-on with Amazon Redshift Serverless with Okta using Amazon Redshift Query Editor v2 and third-party SQL clients

2023-05-05 Maneesh Sharma

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/single-sign-on-with-amazon-redshift-serverless-with-okta-using-amazon-redshift-query-editor-v2-and-third-party-sql-clients/

Amazon Redshift Serverless makes it easy to run and scale analytics in seconds without the need to set up and manage data warehouse clusters. With Redshift Serverless, users such as data analysts, developers, business professionals, and data scientists can get insights from data by simply loading and querying data in the data warehouse.

Customers use their preferred SQL clients to analyze their data in Redshift Serverless. They want to use an identity provider (IdP) or single sign-on (SSO) credentials to connect to Redshift Serverless to reuse existing using credentials and avoid additional user setup and configuration. When you use AWS Identity and Access Management (IAM) or IdP-based credentials to connect to a serverless data warehouse, Amazon Redshift automatically creates a database user for the end-user. You can simplify managing user privileges by using role-based access control. Admins can use a database-role mapping for SSO with the IAM roles that users are assigned to get their database privileges automatically. With this integration, organizations can simplify user management because they no longer need to create users and map them to database roles manually. You can define the mapped database roles as a principal tag for the IdP groups or IAM role, so Amazon Redshift database roles and users who are members of those IdP groups are granted to the database roles automatically.

In this post, we focus on Okta as the IdP and provide step-by-step guidance to integrate Redshift Serverless with Okta using the Amazon Redshift Query Editor V2 and with SQL clients like SQL Workbench/J. You can use this mechanism with other IdP providers such as Azure Active Directory or Ping with any applications or tools using Amazon’s JDBC/ODBC/Python driver.

Solution overview

The following diagram illustrates the authentication flow of Okta with Redshift Serverless using federated IAM roles and automatic database-role mapping.

The workflow contains the following steps:

Either the user chooses an IdP app in their browser, or the SQL client initiates a user authentication request to the IdP (Okta).
Upon a successful authentication, Okta submits a request to the AWS federation endpoint with a SAML assertion containing the PrincipalTags.
The AWS federation endpoint validates the SAML assertion and invokes the AWS Security Token Service (AWS STS) API AssumeRoleWithSAML. The SAML assertion contains the IdP user and group information that is stored in the RedshiftDbUser and RedshiftDbRoles principal tags, respectively. Temporary IAM credentials are returned to the SQL client or, if using the Query Editor v2, the user’s browser is redirected to the Query Editor v2 console using the temporary IAM credentials.
The temporary IAM credentials are used by the SQL client or Query Editor v2 to call the Redshift Serverless GetCredentials API. The API uses the principal tags to determine the user and database roles that the user belongs to. An associated database user is created if the user is signing in for the first time and is granted the matching database roles automatically. A temporary password is returned to the SQL client.
Using the database user and temporary password, the SQL client or Query Editor v2 connects to Redshift Serverless. Upon login, the user is authorized based on the Amazon Redshift database roles that were assigned in Step 4.

To set up the solution, we complete the following steps:

Set up your Okta application:
- Create Okta users.
- Create groups and assign groups to users.
- Create the Okta SAML application.
- Collect Okta information.
Set up AWS configuration:
- Create the IAM IdP.
- Create the IAM role and policy.
Configure Redshift Serverless role-based access.
Federate to Redshift Serverless using the Query Editor V2.
Configure the SQL client (for this post, we use SQL Workbench/J).
Optionally, implement MFA with SQL Client and Query Editor V2.

Prerequisites

You need the following prerequisites to set up this solution:

An AWS account. If you don’t have one, you can sign up for one.
An Redshift Serverless data warehouse. For setup instructions, see Getting started with Amazon Redshift Serverless.
The latest Redshift Serverless JDBC SDK driver-dependent libraries (download the libraries and unzip the JDBC JAR zipped folder).
An Okta account that has an active subscription. You need an admin role to set up the application on Okta. If you’re new to Okta, you can sign up for a free trial or sign up for a developer account.
A SQL client such as SQL Workbench/J.

Set up Okta application

In this section, we provide the steps to configure your Okta application.

Create Okta users

To create your Okta users, complete the following steps:

Sign in to your Okta organization as a user with administrative privileges.
On the admin console, under Directory in the navigation pane, choose People.
Choose Add person.
For First Name, enter the user’s first name.
For Last Name, enter the user’s last name.
For Username, enter the user’s user name in email format.
Select I will set password and enter a password.
Optionally, deselect User must change password on first login if you don’t want the user to change their password when they first sign in. Choose Save.

Create groups and assign groups to users

To create your groups and assign them to users, complete the following steps:

Sign in to your Okta organization as a user with administrative privileges.
On the admin console, under Directory in the navigation pane, choose Groups.
Choose Add group.
Enter a group name and choose Save.
Choose the recently created group and then choose Assign people.
Choose the plus sign and then choose Done.
Repeat Steps 1–6 to add more groups.

In this post, we create two groups: sales and finance.

Create an Okta SAML application

To create your Okta SAML application, complete the following steps:

Sign in to your Okta organization as a user with administrative privileges.
On the admin console, under Applications in the navigation pane, choose Applications.
Choose Create App Integration.
Select SAML 2.0 as the sign-in method and choose Next.
Enter a name for your app integration (for example, redshift_app) and choose Next.
Enter following values in the app and leave the rest as is:
- For Single Sign On URL, enter https://signin.aws.amazon.com/saml.
- For Audience URI (SP Entity ID), enter urn:amazon:webservices.
- For Name ID format, enter EmailAddress.
Choose Next.
Choose I’m an Okta customer adding an internal app followed by This is an internal app that we have created.
Choose Finish.
Choose Assignments and then choose Assign.
Choose Assign to groups and then select Assign next to the groups that you want to add.
Choose Done.

Set up Okta advanced configuration

After you create the custom SAML app, complete the following steps:

On the admin console, navigate to General and choose Edit under SAML settings.
Choose Next.
Set Default Relay State to the Query Editor V2 URL, using the format https://<region>.console.aws.amazon.com/sqlworkbench/home. For this post, we use https://us-west-2.console.aws.amazon.com/sqlworkbench/home.
Under Attribute Statements (optional), add the following properties:
- Provide the IAM role and IdP in comma-separated format using the Role attribute. You’ll create this same IAM role and IdP in a later step when setting up AWS configuration.
- Set user.login for RoleSessionName. This is used as an identifier for the temporary credentials that are issued when the role is assumed.
- Set the DB roles using PrincipalTag:RedshiftDbRoles. This uses the Okta groups to fill the principal tags and map them automatically with the Amazon Redshift database roles. Its value must be a colon-separated list in the format role1:role2.
- Set user.login for PrincipalTag:RedshiftDbUser. This uses the user name in the directory. This is a required tag and defines the database user that is used by Query Editor V2.
- Set the transitive keys using TransitiveTagKeys. This prevents users from changing the session tags in case of role chaining.

The preceding tags are forwarded to the GetCredentials API to get temporary credentials for your Redshift Serverless instance and map automatically with Amazon Redshift database roles. The following table summarizes their attribute statements configuration.

Name	Name Format	Format	Example
https://aws.amazon.com/SAML/Attributes/Role	Unspecified	`arn:aws:iam::<yourAWSAccountID>:role/role-name,arn:aws:iam:: <yourAWSAccountID>:saml-provider/provider-name`	`arn:aws:iam::112034567890:role/oktarole,arn:aws:iam::112034567890:saml-provider/oktaidp`
https://aws.amazon.com/SAML/Attributes/RoleSessionName	Unspecified	`user.login`	`user.login`
https://aws.amazon.com/SAML/Attributes/PrincipalTag:RedshiftDbRoles	Unspecified	`String.join(":", isMemberOfGroupName("group1") ? 'group1' : '', isMemberOfGroupName("group2") ? 'group2' : '')`	`String.join(":", isMemberOfGroupName("sales") ? 'sales' : '', isMemberOfGroupName("finance") ? 'finance' : '')`
https://aws.amazon.com/SAML/Attributes/PrincipalTag:RedshiftDbUser	Unspecified	`user.login`	`user.login`
https://aws.amazon.com/SAML/Attributes/TransitiveTagKeys	Unspecified	`Arrays.flatten("RedshiftDbUser", "RedshiftDbRoles")`	`Arrays.flatten("RedshiftDbUser", "RedshiftDbRoles")`

After you add the attribute claims, choose Next followed by Finish.

Your attributes should be in similar format as shown in the following screenshot.

Collect Okta information

To gather your Okta information, complete the following steps:

On the Sign On tab, choose View SAML setup instructions.
For Identity Provider Single Sign-on URL, Use this URL when connecting with any third-party SQL client such as SQL Workbench/J.
Use the IdP metadata in block 4 and save the metadata file in .xml format (for example, metadata.xml).

Set up AWS configuration

In this section, we provide the steps to configure your IAM resources.

Create the IAM IdP

To create your IAM IdP, complete the following steps:

On the IAM console, under Access management in the navigation pane, choose Identity providers.
Choose Add provider.
For Provider type¸ select SAML.
For Provider name¸ enter a name.
Choose Choose file and upload the metadata file (.xml) you downloaded earlier.
Choose Add provider.

Create the IAM Amazon Redshift access policy

To create your IAM policy, complete the following steps:

On the IAM console, choose Policies.
Choose Create policy.
On the Create policy page, choose the JSON tab.

For the policy, enter the JSON in following format:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "redshift-serverless:GetCredentials",
            "Resource": "<Workgroup ARN>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "redshift-serverless:ListWorkgroups",
            "Resource": "*"
        }
    ]
}

The workgroup ARN is available on the Redshift Serverless workgroup configuration page.

The following example policy includes only a single Redshift Serverless workgroup; you can modify the policy to include multiple workgroups in the Resource section:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "redshift-serverless:GetCredentials",
            "Resource": "arn:aws:redshift-serverless:us-west-2:123456789012:workgroup/4a4f12vc-123b-2d99-fd34-a12345a1e87f"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "redshift-serverless:ListWorkgroups",
            "Resource": "*"
        }
    ]
}

Choose Next: Tags.
Choose Next: Review.
In the Review policy section, for Name, enter the name of your policy; for example, OktaRedshiftPolicy.
For Description, you can optionally enter a brief description of what the policy does.
Choose Create policy.

Create the IAM role

To create your IAM role, complete the following steps:

On the IAM console, choose Roles in the navigation pane.
Choose Create role.
For Trusted entity type, select SAML 2.0 federation.
For SAML 2.0-based provider, choose the IdP you created earlier.
Select Allow programmatic and AWS Management Console access.
Choose Next.
Choose the policy you created earlier.
Also, add the policy AmazonRedshiftQueryEditorV2ReadSharing.
Choose Next.
In the Review section, for Role Name, enter the name of your role; for example, oktarole.
For Description, you can optionally enter a brief description of what the role does.
Choose Create role.
Navigate to the role that you just created and choose Trust Relationships.
Choose Edit trust policy and choose TagSession under Add actions for STS.

When using session tags, trust policies for all roles connected to the IdP passing tags must have the sts:TagSession permission. For roles without this permission in the trust policy, the AssumeRole operation fails.

Choose Update policy.

The following screenshot shows the role permissions.

The following screenshot shows the trust relationships.

Update the advanced Okta Role Attribute

Complete the following steps:

Switch back to Okta.com.
Navigate to the application which you created earlier.
Navigate to General and click Edit under SAML settings.
Under Attribute Statements (optional), update the value for the attribute – https://aws.amazon.com/SAML/Attributes/Role, using the actual role and identity provider arn values from the above step. For example, arn:aws:iam::123456789012:role/oktarole,arn:aws:iam::123456789012:saml-provider/oktaidp.

Configure Redshift Serverless role-based access

In this step, we create database roles in Amazon Redshift based on the groups that you created in Okta. Make sure the role name matches with the Okta Group name.

The following is the syntax to create a role in Redshift Serverless:

create role <IdP groupname>;

For example:

create role sales;
create role finance;

Create the sales and finance database schema:

create schema sales_schema;
create schema finance_schema;

Create the tables:

CREATE TABLE IF NOT EXISTS finance_schema.revenue
(
account INTEGER   ENCODE az64
,customer VARCHAR(20)   ENCODE lzo
,salesamt NUMERIC(18,0)   ENCODE az64
)
DISTSTYLE AUTO
;

insert into finance_schema.revenue values (10001, 'ABC Company', 12000);
insert into finance_schema.revenue values (10002, 'Tech Logistics', 175400);
insert into finance_schema.revenue values (10003, 'XYZ Industry', 24355);
insert into finance_schema.revenue values (10004, 'The tax experts', 186577);

CREATE TABLE IF NOT EXISTS sales_schema.store_sales
(
ID INTEGER   ENCODE az64,
Product varchar(20),
Sales_Amount INTEGER   ENCODE az64
)
DISTSTYLE AUTO
;

Insert into sales_schema.store_sales values (1,'product1',1000);
Insert into sales_schema.store_sales values (2,'product2',2000);
Insert into sales_schema.store_sales values (3,'product3',3000);
Insert into sales_schema.store_sales values (4,'product4',4000);

The following is the syntax to grant permission to the Redshift Serverless role:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]| ALL [ PRIVILEGES ] } ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] } TO role <IdP groupname>;

Grant relevant permission to the role as per your requirements. In the following example, we grant full permission to the role sales on sales_schema and only select permission on finance_schema to the role finance:

grant usage on schema sales_schema to role sales;
grant select on all tables in schema sales_schema to role sales;

grant usage on schema finance_schema to role finance;
grant select on all tables in schema finance_schema to role finance;

Federate to Redshift Serverless using Query Editor V2

The RedshiftDbRoles principal tag and DBGroups are both mechanisms that can be used to integrate with an IdP. However, federating with the RedshiftDbRoles principal has some clear advantages when it comes to connecting with an IdP because it provides automatic mapping between IdP groups and Amazon Redshift database roles. Overall, RedshiftDbRoles is more flexible, easier to manage, and more secure, making it the better option for integrating Amazon Redshift with your IdP.

Now you’re ready to connect to Redshift Serverless using the Query Editor V2 and federated login:

Use the SSO URL you collected earlier and log in to your Okta account with your user credentials. For this demo, we log in with user Ethan.
In the Query Editor v2, choose your Redshift Serverless instance (right-click) and choose Create connection.
For Authentication, select Federated user.
For Database, enter the database name you want to connect to.
Choose Create Connection.

User Ethan will be able to access sales_schema tables. If Ethan tries to access the tables in finance_schema, he will get a permission denied error.

Configure the SQL client (SQL Workbench/J)

To set up SQL Workbench/J, complete the following steps:

Create a new connection in SQL Workbench/J and choose Redshift Serverless as the driver.
Choose Manage drivers and add all the files from the downloaded AWS JDBC driver pack .zip file (remember to unzip the .zip file).
For Username and Password, enter the values that you set in Okta.
Capture the values for app_id, app_name, and idp_host from the Okta app embed link, which can be found on the General tab of your application.
Set the following extended properties:
- For app_id, enter the value from app embed link (for example, 0oa8p1o1RptSabT9abd0/avc8k7abc32lL4izh3b8).
- For app_name, enter the value from app embed link (for example, dev-123456_redshift_app_2).
- For idp_host, enter the value from app embed link (for example, dev-123456.okta.com).
- For plugin_name, enter com.amazon.redshift.plugin.OktaCredentialsProvider. The following screenshot shows the SQL Workbench/J extended properties.
  1. Choose OK.
  2. Choose Test from SQL Workbench/J to test the connection.
  3. When the connection is successful, choose OK.
  4. Choose OK to sign in with the users created.

User Ethan will be able to access the sales_schema tables. If Ethan tries to access the tables in the finance_schema, he will get a permission denied error.

Congratulations! You have federated with Redshift Serverless and Okta with SQL Workbench/J using RedshiftDbRoles.

[Optional] Implement MFA with SQL Client and Query Editor V2

Implementing MFA poses an additional challenge because the nature of multi-factor authentication is an asynchronous process between initiating the login (the first factor) and completing the login (the second factor). The SAML response will be returned to the appropriate listener in each scenario; the SQL Client or the AWS console in the case of QEV2. Depending on which login options you will be giving your users, you may need an additional Okta application. See below for the different scenarios:

If you are ONLY using QEV2 and not using any other SQL client, then you can use MFA with Query Editor V2 with the above application. There are no changes required in the custom SAML application which we have created above.
If you are NOT using QEV2 and only using third party SQL client (SQL Workbench/J etc), then you need to modify the above custom SAML app as mentioned below.
If you want to use QEV2 and third-party SQL Client with MFA, then you need create an additional custom SAML app as mentioned below.

Prerequisites for MFA

Each identity provider (IdP) has step for enabling and managing MFA for your users. In the case of Okta, see the following guides on how to enable MFA using the Okta Verify application and by defining an authentication policy.

Steps to create/update SAML application which supports MFA for a SQL Client

If creating a second app, follow all the steps which are described under section 1 (Create Okta SAML application).
Open the custom SAML app and select General.
Select Edit under SAML settings
Click Next in General Settings
Under General, update the Single sign-on URL to http://localhost:7890/redshift/
Select Next followed by Finish.

Below is the screenshot from the MFA App after making above changes:

Configure SQL Client for MFA

To set up SQL Workbench/J, complete the following steps:

Follow all the steps which are described under (Configure the SQL client (SQL Workbench/J))
Modify your connection updating the extended properties:
- login_url – Get the Single Sign-on URL as shown in section -Collect Okta information. (For example, https://dev-123456.okta.com/app/dev-123456_redshiftapp_2/abc8p6o5psS6xUhBJ517/sso/saml)
- plugin_name – com.amazon.redshift.plugin.BrowserSamlCredentialsProvider
Choose OK
Choose OK from SQL Workbench/J. You’re redirected to the browser to sign in with your Okta credentials.
After that, you will get prompt for MFA. Choose either Enter a code or Get a push notification.
Once authentication is successful, log in to be redirected to a page showing the connection as successful.
With this connection profile, run the following query to return federated user name.

Troubleshooting

If your connection didn’t work, consider the following:

Enable logging in the driver. For instructions, see Configure logging.
Make sure to use the latest Amazon Redshift JDBC driver version.
If you’re getting errors while setting up the application on Okta, make sure you have admin access.
If you can authenticate via the SQL client but get a permission issue or can’t see objects, grant the relevant permission to the role, as detailed earlier in this post.

Clean up

When you’re done testing the solution, clean up the resources to avoid incurring future charges:

Delete the Redshift Serverless instance by deleting both the workgroup and the namespace.
Delete the IAM roles, IAM IdPs, and IAM policies.

Conclusion

In this post, we provided step-by-step instructions to integrate Redshift Serverless with Okta using the Amazon Redshift Query Editor V2 and SQL Workbench/J with the help of federated IAM roles and automatic database-role mapping. You can use a similar setup with any other SQL client (such as DBeaver or DataGrip) or business intelligence tool (such as Tableau Desktop). We also showed how Okta group membership is mapped automatically with Redshift Serverless roles to use role-based authentication seamlessly.

For more information about Redshift Serverless single sign-on using database roles, see Defining database roles to grant to federated users in Amazon Redshift Serverless.

About the Authors

Mohamed Shaaban is a Senior Software Engineer in Amazon Redshift and is based in Berlin, Germany. He has over 12 years of experience in the software engineering. He is passionate about cloud services and building solutions that delight customers. Outside of work, he is an amateur photographer who loves to explore and capture unique moments.

Rajiv Gupta is Sr. Manager of Analytics Specialist Solutions Architects based out of Irvine, CA. He has 20+ years of experience building and managing teams who build data warehouse and business intelligence solutions.

Amol Mhatre is a Database Engineer in Amazon Redshift and works on Customer & Partner engagements. Prior to Amazon, he has worked on multiple projects involving Database & ERP implementations.

Ning Di is a Software Development Engineer at Amazon Redshift, driven by a genuine passion for exploring all aspects of technology.

Harsha Kesapragada is a Software Development Engineer for Amazon Redshift with a passion to build scalable and secure systems. In the past few years, he has been working on Redshift Datasharing, Security and Redshift Serverless.

Integrate Amazon Redshift row-level security with Amazon Redshift native IdP authentication

2022-07-20 Maneesh Sharma

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/integrate-amazon-redshift-row-level-security-with-amazon-redshift-native-idp-authentication/

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers.

As enterprise customers look to build their data warehouse on Amazon Redshift, they have business requirements to enforce adequate fine-grained access controls to govern who can access which rows of their sensitive data based on the user profiler. Furthermore, many organizations want to access Amazon Redshift using their existing identity provider (IdP) while maintaining these compliance and security requirements around their data. Without proper integrated features to enforce row-level and secure authentication, you may have to develop additional solutions such as views, or try to integrate third-party solutions around your data to enforce security.

With the introduction of row-level security in Amazon Redshift, you can restrict user access at the row level. Additionally, we have introduced a native IdP functionality to help you implement authentication and authorization with your choice of business intelligence (BI) tools in a seamless way.

Amazon Redshift row-level security (RLS) provides granular access control over your sensitive data. It does this by using RLS policies to determine which rows to return in the query result sets.

In this post, we walk you through an example on how you can implement row-level security in Amazon Redshift while using existing IdP credentials to simplify authentication and managing permissions. You can use this flexible solution to provide complete control over data access while maintaining authorization using your existing IdP.

Solution overview

For our use case, an organization requires row-level security to restrict access to sales performance data to specific states and their allocated salesperson. We have the following business rules and conditions:

Alice, the salesperson for NY, should have access to NY sales data only
Bob, the salesperson for CA, should get access to CA sales data only
Charlie, the sales manager for the North America region, should have access to sales data for all states
Jen, who belongs to HR department, shouldn’t have access to any sales data

The following diagram illustrates the solution architecture we implement to solve this problem statement using Amazon Redshift row-level security and Amazon Redshift native IdP authentication.

The solution contains the following steps:

Create RLS policies to provide fine-grained access control for row-level data on the Sales table.
Create Amazon Redshift roles for each of the different Azure AD groups and assign relevant permissions to the table.

With native IdP, roles get created automatically based on Azure groups. However, as a best practice, we’re pre-creating the Amazon Redshift roles and assigning relevant permissions.

Attach row-level security policies to the roles.
Configure a JDBC or ODBC driver in your SQL client to use Azure AD federation and use Azure AD login credentials to sign in.
Upon successful authentication, Azure AD issues an authentication token (OAuth token) back to the Amazon Redshift driver.
The driver forwards the authentication token to the Amazon Redshift cluster to initiate a new database session. Amazon Redshift verifies and validates the authentication token.
Amazon Redshift calls the Azure Graph API to obtain the user’s group membership.
Amazon Redshift maps the logged-in Azure AD user to the Amazon Redshift user and maps the Azure AD groups to Amazon Redshift roles.
The Amazon Redshift roles are pre-mapped with the RLS policies mentioned in step 3. This allows the respective users to query the fine-grained row-level access data from the client.

Prerequisites

To implement this solution, you must have the following prerequisites:

An AWS account. If you don’t have one, you can sign up for one.
An Amazon Redshift Serverless or Amazon Redshift provisioned cluster. For setup instructions, refer to Setting up Amazon Redshift Serverless for the first time or Create a sample Amazon Redshift cluster.
A Microsoft Azure account that has an active subscription. You need to be an admin role to set up the application on Azure AD.
Power BI Desktop version 2.102.683.0 64-bit or higher version downloaded and installed.
The latest Amazon Redshift JDBC SDK driver-dependent libraries downloaded and the Amazon Redshift JDBC JAR .zip folder unzipped. Note that all prior Amazon Redshift driver versions before JDBC version 2.1.0.4 don’t support native IdP.
Any SQL client. For this post, we use SQL Workbench/J.

Implement your Amazon Redshift native IdP

To set up your Amazon Redshift native IdP setup, refer to Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client. Follow the steps to set up your Azure application and collect Azure AD information for the Amazon Redshift IdP.

For this post, we have created the following four groups in Azure AD:

sales_ny
sales_ca
sales_manager
hr_group

Then we created the following four users in Azure AD:

Alice – The salesperson in NY state
Bob – The salesperson in CA state
Charlie – The manager for the North America region
Jen – A member of the HR group

Add the respective users to their appropriate group:

Alice – sales_ny
Bob – sales_ca
Charlie – sales_manager
Jen – HR

Next, we need to register the IdP in Amazon Redshift using the following command:

CREATE IDENTITY PROVIDER rls_idp TYPE
azure NAMESPACE 'aad'
PARAMETERS '{
"issuer":"https://sts.windows.net/87f4aa26-78b7-410e-bf29-57b39929ef9a/",
"audience":["https://analysis.windows.net/powerbi/connector/AmazonRedshift",
"api://991abc78-78ab-4ad8-a123-zf123ab03612p"],
"client_id":"123ab555-a321-666d-7890-11a123a44890",
"client_secret":"KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB" }'
;

In the preceding statement, the type azure indicates that the provider specifically facilitates communication with Microsoft Azure AD. We use the following parameters to collect Azure AD information (for more information, refer to Collect Azure AD Information in Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client).

issuer – The issuer ID to trust when a token is received. The unique identifier for the tenant_id is appended to the issuer.
client_id – The unique public identifier of the application registered with the IdP. This can be referred to as the application ID.
client_secret – A secret identifier, or password, known only to the IdP and the registered application.
audience – The application ID that is assigned to the application in Azure. For this post, we connect with Amazon Redshift using Power BI Desktop and SQL Workbench/J. The audience value is hardcoded for Power BI desktop, for example https://analysis.windows.net/powerbi/connector/AmazonRedshift . The second audience value is for the SQL client, which you get from the application ID URI in the OAuth application. For example, api://991abc78-78ab-4ad8-a123-zf123ab03612p.

Use the following command to view the registered IdP on Amazon Redshift:

DESC IDENTITY PROVIDER rls_idp;

Use the following command to view all the IdPs registered:

select * from svv_identity_providers;

The following Sales table contains information about each salesperson, the respective state they cover, and their total sales amount:

CREATE TABLE SALES (sales_person VARCHAR(30), state CHAR(2), "total_sales" INT);
INSERT INTO SALES VALUES ('Alice', 'NY', 5000);
INSERT INTO SALES VALUES ('Bob', 'CA', 6000);
INSERT INTO SALES VALUES ('Sally', 'IL', 7000);

Now we create four roles in the Amazon Redshift cluster based on the groups that we created on the Azure AD portal and assign relevant permissions to them. This simplifies administration by assigning different permissions to different roles and assigning them to different users.

The role name in the Amazon Redshift cluster looks like <namespace>:<azure_ad_group_name>, where the namespace is the one we provided in the IdP creation command (aad) and the group name is the Azure AD group. See the following code:

CREATE ROLE "aad:sales_ny";
CREATE ROLE "aad:sales_ca";
CREATE ROLE "aad:sales_manager";
CREATE ROLE "aad:hr";

Now we grant permission to the Amazon Redshift role on the appropriate tables. For this post, we assign SELECT permission on the Sales table for all four roles:

GRANT SELECT ON TABLE SALES TO ROLE "aad:sales_ny";
GRANT SELECT ON TABLE SALES TO ROLE "aad:sales_ca";
GRANT SELECT ON TABLE SALES TO ROLE "aad:sales_manager";
GRANT SELECT ON TABLE SALES TO ROLE "aad:hr";

Use the following command to view all the roles in the cluster:

select * from svv_roles;

Create a row-level security policy

Let’s enforce an RLS policy on the Sales table to restrict access to sales performance information for a salesperson specific to a particular state. We create the following policy:

CREATE RLS POLICY policy_sales_ny
WITH (state char(2))
USING (state = 'NY');
CREATE RLS POLICY policy_sales_ca
WITH (state char(2))
USING (state = 'CA');

The sales manager is also required to view sales across the North American region. For this, we create the following policy:

CREATE RLS POLICY policy_sales_all
USING (true);

The policy_sales_all policy allows the sales manager to view all the information in the sales table.

Attach the row-level security policy to roles

Now we have to attach the row-level security policies to their respective Amazon Redshift roles so that when the user logs in using their Amazon Redshift native IdP, they can get fine-grained access to the records.

ATTACH RLS POLICY policy_sales_ny ON public.sales TO ROLE "aad:sales_ny";
ATTACH RLS POLICY policy_sales_ca ON public.sales TO ROLE "aad:sales_ca";
ATTACH RLS POLICY policy_sales_all ON public.sales TO ROLE "aad:sales_manager";

For the HR role, we haven’t created or attached any RLS policy because we don’t want any user from the HR group to get access to sales records.

Enable row-level security on the table

Now let’s enable row-level security on the respective tables. In this demo, we enable the RLS policy on the Sales table using the following command:

ALTER TABLE public.sales ROW LEVEL SECURITY ON;

Use the following command to view the RLS policies:

SELECT * FROM svv_rls_attached_policy;

Test row-level security using Power BI Desktop

In this example, we use Microsoft Power BI Desktop to connect with Amazon Redshift using a native IdP. For this solution, use Microsoft Power BI Desktop- Version: 2.102.683.0 64-bit and above.

In your Microsoft Power BI Desktop, choose Get data.

Search for the Amazon Redshift connector, choose it, and choose Connect.

For Server, enter your Amazon Redshift cluster’s endpoint. For example: test-cluster.ct4abcufthff.us-east-1.redshift.amazonaws.com.
For Database, enter your database name (for this post, we enter dev).
Choose OK.

Choose Microsoft Account.

Choose Sign in.

Enter your Microsoft Account credentials in the authorization dialog. For this example, we sign in with user Alice.

Choose Next.

Once connected, you will see the message “You are currently signed in.”

Choose Connect.

As shown in the following screenshot, Azure AD user Alice is able to authenticate using an Amazon Redshift native IdP, and the RLS policies were applied automatically, allowing Alice to access sales performance information for only NY state.

Similarly, we can try signing in as user Bob and see only CA state information.

Charlie belongs to the manager role where the view all policy has been applied, so when he signs in, he is able to view all the rows in the sales table.

Finally, when Jen signs in, she can access the table, but isn’t able to view any sales records because no RLS policy has been attached to the HR role.

If we haven’t granted SELECT on the sales table to the role aad:hr, which Jen belongs to, then she can’t access the sales table.

Test row-level security using SQL Workbench/J

Now we test row-level security with an Amazon Redshift native IdP using SQL Workbench/J.

Create a new connection in SQL Workbench/J and choose Amazon Redshift as the driver.
Choose Manage drivers and add all the files from the downloaded AWS JDBC driver pack .zip file. (Remember to unzip the file.)

Make sure to use the Amazon Redshift driver 2.1.0.4 onwards, because all previous Amazon Redshift driver versions don’t support the Amazon Redshift native IDP feature.

For URL, enter jdbc:redshift://<cluster endpoint>:<port>:<databasename>. For example: jdbc:redshift://test-cluster.ab6yejheyhgf.us-east-1.redshift.amazonaws.com:5439/dev.

On the Driver properties tab, add the following properties:
1. plugin_name – com.amazon.redshift.plugin.BrowserAzureOAuth2CredentialsProvider
2. listen_port – 7890
3. idp_response_timeout – 50
4. scope – Enter the scope value from the OAuth application. For example, api://991abc78-78ab-4ad8-a123-zf123ab03612p/jdbc_login.
5. client_id – Enter the client_id value from the OAuth application. For example, 991abc78-78ab-4ad8-a123-zf123ab03612p.
6. idp_tenant – Enter the tenant ID value from the OAuth application. For example, 87f4aa26-78b7-410e-bf29-57b39929ef9a.

Choose OK from SQL Workbench/J.

You’re redirected to the browser to sign in with your Azure AD credentials.

As shown in the following screenshot, Azure AD user Alice is able to authenticate using an Amazon Redshift native IdP and view only sales performance information for NY state.

Similarly, we can re-authenticate and sign in as user Bob, who is able to view sales information specific to CA state.

When Charlie signs in, he is able to view all the rows from every state.

Finally, when Jen signs in, she is able to access the table, but can’t view any sales records because no RLS policy has been attached to the HR role.

If we haven’t granted SELECT on the sales table to the role aad:hr, which Jen belongs to, then Jen can’t access the sales table.

Summary

In this post, we covered how you can achieve a secure end-to-end experience using Amazon Redshift native IdP authentication, which simplifies administration and row-level security to enable fine-grained row-level access in Amazon Redshift.

For more information about Amazon Redshift row-level security and native IdP federation, refer to:

About the authors

Harshida Patel is a Specialist Sr. Solutions Architect, Analytics, with AWS.

Kiran Chinta is a Software Development Manager at Amazon Redshift. He leads a strong team in query processing, SQL language, data security, and performance. Kiran is passionate about delivering products that seamlessly integrate with customers’ business applications with the right ease of use and performance. In his spare time, he enjoys reading and playing tennis.

Debu Panda is a Senior Manager, Product Management, with AWS. He is an industry leader in analytics, application platforms, and database technologies, and has more than 25 years of experience in the IT world. Debu has published numerous articles on analytics, enterprise Java, and databases, and has presented at multiple conferences such as AWS re:Invent, Oracle Open World, and Java One. He is lead author of the EJB 3 in Action (Manning Publications 2007, 2014) and Middleware Management (Packt).

Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client

2022-04-11 Maneesh Sharma

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/integrate-amazon-redshift-native-idp-federation-with-microsoft-azure-ad-using-a-sql-client/

Amazon Redshift accelerates your time to insights with fast, easy, and secure cloud data warehousing at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries.

The new Amazon Redshift native identity provider authentication simplifies administration by sharing identity and group membership information to Amazon Redshift from a third-party identity provider (IdP) service, such as Microsoft Azure Active Directory (Azure AD), and enabling Amazon Redshift to natively process third-party tokens, identities, and group permissions. This process is very easy to set up, provides a secure and smoother customer experience for managing identities and groups at a centralized external IdP, and integrates natively with Amazon Redshift.

In this post, we focus on Microsoft Azure AD as the IdP and provide step-by-step guidance to connect SQL clients like SQL Workbench/J and DBeaver with Amazon Redshift using a native IdP process. Azure AD manages the users and provides federated access to Amazon Redshift. You don’t need to create separate Amazon Redshift database users, AWS Identity and Access Management (IAM) roles, or IAM policies with this setup.

Solution overview

Using an Amazon Redshift native IdP has the following benefits:

Enables your users to be automatically signed in to Amazon Redshift with their Azure AD accounts
You can manage users and groups from a centralized IdP
External users can securely access Amazon Redshift without manually creating new user names or roles using their existing corporate directory credentials
External user group memberships are natively mirrored with Amazon Redshift roles and users

The following diagram illustrates the architecture of a native IdP for Amazon Redshift:

The workflow contains the following steps:

You configure a JDBC or ODBC driver in your SQL client to use Azure AD federation and use Azure AD login credentials to sign in.
Upon a successful authentication, Azure AD issues an authentication token (OAuth token) back to the Amazon Redshift driver.
The driver forwards the authentication token to the Amazon Redshift cluster to initiate a new database session.
Amazon Redshift verifies and validates the authentication token.
Amazon Redshift calls the Azure Graph API to obtain the user’s group membership.
Amazon Redshift maps the logged-in Azure AD user to the Amazon Redshift user and maps the Azure AD groups to Amazon Redshift roles. If the user and groups don’t exist, Amazon Redshift automatically creates those identities within the IdP namespace.

To implement the solution, you complete the following high-level steps:

Set up your Azure application.
1. Create OAuth Application
2. Create Redshift Client application
3. Create Azure AD Group
Collect Azure AD information for the Amazon Redshift IdP.
Set up the IdP on Amazon Redshift.
Set up Amazon Redshift permissions to external identities.
Configure the SQL client (for this post, we use SQL Workbench/J and DBeaver).

Prerequisites

You need the following prerequisites to set up this solution:

An AWS account. If you don’t have one, you can sign up for one.
An Amazon Redshift cluster. For setup instructions, see Create a sample Amazon Redshift cluster.
Download Amazon Redshift JDBC SDK driver-dependent libraries 2.1.0.4 onwards and unzip the Amazon Redshift JDBC JAR zipped folder. All prior Amazon Redshift driver versions don’t support native IdP.
A Azure account that has an active subscription. You need an admin role to set up the application on Azure AD.
A SQL client such as SQL Workbench/J or DBeaver.

Set up your Azure application

For integrating with any SQL client/BI tool except Microsoft Power BI, we would be creating two applications. First application will be used to authenticate the user and provide a login token. Second application will be used by Redshift to retrieve user and group information.

Step 1: Create OAuth Application

Sign in to the Azure portal with your Microsoft account.
Navigate to the Azure Active Directory application.
Under Manage in the navigation pane, choose App registrations and then choose New registration.
For Name, enter a name (for example, oauth_application).
For Redirect URI, choose Public client/native (mobile and desktop) and enter the redirect URL http://localhost:7890/redshift/. For this post, we are keeping the default settings for the rest of the fields.
Choose Register.
In the navigation pane, under Manage, choose Expose an API.

If you’re setting up for the first time, you can see Set to the right of Application ID URI.

Choose Set and then choose Save.
After the application ID URI is set up, choose Add a scope.
For Scope name, enter a name (for example, jdbc_login).
For Admin consent display name, enter a display name (for example, JDBC login).
For Admin consent description, enter a description of the scope.
Choose Add scope.
After the scope is added, note down the application ID URI (for example, api://991abc78-78ab-4ad8-a123-zf123ab03612p) and API scope (api://991abc78-78ab-4ad8-a123-zf123ab03612p/jdbc_login) in order to register the IdP in Amazon Redshift later.

The application ID URI is known as <Microsoft_Azure_Application_ID_URI> in the following section.

The API scope is known as <Microsoft_Azure_API_scope_value> when setting up the SQL client such as DBeaver and SQL Workbench/J.

Step 2. Create Redshift Client Application

Navigate to the Azure Active Directory application.
Under Manage in the navigation pane, choose App registrations and then choose New registration.
For Name, enter a name (for example, redshift_client). For this post, we are keeping the default settings for the rest of the fields.
Choose Register.
On the newly created application Overview page, locate the client ID and tenant ID and note down these IDs in order to register the IdP in Amazon Redshift later.
In the navigation pane, choose Certificates & secrets.
Choose New client secret.
Enter a Description, select an expiration for the secret or specify a custom lifetime. We are keeping Microsoft recommended default expiration value of 6 months. Choose Add.
Copy the secret value.

It would only be present one time and after that you cannot read it.

In the navigation pane, choose API permissions.
Choose Add a permission and choose Microsoft Graph.
Choose Application permissions.
Search the directory and select the Directory.Read.All permission.
Choose Add permissions.
After the permission is created, choose Grant admin consent.
In the pop-up box, choose Yes to grant the admin consent.

The status for the permission shows as Granted for with a green check mark.

Step 3. Create Azure AD Group

On the Azure AD home page, under Manage, choose Groups.
Choose New group.
In the New Group section, provide the required information.
Choose No members selected and then search for the members.
Select the members and choose Select. For this example, you can search your username and click select.

You can see the number of members in the Members section.

Choose Create.

Collect Azure AD information

Before we collect the Azure AD information, we need to identify the access token version from the application which you have created earlier on the Azure portal under Step 1. Create OAuth Application. In the navigation pane, under Manage, choose Manifest section, then view the accessTokenAcceptedVersion parameter: null and 1 indicate v1.0 tokens, and 2 indicates v2.0 tokens.

To configure your IdP in Amazon Redshift, collect the following parameters from Azure AD. If you don’t have these parameters, contact your Azure admin.

issuer – This is known as <Microsoft_Azure_issuer_value> in the following sections. If you’re using the v1.0 token, use https://sts.windows.net/<Microsoft_Azure_tenantid_value>/. If you’re using the v2.0 token, use https://login.microsoftonline.com/<Microsoft_Azure_tenantid_value>/v2.0. To find your Azure tenant ID, complete the following steps:
- Sign in to the Azure portal with your Microsoft account.
- Under Manage, choose App registrations.
- Choose any application which you have created in previous sections.
- Click on the Overview (left panel) page and it’s listed in the Essentials section as Directory (tenant) ID.
client_id – This is known as <Microsoft_Azure_clientid_value> in the following sections. An example of a client ID is 123ab555-a321-666d-7890-11a123a44890). To get your client ID value, locate the application you created earlier on the Azure portal under Step 2. Create Redshift Client Application. Click on the Overview (left panel) page and it’s listed in the Essentials section.
client_secret – This is known as <Microsoft_Azure_client_secret_value> in the following sections. An example of a client secret value is KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB). To create your client secret value, refer to the section under Step 2. Create Redshift Client Application.
audience – This is known as <Microsoft_Azure_token_audience_value> in the following sections. If you’re using a v1.0 token, the audience value is the application ID URI (for example, api://991abc78-78ab-4ad8-a123-zf123ab03612p). If you’re using a v2.0 token, the audience value is the client ID value (for example, 991abc78-78ab-4ad8-a123-zf123ab03612p). To get these values, please refer to the application which you have created in Step 1: Create OAuth Application. Click on the Overview (left panel) page and it’s listed in the Essentials section.

Set up the IdP on Amazon Redshift

To set up the IdP on Amazon Redshift, complete the following steps:

Run the following SQL:

CREATE IDENTITY PROVIDER <idp_name> TYPE azure 
NAMESPACE '<namespace_name>' 
PARAMETERS '{ 
"issuer":"<Microsoft_Azure_issuer_value>", 
"audience":["<Microsoft_Azure_token_audience_value>"],
"client_id":"<Microsoft_Azure_clientid_value>", 
"client_secret":"<Microsoft_Azure_client_secret_value>"
}';

For example, the following code uses a v1.0 access token:

CREATE IDENTITY PROVIDER test_idp TYPE 
azure NAMESPACE 'oauth_aad' 
PARAMETERS '{
"issuer":"https://sts.windows.net/87f4aa26-78b7-410e-bf29-57b39929ef9a/", 
"audience":["api://991abc78-78ab-4ad8-a123-zf123ab03612p"],
"client_id":"123ab555-a321-666d-7890-11a123a44890", 
"client_secret":"KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB"
}';

The following code uses a v2.0 access token:

CREATE IDENTITY PROVIDER test_idp TYPE 
azure NAMESPACE 'oauth_aad' 
PARAMETERS '{
"issuer":
"https://login.microsoftonline.com/87f4aa26-78b7-410e-bf29-57b39929ef9a/v2.0",
"audience":["991abc78-78ab-4ad8-a123-zf123ab03612p"], 
"client_id":"123ab555-a321-666d-7890-11a123a44890", 
"client_secret":"KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB" 
}';

To alter the IdP, use the following command (this new set of parameter values completely replaces the current values):

ALTER IDENTITY PROVIDER <idp_name> PARAMETERS 
'{
"issuer":"<Microsoft_Azure_issuer_value>/",
"audience":["<Microsoft_Azure_token_audience_value>"], 
"client_id":"<Microsoft_Azure_clientid_value>", 
"client_secret":"<Microsoft_Azure_client_secret_value>"
}';

To view a single registered IdP in the cluster, use the following code:
```
DESC IDENTITY PROVIDER <idp_name>;
```
To view all registered IdPs in the cluster, use the following code:
```
select * from svv_identity_providers;
```

To drop the IdP, use the following command:

DROP IDENTITY PROVIDER <idp_name> [CASCADE];

Set up Amazon Redshift permissions to external identities

The users, roles, and role assignments are automatically created in your Amazon Redshift cluster during the first login using your native IdP unless they were manually created earlier.

Create and assign permission to Amazon Redshift roles

In this step, we create a role in the Amazon Redshift cluster based on the groups that you created on the Azure AD portal. This helps us avoid creating multiple user names manually on the Amazon Redshift side and assign permissions for multiple users individually.

The role name in the Amazon Redshift cluster looks like <namespace>:<azure_ad_group_name>, where the namespace is the one we provided in the IdP creation command and the group name is the one we specified when we were setting up the Azure application. In our example, it’s oauth_aad:rsgroup.

Run the following command in the Amazon Redshift cluster to create a role:

create role "<namespace_name>:<Azure AD groupname>";

For example:

create role "oauth_aad:rsgroup";

To grant permission to the Amazon Redshift role, enter the following command:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]
 | ALL [ PRIVILEGES ] }
ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
TO role "<namespace_name>:<Azure AD groupname>";

Then grant relevant permission to the role as per your requirement. For example:

grant select on all tables in schema public to role "oauth_aad:rsgroup";

Create and assign permission to an Amazon Redshift user

This step is only required if you want to grant permission to an Amazon Redshift user instead of roles. We create an Amazon Redshift user that maps to a Azure AD user and then grant permission to it. If you don’t want to explicitly assign permission to an Amazon Redshift user, you can skip this step.

To create the user, use the following syntax:

CREATE USER "<namespace_name>:<Azure AD username>" PASSWORD DISABLE;

For example:

CREATE USER "oauth_aad:[email protected]" PASSWORD DISABLE;

We use the following syntax to grant permission to the Amazon Redshift user:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]
 | ALL [ PRIVILEGES ] }
ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
TO "<namespace_name>:<Azure AD username>";

For example:

grant select on all tables in schema public to "oauth_aad:[email protected]";

Configure the SQL client

In this section, we provide instructions to set up a SQL client using either DBeaver or SQL Workbench/J.

Set up DBeaver

To set up DBeaver, complete the following steps:

Go to Database and choose Driver Manager.
Search for Redshift, then choose it and choose Copy.
On the Settings tab, for Driver name, enter a name, such as Redshift Native IDP.
Update the URL template to jdbc:redshift://{host}:{port}/{database}?plugin_name=com.amazon.redshift.plugin.BrowserAzureOAuth2CredentialsProvider
Note: In this URL template, do not replace the template parameters with the actual values. Please keep the value as shown in screenshot below.
On the Libraries tab, choose Add files. Keep only one set of the latest driver version (2.1.0.4 and upwards) and if you see any older versions, delete those files.
Add all the files from the downloaded AWS JDBC driver pack .zip file and choose OK (remember to unzip the .zip file).

Note: Use the Amazon Redshift driver 2.1.0.4 onwards, because all previous Amazon Redshift driver versions don’t support the Amazon Redshift native IdP feature.

Close the Driver Manager.
Go to Database and choose New Database Connection.
Search for Redshift Native IDP, then choose it and choose Next.
For Host/Instance, enter your Amazon Redshift endpoint. For e.g. test-cluster.ab6yejheyhgf.us-east-1.redshift.amazonaws.com.
For Database, enter the database name (for this post, we use dev).
For Port, enter 5439.
Please get the below parameter values (scope, client_id and idp_tenant) from the application which you have created in Step 1: Create OAuth Application. On the Driver properties tab, add the following properties:
1. listen_port – 7890
2. idp_response_timeout – 50
3. scope – Enter the value for <Microsoft_Azure_API_scope_value>.
  - If you’re using a v1.0 token, then use the scope value (for example, api://991abc78-78ab-4ad8-a123-zf123ab03612p/jdbc_login).
  - If you’re using a v2.0 token, the scope value is the client ID value (for example, 991abc78-78ab-4ad8-a123-zf123ab03612p).
4. client_id – Enter the value for <Microsoft_Azure_clientid_value>. For example, 991abc78-78ab-4ad8-a123-zf123ab03612p.
5. idp_tenant – Enter the value for <Microsoft_Azure_tenantid_value>. For example, 87f4aa26-78b7-410e-bf29-57b39929ef9a.
You can verify the connection by choosing Test Connection.

You’re redirected to the browser to sign in with your Azure AD credentials. In case, you get SSL related error, then go to SSL tab and select Use SSL

Log in to be redirected to a page showing the connection as successful.
Choose Ok.

Congratulations! You have completed the Amazon Redshift native IdP setup with DBeaver.

Set up SQL Workbench/J

To set up SQL Workbench/J, complete the following steps:

Create a new connection in SQL Workbench/J and choose Amazon Redshift as the driver.
Choose Manage drivers and add all the files from the downloaded AWS JDBC driver pack .zip file (remember to unzip the .zip file).

Use the Amazon Redshift driver 2.1.0.4 onwards, because all previous Amazon Redshift driver versions don’t support the Amazon Redshift native IdP feature.

For URL, enter jdbc:redshift://<cluster endpoint>:<port>:<databasename>. For e.g., jdbc:redshift://test-cluster.ab6yejheyhgf.us-east-1.redshift.amazonaws.com:5439/dev.
Please get the below parameter values (scope, client_id and idp_tenant) from the application which you have created in Step 1: Create OAuth Application. On the Driver properties tab, add the following properties:
1. plugin_name – com.amazon.redshift.plugin.BrowserAzureOAuth2CredentialsProvider
2. listen_port – 7890
3. idp_response_timeout – 50
4. scope – Enter the value for <Microsoft_Azure_API_scope_value>.
  - If you’re using a v1.0 token, then use the scope value (for example, api://991abc78-78ab-4ad8-a123-zf123ab03612p/jdbc_login).
  - If you’re using a v2.0 token, the scope value is the client ID value (for example, 991abc78-78ab-4ad8-a123-zf123ab03612p).
5. client_id – Enter the value for <Microsoft_Azure_clientid_value>. For example, 991abc78-78ab-4ad8-a123-zf123ab03612p.
6. idp_tenant – Enter the value for <Microsoft_Azure_tenantid_value>. For example, 87f4aa26-78b7-410e-bf29-57b39929ef9a.
Choose OK.
Choose Test from SQL Workbench/J.

You’re redirected to the browser to sign in with your Azure AD credentials.

Log in to be redirected to a page showing the connection as successful.
Choose Finish.
With this connection profile, run the following query to test Amazon Redshift native IdP authentication.

Congratulations! You have completed the Amazon Redshift native IdP setup with SQL Workbench/J.

Best Practices with Redshift native IdP:

Pre-create the Amazon Redshift roles based upon the groups which you have created on the Azure AD portal.
Assign permissions to Redshift roles instead of assigning to each individual external user. This will provide smoother end user experience as user will have all the required permission when they login using native IdP.

Troubleshooting

If your connection didn’t work, consider the following:

Enable logging in the driver. For instructions, see Configure logging.
Make sure to use the latest Amazon Redshift JDBC driver version 2.1.0.4 onwards, which supports Amazon Redshift native IdP authentication.
If you’re getting errors while setting up the application on Azure AD, make sure you have admin access.
If you can authenticate via the SQL client but get a permission issue or can’t see objects, grant the relevant permission to the role, as detailed earlier in this post.
If you get the error “claim value does not match expected value,” make sure you provided the correct parameters during Amazon Redshift IdP registration.
Check stl_error or stl_connection_log views on the Amazon Redshift cluster for authentication failures.

Conclusion

In this post, we provided step-by-step instructions to integrate Amazon Redshift with Azure AD and SQL clients (SQL Workbench/J and DBeaver) using Amazon Redshift native IdP authentication. We also showed how Azure group membership is mapped automatically with Amazon Redshift roles and how to set up Amazon Redshift permissions.

For more information about Amazon Redshift native IdP federation, see:

AWS Documentation: Native identity provider (IdP) federation for Amazon Redshift
AWS Blog to connect with Microsoft Power BI: Integrate Amazon Redshift native IdP federation with Microsoft Azure AD and Power BI.

About the Authors

Ilesh Garish is a Software Development Engineer at AWS. His role is to develop connectors for Amazon Redshift. Prior to AWS, he built database drivers for the Oracle RDBMS, TigerLogic XDMS, and OpenAccess SDK. He also worked in the database internal technologies at San Francisco Bay Area startups.

Dengfeng(Davis) Li is a Software Development Engineer at AWS. His passion is creating ease-of-use, secure and scalable applications. In the past few years, he worked on Redshift security, data sharing and catalog optimization.

Integrate Amazon Redshift native IdP federation with Microsoft Azure AD and Power BI

2022-04-07 Maneesh Sharma

Post Syndicated from Maneesh Sharma original https://aws.amazon.com/blogs/big-data/integrate-amazon-redshift-native-idp-federation-with-microsoft-azure-ad-and-power-bi/

As enterprise customers look to build their data warehouse on Amazon Redshift, they have many integration needs with the business intelligence (BI) tools they’re using. For customers who want to integrate Amazon Redshift with their existing identity provider (IdP) such as Microsoft Azure Active Directory (Azure AD) using BI tools and services such as Power BI Desktop and Power BI service, we have introduced a native IdP for Amazon Redshift to help you implement authentication and authorization for these tools in a seamless way.

Amazon Redshift native IdP simplifies the administration process of managing identities and permission. This feature provides native integration with Microsoft Azure AD, which you can use for authentication and authorization with tools like Power BI. It uses your existing IdP to simplify authentication and managing permissions. It does this by making it possible to share identity metadata to Amazon Redshift from your IdP. In this approach, an external IdP (such as Azure AD) issues an access token, which is passed to Amazon Redshift via a client, and then Amazon Redshift performs the token validation and claim extraction natively.

This post shows a step-by-step implementation of the Amazon Redshift native IdP setup with Azure AD, which demonstrates how to manage users and groups with an organizational directory, and how to federate into Amazon Redshift. You don’t need to create AWS Identity and Access Management (IAM) roles, policies, separate database users, or groups in Amazon Redshift with this setup.

Solution overview

Using an Amazon Redshift native IdP has the following benefits:

You can manage users and groups from a centralized IdP
Enables your users to be automatically signed in to Amazon Redshift with their Azure AD accounts
You can automatically create Amazon Redshift roles with a namespace that represents external groups (such as Azure AD groups)
External user group membership is natively mirrored with Amazon Redshift roles and users

The general configuration steps of the Amazon Redshift native IdP approach are as follows:

Register an application in Azure AD and set up groups.
Collect Azure AD information for the Amazon Redshift IdP.
Set up the IdP on Amazon Redshift.
Set up Amazon Redshift permissions to external identities.
Configure the client connection.

The following diagram illustrates the resulting solution.

To get authorized, the Power BI client sends an authentication request to the Azure enterprise application using Azure AD credentials. After verification, Azure sends a JSON web token (OAuth token) to the Power BI application. The Power BI application forwards the connection string with the OAuth token to Amazon Redshift. Amazon Redshift parses and validates the token, and requests group information from Azure AD. Upon reception, Amazon Redshift automatically creates the user and roles, and does the respective mapping.

Prerequisites

You need the following prerequisites to set up this solution:

A Microsoft Azure account that has an active subscription. You need to an admin role to set up the application on Azure AD.
Power BI Desktop version 2.102.683.0 64-bit and above downloaded and installed. In this example, we have use a Windows environment.
The latest version of the Microsoft Enterprise/Standard Gateway installed.
An AWS account with an Amazon Redshift cluster. In this post, we connect Power BI Desktop and service with a publicly accessible Amazon Redshift cluster.

Register an application in Azure AD and set up groups

To set up the Azure application and group permission, complete the following steps:

Sign in to the Azure portal with your Microsoft account.
Navigate to the Azure Active Directory application.
Under Manage, choose App registrations and New registration.
For Name, enter an application name (for example, nativeoauthsetup).
Keep the default settings for the rest of the fields.
Choose Register to complete the initial application registration.
On the newly created application Overview page, locate the client ID and tenant ID and note down these IDs in order to register the IdP in Amazon Redshift later.
Under Manage in the navigation pane, choose API permissions.
Choose Add a permission.
Choose Microsoft Graph and then choose Application permissions.
Search for directory and select the Directory.Read.All permission.
Choose Add permissions.
Choose Grant admin consent.
In the popup box, choose Yes to grant the admin consent.

The status of the permission shows Granted for with a green check mark.

Under Manage in the navigation pane, choose Certificates & secrets.
Choose Client secrets and choose New client secret.
Enter a Description, select an expiration for the secret or specify a custom lifetime. We are keeping Microsoft recommended default expiration value of 6 months. Choose Add.
Copy the secret value.

It would only be present one time and after that you cannot read it.

On the Azure AD home page, under Manage in the navigation pane, choose Groups.
Choose New group.
In the New Group section, provide the required information.
Choose No members selected and then search for the members.
Select your members and choose Select. For this example, you can search your username and click select.

You can see the number of members in the Members section.

Choose Create.

Collect Azure AD Information for Amazon Redshift IdP

Before we collect the Azure AD information, we need to identify the access token version from the application which you have created earlier. In the navigation pane, under Manage, choose Manifest section, then view the accessTokenAcceptedVersion parameter: null and 1 indicate v1.0 tokens, and 2 indicates v2.0 tokens.

To configure your IdP in Amazon Redshift, collect the following parameters from Azure AD. If don’t have these parameters, contact your Azure admin.

issuer – This is known as <Microsoft_Azure_issuer_value>. If you’re using the v1.0 token, use https://sts.windows.net/<Microsoft_Azure_tenantid_value>/. Currently, Power BI only uses v1.0 token. If you’re using the v2.0 token, use https://login.microsoftonline.com/<Microsoft_Azure_tenantid_value>/v2.0. To find your Microsoft Azure tenant ID, complete the following steps:
- Sign in to the Azure portal with your Microsoft account.
- Under Manage, choose App registrations.
- Choose the Amazon Redshift application you created earlier.
- Click on the Overview (left panel) page and under Essentials, note down the values.

client_id – This is known as <Microsoft_Azure_clientid_value> in the following sections. An example of a client ID is 5ab12345-1234-1a12-123a-11abc1a12ab1). To get your client ID value, locate the Amazon Redshift application you created earlier on the Azure portal; it’s listed in the Essentials section.
client_secret – This is known as <Microsoft_Azure_client_secret_value> in the following sections. An example of a client secret value is KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB). To create your client secret value, refer to the steps in the previous section.
audience – This is known as <Microsoft_Azure_token_audience_value> in the following sections. With Power BI Desktop, you need to set the audience value as https://analysis.windows.net/powerbi/connector/AmazonRedshift.

Set up the IdP on Amazon Redshift

To set up the IdP on Amazon Redshift, complete the following steps:

Run the following SQL:

CREATE IDENTITY PROVIDER <idp_name> TYPE azure 
NAMESPACE '<namespace_name>' 
PARAMETERS '{ 
"issuer":"<Microsoft_Azure_issuer_value>", 
"audience":["<Microsoft_Azure_token_audience_value>"],
"client_id":"<Microsoft_Azure_clientid_value>", 
"client_secret":"<Microsoft_Azure_client_secret_value>"
}';

In our example, we use the v1.0 token issuer because as of this writing, Power BI only uses the v1.0 token:

CREATE IDENTITY PROVIDER oauth_standard TYPE azure
NAMESPACE 'aad'
PARAMETERS '{
"issuer":"https://sts.windows.net/e12b1bb1-1234-12ab-abc1-1ab012345a12/",
"audience":["https://analysis.windows.net/powerbi/connector/AmazonRedshift"],
"client_id":"5ab12345-1234-1a12-123a-11abc1a12ab1",
"client_secret":"KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB"
}'

To alter the IdP, use the following command (this new set of parameter values completely replaces the current values):

ALTER IDENTITY PROVIDER <idp_name> PARAMETERS 
'{
"issuer":"<Microsoft_Azure_issuer_value>",
"audience":["<Microsoft_Azure_token_audience_value>"], 
"client_id":"<Microsoft_Azure_clientid_value>", 
"client_secret":"<Microsoft_Azure_client_secret_value>"
}';

To view a single registered IdP in the cluster, use the following code:
```
DESC IDENTITY PROVIDER <idp_name>;
```
To view all registered IdPs in the cluster, use the following code:
```
select * from svv_identity_providers;
```

To drop the IdP, use the following command:

DROP IDENTITY PROVIDER <idp_name> [CASCADE];

Set up Amazon Redshift permissions to external identities

The users, roles, and role assignments are automatically created in your Amazon Redshift cluster during the first login using your native IdP unless they were manually created earlier.

Create and assign permission to Amazon Redshift roles

In this step, we create a role in the Amazon Redshift cluster based on the groups that you created on the Azure AD portal.

Run the following command in the Amazon Redshift cluster:

create role "<namespace_name>:<Azure AD groupname>";

For example:

create role "aad:rsgroup";

To grant permission to the Amazon Redshift role, enter the following command:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]
 | ALL [ PRIVILEGES ] }
ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
TO role "<namespace_name>:<Azure AD groupname>";

Then grant relevant permission to the role as per your requirement. For example:

grant select on all tables in schema public to role "aad:rsgroup";

Create and assign permission to an Amazon Redshift user

To create the user, use the following syntax:

CREATE USER "<namespace_name>:<Azure AD username>" PASSWORD DISABLE;

For example:

CREATE USER "aad:[email protected]" PASSWORD DISABLE;

We use the following syntax to grant permission to the Amazon Redshift user:

GRANT { { SELECT | INSERT | UPDATE | DELETE | DROP | REFERENCES } [,...]
 | ALL [ PRIVILEGES ] }
ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
TO "<namespace_name>:<Azure AD username>";

For example:

grant select on all tables in schema public to "aad:[email protected]"

Configure your client connection using an Amazon Redshift native IdP

In this section, we provide instructions to set up your client connection for either Power BI Desktop or the Power BI service.

Connect Power BI Desktop

In this example, we use Power BI Desktop to connect with Amazon Redshift using a native IdP. Use Power BI Desktop version: 2.102.683.0 64-bit and above.

In your Power BI Desktop, choose Get data.
Search for the Amazon Redshift connector, then choose it and choose Connect.
For Server, enter your Amazon Redshift cluster’s endpoint. For example, test-cluster.ct4abcufthff.us-east-1.redshift.amazonaws.com.
For Database, enter your database name. In this example, we use dev.
Choose OK.
Choose Microsoft Account.
Choose Sign in.
Enter your Microsoft Account credentials.

When you’re connected, you can see the message You are currently signed in.

Choose Connect.

Congratulations! You are signed in using the Amazon Redshift native IdP with Power BI Desktop. Now you can browse your data.

After that, you can create your own Power BI report on the desktop version and publish it to your Microsoft account. For this example, we created and published a report named RedshiftOAuthReport, which I refer to later in this post.

Connect Power BI service

Now, let’s connect a Power BI gateway with Amazon Redshift using a native IdP. Before proceeding with below setup, please make sure you have downloaded and installed the latest version of the Microsoft Enterprise/Standard Gateway.

Open the Power BI web application and sign in if necessary.

You can see the RedshiftOAuthReport report that we created earlier.

In the navigation pane, under Datasets, choose the menu icon (three dots) next to the report name and then choose Settings.
Enable Gateway connection on the settings page.
Click on the arrow on right side and select Manually add to gateway.
In the Data Source Settings section, enter the appropriate values:
1. For Data Source Name, enter a name.
2. For Data Source Type, choose Amazon Redshift.
3. For Server, enter your Amazon Redshift cluster’s endpoint.
4. For Database, enter your database name (for this post, we use dev).
5. For Authentication Method, choose OAuth2.
Choose Edit credentials.
In the pop-up box, choose Sign in.
Enter your Microsoft account credentials and follow the authentication process.
After the authentication, choose Add on the Data Source Settings page.
Make sure that Gateway connection is enabled. If not, enable it.
Select your gateway from the gateway list.
On the Maps to menu, choose your data source.
Choose Apply.

Congratulations! You have completed the Amazon Redshift native IdP setup with Power BI web service.

Best Practices with Redshift native IdP:

Pre-create the Amazon Redshift roles based upon the groups which you have created on the Azure AD portal.
Assign permissions to Redshift roles instead of assigning to each individual external user. This will provide smoother end user experience as user will have all the required permission when they login using native IdP.

Troubleshooting

If your connection didn’t work, consider the following:

Enable logging in the driver. For instructions, see Configure logging.
Make sure to use the latest Amazon Redshift JDBC driver version 2.1.0.4 onwards, which supports Amazon Redshift native IdP authentication.
If you’re getting errors while setting up the application on Azure AD, make sure you have admin access.
If you can authenticate via the SQL client but get a permission issue or can’t see objects, grant the relevant permission to the role, as detailed earlier in this post.
If you get the error “claim value does not match expected value,” make sure you provided the correct parameters during Amazon Redshift IdP registration.
Check stl_error or stl_connection_log views on the Amazon Redshift cluster for authentication failures.

Summary

In this post, we covered the step-by-step process of integrating Amazon Redshift with Azure AD and Power BI Desktop and web service using Amazon Redshift native IdP federation. The process consisted of registering a Azure application, creating Azure AD groups, setting up the Amazon Redshift IdP, creating and assigning permission to Amazon Redshift roles, and finally configuring client connections.

For more information about Amazon Redshift native IdP federation, see:

AWS Documentation: Native identity provider (IdP) federation for Amazon Redshift.
AWS Blog to connect with SQL Client: Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client.

If you have questions or suggestions, please leave a comment.

About the Authors

Sergey Konoplev is a Senior Database Engineer on the amazon Redshift Team at AWS. Sergey has been focusing on Automation and improvement of database and data operations for more than a decade.