Federate Database User Authentication Easily with IAM and Amazon Redshift

Post Syndicated from Thiyagarajan Arumugam original https://aws.amazon.com/blogs/big-data/federate-database-user-authentication-easily-with-iam-and-amazon-redshift/

Managing database users though federation allows you to manage authentication and authorization procedures centrally. Amazon Redshift now supports database authentication with IAM, enabling user authentication though enterprise federation. No need to manage separate database users and passwords to further ease the database administration. You can now manage users outside of AWS and authenticate them for access to an Amazon Redshift data warehouse. Do this by integrating IAM authentication and a third-party SAML-2.0 identity provider (IdP), such as AD FS, PingFederate, or Okta. In addition, database users can also be automatically created at their first login based on corporate permissions.

In this post, I demonstrate how you can extend the federation to enable single sign-on (SSO) to the Amazon Redshift data warehouse.

SAML and Amazon Redshift

AWS supports Security Assertion Markup Language (SAML) 2.0, which is an open standard for identity federation used by many IdPs. SAML enables federated SSO, which enables your users to sign in to the AWS Management Console. Users can also make programmatic calls to AWS API actions by using assertions from a SAML-compliant IdP. For example, if you use Microsoft Active Directory for corporate directories, you may be familiar with how Active Directory and AD FS work together to enable federation. For more information, see the Enabling Federation to AWS Using Windows Active Directory, AD FS, and SAML 2.0 AWS Security Blog post.

Amazon Redshift now provides the GetClusterCredentials API operation that allows you to generate temporary database user credentials for authentication. You can set up an IAM permissions policy that generates these credentials for connecting to Amazon Redshift. Extending the IAM authentication, you can configure the federation of AWS access though a SAML 2.0–compliant IdP. An IAM role can be configured to permit the federated users call the GetClusterCredentials action and generate temporary credentials to log in to Amazon Redshift databases. You can also set up policies to restrict access to Amazon Redshift clusters, databases, database user names, and user group.

Amazon Redshift federation workflow

In this post, I demonstrate how you can use a JDBC– or ODBC-based SQL client to log in to the Amazon Redshift cluster using this feature. The SQL clients used with Amazon Redshift JDBC or ODBC drivers automatically manage the process of calling the GetClusterCredentials action, retrieving the database user credentials, and establishing a connection to your Amazon Redshift database. You can also use your database application to programmatically call the GetClusterCredentials action, retrieve database user credentials, and connect to the database. I demonstrate these features using an example company to show how different database users accounts can be managed easily using federation.

The following diagram shows how the SSO process works:

  1. JDBC/ODBC
  2. Authenticate using Corp Username/Password
  3. IdP sends SAML assertion
  4. Call STS to assume role with SAML
  5. STS Returns Temp Credentials
  6. Use Temp Credentials to get Temp cluster credentials
  7. Connect to Amazon Redshift using temp credentials

Walkthrough

Example Corp. is using Active Directory (idp host:demo.examplecorp.com) to manage federated access for users in its organization. It has an AWS account: 123456789012 and currently manages an Amazon Redshift cluster with the cluster ID “examplecorp-dw”, database “analytics” in us-west-2 region for its Sales and Data Science teams. It wants the following access:

  • Sales users can access the examplecorp-dw cluster using the sales_grp database group
  • Sales users access examplecorp-dw through a JDBC-based SQL client
  • Sales users access examplecorp-dw through an ODBC connection, for their reporting tools
  • Data Science users access the examplecorp-dw cluster using the data_science_grp database group.
  • Partners access the examplecorp-dw cluster and query using the partner_grp database group.
  • Partners are not federated through Active Directory and are provided with separate IAM user credentials (with IAM user name examplecorpsalespartner).
  • Partners can connect to the examplecorp-dw cluster programmatically, using language such as Python.
  • All users are automatically created in Amazon Redshift when they log in for the first time.
  • (Optional) Internal users do not specify database user or group information in their connection string. It is automatically assigned.
  • Data warehouse users can use SSO for the Amazon Redshift data warehouse using the preceding permissions.

Step 1:  Set up IdPs and federation

The Enabling Federation to AWS Using Windows Active Directory post demonstrated how to prepare Active Directory and enable federation to AWS. Using those instructions, you can establish trust between your AWS account and the IdP and enable user access to AWS using SSO.  For more information, see Identity Providers and Federation.

For this walkthrough, assume that this company has already configured SSO to their AWS account: 123456789012 for their Active Directory domain demo.examplecorp.com. The Sales and Data Science teams are not required to specify database user and group information in the connection string. The connection string can be configured by adding SAML Attribute elements to your IdP. Configuring these optional attributes enables internal users to conveniently avoid providing the DbUser and DbGroup parameters when they log in to Amazon Redshift.

The user-name attribute can be set up as follows, with a user ID (for example, nancy) or an email address (for example. [email protected]):

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbUser">  
  <AttributeValue>user-name</AttributeValue>
</Attribute>

The AutoCreate attribute can be defined as follows:

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/AutoCreate">
    <AttributeValue>true</AttributeValue>
</Attribute>

The sales_grp database group can be included as follows:

<Attribute Name="https://redshift.amazon.com/SAML/Attributes/DbGroups">
    <AttributeValue>sales_grp</AttributeValue>
</Attribute>

For more information about attribute element configuration, see Configure SAML Assertions for Your IdP.

Step 2: Create IAM roles for access to the Amazon Redshift cluster

The next step is to create IAM policies with permissions to call GetClusterCredentials and provide authorization for Amazon Redshift resources. To grant a SQL client the ability to retrieve the cluster endpoint, region, and port automatically, include the redshift:DescribeClusters action with the Amazon Redshift cluster resource in the IAM role.  For example, users can connect to the Amazon Redshift cluster using a JDBC URL without the need to hardcode the Amazon Redshift endpoint:

Previous:  jdbc:redshift://endpoint:port/database

Current:  jdbc:redshift:iam://clustername:region/dbname

Use IAM to create the following policies. You can also use an existing user or role and assign these policies. For example, if you already created an IAM role for IdP access, you can attach the necessary policies to that role. Here is the policy created for sales users for this example:

Sales_DW_IAM_Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "redshift:DescribeClusters"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:GetClusterCredentials"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:userid": "AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:CreateClusterUser"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:JoinGroup"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp"
            ]
        }
    ]
}

The policy uses the following parameter values:

  • Region: us-west-2
  • AWS Account: 123456789012
  • Cluster name: examplecorp-dw
  • Database group: sales_grp
  • IAM role: AIDIODR4TAW7CSEXAMPLE
Policy Statement Description
{
"Effect":"Allow",
"Action":[
"redshift:DescribeClusters"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
]
}

Allow users to retrieve the cluster endpoint, region, and port automatically for the Amazon Redshift cluster examplecorp-dw. This specification uses the resource format arn:aws:redshift:region:account-id:cluster:clustername. For example, the SQL client JDBC can be specified in the format jdbc:redshift:iam://clustername:region/dbname.

For more information, see Amazon Resource Names.

{
"Effect":"Allow",
"Action":[
"redshift:GetClusterCredentials"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
"arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
],
"Condition":{
"StringEquals":{
"aws:userid":"AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com"
}
}
}

Generates a temporary token to authenticate into the examplecorp-dw cluster. “arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}” restricts the corporate user name to the database user name for that user. This resource is specified using the format: arn:aws:redshift:region:account-id:dbuser:clustername/dbusername.

The Condition block enforces that the AWS user ID should match “AIDIODR4TAW7CSEXAMPLE:${redshift:DbUser}@examplecorp.com”, so that individual users can authenticate only as themselves. The AIDIODR4TAW7CSEXAMPLE role has the Sales_DW_IAM_Policy policy attached.

{
"Effect":"Allow",
"Action":[
"redshift:CreateClusterUser"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
]
}
Automatically creates database users in examplecorp-dw, when they log in for the first time. Subsequent logins reuse the existing database user.
{
"Effect":"Allow",
"Action":[
"redshift:JoinGroup"
],
"Resource":[
"arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp"
]
}
Allows sales users to join the sales_grp database group through the resource “arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/sales_grp” that is specified in the format arn:aws:redshift:region:account-id:dbgroup:clustername/dbgroupname.

Similar policies can be created for Data Science users with access to join the data_science_grp group in examplecorp-dw. You can now attach the Sales_DW_IAM_Policy policy to the role that is mapped to IdP application for SSO.
 For more information about how to define the claim rules, see Configuring SAML Assertions for the Authentication Response.

Because partners are not authorized using Active Directory, they are provided with IAM credentials and added to the partner_grp database group. The Partner_DW_IAM_Policy is attached to the IAM users for partners. The following policy allows partners to log in using the IAM user name as the database user name.

Partner_DW_IAM_Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "redshift:DescribeClusters"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:GetClusterCredentials"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:cluster:examplecorp-dw",
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ],
            "Condition": {
                "StringEquals": {
                    "redshift:DbUser": "${aws:username}"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:CreateClusterUser"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbuser:examplecorp-dw/${redshift:DbUser}"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "redshift:JoinGroup"
            ],
            "Resource": [
                "arn:aws:redshift:us-west-2:123456789012:dbgroup:examplecorp-dw/partner_grp"
            ]
        }
    ]
}

redshift:DbUser“: “${aws:username}” forces an IAM user to use the IAM user name as the database user name.

With the previous steps configured, you can now establish the connection to Amazon Redshift through JDBC– or ODBC-supported clients.

Step 3: Set up database user access

Before you start connecting to Amazon Redshift using the SQL client, set up the database groups for appropriate data access. Log in to your Amazon Redshift database as superuser to create a database group, using CREATE GROUP.

Log in to examplecorp-dw/analytics as superuser and create the following groups and users:

CREATE GROUP sales_grp;
CREATE GROUP datascience_grp;
CREATE GROUP partner_grp;

Use the GRANT command to define access permissions to database objects (tables/views) for the preceding groups.

Step 4: Connect to Amazon Redshift using the JDBC SQL client

Assume that sales user “nancy” is using the SQL Workbench client and JDBC driver to log in to the Amazon Redshift data warehouse. The following steps help set up the client and establish the connection:

  1. Download the latest Amazon Redshift JDBC driver from the Configure a JDBC Connection page
  2. Build the JDBC URL with the IAM option in the following format:
    jdbc:redshift:iam://examplecorp-dw:us-west-2/sales_db

Because the redshift:DescribeClusters action is assigned to the preceding IAM roles, it automatically resolves the cluster endpoints and the port. Otherwise, you can specify the endpoint and port information in the JDBC URL, as described in Configure a JDBC Connection.

Identify the following JDBC options for providing the IAM credentials (see the “Prepare your environment” section) and configure in the SQL Workbench Connection Profile:

plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider 
idp_host=demo.examplecorp.com (The name of the corporate identity provider host)
idp_port=443  (The port of the corporate identity provider host)
user=examplecorp\nancy(corporate user name)
password=***(corporate user password)

The SQL workbench configuration looks similar to the following screenshot:

Now, “nancy” can connect to examplecorp-dw by authenticating using the corporate Active Directory. Because the SAML attributes elements are already configured for nancy, she logs in as database user nancy and is assigned the sales_grp. Similarly, other Sales and Data Science users can connect to the examplecorp-dw cluster. A custom Amazon Redshift ODBC driver can also be used to connect using a SQL client. For more information, see Configure an ODBC Connection.

Step 5: Connecting to Amazon Redshift using JDBC SQL Client and IAM Credentials

This optional step is necessary only when you want to enable users that are not authenticated with Active Directory. Partners are provided with IAM credentials that they can use to connect to the examplecorp-dw Amazon Redshift clusters. These IAM users are attached to Partner_DW_IAM_Policy that assigns them to be assigned to the public database group in Amazon Redshift. The following JDBC URLs enable them to connect to the Amazon Redshift cluster:

jdbc:redshift:iam//examplecorp-dw/analytics?AccessKeyID=XXX&SecretAccessKey=YYY&DbUser=examplecorpsalespartner&DbGroup= partner_grp&AutoCreate=true

The AutoCreate option automatically creates a new database user the first time the partner logs in. There are several other options available to conveniently specify the IAM user credentials. For more information, see Options for providing IAM credentials.

Step 6: Connecting to Amazon Redshift using an ODBC client for Microsoft Windows

Assume that another sales user “uma” is using an ODBC-based client to log in to the Amazon Redshift data warehouse using Example Corp Active Directory. The following steps help set up the ODBC client and establish the Amazon Redshift connection in a Microsoft Windows operating system connected to your corporate network:

  1. Download and install the latest Amazon Redshift ODBC driver.
  2. Create a system DSN entry.
    1. In the Start menu, locate the driver folder or folders:
      • Amazon Redshift ODBC Driver (32-bit)
      • Amazon Redshift ODBC Driver (64-bit)
      • If you installed both drivers, you have a folder for each driver.
    2. Choose ODBC Administrator, and then type your administrator credentials.
    3. To configure the driver for all users on the computer, choose System DSN. To configure the driver for your user account only, choose User DSN.
    4. Choose Add.
  3. Select the Amazon Redshift ODBC driver, and choose Finish. Configure the following attributes:
    Data Source Name =any friendly name to identify the ODBC connection 
    Database=analytics
    user=uma(corporate user name)
    Auth Type-Identity Provider: AD FS
    password=leave blank (Windows automatically authenticates)
    Cluster ID: examplecorp-dw
    idp_host=demo.examplecorp.com (The name of the corporate IdP host)

This configuration looks like the following:

  1. Choose OK to save the ODBC connection.
  2. Verify that uma is set up with the SAML attributes, as described in the “Set up IdPs and federation” section.

The user uma can now use this ODBC connection to establish the connection to the Amazon Redshift cluster using any ODBC-based tools or reporting tools such as Tableau. Internally, uma authenticates using the Sales_DW_IAM_Policy  IAM role and is assigned the sales_grp database group.

Step 7: Connecting to Amazon Redshift using Python and IAM credentials

To enable partners, connect to the examplecorp-dw cluster programmatically, using Python on a computer such as Amazon EC2 instance. Reuse the IAM users that are attached to the Partner_DW_IAM_Policy policy defined in Step 2.

The following steps show this set up on an EC2 instance:

  1. Launch a new EC2 instance with the Partner_DW_IAM_Policy role, as described in Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances. Alternatively, you can attach an existing IAM role to an EC2 instance.
  2. This example uses Python PostgreSQL Driver (PyGreSQL) to connect to your Amazon Redshift clusters. To install PyGreSQL on Amazon Linux, use the following command as the ec2-user:
    sudo easy_install pip
    sudo yum install postgresql postgresql-devel gcc python-devel
    sudo pip install PyGreSQL

  1. The following code snippet demonstrates programmatic access to Amazon Redshift for partner users:
    #!/usr/bin/env python
    """
    Usage:
    python redshift-unload-copy.py <config file> <region>
    
    * Copyright 2014, Amazon.com, Inc. or its affiliates. All Rights Reserved.
    *
    * Licensed under the Amazon Software License (the "License").
    * You may not use this file except in compliance with the License.
    * A copy of the License is located at
    *
    * http://aws.amazon.com/asl/
    *
    * or in the "license" file accompanying this file. This file is distributed
    * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
    * express or implied. See the License for the specific language governing
    * permissions and limitations under the License.
    """
    
    import sys
    import pg
    import boto3
    
    REGION = 'us-west-2'
    CLUSTER_IDENTIFIER = 'examplecorp-dw'
    DB_NAME = 'sales_db'
    DB_USER = 'examplecorpsalespartner'
    
    options = """keepalives=1 keepalives_idle=200 keepalives_interval=200
                 keepalives_count=6"""
    
    set_timeout_stmt = "set statement_timeout = 1200000"
    
    def conn_to_rs(host, port, db, usr, pwd, opt=options, timeout=set_timeout_stmt):
        rs_conn_string = """host=%s port=%s dbname=%s user=%s password=%s
                             %s""" % (host, port, db, usr, pwd, opt)
        print "Connecting to %s:%s:%s as %s" % (host, port, db, usr)
        rs_conn = pg.connect(dbname=rs_conn_string)
        rs_conn.query(timeout)
        return rs_conn
    
    def main():
        # describe the cluster and fetch the IAM temporary credentials
        global redshift_client
        redshift_client = boto3.client('redshift', region_name=REGION)
        response_cluster_details = redshift_client.describe_clusters(ClusterIdentifier=CLUSTER_IDENTIFIER)
        response_credentials = redshift_client.get_cluster_credentials(DbUser=DB_USER,DbName=DB_NAME,ClusterIdentifier=CLUSTER_IDENTIFIER,DurationSeconds=3600)
        rs_host = response_cluster_details['Clusters'][0]['Endpoint']['Address']
        rs_port = response_cluster_details['Clusters'][0]['Endpoint']['Port']
        rs_db = DB_NAME
        rs_iam_user = response_credentials['DbUser']
        rs_iam_pwd = response_credentials['DbPassword']
        # connect to the Amazon Redshift cluster
        conn = conn_to_rs(rs_host, rs_port, rs_db, rs_iam_user,rs_iam_pwd)
        # execute a query
        result = conn.query("SELECT sysdate as dt")
        # fetch results from the query
        for dt_val in result.getresult() :
            print dt_val
        # close the Amazon Redshift connection
        conn.close()
    
    if __name__ == "__main__":
        main()

You can save this Python program in a file (redshiftscript.py) and execute it at the command line as ec2-user:

python redshiftscript.py

Now partners can connect to the Amazon Redshift cluster using the Python script, and authentication is federated through the IAM user.

Summary

In this post, I demonstrated how to use federated access using Active Directory and IAM roles to enable single sign-on to an Amazon Redshift cluster. I also showed how partners outside an organization can be managed easily using IAM credentials.  Using the GetClusterCredentials API action, now supported by Amazon Redshift, lets you manage a large number of database users and have them use corporate credentials to log in. You don’t have to maintain separate database user accounts.

Although this post demonstrated the integration of IAM with AD FS and Active Directory, you can replicate this solution across with your choice of SAML 2.0 third-party identity providers (IdP), such as PingFederate or Okta. For the different supported federation options, see Configure SAML Assertions for Your IdP.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to establish federated access to your AWS resources by using Active Directory user attributes.


About the Author

Thiyagarajan Arumugam is a Big Data Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.