All posts by Nitin Wagh

Connect to Amazon Athena with federated identities using temporary credentials

Post Syndicated from Nitin Wagh original https://aws.amazon.com/blogs/big-data/connect-to-amazon-athena-with-federated-identities-using-temporary-credentials/

Many organizations have standardized on centralized user management, most commonly Microsoft Active Directory or LDAP.  Access to AWS resources is no exception.  Amazon Athena is a serverless query engine for data on Amazon S3 that is popular for quick and cost-effective queries of data in a data lake.  To allow users or applications to access Athena, organizations are required to use an AWS access key and an access secret key from which appropriate policies are enforced. To maintain a consistent authorization model across, organizations must enable authentication and authorization for Athena by using federated users.

This blog post shows the process of enabling federated user access with the AWS Security Token Service (AWS STS). This approach lets you create temporary security credentials and provides them to trusted users for running queries in Athena.

Temporary security credentials in AWS STS

Temporary security credentials ensures that access keys to protected AWS resources are properly rotated. Therefore, potential security leaks can be caught and remedied. AWS STS generates these per-use temporary access keys.

Temporary security credentials work similar to the long-term access key credentials that your IAM users can use. However, temporary security credentials have the following differences:

  • They are intended for short-term use only. You can configure these credentials to last from a few minutes to several hours, with a maximum of 12 hours. After they expire, AWS no longer recognizes them, or allows any kind of access from API requests made with them.
  • They are not stored with the user. They are generated dynamically and provided to the user when requested. When or before they expire, the user can request new credentials, if they still have permissions to do so.

Common scenarios for federated access

The following common scenarios describe when your organization may require federated access to Athena:

  1. Running queries in Athena while using federation. Your group is required to run queries in Athena while federating into AWS using SAML with permissions stored in Active Directory.
  2. Enabling access across accounts to Athena for users in your organization. Users in your organization with access to one AWS account needs to run Athena queries in a different account.
  3. Enabling access to Athena for a data application. A data application deployed on an Amazon EC2 instance needs to run Athena queries via JDBC.

Athena as the query engine for your data lake on S3

Athena is an interactive query service that lets you analyze data directly in Amazon S3 by using standard SQL. You can access Athena by using JDBC and ODBC drivers, AWS SDK, or the Athena console.

Athena enables schema-on-read analytics to gain insights from structured or semi-structured datasets found in the data lake. A data lake is ubiquitous, scalable, and reliable storage that lets you consume all of your structured and unstructured data. Customers increasingly prefer a serverless approach to querying data in their data lake.  Some benefits of using an Amazon S3 for a data lake include:

  • The ability to efficiently consume and store any data, at any scale, at low cost.
  • A single destination for searching and finding the relevant data.
  • Analysis of the data in S3 through a unified set of tools.

Solution overview

The following sections describe how to enable the common scenarios introduced previously in this post. They show how to download, install, and configure SQL Workbench to run queries in Athena. Next, they show how to use AWS STS with a custom JDBC credentials provider to obtain temporary credentials for an authorized user.  These credentials are passed to Athena’s JDBC driver, which enables SQL Workbench to run authorized queries.

As a reminder, the scenarios are:

  1. Running queries in Athena while using federation.
  2. Enabling access across accounts to Athena for users in your organization.
  3. Enabling access to Athena for a data application.

Walkthrough

This walkthrough uses a sample table created to query Amazon CloudFront logs.  It demonstrates that proper access has been granted to Athena after each scenario. This walkthrough also assumes that you have a table for testing.  For information about creating a table, see Getting Started with Amazon Athena.

Prerequisites for scenarios 1 and 2

  1. Download and install SQL Workbench.
  2. Download the Athena custom credentials provider .jar file to the same computer where SQL Workbench is installed.

Note: You can also compile the .jar file by using this Athena JDBC source code.

  1. Download the Amazon Athena JDBC driver to the same computer where SQL Workbench is installed.
  2. In SQL Workbench, open File, Connect window, Manage Drivers. Choose the Athena driver and add two libraries, Athena JDBC driver and the custom credentials provider by specifying the location where you downloaded them.

Scenario 1: SAML Federation with Active Directory

In this scenario, we use a SAML command line tool. It performs a SAML handshake with an identity provider, and then retrieves temporary security credentials from AWS STS. We then pass the obtained security credentials through a custom credentials provider to run queries in Athena.

Before You Begin

  1. Make sure you have followed the prerequisites for scenarios 1 and 2.
  2. Set up a SAML integration with ADFS. For information, see Enabling Federation to AWS Using Windows Active Directory, ADFS, and SAML 2.0.
  3. Add the following IAM policies to the ADFS-Production role:

  1. Set up federated AWS CLI For more information, see How to Implement a General Solution for Federated API/CLI Access Using SAML 2.0

Executing queries in Athena with a SAML federated user identity

  1. Run the federated AWS CLI script configured as part of the prerequisites. Log in using a valid user name and password, then choose the ADFS-Production role. As a result, your temporary access_key, secret_access_key and session_token are generated. They are stored in a “credentials” file under the [saml] profile that looks similar to the following:
 [saml]
output = json
region = us-east-1
aws_access_key_id = XXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXX
aws_session_token = XXXXXXXXXXXXXXXXX
  1. To enable running queries in Athena through SQL Workbench, configure an Athena connection as follows:

  1. Choose Extended Properties, and enter the properties as follows:
"AWSCredentialsProviderClass"="com.amazonaws.athena.jdbc.CustomIAMRoleAssumptionSAMLCredentialsProvider"
"AWSCredentialsProviderArguments"="<access_key_id>, <secret_access_key>, <session token>"
"S3OutputLocation"="s3://<bucket where query results are stored>"
"LogPath"= "<local path on laptop, or pc where logs are stored>"
"LogLevel"= "<Log Level from 0 to 6>"

  1. Choose Test to verify that you can successfully connect to Athena.
  2. Run a query in SQL Workbench to verify that credentials are correctly applied.

Scenario 2: Cross-account access

In this scenario, you allow users in one AWS account, referred to as Account A to run Athena queries in a different account, called Account B.

To enable this configuration, use the CustomIAMRoleAssumptionCredentialsProvider custom credentials provider to retrieve the necessary credentials. By doing so, you can run Athena queries by using credentials from Account A in Account B. This is made possible by the cross-account roles, as shown in the following diagram:

Before You Begin

1. If you haven’t already done so, follow these prerequisites for scenarios 1 and 2.

  1. Use AWS CLI to define a role named AthenaCrossAccountRole in Account B, as follows:
aws iam create-role --role-name AthenaCrossAccountRole --assume-role-policy-document file://cross-account-trust.json

Content of cross-account-trust.json file

{
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<ACCOUNT-A-ID>:root"
      },
      "Action": "sts:AssumeRole"
      }
    }
  ]
}
  1. Attach the IAM policies, AmazonAthenaFullAccess and AmazonS3FullAccess, to the AthenaCrossAccountRole IAM role, as follows:
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonAthenaFullAccess --role-name AthenaCrossAccountRole

aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --role-name --role-name AthenaCrossAccountRole

Executing Queries in Athena with a SAML Federated User Identity

  1. Set up a new Athena database connection in SQLWorkbench, as shown in the following example:

  1. Choose Extended Properties, and enter the properties as shown in the following example. This example configures the AWS credentials of an IAM user from Account A (AccessKey, SecretKey, and Cross Account Role ARN).
"AwsCredentialsProviderClass"="com.amazonaws.custom.athena.jdbc.CustomIAMRoleAssumptionCredentialsProvider"
"AwsCredentialsProviderArguments"="<aws_key_id>,<secret_key_id>,<cross Account Role ARN>"
"S3OutputLocation"="s3://<bucket where Athena results are stored>"
"LogPath"= "<local directory path on laptop, or pc where logs are stored>"
“LogLevel” = “Log Level from 1 to 6”

  1. Choose Test to verify that you can successfully connect to Athena.
  2. Run a query in SQL Workbench to verify that credentials are correctly applied.

Scenario 3: Using EC2 Instance Profile Role

This scenario uses the Amazon EC2 instance profile role to retrieve temporary credentials.  First, create the EC2AthenaInstanceProfileRole IAM role via AWS CLI, as shown in the following example:

aws iam create-role --role-name EC2AthenaInstanceProfileRole --assume-role-policy-document file://policy.json

Content of policy.json file

{
  "Statement": [
      {
        "Action": "sts:AssumeRole",
        "Effect": "Allow",
        "Principal": {
           "Service": "ec2.amazonaws.com"
         }
      }
  ]
}

Attach the IAM policies, AmazonAthenaFullAccess and AmazonS3FullAccess, to the EC2AthenaInstanceProfileRole IAM role, as follows:

aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonAthenaFullAccess --role-name EC2AthenaInstanceProfileRole

aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --role-name --role-name EC2AthenaInstanceProfileRole

Before You Begin

  1. Launch the Amazon EC2 instance for Windows, then attach the InstanceProfile role created in the previous step:
aws ec2 run-instances --image-id <use Amazon EC2 on Windows 2012 AMI Id for your region> --iam-instance-profile 'Arn=arn:aws:iam::<your_aws_account_id>:instance-profile/EC2AthenaInstanceProfileRole' --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=AthenaTest}]' --count 1 --instance-type t2.micro --key-name <your key> --security-group-ids <your_windows_security_group_id>

2.    Log in to your Windows instance using RDP.

3.    Install Java 8 and download and install SQL Workbench.

4.    Download the Athena JDBC driver.

Executing queries in Athena using an EC2 instance profile role

1.      In SQL Workbench, open File, Connect window, Manage Drivers. Specify a path to the Athena JDBC driver that was previously downloaded, as shown in the following example.

  1. The instance credentials provider, InstanceProfileCredentialsProvider, is included with the Amazon Athena JDBC driver. In SQL Workbench, set up an Athena connection as follows:

  1. Choose Extended Properties, and enter the properties as follows:
"AwsCredentialsProviderClass"="com.simba.athena.amazonaws.auth.InstanceProfileCredentialsProvider"
"S3OutputLocation"="s3://<bucket where Athena results are stored>"
"LogPath"= "<local directory path on laptop, or pc where logs are stored>"
“LogLevel” = “Log Level from 0 to 6”

  1. Choose Test to verify that you can successfully connect to Athena.
  2. Run a query in SQL Workbench to verify that credentials are correctly applied.

Conclusion

This post walked through three scenarios to enable trusted users to access Athena using temporary security credentials. First, we used SAML federation where user credentials were stored in Active Directory. Second, we used a custom credentials provider library to enable cross-account access. And third, we used an EC2 Instance Profile role to provide temporary credentials for users in our organization to access Athena.

These approaches ensure that access keys protecting AWS resources are not directly hardcoded in applications and can be easily revoked as needed. To achieve this, we used AWS STS to generate temporary per-use credentials.

The scenarios demonstrated in this post used SQL Workbench.  However, they apply for all other uses of the JDBC driver with Amazon Athena. Additionally, when using AWS SDK with Athena, similar approaches also apply.

Happy querying!

 


Additional Reading

If you found this post useful, be sure to check out Top 10 Performance Tuning Tips for Amazon Athena, and Analyze and visualize your VPC network traffic using Amazon Kinesis and Amazon Athena.

 


About the Author

Nitin Wagh is a Solutions Architect with Amazon Web Services specializing in Big Data Analytics. He helps AWS customers develop distributed big data solutions using AWS technologies.