Post Syndicated from Aarthi Srinivasan original https://aws.amazon.com/blogs/big-data/enable-cross-account-sharing-with-direct-iam-principals-using-aws-lake-formation-tags/
With AWS Lake Formation, you can build data lakes with multiple AWS accounts in a variety of ways. For example, you could build a data mesh, implementing a centralized data governance model and decoupling data producers from the central governance. Such data lakes enable the data as an asset paradigm and unleash new possibilities with data discovery and exploration across organization-wide consumers. While enabling the power of data in decision-making across your organization, it’s also crucial to secure the data. With Lake Formation, sharing datasets across accounts only requires a few simple steps, and you can control what you share.
Lake Formation has launched Version 3 capabilities for sharing AWS Glue Data Catalog resources across accounts. When moving to Lake Formation cross-account sharing V3, you get several benefits. When moving from V1, you get more optimized usage of AWS Resource Access Manager (AWS RAM) to scale sharing of resources. When moving from V2, you get a few enhancements. First, you don’t have to maintain AWS Glue resource policies to share using LF-tags because Version 3 uses AWS RAM. Second, you can share with AWS Organizations using LF-tags. Third, you can share to individual AWS Identity and Access Management (IAM) users and roles in other accounts, thereby providing data owners control over which individuals can access their data.
Lake Formation tag-based access control (LF-TBAC) is an authorization strategy that defines permissions based on attributes called LF-tags. LF-tags are different from IAM resource tags and are associated only with Lake Formation databases, tables, and columns. LF-TBAC allows you to define the grant and revoke permissions policy by grouping Data Catalog resources, and therefore helps in scaling permissions across a large number of databases and tables. LF-tags are inherited from a database to all its tables and all the columns of each table.
Version 3 offers the following benefits:
- True central governance with cross-account sharing to specific IAM principals in the target account
- Ease of use in not having to maintain an AWS Glue resource policy for LF-TBAC
- Efficient reuse of AWS RAM shares
- Ease of use in scaling to hundreds of accounts with LF-TBAC
In this post, we illustrate the new features of cross-account sharing Version 3 in a producer-consumer scenario using TPC datasets. We walk through the setup of using LF-TBAC to share data catalog resources from the data producer account to direct IAM users in the consumer account. We also go through the steps in the receiving account to accept the shares and query the data.
Solution overview
To demonstrate the Lake Formation cross-account Version 3 features, we use the TPC datasets available at s3://aws-data-analytics-workshops/shared_datasets/tpcparquet/. The solution consists of steps in both accounts.
In account A, complete the following steps:
- As a data producer, register the dataset with Lake Formation and create AWS Glue Data Catalog tables.
- Create LF-tags and associate them with the database and tables.
- Grant LF-tag based permissions on resources directly to personas in consumer account B.
The following steps take place in account B:
- The consumer account data lake admin reviews and accepts the AWS RAM invitations.
- The data lake admin gives CREATE DATABASE access to the IAM user
lf_business_analysts
. - The data lake admin creates a database for the marketing team and grants CREATE TABLE access to
lf_campaign_manager
. - The IAM users create resource links on the shared database and tables and query them in Amazon Athena.
The producer account A has the following personas:
- Data lake admin – Manages the data lake in the producer account
- lf-producersteward – Manages the data and user access
The consumer account B has the following personas:
- Data lake admin – Manages the data lake in the consumer account
- lf-business-analysts – The business analysts in the sales team needs access to non-PII data
- lf-campaign-manager – The manager in the marketing team needs access to data related to products and promotions
Prerequisites
You need the following prerequisites:
- Two AWS accounts. For this demonstration of how AWS RAM invites are created and accepted, you should use two accounts that are not part of the same organization.
- An admin IAM user in both accounts to launch the AWS CloudFormation stacks.
- Lake Formation mode enabled in both the producer and consumer account with cross-account Version 3. For instructions, refer to Change the default permission model.
Lake Formation and AWS CloudFormation setup in account A
To keep the setup simple, we have an IAM admin registered as the data lake admin.
- Sign into the AWS Management Console in the
us-east-1
Region. - On the Lake Formation console, under Permissions in the navigation pane, choose Administrative roles and tasks.
- Select Choose Administrators under Datalake administrators.
- In the pop-up window Manage data lake administrators, under IAM users and roles, choose IAM admin user and choose Save.
- Choose Launch Stack to deploy the CloudFormation template:
- Choose Next.
- Provide a name for the stack and choose Next.
- On the next page, choose Next.
- Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
- Choose Create.
Stack creation should take about 2–3 minutes. The stack establishes the producer setup as follows:
- Creates an Amazon Simple Storage Service (Amazon S3) data lake bucket
- Registers the data lake bucket with Lake Formation
- Creates an AWS Glue database and tables
- Creates an IAM user (
lf-producersteward
) who will act as producer steward - Creates LF-tags and assigns them to the created catalog resources as specified in the following table
Database | Table | LF-Tag Key | LF-Tag Value | Resource Tagged |
lftpcdb |
. | Sensitivity |
Public |
DATABASE |
lftpcdb |
items |
HasCampaign |
true |
TABLE |
lftpcdb |
promotions |
HasCampaign |
true |
TABLE |
lftpcdb |
customers table columns = "c_last_name","c_first_name","c_email_address" |
Sensitivity |
Confidential |
TABLECOLUMNS |
Verify permissions in account A
After the CloudFormation stack launches, complete the following steps in account A:
- On the AWS CloudFormation console, navigate to the Outputs tab of the stack.
- Choose the
LFProducerStewardCredentials
value to navigate to the AWS Secrets Manager console. - In the Secret value section, choose Retrieve secret value.
- Note down the secret value for the password for IAM user
lf-producersteward
.
You need this to log in to the console later as the user lf-producersteward
.
- On the LakeFormation console, choose Databases on the navigation pane.
- Open the database
lftpcdb
. - Verify the LF-tags on the database are created.
- Choose View tables and choose the
items
table to verify the LF-tags.
- Repeat the steps for the
promotions
andcustomers
tables to verify the LF-tags assigned.
- On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.
- Select the database
lftpcdb
and on the Actions menu, choose View Permissions. - Verify that there are no default permissions granted on the database
lftpcdb
forIAMAllowedPrincipals
. - If you find any, select the permission and choose Revoke to revoke the permission.
- On the AWS Management Console, choose the AWS CloudShell icon on the top menu.
This opens AWS CloudShell in another tab of the browser. Allow a few minutes for the CloudShell environment to set up.
- Run the following AWS Command Line Interface (AWS CLI) command after replacing
{BUCKET_NAME}
withDataLakeBucket
from the stack output.
If CloudShell isn’t available in your chosen Region, run the following AWS CLI command to copy the required dataset from your preferred AWS CLI environment as the IAM admin user.
- Verify that your S3 bucket has the dataset copied in it.
- Log out as the IAM admin user.
Grant permissions in account A
Next, we continue granting Lake Formation permissions to the dataset as a data steward within the producer account. The data steward grants the following LF-tag-based permissions to the consumer personas.
Consumer Persona | LF-tag Policy |
lf-business-analysts |
Sensitivity=Public |
lf-campaign-manager |
HasCampaign=true |
- Log in to account A as user
lf-producersteward
, using the password you noted from Secrets Manager earlier. - On the Lake Formation console, under Permissions in the navigation pane, choose Data Lake permissions.
- Choose Grant.
- Under Principals, select External accounts.
- Provide the ARN of the IAM user in the consumer account (
arn:aws:iam::<accountB_id>:user/lf-business-analysts
) and press Enter.
- Under LF_Tags or catalog resources, select Resources matched by LF-Tags.
- Choose Add LF-Tag to add a new key-value pair.
- For the key, choose
Sensitivity
and for the value, choosePublic
. - Under Database permissions, select Describe, and under Table permissions, select Select and Describe.
- Choose Grant to apply the permissions.
- On the Lake Formation console, under Permissions in the navigation pane, choose Data Lake permissions.
- Choose Grant.
- Under Principals, select External accounts.
- Provide the ARN of the IAM user in the consumer account (
arn:aws:iam::<accountB_id>:user/lf-campaign-manager
) and press Enter. - Under LF_Tags or catalog resources, select Resources matched by LF-Tags.
- Choose Add LF-Tag to add a new key-value pair.
- For the key, choose
HasCampaign
and for the value, choose true.
- Under Database permissions, select Describe, and under Table permissions, select Select and Describe.
- Choose Grant to apply the permissions.
- Verify on the Data lake permissions tab that the permissions you have granted show up correctly.
AWS CloudFormation setup in account B
Complete the following steps in the consumer account:
- Log in as an IAM admin user in account B and launch the CloudFormation stack:
- Choose Next.
- Provide a name for the stack, then choose Next.
- On the next page, choose Next.
- Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
- Choose Create.
Stack creation should take about 2–3 minutes. The stack sets up the following resources in account B:
- IAM users
datalakeadmin1
,lf-business-analysts
, andlf-campaign-manager
, with relevant IAM and Lake Formation permissions - A database called
db_for_shared_tables
withCreate_Table
permissions to the lf-campaign-manager user - An S3 bucket named
lfblog-athenaresults-<your-accountB-id>-us-east-1 with ListBucket
and write permissions tolf-business-analysts
andlf-campaign-manager
Note down the stack output details.
Accept resource shares in account B
After you launch the CloudFormation stack, complete the following steps in account B:
- On the CloudFormation stack Outputs tab, choose the link for
DataLakeAdminCredentials
.
This takes you to the Secrets Manager console.
- On the Secrets Manager console, choose Retrieve secret value and copy the password for
DataLakeAdmin
user. - Use the
ConsoleIAMLoginURL
value from the CloudFormation template output to log in to account B with the data lake admin user name datalakeadmin1 and the password you copied from Secrets Manager. - Open the AWS RAM console in another browser tab.
- In the navigation pane, under Shared with me, choose Resource shares to view the pending invitations.
You should see two resource share invitations from the producer account A: one for database-level share and one for table-level share.
- Choose each resource share link, review the details, and choose Accept.
After you accept the invitations, the status of the resource shares changes from Active from Pending.
Grant permissions in account B
To grant permissions in account B, complete the following steps:
- On the Lake Formation console, under Permissions on the navigation pane, choose Administrative roles and tasks.
- Under Database creators, choose Grant.
- Under IAM users and roles, choose
lf-business-analysts
. - For Catalog permissions, select Create database.
- Choose Grant.
- Log out of the console as the data lake admin user.
Query the shared datasets as consumer users
To validate the lf-business-analysts
user’s data access, perform the following steps:
- Log in to the console as lf-business-analysts, using the credentials noted from the CloudFormation stack output.
- On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.
- Select the database
lftpcdb
and on the Actions menu, choose Create resource link.
- Under Resource link name, enter
rl_lftpcdb
. - Choose Create.
- After the resource link is created, select the resource link and choose View tables.
You can now see the four tables in the shared database.
- Open the Athena console in another browser tab and choose the
lfblog-athenaresults-<your-accountB-id>-us-east-1 bucket
as the query results location. - Verify data access using the following query (for more information, refer to Running SQL queries using Amazon Athena):
The following screenshot shows the query output.
Notice that account A shared the database lftpcdb
to account B using the LF-tag expression Sensitivity=Public
. Columns c_first_name
, c_last_name
, and c_email_address
in table customers were overwritten with Sensitivity=Confidential
. Therefore, these three columns are not visible to user lf-business-analysts
.
You can preview the other tables from the database similarly to see the available columns and data.
- Log out of the console as
lf-business-analysts
.
Now we can validate the lf-campaign-manager
user’s data access.
- Log in to the console as lf-campaign-manager using the credentials noted from the CloudFormation stack output.
- On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.
- Verify that you can see the database
db_for_shared_tables
shared by the data lake admin.
- Under Data catalog in the navigation pane, choose Tables.
You should be able to see the two tables shared from account A using the LF-tag expression HasCampaign=true
. The two tables show the Owner account ID as account A.
Because lf-campaign-manager
received table level shares, this user will create table-level resource links for querying in Athena.
- Select the promotions table, and on the Actions menu, choose Create resource link.
- For Resource link name, enter
rl_promotions
.
- Under Database, choose
db_for_shared_tables
for the database to contain the resource link. - Choose Create.
- Repeat the table resource link creation for the other table items.
Notice that the resource links show account B as owner, whereas the actual tables show account A as the owner.
- Open the Athena console in another browser tab and choose the
lfblog-athenaresults-<your-accountB-id>-us-east-1
bucket as the query results location. - 11. Query the tables using the resource links.
As shown in the following screenshot, all columns of both tables are accessible to lf-campaign-manager
.
In summary, you have seen how LF-tags are used to share a database and select tables from one account to another account’s IAM users.
Clean up
To avoid incurring charges on the AWS resources created in this post, you can perform the following steps.
First, clean up resources in account A:
- Empty the S3 bucket created for this post by deleting the downloaded objects from your S3 bucket.
- Delete the CloudFormation stack.
This deletes the S3 bucket, custom IAM roles, policies, and the LF database, tables, and permissions.
- You may choose to undo the Lake Formation settings also and add IAM access back from the Lake Formation console Settings page.
Now complete the following steps in account B:
- Empty the S3 bucket
lfblog-athenaresults-<your-accountB-id>-us-east-1
used as the Athena query results location. - Revoke permission to
lf-business-analysts
as database creator. - Delete the CloudFormation stack.
This deletes the IAM users, S3 bucket, Lake Formation database db_for_shared_tables
, resource links, and all the permissions from Lake Formation.
If there are any resource links and permissions left, delete them manually in Lake Formation from both accounts.
Conclusion
In this post, we illustrated the benefits of using Lake Formation cross-account sharing Version 3 using LF-tags to direct IAM principals and how to receive the shared tables in the consumer account. We used a two-account scenario in which a data producer account shares a database and specific tables to individual IAM users in another account using LF-tags. In the receiving account, we showed the role played by a data lake admin vs. the receiving IAM users. We also illustrated how to overwrite column tags to mask and share PII data.
With Version 3 of cross-account sharing features, Lake Formation makes possible more modern data mesh models, where a producer can directly share to an IAM principal in another account, instead of the entire account. Data mesh implementation becomes easier for data administrators and data platform owners because they can easily scale to hundreds of consumer accounts using the LF-tags based sharing to organizational units or IDs.
We encourage you to upgrade your Lake Formation cross-account sharing to Version 3 and benefit from the enhancements. For more details, see Updating cross-account data sharing version settings.
About the authors
Aarthi Srinivasan is a Senior Big Data Architect with AWS Lake Formation. She likes building data lake solutions for AWS customers and partners. When not on the keyboard, she explores the latest science and technology trends and spends time with her family.
Srividya Parthasarathy is a Senior Big Data Architect on the AWS Lake Formation team. She enjoys building analytics and data mesh solutions on AWS and sharing them with the community.