Implement row-level access control in a multi-tenant environment with Amazon Redshift

Post Syndicated from Siva Bangaru original https://aws.amazon.com/blogs/big-data/implement-row-level-access-control-in-a-multi-tenant-environment-with-amazon-redshift/

This is a guest post co-written with Siva Bangaru and Leon Liu from ADP.

ADP helps organizations of all types and sizes by providing human capital management (HCM) solutions that unite HR, payroll, talent, time, tax, and benefits administration. ADP is a leader in business outsourcing services, analytics, and compliance expertise. ADP’s unmatched experience, deep insights, and cutting-edge technology have transformed human resources from a back-office administrative function to a strategic business advantage.

People Analytics powered by ADP DataCloud is an application that provides analytics and enhanced insights to ADP’s clients. It delivers a guided analytics experience that make it easy for you to create, use, and distribute tailored analytics for your organization. ADP People Analytics’s streamlined, configurable dashboards can help you identify potential issues in key areas, like overtime, turnover, compensation, and much more.

ADP provides this analytics experience to thousands of clients today. Securing customers’ data is a top priority for ADP. The company requires the highest security standards when implementing a multi-tenant analytics platform on Amazon Redshift.

ADP DataCloud integrates with Amazon Redshift row-level security (RLS) to implement granular data entitlements and enforce the access restrictions on their tables in Amazon Redshift.

In this post, we discuss how the ADP DataCloud team implemented Amazon Redshift RLS on the foundation of role-based access control (RBAC) to simplify managing privileges required in a multi-tenant environment, and also enabled and enforced access to granular data entitlements in business terms.

The ADP DataCloud team had the following key requirements and challenges:

Support a multi-tenant application to enforce a logical separation of each tenant’s data rows
Support dynamic provisioning of new tenants
Minimal impact on performance

Row-level security in Amazon Redshift

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. One of the challenges with security is that enterprises want to provide fine-grained access control at the row level for sensitive data. This can be done by creating views or using different databases and schemas for different users. However, this approach isn’t scalable and becomes complex to maintain over time, especially when supporting multi-tenant environments.

In early 2022, Amazon Redshift released row-level security, which is built on the foundation of role-based access control. RLS allows you to control which users or roles can access specific records of data within tables, based on security policies that are defined at the database object level. This new RLS capability in Amazon Redshift enables you to dynamically filter existing rows of data in a table along with session context variable setting capabilities to dynamically assign the appropriate tenant configuration. This is in addition to column-level access control, where you can grant users permissions to a subset of columns. Now you can combine column-level access control with RLS policies to further restrict access to particular rows of visible columns. Refer to Achieve fine-grained data security with row-level access control in Amazon Redshift for additional details.

Solution overview

As part of ADP’s key requirements to support a multi-tenant data store wherein a single table holds data of multiple tenants, enforcement of security policies to ensure no cross-tenant data access is of paramount importance. One obvious way to ensure this is by creating database users for each tenant and implementing RLS policies to filter a single tenant’s data as per the logged-in user. But this can be tedious and become cumbersome to maintain as the number of tenants grow by the thousands.

This post presents another way to handle this use case by combining session context variables and RLS policies on tables to filter a single tenant’s data, thereby easing the burden of creating and maintaining thousands of database users. In fact, a single database user is all that is needed to connect and query different tenant’s data in different sessions from a multi-tenant table by setting different values to a session context variable in each session, as shown in the following diagram.

Let’s start by covering the high-level implementation steps. Consider there is a database user in Amazon Redshift app_user (which is neither a super user, nor has the sys:secadmin role granted, nor has the IGNORE RLS system privilege granted via another role). The user app_user owns a schema with the same name and all objects in it. The following is a typical multi-tenant table employee in the app_user schema with some sample records shown in the table:

CREATE TABLE app_user.employee (
    tenant_id varchar(50) not null,
    id varchar(50) not null,
    name varchar(200),
    email varchar(200),
    ssn char(9),
    constraint employee_pkey primary key (tenant_id,id)	
);

TENANT_ID	ID	NAME	EMAIL	SSN
T0001	E4646	Andy	.	XXXXXXXXX
T0001	E4689	Bob	.	XXXXXXXXX
T0001	E4691	Christina	.	XXXXXXXXX
T0002	E4733	Peter	.	XXXXXXXXX
T0002	E4788	Quan	.	XXXXXXXXX
T0002	E4701	Rose	.	XXXXXXXXX
T0003	E5699	Diana	.	XXXXXXXXX
T0003	E5608	Emily	.	XXXXXXXXX
T0003	E5645	Florence	.	XXXXXXXXX

To implement that, the following steps are required:

Create a RLS policy on a column using a predicate that is set using a session context variable.
Enable RLS at the table level and attach the RLS policy on the table.
Create a stored procedure that sets the session context variable used in the RLS policy predicate.
Connect and call the stored procedure to set the session context variable and query the table.

Now RLS can be enabled on this table in such a way that whenever app_user queries the employee table, the user will either see no rows or retrieve only rows specific to single tenant despite being the owner of the table.

An administrator, such as app_admin, either a super user or a user that has the sys:secadmin role, can enforce this as follows:

Create a RLS policy that attaches a tenant_id predicate using a session context variable:

create rls policy tenant_policy
with (tenant_id varchar(50))
using (tenant_id = current_setting('app_context.tenant_id'));

Enable RLS and attach the policy on the employee table:

alter table app_user.employee row level security on;

attach rls policy tenant_policy on app_user.employee to public;

Create a stored procedure to set the tenant_id in a session variable and grant access to app_user:

create or replace procedure app_admin.set_app_context
(p_tenant_id in varchar)
	language plpgsql
as $$ 	
declare
		v_tenant_id  varchar(50);
begin
    reset all;   
	v_tenant_id := set_config('app_context.tenant_id',p_tenant_id,false);	
end;
 $$
;

grant execute on app_admin. set_app_context(varchar) to app_user;

Connect to app_user and call the stored procedure to set the session context variable:
```
call app_admin.set_app_context('T0001');
```

When this setup is complete, whenever tenants are connecting to ADP Analytics dashboards, it connects as app_user and runs stored procedures by passing tenant_id, which sets the session context variable using the tenant ID. In this case, when requests come to connect and query the employee table, the user will experience the following scenarios:

No data is retrieved if current_setting('app_context.tenant_id') is not set or is null
Data is retrieved if current_setting('app_context.tenant_id') is set by calling the app_admin.set_app_context(varchar) procedure to a value that exists in the employee table (for example, app_admin.set_app_context(‘T0001’))

No data is retrieved if current_setting('app_context.tenant_id') is set to a value that doesn’t exist in the employee table (for example, app_admin.set_app_context(‘T9999’))

Validate RLS by examining query plans

Now let’s review the preceding scenarios by running an explain plan and observing how RLS works for the test setup. If a query contains a table that is subject to RLS policies, EXPLAIN displays a special RLS SecureScan node. Amazon Redshift also logs the same node type to the STL_EXPLAIN system table. EXPLAIN doesn’t reveal the RLS predicate that applies to the employee table. To view an explain plan with RLS predicate details, the EXPLAIN RLS system privilege is granted to app_user via a role.

In this first scenario, tenant_id wasn’t set by the stored procedure and was passed as a null value, therefore below select statement returns no rows .

=> select count(1),tenant_id from employee group by 2;

count | tenant_id

-------+-------------

(0 rows)

Explain plan output shows the filter as NULL:

=> explain select count(1),tenant_id from employee group by 2;

QUERY PLAN

--------------------------------------------------------------------------------

XN HashAggregate (cost=0.10..0.11 rows=4 width=20)

-> XN RLS SecureScan employee (cost=0.00..0.08 rows=4 width=20)

-> XN Result (cost=0.00..0.04 rows=4 width=20)

One-Time Filter: NULL::boolean

-> XN Seq Scan on employee (cost=0.00..0.04 rows=4 width=20)

(5 rows)

In the second scenario, tenant_id was set by the stored procedure and passed as a value of T0001, therefore returning only corresponding rows as shown in the explain plan output:

Call stored procedure to set the session context variable as ‘T0001’ and then run the select :

=> call app_admin.set_app_context('T0001');

=> select count(1),tenant_id from employee group by 2;
 count |   tenant_id
-------+------------------
     3 | T0001
(1 row)

Explain plan output shows the filter on tenant_id as ‘T0001’

=> explain select count(1),tenant_id from employee group by 2;
                                QUERY PLAN
--------------------------------------------------------------------------
 XN HashAggregate  (cost=0.07..0.07 rows=1 width=20)
   ->  XN RLS SecureScan employee  (cost=0.00..0.06 rows=1 width=20)
         ->  XN Seq Scan on employee  (cost=0.00..0.05 rows=1 width=20)
               Filter: ((tenant_id)::text = 'T0001'::text)
(4 rows)

In the third scenario, a non-existing tenant_id was set by the stored procedure, therefore returning no rows:

=> call app_admin.set_app_context('T9999');

=> select count(1),tenant_id from employee group by 2;
 count | tenant_id
-------+-------------
(0 rows)


=> explain select count(1),tenant_id from employee group by 2;
                                QUERY PLAN
--------------------------------------------------------------------------
 XN HashAggregate  (cost=0.07..0.07 rows=1 width=20)
   ->  XN RLS SecureScan employee  (cost=0.00..0.06 rows=1 width=20)
         ->  XN Seq Scan on employee  (cost=0.00..0.05 rows=1 width=20)
               Filter: ((tenant_id)::text = 'T9999'::text)
(4 rows)

Another key point is that you can apply the same policy to multiple tables as long as they have the column (tenant_id varchar(50)) defined with the same data type, because RLS polices are strongly typed in Amazon Redshift. Similarly, you can combine multiple RLS policies defined using different session context variables or other relevant column predicates and attach them to a single table.

Also, this RLS implementation doesn’t need any changes when a new tenant’s data is added to the table, because it can be queried by simply setting the new tenant’s identifier in the session context variable that is used to define the filter predicate inside the RLS policy. A tenant to its corresponding identifier mapping is typically done during an application’s tenant onboarding process and is generally maintained in a separate metastore, which is also referred to during each tenant’s login to get the tenant’s identifier. With that, thousands of tenants could be provisioned without needing to change any policy in Amazon Redshift. In our testing, we found no performance impact by tenants after RLS was implemented.

Conclusion

In this post, we demonstrated how the ADP DataCloud team implemented row-level security in a multi-tenant environment for thousands of customers using Amazon Redshift RLS and session context variables. For more information about RLS best practices, refer to Amazon Redshift security overview.

Try out RLS for your future Amazon Redshift implementations, and feel free to leave a comment about your use cases and experience.

About the authors

Siva Bangaru is a Database Architect at ADP. He has more than 13 years of experience with technical expertise on design, development, administration, and performance tuning of database solutions for a variety of OLAP and OLTP use cases on multiple database engines like Oracle, Amazon Aurora PostgreSQL, and Amazon Redshift.

Leon Liu is a Chief Architect at ADP. He has over 20 years of experience with enterprise application framework, architecture, data warehouses, and big data real-time processing.

Neha Daudani is a Solutions Architect at AWS. She has 15 years of experience in the data and analytics space. She has enabled clients on various projects on enterprise data warehouses, data governance, data visualization, master data management, data modeling, and data migration for clients to use business intelligence and analytics in business growth and operational efficiency.

Rohit Bansal is an Analytics Specialist Solutions Architect at AWS. He specializes in Amazon Redshift and works with customers to build next-generation analytics solutions using other AWS Analytics services.

Noise