All posts by Anand Sakhare

Automate Amazon QuickSight user and group management using LDAP data for row-level security

Post Syndicated from Anand Sakhare original https://aws.amazon.com/blogs/big-data/automate-amazon-quicksight-user-and-group-management-using-ldap-data-for-row-level-security/

In any business intelligence system, securing and restricting access to the data is important. For example, you might want a particular dashboard to only be viewed by the users with whom the dashboard has been shared, yet customize the data displayed on that dashboard per user by implementing row-level security. With row-level security, you can grant and restrict access to only specific rows, all rows, or no rows of any dataset.

Organizations typically want to display different data to users within different business units. To do this, we need to traverse the organizational hierarchy to find the ultimate business owner for each employee. This post goes into the details of how we can use Lightweight Directory Access Protocol (LDAP) data to find what business unit each employee belongs to, and how we can apply data restriction using row-level security at an Amazon QuickSight group level rather than at an individual user level.

Additionally, this post explains how to automate user and group management within QuickSight to add or remove a QuickSight group or a user to a group.

Architectural overview

The following diagram illustrates the solution architecture.

The solution has two parts:

  • We first move data from an LDAP data store to Amazon Redshift or any other QuickSight supported data source, flatten it and integrate it with existing analytical data, and create a joined result set matching the organizational hierarchy
  • Then we automate the user and group management, which we apply to the permission file in QuickSight to dictate access to a particular dataset

We demonstrate this with a fictitious IT support ticketing system. This system tracks who is creating a particular support ticket and who is working on that ticket. We use QuickSight to visualize trends like number of tickets worked on by a particular assignment group, number of open vs. closed tickets, and so on for a particular business unit. We use row-level security to demonstrate different levels of access and how the same dataset changes for a super user like a president of a company overseeing multiple business units vs. a manager of a particular business or a direct report working on the actual tickets.

Solution overview

We extract and dump the LDAP dataset into Amazon Simple Storage Service (Amazon S3). This dataset can be a direct database dump. The LDAP dataset is usually not flattened and can’t be directly queried to find the reporting hierarchy of a particular employee. To flatten the directory structure, we load the data in an Amazon Relational Database Service (Amazon RDS) instance that supports recursive SQLs. Optionally, we can also flatten the LDAP data using AWS Glue with Apache Spark. The flattened LDAP data is written back into Amazon S3 and then loaded into Amazon Redshift or a QuickSight supported data store. This portion of loading to Amazon S3 and then to Amazon Redshift is optional if the QuickSight data source you’re using supports running recursive SQLs.

The flattened LDAP structure table should now consist of a user column, their manager, and who they report to, up to a vice president in the organization hierarchy. This is then joined with the ticketing system dataset and assignment group or business unit ownership table, which tells which manager or vice president is responsible for a particular business unit. The final joined and aggregated table provides ticketing metadata, ownership, and the business unit hierarchy for each of the tickets. This can be directly loaded into SPICE either as a query or a dataset. You can apply a permission file to this new dataset that dictates which group has access to which datasets.

The second component of the solution goes into the details of how to automate user management, which is done by uploading separate CSV files to Amazon S3 for adding new groups, adding new users, and removing users. When a particular object is uploaded, an event triggers an AWS Lambda function, which makes API calls for the QuickSight client to add or remove the users or add a group.

When you combine these two components, you get an automated way of incorporating your LDAP structure in QuickSight and managing the users in a similar fashion to how you manage users in your corporate active directory.

Prerequisites

To follow along with this post, clone the GitHub repo, deploy the infrastructure, and download the scripts.

Prepare the data

To get started, create a new S3 bucket or use an existing bucket to upload the provided scripts. On the Amazon S3 console, the prefix structure of the bucket should look like the following screenshot.

Use the redshift-cluster.yml AWS CloudFormation template file under the folder infrastructure deploy to deploy the Amazon Redshift cluster and an AWS Identity and Access Management (IAM) role that is attached to Amazon Redshift to access Amazon S3.

The CloudFormation template requires several parameters, including the following:

  • InboundTraffic – We recommend restricting access to IP addresses within your network or to the IP address of the client. Entering 0.0.0.0/0 allows incoming traffic from all IP addresses.
  • S3BucketForRedshiftIAMRole – The name of the S3 bucket where the scripts are uploaded. This is used to create an IAM role that is assumed by the cluster.

Alternatively, you can create the cluster via the Amazon Redshift console and attach the IAM role to access the S3 bucket to the cluster.

To perform the next steps, you can use the Amazon Redshift query editor. The scripts that need to be run are under the scripts folder. In the provided queries, replace bucket-name with the bucket where the scripts and data are uploaded, replace the iam_role ARN (arn:aws:iam::111122223333:role/quicksight-row-level-demo-redshift-role) with your actual account number, and replace the Region us-east-1 with the Region where the S3 bucket is created. When the queries are complete, several tables are now in the cluster (see the following screenshot).

Ticketing data

For this post, we assume that the IT support ticketing data is already loaded into Amazon Redshift. The important columns of interest in this dataset are ticket_no, which can be a primary key for this table, and assigned_user_email_id, which we use as a foreign key to join with the flattened LDAP dataset. We refer to this table as ticket_data. You can query this table in your Amazon Redshift cluster, as shown in the following screenshot.

In this post, we take the organizational structure where different business units are managed by different vice presidents.

We want the ability to categorize our ticketing data based on user assignment into business units. Our ticket data has assigned users for each ticket. We use assigned_user_email_id to identify owners for each of these tickets. Based on the LDAP dataset, we identify the vice president in the hierarchy for each assigned user. We also have an assignment group and assignment_group_manager_email mapping in a separate table, which gives us relationships between business units (or assignment group) and the respective vice presidents.

LDAP data

Our primary key for this data in this example dataset is user_email_id, and reports_to is the foreign key that refers to user_email_id in the same dataset. The following screenshot shows the employees table in an RDS instance. This table can also exist in any database that supports recursive queries.

We’re interested in taking a particular user, finding their manager, and moving up the hierarchy until we find the vice president of a particular business unit. For example, in the preceding LDAP table, Todd V reports to Jimmy P, Jimmy P reports to Steve W, and Steve W reports to President Thomas J. This hierarchy is difficult to query in the current table structure because Amazon Redshift doesn’t support recursive queries. As an alternative, we decided to load the data in an RDS instance that supports recursive queries first and then load it into Amazon Redshift.

After we load the data into the RDS instance, we can use a SQL query like the following to find the hierarchy order we discussed:

SELECT USER_EMAIL_ID, NAME, DESIGNATION, REPORTS_TO, 
SYS_CONNECT_BY_PATH(USER_EMAIL_ID, '/') "Path", 
SUBSTR(SYS_CONNECT_BY_PATH(USER_EMAIL_ID, '/'), 
Instr(SYS_CONNECT_BY_PATH(USER_EMAIL_ID, '/'), '/', 2)-2) vp_roll_up
FROM employees 
WHERE designation != 'Vice President'
START WITH designation = 'Vice President' 
CONNECT BY PRIOR USER_EMAIL_ID = REPORTS_TO
ORDER BY REPORTS_TO;

The following screenshot shows the flattened sample LDAP dataset for the preceding example. Let’s call this LDAP_flattened_data. You can query this table in your Amazon Redshift cluster.

Note that for this post, the dataset is already flattened.

Assignment groups and manager mapping

The next step is to identify the business unit or assignment group each vice president belongs to.

For simplicity, we assume the following mapping between assignment groups (business units) and their respective vice presidents (or assignment group manager). This can be simply stored in a table. Let’s call the table assignment_group_manager_mapping. After we join LDAP_flattened_data with the assignment_group_manager_mapping table, we can identify which assignment group each user belongs to. Carrying forward the previous example, Todd V rolls up to vice president Steve W and therefore belongs to the IT Repairs assignment group. You can query this table in your Amazon Redshift cluster.

Now that we have the three base tables ready, all we need to do is join them to form a flattened table that is suitable for visualizations in QuickSight. We store this data into SPICE, which is an in-memory optimized calculation engine designed specifically to support low-latency, ad hoc data visualization. You can create this table by performing a join between ticket_data and LDAP_flattened_data on assigned_user_email_id and USER_EMAIL_ID, which gives us VP_ROLL_UP, which we can then use to join with the assignment_group_manager_mapping table on assignment_group_manager_email. The final dataset created from the sample data is shown in the following table.

For this post, we use the following sample SQL statement, but you can achieve this in multiple ways.

SELECT * 
FROM ticket_data td, LDAP_flattened_data ld, assignment_group_manager_mapping ag
WHERE td.assigned_user_email_id = ld.USER_EMAIL_ID and
ld.VP_ROLL_UP = ag.assignment_group_manager_email;

You can directly load this table and query into SPICE from Amazon S3, Amazon Athena, Amazon Redshift, or any other supported data sources. For more information, see Working with Datasets.

Add row-level security

To add row-level security, we follow the same steps as in the documentation Using Row-Level Security (RLS) to Restrict Access to a Dataset. We create a dataset in QuickSight by querying the table in Amazon Redshift. Alternatively, you can upload the permissions file from the GitHub repository. You can query this table in your Amazon Redshift cluster with select * from row_level_security. The following screenshot shows our results.

With the preceding examples, the users that are added to a particular assignment group get the same permissions as the group, and the users that belong to the superuser group have elevated access. In QuickSight, we perform the following steps.

  1. Create a new dataset with the row_level_security table.

Both datasets are visible in QuickSight.

  1. In the details for the ticketing-data-final dataset, and choose Row-level security.
  2. Select the row_level_security dataset that we created.
  3. Select Apply dataset and confirm.

When the row-level security has been successfully applied, the lock icon is visible next to the ticketing dataset.

Automate user management

To tie it all together, we automate user and group management. When a particular user is added to a group, they get access as defined by the preceding permissions file. For this setup, we use the following Amazon S3 prefix structure, essentially a separate path for adding groups, adding users, and removing users.

We upload CSV files under each of these Amazon S3 paths.

To add or remove users from a group, refer to the user.csv file under the quicksight-user-group-management folder and copy it to the add-user-to-group folder.

To add groups, refer to the groups.csv file under the quicksight-user-group-management folder and copy it to the add-group folder.

When a file is added in any of these S3 folder paths, a Lambda function is triggered, which makes a Boto3 call to add groups or add or remove users. You can use the quicksight-row-level-lambda.yml CloudFormation template under the Infrastructure deploy folder of the GitHub repo to create the Lambda function and execution role.

Now we create an Amazon S3 event trigger for this bucket to trigger the Lambda function. On the Properties tab of the bucket, choose Events.

Add a notification and provide the ARN of the Lambda function that you created.

Visualize the data

To demonstrate the solution, we show how Steve W. can view all the tickets under IT Repairs, and Todd V can view only the tickets assigned to him.

We create a sample analysis using the dataset created earlier. Then we can publish it as a dashboard and share it with the target user group.

For this post, we set up the following visualizations:

  • Tickets assigned this month – The count of the tickets opened in the current month
    • Visual type – KPI
    • Value – ticket_no
    • Filters – created_date should be current month
  • Ticket Details – The details of the tickets, such as status and assigned owner
    • Visual type – Table
    • Groupby columns – ticket_no, created_by, problem_issue, assigned_user
    • Filters – created_date should be current month
  • Tickets by Status – The status of all the tickets by assignment group
    • Visual type – Pie chart
    • Group/Color – Status
    • Value – ticket_no
    • Filters – created_date should be current month
  • Tickets Assignment by Groups – The status of all the tickets by assignment group
    • Visual type – Bar chart
    • X-axis – assignment_group
    • Y-axis – count of ticket_no
    • Filters – created_date should be current month
  • Tickets Resolved by day – The number of tickets closed each day
    • Visual type – Line chart
    • X-axis – resolved_date
    • Y-axis – count of ticket_no
    • Filters – created_date should be current month and status is closed

When user Todd V (the QuickSight user name is the same as the users email ID) logs in, he sees a dashboard like the following screenshot.

When Steve W logs in, his dashboard shows more information.

When a superuser logs in, they see the following dashboard.

Conclusion

We demonstrated one of the many ways we can use LDAP data for organizational hierarchy-based visualizations in QuickSight. In this post, we talked about how we can find organizational ownership for ticket data. You can further generalize this solution to fit any data within an organization that needs business unit-based grouping. Another use case for this can be for visualizing security threats or sales data across different business units.

This post covers only one level of organizational hierarchy, but in many enterprises, the organizational structure can be much more complicated. You can use a similar approach to deal with these nested organizational hierarchies, where we can report on different levels of business units. If you are interested in implementing row-level security using organizational LDAP hierarchical structure refer to Implement row-level security using a complete LDAP hierarchical organization structure in Amazon QuickSight.


About the Author

Anand Sakhare is a Big Data Architect with AWS. He helps customers build big data, analytics, and machine learning capabilities using a mix of technologies. He is passionate about innovation and solving complex problems.

 

 

 

Rohan Jamadagni is a Sr. Data Architect, working with AWS for the past 5 years. He works closely with customers to implement data and analytics solutions on AWS. He enjoys understanding the meaning behind data and helping customers visualize their data to provide meaningful insights.

 

 

 

Umair Nawaz is a DevOps Engineer at Amazon Web Services in New York City. He works on building secure architectures and advises enterprises on agile software delivery. He is motivated to solve problems strategically by utilizing modern technologies.

Implement row-level security using a complete LDAP hierarchical organization structure in Amazon QuickSight

Post Syndicated from Anand Sakhare original https://aws.amazon.com/blogs/big-data/implement-row-level-security-using-a-complete-ldap-hierarchical-organization-structure-in-amazon-quicksight/

In a world where data security is a crucial concern, it’s very important to secure data even within an organization. Amazon QuickSight provides a sophisticated way of implementing data security by applying row-level security so you can restrict data access for visualizations.

An entire organization may need access to the same dashboard, but may also want to restrict access to the data within the dashboard per the organization’s hierarchical structure. For instance, vice presidents need visibility into all data within their organization, team managers need to see the data related to all their direct reports, and an individual contributor just needs to see the their own data. Creating and maintaining these data security rules can be laborious if managed manually.

In this post, we go into the details of how to extract the organizational hierarchical structure from Lightweight Directory Access Protocol (LDAP) data, flatten it, and create a row-level security permissions file to mimic the same level of hierarchical access controls to a QuickSight dataset. We show this with mock datasets of the LDAP data and ticketing data, and use that data to implement user-level access on QuickSight visualizations and datasets.

Overview of solution

In this post, we demonstrate processing LDAP data and implementing row-level security to control user-level access according to organizational hierarchies. We demonstrate with a sample dataset how to dynamically change the data behind visualizations. We also talk about automatically creating the permissions file required for implementing security on QuickSight visualizations using the LDAP data. Additionally, we create a sample dashboard and apply the generated permissions file to manage data access for the users.

LDAP hierarchical data (employee dataset)

We use sample LDAP data, which is a mock organizational hierarchical structure. Sample code to generate the hierarchical data is available on the GitHub repo.

The basic idea is we have a hierarchical dataset where an employee-manager or employee-supervisor relationship exists. In our sample dataset, we have employee_id and manager_id columns, which represent the employee-manager relationship. The following screenshot is the hierarchical representation of the first few employees in the dataset and the first few rows in the table. The data shows “Mies, Crin” (employee_id 0) is the root user. In our mock data, employee IDs range from 0–249, with 10 levels of hierarchies.

The following screenshot shows the hierarchical structure of the sample data.

Ticketing dataset

Our randomly generated sample ticketing dataset has about 250 tickets (see the following screenshot). Each of these tickets is assigned to an employee. The column assigned_to_emp_id represents the employee that the ticket is assigned to. The code to generate a random sample of the data is available on the GitHub repo.

We can replace emp_id with any unique identifier such as UID, RACF ID, email ID, and so on.

Sample organizational structure

The row-level security that we talk about in the next sections ties back to these two datasets. For example, tickets assigned to any employee ID present in the column assigned_to_emp_id should only be visible to the employees that are higher in that employee’s hierarchy. Tickets assigned to employee ID 18 can only be viewed by employee IDs 18, 7, 5, 1, and 0, because employee ID 18 directly or indirectly reports to them. The following screenshot shows an example of the hierarchy.

Preprocess ticketing data by flattening hierarchical relationships

We first need to flatten the LDAP data to get the employee IDs in the hierarchy in a row. This flattened data needs to be refreshed to account for any organizational changes, such as new employees onboarding. In this example, we use the SYS_CONNECT_BY_PATH function on an Amazon Relational Database Service (Amazon RDS) database to achieve that. We can achieve the same result programmatically or by using common table expressions (CTEs) in Amazon Redshift. The goal is to create 10 columns containing the complete path from that employee to the highest-level manager. Not every employee has a value in every column, assuming the employee isn’t at the lowest level of the hierarchy. A given employee ID appears on their own row, as well as rows for employees they’re a direct or indirect manager of. We query the data with the following code:

SELECT EMPLOYEE_ID, MANAGER_ID, "name", DOB, DEPT, SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/') "Path", 
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 1) "CEO",
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 2) "CEO_1",
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 3) "CEO_2",
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 4) "CEO_3",
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 5) "CEO_4",
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 6) "CEO_5",
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 7) "CEO_6",
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 8) "CEO_7",
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 9) "CEO_8",
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 10) "CEO_9",
regexp_substr(SYS_CONNECT_BY_PATH(EMPLOYEE_ID,'/'), '[^\/]+', 1, 11) "CEO_10"
FROM employees
START WITH MANAGER_ID is NULL
CONNECT BY PRIOR EMPLOYEE_ID = MANAGER_ID
ORDER BY EMPLOYEE_ID;

The following screenshot shows the query output.

The following screenshot shows an example of the hierarchy for employee ID 147.

For employee ID 147, we can see the employees in the hierarchy are organized in columns with their levels. Employee ID 147 reports to 142, 142 reports to 128, and so on. Similarly, for employee ID 142, we can see that the employees above 142 are present in their respective columns.

Join the ticketing data with the flattened LDAP data

To get to the final dataset that we need for visualizations, we need to join the ticketing data with the LDAP data with flattened hierarchies. For our demo, we created two tables, Tickets and Employees, and copied the data we showed earlier to these tables using an Amazon Redshift copy command from Amazon Simple Storage Service (Amazon S3). The following is a sample output of the join query between these tables. This dataset is what we import into SPICE in QuickSight. SPICE is QuickSight’s Super-fast, Parallel, In-memory Calculation Engine, and it’s engineered to rapidly perform advanced calculations and serve data.

select ticket_num,assigned_to_emp_id,name,category,manager_id,ceo,ceo_1,ceo_2,ceo_3,ceo_4,ceo_5,ceo_6,ceo_7,ceo_8,ceo_9,ceo_10  
from blogpostdb.Tickets a JOIN blogpostdb.Employees b
ON a.assigned_to_emp_id = b.EMPLOYEE_ID;

The following screenshot shows our flattened data.

Create the permissions file

You can use the following code snippet to create the permissions file needed to apply row-level security on your dataset in QuickSight:

import csv
def create_permissions_file(list_of_emp_ids, number_of_levels):
    output_header=["ceo_" + str(i) if i!=0 else 'ceo' for i in range(number_of_levels)]
    output_header.insert(0,'UserName')
    f=open("./sample_permissions_file.csv", 'w')
    writer = csv.writer(f)
    writer.writerow(output_header)
    for i in list_of_emp_ids:
        for j in range(1,len(output_header)):
            l = [None] * (len(output_header))
            l[j]=i
            l[0]=i
            print(l)
            writer.writerow(l)
    f.close()

The input to this function is a list of your employee IDs. These employee IDs appear as owners in the ticketing data as well. There are multiple ways to get the data. If the ticketing data ownership is related to any other user-specific information such as email ID or any unique identifier, then a list of that information is the input to this function. The second input is the number of levels your organization has (integer value). The goal is to create a CSV file to use as a permissions file for your dataset in QuickSight.

Assume there are 10 hierarchical levels in your organization. The output permissions file looks something like the following screenshot.

Create QuickSight analyses

We now apply the permission file to the QuickSight dataset. For instructions, see Using Row-Level Security (RLS) to Restrict Access to a Dataset.

Now we create a sample visualization to show the specific tickets owned by an employee or their reportees.

After we import the permissions file and apply it to the final dataset (created by joining the ticketing data with the flattened LDAP data) in SPICE, we’ve created the following visualization. The goal is to verify that when different users log in and see the visualization, they see the same visualization with different data, in this case the only tickets that concern them.

The following screenshot shows the visualization without any row-level security.

For our hierarchy, we’ve created QuickSight users with usernames that are the same as their employee IDs (the employee with ID 142 has the QuickSight username 142; this can easily be replaced by any unique identifiers your organization uses). We log in with employee IDs 232, 147, 61, 84, and 28, and verify that they only see the tickets that concern them. In the visualization “You are viewing tickets concerning these employees,” we can see whose tickets the logged-in user is authorized to see. Because the mocked data only consists of around 250 tickets randomly assigned to 250 employees, some visualizations may show no data.

The following screenshot shows the example hierarchy. Employee ID 232 is a leaf node (nobody reports to them).

Employee 232 is only authorized to view their own tickets, as shown in the following visualization.

Similarly, because employee ID 147 is also a leaf node, they can only view their assigned tickets.

In our example hierarchy, employee IDs 72, 75, 174, 229, and 134 report to employee 61. In our dataset, only four tickets are assigned to those employees. The following screenshot shows the tickets of concern to employee ID 61.

The following screenshot shows the visualizations visible to employee ID 61.

Similarly, when we log in with employee IDs 84 and 28, we can verify that they only see the tickets concerning them.

Publish the dashboard

You can use the share function to publish the analysis to a dashboard and share the data with stakeholders.

Clean up

To avoid incurring future charges, make sure to remove resources you created when you’re done using them.

Conclusion

Data security is an important concern for many organizations. This solution is an easy way to use organizational LDAP data to implement data security with row-level security in QuickSight. With organizational restructuring, hierarchies are bound to change with time. Therefore, the LDAP data can be dumped on a periodic basis and be updated in the respective Amazon Redshift table. This enables users to have better visibility in the data within their organizational hierarchy.


About the Author

Anand Sakhare is a Big Data Architect with AWS. He helps customers build big data, analytics, and machine learning capabilities using a mix of technologies. He is passionate about innovation and solving complex problems.

 

 

 

Rohan Jamadagni is a Sr. Data Architect, working with AWS for the past 5 years. He works closely with customers to implement data and analytics solutions on AWS. He enjoys understanding the meaning behind data and helping customers visualize their data to provide meaningful insights.

 

 

Umair Nawaz is a DevOps Engineer at Amazon Web Services in New York City. He works on building secure architectures and advises enterprises on agile software delivery. He is motivated to solve problems strategically by utilizing modern technologies.