Centrally tracking dashboard lineage, permissions, and more with Amazon QuickSight administrative dashboards

Post Syndicated from Jesse Gebhardt original https://aws.amazon.com/blogs/big-data/centrally-tracking-dashboard-lineage-permissions-and-more-with-amazon-quicksight-administrative-dashboards/

This post is co-written with Shawn Koupal, an Enterprise Analytics IT Architect at Best Western International, Inc.

A common ask from Amazon QuickSight administrators is to understand the lineage of a given dashboard (what analysis is it built from, what datasets are used in the analysis, and what data sources do those datasets use). QuickSight APIs allow us to capture the metadata from each object and build a complete picture of the linkages between each object. As a QuickSight administrator, you can build a dashboard that displays the lineage from dashboard to data source, along with the permissions for each asset type. It can be helpful to see all permissions assigned to each of your assets as well as the relationships between them, all in one place.

Solution overview

In this solution, you build an end-to-end data pipeline using QuickSight to ingest data from an AWS Glue table.

The following diagram illustrates the architecture of the solution.

You can invoke the QuickSight APIs via the AWS Software Development Kit (AWS SDK) or the AWS Command Line Interface (AWS CLI). For this post, we use the AWS SDK.

The solution starts with an AWS Lambda function that calls the QuickSight list APIs (list_data_sources, list_data_sets, list_analyses, list_templates, and list_dashboards) depending on the event message to build lists of assets in chunks of 100, which are iterated through by a second Lambda function. The reason for splitting the work into two functions is to work around the 15-minute time limit in Lambda. You can schedule the Lambda function to run on each asset type based on an event rule trigger. See the following code:

import boto3
import os
import time
import datetime
import json
​
AWS_ACCOUNT_ID=os.environ['AWS_ACCOUNT_ID']
AWS_REGION=os.environ['AWS_REGION']
QS_S3_BUCKET=os.environ['QS_S3_BUCKET']
DownloaderFunctionName=os.environ['DownloaderFunctionName']
​
client = boto3.client('quicksight', region_name=AWS_REGION)
lambda_client = boto3.client('lambda')
​
def invoke_downloader(iteration, apicall, list_results):
​
    apicall=apicall.replace("list_data_sources","datasource").replace("list_data_sets","dataset").replace("list_analyses","analysis").replace("list_dashboards","dashboard").replace("list_templates","template")
    msg = {"Iteration": iteration, "api": apicall, "Objects":  list_results }
    invoke_response = lambda_client.invoke(FunctionName=DownloaderFunctionName,
                                           InvocationType='Event',
                                           Payload=json.dumps(msg, default=datetime_handler))
​
​
def datetime_handler(x):
    if isinstance(x, datetime.datetime):
        return x.isoformat()
    raise TypeError("Unknown type")
​
def file_cleanup(apicall):
    #Replace the apicall with the S3 folder name
    object_type=apicall.replace("list_data_sources","datasource").replace("list_data_sets","dataset").replace("list_analyses","analysis").replace("list_dashboards","dashboard").replace("list_templates","template")
    
    s3_path='quicksight_lineage/'+object_type+'/'
    s3_path2='quicksight_lineage/'+object_type+'_permissions/'
    fileprefix="QuickSight_"+object_type
    botoSession = boto3.Session (region_name = 'us-west-2')
    s3_session = botoSession.resource('s3')
    bucket = s3_session.Bucket(QS_S3_BUCKET)
    #Delete Any files with prefix in s3_path and s3_path2
    bucket.objects.filter(Prefix=s3_path+fileprefix).delete()
    bucket.objects.filter(Prefix=s3_path2+fileprefix).delete()
​
def lambda_handler(event, context):
​
​
    if event == {}:
        #Call All APIs assests 
        apicall_list=['list_data_sources','list_data_sets','list_analyses','list_dashboards','list_templates']
    elif  event["api"] == 'datasource':
        apicall_list=['list_data_sources']
    elif event["api"] == 'dataset':
        apicall_list=['list_data_sets']
    elif event["api"] == 'analysis':
        apicall_list=['list_analyses']
    elif event["api"] == 'dashboard':
        apicall_list=['list_dashboards']
    elif event["api"] == 'template':
        apicall_list=['list_templates']
    else:
        print("[WARN] Exception: Invalid Event Type.")
        return
    for apicall in apicall_list: 
        try:
            #Clean up files from previous run
            file_cleanup(apicall)
            #Reset variables for each apicall
            iteration=0
            user_token = None
            list_results={}
​
            while True:
                iteration+=1
                print("Calling ",apicall, iteration)
                
                if user_token is None:
                    exec("""results=client."""+apicall+"""(AwsAccountId='"""+AWS_ACCOUNT_ID+"""', MaxResults=100);""",globals(), list_results)
                else:
                    exec("""results=client."""+apicall+"""(AwsAccountId='"""+AWS_ACCOUNT_ID+"""', MaxResults=100,NextToken='"""+user_token+"""');""",globals(), list_results)
​
                invoke_downloader(iteration, apicall, list_results["results"])
                user_token=list_results["results"]["NextToken"]
                print(user_token)
        except KeyError:
            print("NextToken not found.")

The second Lambda function consumes the list of assets from the event parameter from the first function and uses the QuickSight describe APIs (describe_datasource, describe_dataset, describe_analysis, describe_template, and describe_dashboard). The details of each QuickSight asset are written to CSV files in an Amazon Simple Storage Service (Amazon S3) bucket in groups of 100. Because the first function calls the second function in parallel, it’s recommended to set the reserved concurrency to 2 in the second Lambda function to avoid throttling errors (if you use the AWS CloudFormation template provided later in this post, this is automatically configured for you). See the following code:

import boto3
import os
import time
import datetime
import json

AWS_ACCOUNT_ID=os.environ['AWS_ACCOUNT_ID']
AWS_REGION=os.environ['AWS_REGION']
QS_S3_BUCKET=os.environ['QS_S3_BUCKET']

client = boto3.client('quicksight', region_name=AWS_REGION)
lambda_client = boto3.client('lambda')
s3 = boto3.client('s3')

def process_dashboards(list_dashboard,iteration,object_type):
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("DashboardId,Name,SourceEntityArn,VersionCreatedTime,VersionNumber,CreatedTime,DataSetArns,LastPublishedTime,LastUpdatedTime" + '\n')

    for dashboard in list_dashboard["DashboardSummaryList"]:
        dashboard_desc= client.describe_dashboard(AwsAccountId=AWS_ACCOUNT_ID,DashboardId=dashboard["DashboardId"])
        
        source_entity_arn = dashboard_desc["Dashboard"]["Version"]["SourceEntityArn"]
        version_created_time = dashboard_desc["Dashboard"]["Version"]["CreatedTime"].isoformat()
        version_number = str(dashboard_desc["Dashboard"]["Version"]["VersionNumber"])
        created_time = dashboard_desc["Dashboard"]["CreatedTime"].isoformat()

        last_published_time = dashboard_desc["Dashboard"]["LastPublishedTime"].isoformat()
        last_updated_time = dashboard_desc["Dashboard"]["LastUpdatedTime"].isoformat()
        try:
            for arn in dashboard_desc["Dashboard"]["Version"]["DataSetArns"]:
                f.write(dashboard["DashboardId"]+',"'+ dashboard["Name"] + '",' + source_entity_arn + ',' + version_created_time + ',' + version_number + ','   + created_time + ','+ arn + ',' + last_published_time + ',' + last_updated_time +'\n')
        except Exception as e:
            print(e)
            dataset_arn=''
            f.write(dashboard["DashboardId"]+',"'+ dashboard["Name"] + '",' + source_entity_arn + ',' + version_created_time + ',' + version_number + ','   + created_time + ','+ dataset_arn + ',' + last_published_time + ',' + last_updated_time +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            
        

def process_dashboards_permissions(list_dashboard,iteration,object_type):

    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("DashboardId,Name,Principal,Permission,Iteration" + '\n')
	
    for dashboard in list_dashboard["DashboardSummaryList"]:

        try:
            list_permissions = client.describe_dashboard_permissions(AwsAccountId=AWS_ACCOUNT_ID,DashboardId=dashboard["DashboardId"])
        except:
            print("Error Listing Permissions for:"+dashboard["DashboardId"])
            continue

        for permission in list_permissions["Permissions"]:
            #If Action includes delete operation then principal has co-owner permissions
            if "quicksight:DeleteDashboard" in permission["Actions"]:
                action = "Co-Owner"
            else:
                action = "View"

            f.write(dashboard["DashboardId"]+',"'+ dashboard["Name"] + '",' + permission["Principal"] +  ',' + action +  ',' + iteration +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)      


def process_analysis(list_analyses,iteration,object_type):
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("AnalysisId,Name,AnalysisArn,CreatedTime,LastUpdatedTime,DataSetArn,Iteration" + '\n')

    for analysis in list_analyses["AnalysisSummaryList"]:
        #Call describe_analysis
        analysis_desc= client.describe_analysis(AwsAccountId=AWS_ACCOUNT_ID,AnalysisId=analysis["AnalysisId"])

        analysis_arn = analysis_desc["Analysis"]["Arn"]
        created_time = analysis_desc["Analysis"]["CreatedTime"].isoformat()
        last_updated_time = analysis_desc["Analysis"]["LastUpdatedTime"].isoformat()

        try:
            for arn in analysis_desc["Analysis"]["DataSetArns"]:
                f.write(analysis["AnalysisId"]+',"'+ analysis["Name"] + '",' + analysis_arn + ',' + created_time + ','  + last_updated_time + ',' + arn + ',' + iteration  +'\n')
        except Exception as e:
            print(e)
            dataset_arn=''
            f.write(analysis["AnalysisId"]+',"'+ analysis["Name"] + '",' + analysis_arn + ',' + created_time + ','  + last_updated_time + ',' + dataset_arn  + ',' + iteration +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            

        
def process_analysis_permissions(list_analyses,iteration,object_type):

    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("AnalysisId,Name,Principal,Permission,Iteration" + '\n')
	
    for analysis in list_analyses["AnalysisSummaryList"]:

        try:
            list_permissions = client.describe_analysis_permissions(AwsAccountId=AWS_ACCOUNT_ID,AnalysisId=analysis["AnalysisId"])
        except:
            print("Error Listing Permissions for:"+analysis["AnalysisId"])
            continue
        for permission in list_permissions["Permissions"]:
            #If Action includes delete operation then principal has co-owner permissions
            if "quicksight:DeleteAnalysis" in permission["Actions"]:
                action = "Co-Owner"
            else:
                action = "View"

            f.write(analysis["AnalysisId"]+',"'+ analysis["Name"] + '",' + permission["Principal"] +  ',' + action +  ',' + iteration +'\n')
    
    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)      


def process_templates(list_templates,iteration,object_type):
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("TemplateId,Name,TemplateArn,CreatedTime,LastUpdatedTime,SourceEntityArn,VersionNumber,Iteration" + '\n')

    for template in list_templates["TemplateSummaryList"]:
        #Call describe_template
        template_desc= client.describe_template(AwsAccountId=AWS_ACCOUNT_ID,TemplateId=template["TemplateId"])

        template_arn = template_desc["Template"]["Arn"]
        created_time = template_desc["Template"]["CreatedTime"].isoformat()
        last_updated_time = template_desc["Template"]["LastUpdatedTime"].isoformat()
        source_entity_arn = template_desc["Template"]["Version"]["SourceEntityArn"]
        version_number = str(template_desc["Template"]["Version"]["VersionNumber"])
        f.write(template["TemplateId"]+',"'+ template["Name"] + '",' + template_arn + ',' + created_time + ','  + last_updated_time + ',' + source_entity_arn + ',' + version_number +  ',' + iteration  +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            

        
def process_templates_permissions(list_templates,iteration,object_type):

    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("TemplateId,Name,Principal,Permission,Iteration" + '\n')
	
    for template in list_templates["TemplateSummaryList"]:

        try:
            list_permissions = client.describe_template_permissions(AwsAccountId=AWS_ACCOUNT_ID,TemplateId=template["TemplateId"])
        except:
            print("Error Listing Permissions for:"+template["TemplateId"])
            continue
        for permission in list_permissions["Permissions"]:
            #If Action includes delete operation then principal has co-owner permissions
            if "quicksight:DeleteTemplate" in permission["Actions"]:
                action = "Co-Owner"
            else:
                action = "View"

            f.write(template["TemplateId"]+',"'+ template["Name"] + '",' + permission["Principal"] +  ',' + action +  ',' + iteration +'\n')
    
    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)      


def process_datasources(list_data_sources,iteration,object_type):
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("DataSourceId,DataSourceArn,Name,Type,LastUpdatedTime,CreatedTime,Status,ErrorInfo,Iteration" + '\n')

    global datasource_list
    datasource_list=[]
    for datasource in list_data_sources["DataSources"]:
        datasource_id=datasource["DataSourceId"]
        name=datasource["Name"]
        datasource_type=datasource["Type"]
        try:
            status=datasource["Status"]
        except:
            status=''
        CreatedTime=str(datasource["CreatedTime"])
        LastUpdatedTime=str(datasource["LastUpdatedTime"])
        try:
            ErrorInfo="Type: "+datasource["ErrorInfo"]["Type"]+" Message: "+datasource["ErrorInfo"]["Message"]
        except:
            ErrorInfo="Null"

        f.write( datasource_id + ',' + datasource["Arn"] + ',"' + name + '",'  + datasource_type + ',' + LastUpdatedTime+ ',' + CreatedTime + ',' + status + ',' + ErrorInfo+ ',' + iteration +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            


def process_datasources_permissions(list_data_sources,iteration,object_type):
    
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("DataSourceID,Name,Principal,Permission,Iteration" + '\n')

    for datasource in list_data_sources["DataSources"]:
        try:
            list_permissions = client.describe_data_source_permissions(AwsAccountId=AWS_ACCOUNT_ID,DataSourceId=datasource["DataSourceId"])
        except:
            print("Error Listing Permissions for:"+datasource["DataSourceId"])
            continue
        for permission in list_permissions["Permissions"]:
            #If Action includes delete operation then principal has co-owner permissions
            if "quicksight:DeleteDataSource" in permission["Actions"]:
                action = "Co-Owner"
            else:
                action = "View"
                
            f.write(datasource["DataSourceId"]+',"'+ datasource["Name"] + '",' + permission["Principal"] +  ',' + action +  ',' + iteration +'\n')
    
    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            
    

def process_datasets(list_datasets,iteration,object_type):

    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write('DatasetId,DataSetArn,Name,SpiceSize,ImportMode,LastUpdatedTime,CreatedTime,DataSourceArn,DataSourceName,DataSourceType,Source,Columns,Iteration' + '\n')
    
    for dataset in list_datasets["DataSetSummaries"]:
        
        try:
            response= client.describe_data_set(AwsAccountId=AWS_ACCOUNT_ID,DataSetId=dataset["DataSetId"])
        except Exception as e:
            print("Dataset ID: ", dataset["DataSetId"], e)
            f.write( dataset["DataSetId"] + ',' + dataset["Arn"] + ',"' + dataset["Name"] + '",' + '0' + ',' + dataset["ImportMode"] + ',' + str(dataset["LastUpdatedTime"])+ ','+ str(dataset["CreatedTime"])+ ',' + 'n/a' + ',"' + 'n/a' + '",' +  'n/a'  + ',' +  'n/a' + ',"'  + 'n/a'+ '",' + iteration +'\n')
            continue

        dataset_id=response["DataSet"]["DataSetId"]
        dataset_name=response["DataSet"]["Name"]
        dataset_size=response["DataSet"]["ConsumedSpiceCapacityInBytes"]
        ImportMode=response["DataSet"]["ImportMode"]
        LastUpdatedTime=response["DataSet"]["LastUpdatedTime"].isoformat()
        CreatedTime=response["DataSet"]["CreatedTime"].isoformat()

        try:
            for key in response["DataSet"]["PhysicalTableMap"].keys():
                
                if key == 's3PhysicalTable':
                    
                    source='S3Source'
                    DataSourceArn=response["DataSet"]["PhysicalTableMap"]["s3PhysicalTable"]["S3Source"]["DataSourceArn"]
                    Columns=response["DataSet"]["PhysicalTableMap"]["s3PhysicalTable"]["S3Source"]["InputColumns"]
                    #SqlQuery="Null"

                else:

                    try:
                        DataSourceArn=response["DataSet"]["PhysicalTableMap"][key]["RelationalTable"]["DataSourceArn"]
                        Columns=""
                        source="VisualEditor"
                    except:
                        DataSourceArn=response["DataSet"]["PhysicalTableMap"][key]["CustomSql"]["DataSourceArn"]
                        Columns=response["DataSet"]["PhysicalTableMap"][key]["CustomSql"]["Columns"]
                        source="CustomSql"

                DataSourceName=""
                DataSourceType=""
                
                f.write( dataset_id + ',' + dataset["Arn"] + ',"' + dataset_name + '",' + str(dataset_size) + ',' + ImportMode + ',' + LastUpdatedTime+ ','+ CreatedTime+ ',' + DataSourceArn + ',"' + DataSourceName + '",' +  DataSourceType  + ',' +  source + ',"'  + str(Columns) + '",' + iteration +'\n')
                
        except:
            print("[DEBUG]: Exception in main write for: " + str(dataset))
            f.write( dataset_id  + ',' + dataset["Arn"] +',"' + dataset_name + '",' + str(dataset_size) + ',' + ImportMode + ',' + LastUpdatedTime+ ',' + CreatedTime + ',,,,Unknown,"'  + str(Columns) + '",' + iteration +'\n')

    f.close()
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)


def process_datasets_permissions(list_datasets,iteration,object_type):
    
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    f.write('DataSetID,Name,Principal,Permission,Iteration'+'\n')

    for dataset in list_datasets["DataSetSummaries"]:
        try:
            list_permissions = client.describe_data_set_permissions(AwsAccountId=AWS_ACCOUNT_ID,DataSetId=dataset["DataSetId"])
        except:
            print("Error Listing Permissions for:"+dataset["DataSetId"])
            continue
        
        for permission in list_permissions["Permissions"]:
            #If Action includes delete operation then principal has co-owner permissions
            if "quicksight:DeleteDataSet" in permission["Actions"]:
                action = "Co-Owner"
            else:
                action = "View"
                
            f.write(dataset["DataSetId"]+',"'+ dataset["Name"] + '",' + permission["Principal"] +  ',' + action+  ',' + iteration +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            


def lambda_handler(event, context):

    list_objects=event["Objects"]
    iteration=str(event["Iteration"])
    
    print("Iteration: ", iteration)
    print("[INFO]Processing QuickSight:", event["api"] )
    
    if  event["api"] == 'datasource':
        process_datasources(list_objects, iteration, event["api"])
        process_datasources_permissions(list_objects, iteration, event["api"]+'_permissions')
    elif event["api"] == 'dataset':
        process_datasets(list_objects, iteration, event["api"])
        process_datasets_permissions(list_objects, iteration, event["api"]+'_permissions')
    elif event["api"] == 'analysis':
        process_analysis(list_objects, iteration, event["api"])
        process_analysis_permissions(list_objects, iteration, event["api"]+'_permissions')
    elif event["api"] == 'dashboard':
        process_dashboards(list_objects, iteration, event["api"])
        process_dashboards_permissions(list_objects, iteration, event["api"]+'_permissions')
    elif event["api"] == 'template':
        process_templates(list_objects, iteration, event["api"])
        process_templates_permissions(list_objects, iteration, event["api"]+'_permissions')
    else:
        print("[WARN] Exception: Invalid Event Type.")
        return

Afterwards, the S3 bucket has the directory structure under the quicksight_lineage folder as shown in the following screenshot.

You then use AWS Glue to store the metadata of each file in an AWS Glue table, which allows you to query the information from QuickSight using an Amazon Athena or Amazon Redshift Spectrum data source (if you run the CloudFormation stack, the tables are set up for you).

The following diagram shows the tables and relationships.

Walkthrough overview

The workflow is comprised of the following high-level steps:

  1. Deploy the CloudFormation template to build the Lambda functions, AWS Identity and Access Management (IAM) roles, S3 bucket, AWS Glue database, and AWS Glue tables.
  2. Run the Python Lambda functions to build CSV files that contain the QuickSight object details.
  3. Visualize the data in QuickSight. To do so, you must create your data source, dataset, and then analysis.

For this post, we use Athena as the query engine. To use Redshift Spectrum, you must modify the provided queries.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account
  • An IAM user with access to AWS resources used in this solution (CloudFormation, IAM, Amazon S3, AWS Glue, Athena, QuickSight)
  • Athena configured with a query result location
  • QuickSight Enterprise Edition

Creating resources

Create your resources by launching the following CloudFormation stack:

During the stack creation process, you must provide an S3 bucket name in the S3BucketName parameter (AWSAccountNumber is appended to the bucket name provided to make it unique).

After the stack creation is successful, you have two Lambda functions, two S3 buckets, an AWS Glue database and tables, and the corresponding IAM roles and policies.

Running the Lambda function

To run your Lambda function, complete the following steps:

  1. On the Lambda console, navigate to the QuickSight-Lineage-Dispatcher function.
  2. From the Select a test event menu, choose Configure test events.

  1. Select Create new test event.

You create one test event for all QuickSight assets.

  1. For Event name, enter all.
  2. Enter an empty JSON object ({}).

  1. Choose Test to run the Lambda function and generate CSV files of the assets.

Alternatively, you can create test events for each QuickSight object (Data Source, DataSet, Analysis, Dashboard, and Template) for larger QuickSight environments:

  • Test event DataSource code:
    {
      "api": "datasource"
    }

  • Test event DataSet code:
    {
      "api": "dataset"
    }

  • Test event Analysis code:
    {
      "api": "analysis"
    }

  • Test event Dashboard code:
    {
      "api": "dashboard"
    }

  • Test event Template code:
    {
      "api": "template"
    }

The following screenshot shows the configuration of a test event for Analysis.

Creating your data source and lineage data set

In this step, you use QuickSight to access the tables in your AWS Glue database.

  1. Log in to QuickSight.
  2. Choose Manage QuickSight.
  3. Choose Security & permissions.
  4. Ensure that access to the S3 bucket (that was created through CloudFormation) is enabled.
  5. Choose New analysis.
  6. Choose New dataset.
  7. For the data source, choose Athena.

  1. For your data source name, enter QuickSight-Lineage.
  2. Choose Create data source.

QuickSight prompts you to select your schema or database.

  1. Choose Use custom SQL.

  1. Update the query name from New custom SQL to QuickSight Lineage.
  2. Enter the following code into the query box:
    select 
       a.analysisid      as analysis_id,
       a.name            as analysis_name,
       a.analysisarn     as analysis_arn,
       date_parse(substr(a.createdtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')     as analysis_createdtime,
       date_parse(substr(a.lastupdatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as analysis_lastupdatedtime,
       a.datasetarn      as analysis_datasetarn,
       r.dashboardid        as dashboard_id,
       r.name               as dashboard_name,
       r.name||' - ID: '||r.dashboardid as dashboard_name_w_id,
       date_parse(substr(r.versioncreatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as dashboard_versioncreatedtime,
       r.versionnumber      as dashboard_versionnumber     ,
       date_parse(substr(r.createdtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')  as dashboard_createdtime,
       date_parse(substr(r.lastpublishedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as dashboard_lastpublishedtime ,
       date_parse(substr(r.lastupdatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as dashboard_lastupdatedtime,
       d.datasetid        as dataset_id,
       d.datasetarn       as dataset_arn,
       d.name             as dataset_name,
       d.spicesize        as dataset_spicesize,
       d.importmode       as dataset_importmode,
       date_parse(substr(d.lastupdatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')  as dataset_lastupdatedtime,
       date_parse(substr(d.createdtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')      as dataset_createdtime,
       d.source           as dataset_source,
       d.columns          as dataset_columns,
       s.datasourceid     as datasource_id,
       s.datasourcearn    as datasource_arn,
       s.name             as datasource_name,
       s.type             as datasource_type,
       date_parse(substr(s.lastupdatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as datasource_lastupdatedtime,
       date_parse(substr(s.createdtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as datasource_createdtime,
       s.status           as datasource_status,
       s.errorinfo        as datasource_errorinfo,
       t.templateid       as template_id,
       t.name             as template_name,
       t.templatearn      as template_arn,
       date_parse(substr(t.createdtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')      as template_createtime,
       date_parse(substr(t.lastupdatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')  as template_lastupdatedtime
    from "quicksight-lineage"."dashboard" r
    left join  "quicksight-lineage"."analysis" a
    on a.analysisarn = r.sourceentityarn and a.datasetarn=r.datasetarns
    left join "quicksight-lineage"."template" t
    on t.templatearn = r.sourceentityarn
    left join  "quicksight-lineage"."dataset" d
    on d.datasetarn = r.datasetarns
    left join  "quicksight-lineage"."datasource" s
    on s.datasourcearn = d.datasourcearn

  1. Choose Confirm query.

  1. Select Import to SPICE for quicker analytics.
  2. Choose Visualize.

In the new analysis, one empty visual is loaded by default.

  1. Change the visual type to pivot table.
  2. Choose (single-click) dashboard_name, analysis_name, template_name, dataset_name, and datasource_name in the Fields list.

You can search for name in field list to make this step easier

  1. Confirm that all fields were also added to the Rows

If you have assets with duplicates names, it can helpful to add the corresponding ID columns to the visual; for example, dashboard_id, analysis_id, template_id, dataset_id, datasource_id.

Visualizing your assets and lineage

You now create five new visuals, one for each asset type (Dashboard, Analysis, Template, Dataset, Data Source), to display the additional columns pulled from the APIs.

  1. From the Add menu, choose Add visual.

  1. For the first new visual, choose the table visual type.
  2. Search for dashboard_ in Field List.
  3. Choose (single-click) all matching columns.

  1. For the second visual, choose the table visual type.
  2. Search for analysis_ in the Field List.
  3. Choose (single-click) all matching columns.
  4. Move the second visual underneath the first visual.
  5. Repeat same steps for template_, dataset_, and datasource_.

Creating your permissions data set

You now create your new data set.

  1. Leave the analysis by choosing the QuickSight logo on the top left.
  2. In the navigation pane, choose Datasets.
  3. Choose New dataset.
  4. Locate and choose the QuickSight-Lineage data source created earlier in the FROM EXISTING DATA SOURCES
  5. In the QuickSight Lineage data source window, choose Create data set.
  6. Choose Use custom SQL.

  1. Update the name from New custom SQL to QuickSight Lineage Permissions.
  2. Enter the following code into the query box:
    select distinct 'datasource' as QuickSightObjectType, sp.datasourceid as "QuickSightID",sp.name,
    split_part(principal,':',6) as principal,permission, r.name||' - ID: '||r.dashboardid as dashboard_name_w_id
     from "quicksight-lineage"."datasource_permissions"  sp
     inner join  "quicksight-lineage"."datasource" s
      on s.datasourceid = sp.datasourceid
     left join  "quicksight-lineage"."dataset" d
       on s.datasourcearn = d.datasourcearn
     left join  "quicksight-lineage"."dashboard" r
       on d.datasetarn = r.datasetarns
    union
    select distinct 'dataset' as QuickSightObjectType, dp.datasetid as "QuickSightID",dp.name,
    split_part(principal,':',6) as principal,permission, r.name||' - ID: '||r.dashboardid as dashboard_name_w_id 
     from "quicksight-lineage"."dataset_permissions" dp
     inner join  "quicksight-lineage"."dataset" d
       on d.datasetid = dp.datasetid
     left join  "quicksight-lineage"."dashboard" r
       on d.datasetarn = r.datasetarns
    union
    select distinct 'analysis' as QuickSightObjectType, ap.analysisid as "QuickSightID",ap.name,
    split_part(principal,':',6) as principal,permission, r.name||' - ID: '||r.dashboardid as dashboard_name_w_id  
     from "quicksight-lineage"."analysis_permissions" ap
      inner join  "quicksight-lineage"."analysis" a
       on a.analysisid = ap.analysisid
      left join  "quicksight-lineage"."dashboard" r
       on a.analysisarn = r.sourceentityarn  
    union
    select distinct 'template' as QuickSightObjectType, tp.templateid as "QuickSightID",tp.name,
    split_part(principal,':',6) as principal,permission, r.name||' - ID: '||r.dashboardid as dashboard_name_w_id  
     from "quicksight-lineage"."template_permissions" tp
      inner join  "quicksight-lineage"."template" t
       on t.templateid = tp.templateid
      left join  "quicksight-lineage"."dashboard" r
       on t.templatearn = r.sourceentityarn     
    union
    select distinct 'dashboard' as QuickSightObjectType, dashboardid as "QuickSightID",name,
    split_part(principal,':',6) as principal,permission, name||' - ID: '||dashboardid as dashboard_name_w_id
     from "quicksight-lineage"."dashboard_permissions"

  1. Choose Edit / Preview data.
  2. Choose Apply.
  3. For Query mode, select SPICE.

  1. Choose Save.
  2. Navigate to the Analyses page and open the analysis created earlier.
  3. Choose the pencil icon to add the new dataset.

  1. Choose Add data set.

  1. Select QuickSight Lineage Permissions.
  2. Choose Select.

  1. Make sure the new QuickSight Lineage Permissions dataset is active in the Data set drop-down menu.

Visualizing your permissions

You now add a new visual to display permissions. 

  1. Choose the table visual type.
  2. Choose (single-click) name, principal, and permission in the Field List.
  3. In the navigation pane, choose Filter.
  4. Choose +.
  5. Choose quicksightobjecttype.

  1. Choose the new filter.
  2. Deselect Select all.
  3. Select dashboard.
  4. Choose Apply.

  1. Choose Close.
  2. Move the new permissions visual so it’s to the right of the dashboard visual.

 

  1. On the new permissions visual, choose the menu options (…).
  2. Choose Duplicate visual.
  3. Repeat this step four times.
  4. Modify the quicksightobjectype filter on each new permission visual so you have one visual for each asset type.
  5. Move the visual to the right of the corresponding asset type visual.

  

Creating parameters for filtering

At this point all the visuals are created; next you need to create a parameter. You can simplify the following steps by using the new simplified filter control creation process. For more information, see Amazon QuickSight adds support for on-sheet filter controls. The following steps still work fine, but to add filter controls to an analysis, you don’t need to create parameters anymore. 

  1. Navigate to the Parameters menu.
  2. Choose Create one
  3. For Name, enter DashboardNameWithID.
  4. Choose Create.

 

  1. Choose Create a new control for a filter or a calculated field.
  2. For Display name, enter Dashboard Name with ID.
  3. For Style, choose Single select drop down.
  4. For Values, select Link to a data set field.
  5. For Select a data set, choose QuickSight Lineage Permissions.
  6. For Select a column, choose dashboard_name_w_id.
  7. Choose Add.

  1. Choose the first visual (Count of Records by Dashboard_name, Template_name, Dataset_name, Datasource_name, and Analysis_name).
  2. Add a filter in the dashboard_name_w_id field.
  3. Choose the newly added filter.
  4. Set the filter scope to All visuals.
  5. For Filter type, choose Custom filter.
  6. Select Use parameters.
  7. From the drop-down menu, choose DashboardNameWithId.
  8. Choose Apply.
  9. Choose Close.

  1. Choose the first permissions visual (Permission, Principal, and Name).
  2. Add a filter in the dashboard_name_w_id field.
  3. Set the filter scope to All visuals.
  4. For Filter type, choose Custom filter.
  5. Select Use parameters.
  6. From the drop-down menu, choose DashboardNameWithID.
  7. Choose Apply.
  8. Choose Close.

The analysis build is complete and can be published as a dashboard.

Creating additional visuals

You can also create additional visuals for different use cases.

Visualizing SPICE usage across all your SPICE datasets

To visualize Spice usage across your SPICE datasets, complete the following steps.

  1. Use the QuickSight Lineage dataset and choose the donut chart visual.
  2. For Group/Color, add dataset_name.
  3. For Value, add dataset_spicesize.
  4. Change the aggregation of dataset_spicesize to Average because a dataset can be listed multiple times in the dataset if it is reused across multiple dashboards.

This visual can be useful to track down what is consuming SPICE storage.

Visualizing SPICE refreshes by hour

To visualize SPICE refreshes by hour, complete the following steps:

  1. Use the QuickSight Lineage dataset to create a vertical stacked bar chart.
  2. For X axis, add dataset_lastupdatetime aggregated by HOUR.
  3. For Value, add dataset_id aggregated by Count district.
  4. For Group/Color, add dataset_name.
  5. Create a filter on dataset_importmode equal to SPICE.

This visual can be useful to see when all the SPICE dataset refreshes last occurred. The source data is a snapshot in time, so you need to update the source data by running the Lambda function on a regular basis.

Cleaning up

To avoid incurring future charges, delete the resources you created in this walkthrough by deleting the CloudFormation stack. Also, be sure to delete the analysis and dataset (to free up SPICE usage).

Conclusion

We also created some visuals to display SPICE usage by data set as well as the last refresh time per data set, allowing you to view the health of your SPICE refreshes and to free up SPICE capacity by cleaning up older data sets.

Give this technique of building administrative dashboards from data collected via the QuickSight APIs a try, and share you feedback and questions in the comments.


About the Authors

Jesse Gebhardt is a senior global business development manager focused on analytics. He has spent over 10 years in the Business Intelligence industry. At AWS, he aids customers around the globe gain insight and value from the data they have stored in their data lakes and data warehouses. Jesse lives in sunny Phoenix, and is an amateur electronic music producer.

 

 

Arun Santhosh is a Specialized World Wide Solution Architect for Amazon QuickSight. Arun started his career at IBM as a developer and progressed on to be an Application Architect. Later, he worked as a Technical Architect at Cognizant. Business Intelligence has been his core focus in these prior roles as well.

 

 

Shawn Koupal is an Enterprise Analytics IT Architect at Best Western International, Inc.