Tag Archives: Amazon QuickSight

Improving Retail Forecast Accuracy with Machine Learning

Post Syndicated from Soonam Jose original https://aws.amazon.com/blogs/architecture/improving-retail-forecast-accuracy-with-machine-learning/

The global retail market continues to grow larger and the influx of consumer data increases daily. The rise in volume, variety, and velocity of data poses challenges with demand forecasting and inventory planning. Outdated systems generate inaccurate demand forecasts. This results in multiple challenges for retailers. They are faced with over-stocking and lost sales, and often have to rely on increased levels of safety stock to avoid losing sales.

A recent McKinsey study indicates that AI-based forecasting improves forecasting accuracy by 10–20 percent. This translates to revenue increases of 2–3 percent. An accurate forecasting system can also help determine ideal inventory levels and better predict the impact of sales promotions. It provides a single view of demand across all channels and a better customer experience overall.

In this blog post, we will show you how to build a reliable retail forecasting system. We will use Amazon Forecast, and an AWS-vetted solution called Improving Forecast Accuracy with Machine Learning. This is an AWS Solutions Implementation that automatically produces forecasts and generates visualization dashboards. This solution can be extended to use cases across a variety of industries.

Improving Forecast Accuracy solution architecture

This post will illustrate a retail environment that has an SAP S/4 HANA system for overall enterprise resource planning (ERP). We will show a forecasting solution based on Amazon Forecast to predict demand across product categories. The environment also has a unified platform for customer experience provided by SAP Customer Activity Repository (CAR). Replenishment processes are driven by SAP Forecasting and Replenishment (F&R), and SAP Fiori apps are used to manage forecasts.

The solution is divided into four parts: Data extraction and preparation, Forecasting and monitoring, Data visualization, and Forecast import and utilization in SAP.

Figure 1. Notional architecture for improving forecasting accuracy solution and SAP integration

Figure 1. Notional architecture for improving forecasting accuracy solution and SAP integration

­­Data extraction and preparation

Historical demand data such as sales, web traffic, inventory numbers, and resource demand are extracted from SAP and uploaded to Amazon Simple Storage Service (S3). There are multiple ways to extract data from an SAP system into AWS. As part of this architecture, we will use operational data provisioning (ODP) extraction. ODP acts as a data source for OData services, enabling REST-based integrations with external applications. The ODP-Based Data Extraction via OData document details this approach. The steps involved are:

  1. Create a data source using transaction RSO2, allow Change Data Capture for specific data to be extracted
  2. Create an OData service using transaction SEGW
  3. Create a Data model for ODP extraction, which refers to the defined data source, then register the service
  4. Initiate the service from SAP gateway client
  5. In the AWS Management Console, create an AWS Lambda function to extract data and upload to S3. Check out the sample extractor code using Python, referenced in the blog Building data lakes with SAP on AWS

Related data that can potentially affect demand levels can be uploaded to Amazon S3. These could include seasonal events, promotions, and item price. Additional item metadata, such as product descriptions, color, brand, size may also be uploaded. Amazon Forecast provides built-in related time series data for holidays and weather. These three components together form the forecast inputs.

Forecasting and monitoring

An S3 event notification will be initiated when new datasets are uploaded to the input bucket. This in turn, starts an AWS Step Functions state machine. The state machine combines a series of AWS Lambda functions that build, train, and deploy machine learning models in Amazon Forecast. All AWS Step Functions logs are sent to Amazon CloudWatch. Administrators will be notified with the results of the AWS Step Functions through Amazon Simple Notification Service (SNS).

An AWS Glue job combines raw forecast input data, metadata, predictor backtest exports, and forecast exports. These all go into an aggregated view of forecasts in an S3 bucket. It is then translated to the format expected by the External Forecast import interface. Amazon Athena can be used to query forecast output in S3 using standard SQL queries.

Data visualization

Amazon QuickSight analyses can be created on a per-forecast basis. This provides users with forecast output visualization across hierarchies and categories of forecasted items. It also displays item-level accuracy metrics. Dashboards can be created from these analyses and shared within the organization. Additionally, data scientists and developers can prepare and process data, and evaluate Forecast outputs using an Amazon SageMaker Notebook Instance.

Forecast import and utilization in SAP

Amazon Forecast outputs located in Amazon S3 will be imported into the Unified Demand Forecast (UDF) module within the SAP Customer Activity Repository (CAR). You can read here about how to import external forecasts. An AWS Lambda function will be initiated when aggregated forecasts are uploaded to the S3 bucket. The Lambda function performs a remote function call (RFC) to the SAP system through the official SAP JCo Library. The SAP RFC credentials and connection information may be stored securely inside AWS Secrets Manager and read on demand to establish connectivity.

Once imported, forecast values from the solution can be retrieved by SAP Forecasting and Replenishment (F&R). They will be consumed as an input to replenishment processes, which consist of requirements calculation and­­­­­ requirement quantity optimization. SAP F&R calculates requirements based on the forecast, the current stock, and the open purchase orders. The requirement quantity then may be improved in accordance with optimization settings defined in SAP F&R.

­­­

Additionally, you have the flexibly to adjust the system forecast as required by the demand situation or analyze forecasts via respective SAP Fiori Apps.

Sample use case: AnyCompany Stores, Inc.

To illustrate how beneficial this solution can be for retail organizations, let’s consider AnyCompany Stores, Inc. This is a hypothetical customer and leader in the retailer industry with 985 stores across the United States. They struggle with issues stemming from their existing forecasting implementation. That implementation only understands certain categories and does not factor in the entire product portfolio. Additionally, it is limited to available demand history and does not consider related information that may affect forecasts. AnyCompany Stores is looking to improve their demand forecasting system.

Using Improving Forecast Accuracy with Machine Learning, AnyCompany Stores can easily generate AI-based forecasts at appropriate quantiles to address sensitivities associated with respective product categories. This mitigates inconsistent inventory buys, overstocks, out-of-stocks, and margin erosion. The solution also considers all relevant related data in addition to the historical demand data. This ensures that generated forecasts are accurate for each product category.

The generated forecasts may be used to complement existing forecasting and replenishment processes. With an improved forecasting solution, AnyCompany Stores will be able to meet demand, while holding less inventory and improving customer experience. This also helps ensure that potential demand spikes are accurately captured, so staples will always be in stock. Additionally, the company will not overstock expensive items with short shelf lives that are likely to spoil.

Conclusion

In this post, we explored how to implement an accurate retail forecasting solution using a ready-to-deploy AWS Solution. We use generated forecasts to drive inventory replenishment optimization and improve customer experience. The solution can be extended to inventory, workforce, capacity, and financial planning.

We showcase one of the ways in which Improving Forecast Accuracy with Machine Learning may be extended for a use case in the retail industry. If your organization would like to improve business outcomes with the power of forecasting, explore customizing this solution to fit your unique needs.

Further reading:

Analyze Fraud Transactions using Amazon Fraud Detector and Amazon Athena

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/analyze-fraud-transactions-using-amazon-fraud-detector-and-amazon-athena/

Organizations with online businesses have to be on guard constantly for fraudulent activity, such as fake accounts or payments made with stolen credit cards. One way they try to identify fraudsters is by using fraud detection applications. Some of these applications use machine learning (ML).

A common challenge with ML is the need for a large, labeled dataset to create ML models to detect fraud. You will also need the skill set and infrastructure to build, train, deploy, and scale your ML model.

In this post, I discuss how to perform fraud detection on a batch of many events using Amazon Fraud Detector. Amazon Fraud Detector is a fully managed service that can identify potentially fraudulent online activities. These can be situations such as the creation of fake accounts or online payment fraud. Unlike general-purpose ML packages, Amazon Fraud Detector is designed specifically to detect fraud. You can analyze fraud transaction prediction results by using Amazon Athena and Amazon QuickSight. I will explain how to review fraud using Amazon Fraud Detector and Amazon SageMaker built-in algorithms.

Batch fraud prediction use cases

You can use a batch predictions job in Amazon Fraud Detector to get predictions for a set of events that do not require real-time scoring. You may want to generate fraud predictions for a batch of events. These might be payment fraud, account take over or compromise, and free tier misuse while performing an offline proof-of-concept. You can also use batch predictions to evaluate the risk of events on an hourly, daily, or weekly basis depending upon your business need.

Batch fraud insights using Amazon Fraud Detector

Organizations such as ecommerce companies and credit card companies use ML to detect the fraud. Some of the most common types of fraud include email account compromise (personal or business), new account fraud, and non-payment or non-delivery (which includes compromised card numbers).

Amazon Fraud Detector automates the time-consuming and expensive steps to build, train, and deploy an ML model for fraud detection. Amazon Fraud Detector customizes each model it creates to your dataset, making the accuracy of models higher than current one-size-fits-all ML solutions. And because you pay only for what you use, you can avoid large upfront expenses.

If you want to analyze fraud transactions after the fact, you can perform batch fraud predictions using Amazon Fraud Detector. Then you can store fraud prediction results in an Amazon S3 bucket. Amazon Athena helps you analyze the fraud prediction results. You can create fraud prediction visualization dashboards using Amazon QuickSight.

The following diagram illustrates how to perform fraud predictions for a batch of events and analyze them using Amazon Athena.

Figure 1. Example architecture for analyzing fraud transactions using Amazon Fraud Detector and Amazon Athena

Figure 1. Example architecture for analyzing fraud transactions using Amazon Fraud Detector and Amazon Athena

The architecture flow follows these general steps:

  1. Create and publish a detector. First create and publish a detector using Amazon Fraud Detector. It should contain your fraud prediction model and rules. For additional details, see Get started (console).
  2. Create an input Amazon S3 bucket and upload your CSV file. Prepare a CSV file that contains the events you want to evaluate. Then upload your CSV file into the input S3 bucket. In this file, include a column for each variable in the event type associated with your detector. In addition, include columns for EVENT_ID, ENTITY_ID, EVENT_TIMESTAMP, ENTITY_TYPE. Refer to Amazon Fraud Detector batch input and output files for more details. Read Create a variable for additional information on Amazon Fraud Detector variable data types and formatting.
  3. Create an output Amazon S3 bucket. Create an output Amazon S3 bucket to store your Amazon Fraud Detector prediction results.
  4. Perform a batch prediction. You can use a batch predictions job in Amazon Fraud Detector to get predictions for a set of events that do not require real-time scoring. Read more here about Batch predictions.
  5. Review your prediction results. Review your results in the CSV file that is generated and stored in the Amazon S3 output bucket.
  6. Analyze your fraud prediction results.
    • After creating a Data Catalog by using AWS Glue, you can use Amazon Athena to analyze your fraud prediction results with standard SQL.
    • You can develop user-friendly dashboards to analyze fraud prediction results using Amazon QuickSight by creating new datasets with Amazon Athena as your data source.

Fraud detection using Amazon SageMaker

The Amazon Web Services (AWS) Solutions Implementation, Fraud Detection Using Machine Learning, enables you to run automated transaction processing. This can be on an example dataset or your own dataset. The included ML model detects potentially fraudulent activity and flags that activity for review. The diagram following presents the architecture you can automatically deploy using the solution’s implementation guide and accompanying AWS CloudFormation template.

SageMaker provides several built-in machine learning algorithms that you can use for a variety of problem types. This solution leverages the built-in Random Cut Forest algorithm for unsupervised learning and the built-in XGBoost algorithm for supervised learning. In the SageMaker Developer Guide, you can see how Random Cut Forest and XGBoost algorithms work.

Figure 2. Fraud detection using machine learning architecture on AWS

Figure 2. Fraud detection using machine learning architecture on AWS

This architecture can be segmented into three phases.

  1. Develop a fraud prediction machine learning model. The AWS CloudFormation template deploys an example dataset of credit card transactions contained in an Amazon Simple Storage Service (Amazon S3) bucket. An Amazon SageMaker notebook instance with different ML models will be trained on the dataset.
  2. Perform fraud prediction. The solution also deploys an AWS Lambda function that processes transactions from the example dataset. It invokes the two SageMaker endpoints that assign anomaly scores and classification scores to incoming data points. An Amazon API Gateway REST API initiates predictions using signed HTTP requests. An Amazon Kinesis Data Firehose delivery stream loads the processed transactions into another Amazon S3 bucket for storage. The solution also provides an example of how to invoke the prediction REST API as part of the Amazon SageMaker notebook.
  3. Analyze fraud transactions. Once the transactions have been loaded into Amazon S3, you can use analytics tools and services for visualization, reporting, ad-hoc queries, and more detailed analysis.

By default, the solution is configured to process transactions from the example dataset. To use your own dataset, you must modify the solution. For more information, see Customization.

Conclusion

In this post, we showed you how to analyze fraud transactions using Amazon Fraud Detector and Amazon Athena. You can build fraud insights using Amazon Fraud Detector and Amazon SageMaker built-in algorithms Random Cut Forest and XGBoost. With the information in this post, you can build your own fraud insights models on AWS. You’ll be able to detect fraud faster. Finally, you’ll be able to solve a variety of fraud types. These can be new account fraud, online transaction fraud, and fake reviews, among others.

Read more and get started on building fraud detection models on AWS.

Establish private connectivity between Amazon QuickSight and Snowflake using AWS PrivateLink

Post Syndicated from Maxwell Moon original https://aws.amazon.com/blogs/big-data/establish-private-connectivity-between-amazon-quicksight-and-snowflake-using-aws-privatelink/

Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud. QuickSight lets you easily create and publish interactive BI dashboards that include Machine Learning-powered insights. QuickSight dashboards can be accessed from any device, and seamlessly embedded into your applications, portals, and websites.

QuickSight offers several sources for data, including but not limited to Amazon Athena, Amazon Redshift, Amazon Simple Storage Service (Amazon S3), and Snowflake. This post presents solutions to enable you to set up Snowflake as a data source for QuickSight regardless of your network configuration requirements.

We cover the following configurations of Snowflake as a data source for QuickSight:

  • QuickSight connection to Snowflake via AWS PrivateLink
  • QuickSight connection to Snowflake via AWS PrivateLink and virtual private cloud (VPC) peering (same Region)
  • QuickSight connection to Snowflake via AWS PrivateLink and VPC peering (cross-Region)
  • QuickSight connection to Snowflake (public network)

Prerequisites

To complete this solution, you need the following:

QuickSight connection to Snowflake via AWS PrivateLink

First, we show you how to connect to Snowflake with QuickSight over AWS PrivateLink. The following diagram illustrates the solution architecture.

Set up the Snowflake AWS PrivateLink integration

To start, we walk through enabling AWS PrivateLink for your Snowflake account. This includes locating resources in your AWS account, access to the Snowflake UI, and creating a support case with Snowflake.

  1. Identify the VPC you want to use to set up the AWS PrivateLink integration. To do so, retrieve a list of VPCs from the command line, then retrieve the VpcId element from the resulting JSON object for the desired VPC. See the following code:
aws ec2 describe-vpcs --output json
  1. Retrieve your AWS account ID. This post assumes that the account you’re targeting is your default account on your AWS CLI configuration.
aws sts get-caller-identity --output json
  1. If you’re setting up multiple accounts, repeat these steps for all accounts and VPCs (this post assumes you’re setting up a single account and VPC and will use this as the context moving forward).
  2. Contact Snowflake Support with your AWS account ID, VPC ID, and the corresponding account URL you use to access Snowflake (for example, <account id>.snowflakecomputing.com).

Enabling AWS PrivateLink for your Snowflake account can take up to two business days.

  1. After AWS PrivateLink is enabled, retrieve the AWS PrivateLink configuration for your Region by running the following command in a Snowflake worksheet, then retrieve the values for privatelink-account-url and privatelink_ocsp-url from the resulting JSON object. Examples of each value are as follows:
select SYSTEM$GET_PRIVATELINK_CONFIG();

privatelink-vpce-id: com.amazonaws.vpce.<region_id>.vpce-svc-xxxxxxxxxxxxxxxxx
privatelink-account-url: xxxxxxxx.<region>.privatelink.snowflakecomputing.com
privatelink_ocsp-url: ocsp.xxxxxxxx.<region>.privatelink.snowflakecomputing.com
  1. Store these values in a text editor for later use.

Next, we configure the VPC endpoint on the Amazon Virtual Private Cloud (Amazon VPC) console create all the required security groups.

  1. On the Amazon VPC console, choose Endpoints in the navigation menu.
  2. Choose Create endpoint.
  3. Select Find AWS Service by Name.
  4. For Service Name, enter the value for privatelink-vpce-id that we retrieved earlier.
  5. Choose Verify.

A green alert with “Service Name Found” appears and the VPC and subnet options automatically expand.

Depending on your targeted Region, your resulting screen may show another Region name.

  1. Choose the same VPC ID that you sent to Snowflake.
  2. Select the subnets where you want to create endpoints.

AWS recommends using more than one subnet for high availability.

  1. For Security group, choose Create a new security group.

This opens the Security groups page on the Amazon VPC console in a new tab.

  1. Choose Create security group.

  1. Give your new security group a name (for example, quicksight-doc-snowflake-privatelink-connection) and description.
  2. Choose the VPC ID you used in previous steps.

Next, you add two rules that allow traffic from within your VPC to this VPC endpoint.

  1. Retrieve the CIDR block for your targeted VPC:
aws ec2 describe-vpcs --vpc-ids vpc-xxxxxxxxxxxxxxxxx | jq -r '.Vpcs[].CidrBlock'
  1. Choose Add rule in the Inbound rules
  2. Choose HTTPS for the type, leave the source as Custom, and enter the value retrieved from the preceding describe-vpcs call (for example, 10.0.0.0/16).
  3. Choose Add rule in the Inbound rules
  4. Choose HTTP for the type, leave the source as Custom, and enter the value retrieved from the preceding describe-vpcs
  5. Choose Create security group.

  1. Retrieve the security group ID from the newly created security group.
  2. On the VPC endpoint configuration page, remove the default security group.
  3. Search for and select the new security group ID.

  1. Choose Create endpoint.

You’re redirected to a page that has a link to your VPC endpoint configuration, specified by the VPC ID. The next page has a link to view the configuration in full.

  1. Retrieve the topmost record in the DNS names list.

This can be differentiated from other DNS names because it only includes the Region name (such as us-west-2), and no Availability Zone letter notation (such as us-west-2a).

  1. Store this record in a text editor for later use.

Configure DNS for Snowflake endpoints in your VPC

To configure your Snowflake endpoints, complete the following steps:

  1. On the Route 53 console, choose Hosted Zones in the navigation pane.
  2. Choose Create hosted zone.
  3. For Domain name, enter the value you stored for privatelink-account-url from the previous steps.

In this field, we remove the Snowflake account ID from the DNS name and only use the value starting with the Region identifier (for example, <region>.privatelink.snowflakecomputing.com). We create a resource record set later for the subdomain.

  1. For Type, select Private hosted zone.

Your Region code may not be us-west-2; reference the DNS name returned to you by Snowflake.

  1. In the VPCs to associate with the hosted zone section, choose the Region in which your VPC is located and the VPC ID used in previous steps.

  1. Choose Create hosted zone.

Next. we create two records: one for privatelink-account-url and one for privatelink_ocsp-url.

  1. On the Hosted zones page, choose Create record set.
  2. For Record name, enter your Snowflake account ID (the first eight characters in privatelink-account-url).
  3. For Record type, choose CNAME.
  4. For Value, enter the DNS name for the Regional VPC endpoint we retrieved in the previous section.
  5. Choose Create records.

  1. Repeat these steps for the OCSP record we notated as privatelink-ocsp-url earlier, starting with ocsp through the eight-character Snowflake ID for the record name (for example, ocsp.xxxxxxxx).

Configure a Route 53 resolver inbound endpoint for your VPC

QuickSight doesn’t use the standard AWS resolver (the VPC’s .2 resolver). To resolve private DNS from QuickSight, you need to set up Route 53 resolver endpoints.

First, we create a security group for the Route 53 resolver inbound endpoint.

  1. On the Security groups page of the Amazon VPC console, choose Create security group.
  2. Enter a name for your security group (for example, quicksight-doc-route53-resolver-sg) and a description.
  3. Choose the VPC ID used in previous steps.
  4. Create rules that allow for DNS (Port 53) over UDP and TCP from within the VPC CIDR block.
  5. Choose Create security group.
  6. Note the security group ID, because we now add a rule to allow traffic to the VPC endpoint security group.

Now we create the Route 53 resolver inbound endpoint for our VPC.

  1. On the Route 53 console, choose Inbound endpoint in the navigation pane.
  2. Choose Create inbound endpoint.
  3. For Endpoint name, enter a name (for example, quicksight-inbound-resolver).
  4. For VPC in the Region, choose the VPC ID used in previous steps.
  5. For Security group for the endpoint, choose the security group ID you saved earlier.

  1. In the IP address section, choose two Availability Zones and subnets, and leave Use an IP address that is selected automatically selected.

  1. Choose Submit.
  2. Choose the inbound endpoint after it’s created and take note of the two IP addresses for the resolvers.

Connect a VPC to QuickSight

To connect a VPC to QuickSight, complete the following steps:

  1. On the Security groups page of the Amazon VPC console, choose Create security group.
  2. Enter a name (for example, quicksight-snowflake-privatelink-sg) and a description.
  3. Choose the VPC ID used in previous steps.

Security groups for QuickSight are different from other security groups in that they are stateless, rather than stateful. This means you must explicitly allow return traffic from the targeted security group. The inbound rule in your security group must allow traffic on all ports. It needs to do this because the destination port number of any inbound return packets is set to a randomly allocated port number. For more information, see Inbound Rules.

  1. Choose Create security group.
  2. Take note of the security group ID, because we now add a rule to allow traffic to the VPC endpoint security group.
  3. On the Security groups page, search for the security group ID that is used for the VPC endpoint.
  4. Choose Edit inbound rules.
  5. Add rules for both HTTPS and HTTP traffic, using the security group ID for the security group you created as the source.
  6. Choose Save rules.

Next, we move to the QuickSight console to configure the VPC connection.

  1. Navigate to the QuickSight console.
  2. Choose the user name and choose Manage QuickSight.

  1. In the navigation pane, choose Manage VPC connections.
  2. Choose Add a VPC connection.
  3. For VPC connection name, enter a name (for example, snowflake-privatelink).
  4. For VPC ID, choose the VPC used in previous steps.
  5. For Subnet ID, choose one of the subnets that has a VPC endpoint, as specified when you created the endpoint earlier.
  6. For Security group ID, enter the ID of the security group you created.
  7. For DNS resolver endpoints, enter the two IPs for the inbound resolver endpoint you created earlier.

  1. Choose Create.

Set up a Snowflake data source through the VPC

To set up a Snowflake data source, complete the following steps.

  1. On the QuickSight console, choose Datasets in the navigation page.
  2. Choose New dataset.
  3. Choose the Snowflake option.
  4. For Data source name, enter a name (for example, snowflake).
  5. For Connection type¸ choose the VPC connection you created earlier (snowflake-privatelink).
  6. For Database server, enter privatelink-account-url.
  7. For Database name, enter the name of your database.
  8. For Warehouse, enter the name of a running Snowflake warehouse.
  9. For Username, enter your Snowflake username.
  10. For Password, enter your Snowflake password.
  11. Choose Validate.
  12. Upon successful validation, choose Create data source.

Create your first QuickSight dashboard

In this section, we cover creating a dataset in QuickSight, then using this data in a visualization. We’re using a dummy dataset that has information about fictional employees.

  1. For Schema, choose your schema.
  2. For Tables, select your tables.
  3. Choose Select.

In the Finish dataset creation section, you can determine if QuickSight imports your dataset into SPICE to improve query performance or directly queries your data each time a dashboard is loaded. For more information about SPICE, see Importing Data into SPICE.

  1. For this post, we select Import to SPICE for quicker analytics.
  2. Choose Visualize.

Now that we have the schema, table, and SPICE configuration for the dataset, we can create our first visualization.

  1. Choose a field from the available fields list. For this post, we choose City.
  2. Choose a visualization in the Visual types

This only scratches the surface of the visualization capabilities of QuickSight. For more information, see Working with Amazon QuickSight Visuals.

Next, we cover a network configuration that allows for QuickSight to be connected to one VPC with AWS PrivateLink in another VPC, and use VPC peering to allow QuickSight to use the AWS PrivateLink connection.

QuickSight connection to Snowflake via AWS PrivateLink and VPC peering within the same Region

In this section, we show you how to connect to Snowflake with QuickSight with two VPCs peered and AWS PrivateLink. The following diagram illustrates the solution architecture.

Set up VPC peering

First, we create the VPC peering connection from the requesting VPC.

  1. On the Peering connections page of the Amazon VPC console, choose Create peering connection.
  2. For Select a local VPC to peer with, choose the VPC in which you configured your Snowflake AWS PrivateLink connection.

  1. In the Select another VPC to peer with section, leave the default options for Account and Region (My account and This Region, respectively).
  2. For VPC (Accepter), choose the VPC where your QuickSight is connected to.

  1. Choose Create peering connection.

Next, we accept the VPC connection from the accepting VPC.

  1. On the Peering connections page, select the connection you created.
  2. On the Actions menu, choose Accept.
  3. Review the information about the request. If everything looks correct, choose Yes, Accept.

Next, we configure DNS to resolve between the two VPCs.

  1. On the Peering connections page, choose your new peering connection.
  2. On the DNS tab, check if the two options show as Disabled.

If they’re enabled, you can skip to the steps on creating route tables.

  1. On the Actions menu, choose Edit DNS Settings.

This requires your VPC to have DNS host name and resolution enabled.

  1. Select both check boxes to allow DNS to resolve from both the acceptor and requestor VPCs.
  2. Choose Save.

Next, create the route table entry to allow for routes to propagate between the two VPCs.

  1. On the Route tables page, choose the route tables in your requesting VPC.
  2. On the Route tab, choose Edit routes.
  3. Add a route for the CIDR block that your peered VPC uses (for this post, 172.31.0.0/16).
  4. Choose Save routes.

  1. Repeat for the route tables in your accepter VPC.

Configure DNS in the accepter VPC

In this section, we associate the accepter VPC that with the same private hosted zone as the requester VPC (<region>.privatelink.snowflakecomputing.com).

  1. On the Route 53 console, choose Hosted zones in the navigation pane.
  2. Select the hosted zone <region>.privatelink.snowflakecomputing.com and choose Edit.
  3. In the VPCs to associate with the hosted zone section, choose Add VPC.
  4. Choose the Region and VPC ID associated with the accepter VPC.

  1. Choose Save changes.

Configure Route 53 resolver inbound endpoints in the accepter VPC

To configure your Route 53 resolver inbound endpoints, complete the following steps:

  1. On the Security groups page of the Amazon VPC console, choose Create security group.
  2. Enter a name (for example, quicksight-doc-route53-resolver-sg) and a description.
  3. Choose the VPC ID used in previous steps.
  4. Create rules that allow for DNS (port 53) over UDP and TCP from within the VPC CIDR block (for this post, 172.31.0.0/16).
  5. Choose Create security group.

  1. Take note of the security group ID, because we now add a rule to allow traffic to the VPC endpoint security group.

Next, we set up the Route 53 inbound endpoint for this VPC.

  1. On the Route 53 console, choose Inbound endpoint in the navigation pane.
  2. Choose Create inbound endpoint.
  3. Enter a name for the endpoint (for example, quicksight-inbound-resolver).
  4. For VPC in the Region, choose the VPC ID for the accepter VPC.
  5. For Security group, choose the security group ID you saved earlier.
  6. In the IP Address section, select two Availability Zones and subnets, and leave Use an IP address that is selected automatically
  7. Choose Submit.
  8. Choose the inbound endpoint after it’s created.
  9. After the inbound endpoint has provisioned, note the two IP addresses for the resolvers.

Connect the accepter VPC to QuickSight

To start, we need to create a security group for QuickSight to allow traffic to the Route 53 resolver inbound endpoints, the VPC endpoint for AWS PrivateLink, and traffic within the local network.

  1. On the Security groups page of the Amazon VPC console, choose Create security group.
  2. Enter a name (for example, quicksight-snowflake-privatelink-vpc-peering-sg) and a description.
  3. Choose the VPC ID for the accepter VPC.
  4. Create the following ingress rules:
    1. One rule for the local network for all TCP ports (e.g., 172.31.0.0/16).
    2. One rule allowing DNS traffic from the security group for the Route 53 resolver inbound endpoint for all TCP ports.
    3. One rule allowing DNS traffic from the security group for the Route 53 resolver inbound endpoint for all UDP ports.
    4. One rule allowing traffic to the security group for the VPC endpoint (located in the peered VPC).

As discussed earlier, security groups for QuickSight are different from other security groups. You must explicitly allow return traffic from the targeted security group, and the inbound rule in your security group must allow traffic on all ports. For more information, see Inbound Rules.

Next, we modify the security group for the Route 53 resolver inbound endpoint to allow traffic from the security group we created.

  1. On the Security groups page, search for the security group ID used for the Route 53 resolver inbound endpoint.
  2. Choose Edit inbound rules.
  3. Add rules for both DNS over UDP and DNS over TCP, using the security group ID for the security group we created for QuickSight as the source.

  1. Choose Save rules.

Next, modify the security group that was created for the VPC endpoint for the AWS PrivateLink connection.

  1. On the Security groups page, search for the security group ID used for the VPC endpoint for the AWS PrivateLink connection.
  2. Choose Edit inbound rules.
  3. Add rules for both HTTPS and HTTP, using the security group ID for the security group created for QuickSight as the source.
  4. Choose Save rules.

Next, we set up the VPC connection in QuickSight.

  1. On the QuickSight console, choose the user name and choose Manage QuickSight.
  2. In the navigation pane, choose Manage VPC connections.
  3. Choose Add a VPC connection.
  4. For VPC connection name¸ enter a name (for example, snowflake-privatelink-vpc-peering).
  5. For Subnet, choose a subnet ID that has a route table with a peering connection to the requester VPC where the AWS PrivateLink connection resides.
  6. For Security group ID, enter the ID of the security group created earlier.
  7. For DNS resolver endpoints, enter the two IPs for the inbound resolver endpoint you created.
  8. Choose Create.

Set up a Snowflake data source in QuickSight through the VPC

To set up a Snowflake data source in QuickSight, complete the following steps:

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.
  3. Choose the Snowflake option.
  4. Enter a data source name (for example, snowflake-dataset).
  5. Choose the VPC connection you created (snowflake-privatelink).
  6. For Database server, enter the privatelink-account-url.
  7. For Database name, enter the name of your database.
  8. For Warehouse, enter the name of a running Snowflake warehouse.
  9. For Username, enter your Snowflake username.
  10. For Password, enter your Snowflake password.
  11. Choose Validate.
  12. Upon successful validation, choose Create data source.

For steps to create a dashboard, see the earlier section, Create your first QuickSight dashboard.

In the next section, we cover a similar network configuration, with the difference being that we use cross-Region VPC peering.

QuickSight connection to Snowflake via AWS PrivateLink and VPC peering across Regions

In this section, we show you how to connect to Snowflake with QuickSight over AWS PrivateLink with two VPCs peered across Regions.

We refer to Regions generically throughout this post, denoting the Region that has the Snowflake AWS PrivateLink connection as Region A and the Region in which QuickSight is set up as Region B.

The following diagram illustrates our solution architecture.

Set up VPC peering between two Regions

First, we create the VPC peering connection from the requesting VPC.

  1. Navigate to the Peering connections page on the Amazon VPC console in Region B (the Region in which you plan to use QuickSight to deploy dashboards).
  2. Choose Create peering connection.
  3. In the Select a local VPC to peer with section, for VPC (Requester), choose the VPC in which you have connected or intend to connect QuickSight.

  1. For Select another VPC to peer with, select My account and Another Region.
  2. Choose the Region in which your Snowflake AWS PrivateLink connection exists.
  3. For VPC ID (Accepter), enter the VPC ID for the VPC in which your Snowflake AWS PrivateLink exists.

  1. Choose Create peering connection.
  2. Copy the VPC peering connection ID so we can easily locate it in the next steps (it looks like pcx-xxxxxxxxxxxx).

Next, we accept the VPC peering connection from the Region in which you created your AWS PrivateLink connection.

  1. Navigate to the Amazon VPC console in Region A (where your Snowflake AWS PrivateLink connection exists).
  2. Search for and select the peering connection you created.
  3. On the Actions menu, choose Accept Request.

  1. Review the information about the request. If everything looks correct, choose Yes, Accept.

Next, we configure DNS to resolve between the two VPCs.

  1. On the Peering connections page of the Amazon VPC console, choose your newly created VPC peering connection.
  2. On the DNS tab, check if the two options show Disabled.

If they’re enabled, skip to the steps on creating route tables.

  1. On the Actions menu, choose Edit DNS settings.

This requires your VPC to have DNS host name and resolution enabled.

  1. Select both check boxes to allow DNS to resolve from both the accepter and requestor VPCs.
  2. Choose Save.

Next, we create the route table entry to allow for routes to propagate between the two VPCs for Region B.

  1. Navigate to the Amazon VPC console in Region B (the Region in which you plan to use QuickSight to deploy dashboards).
  2. In the navigation pane, choose Route tables.
  3. Select the route tables in your requesting VPC.
  4. On the Route tab, choose Edit routes.
  5. Add a route for the CIDR block that your peered VPC uses (for this post, 10.0.0.0/16 is the CIDR block for the VPC in which the Snowflake AWS PrivateLink connection resides).
  6. Choose Save routes.

Next, create the route table entry to allow for routes to propagate between the two VPCs for Region A.

  1. Navigate to the Amazon VPC console in Region A (where your Snowflake AWS PrivateLink connection exists).
  2. Repeat the previous steps, using the CIDR block for the peered VPC (in this post, 172.16.0.0/16).

Configure DNS in the VPC in Region B

First, we need to associate the VPC in Region B (where you deploy QuickSight) with the same private hosted zone as the VPC in Region A where your Snowflake AWS PrivateLink connection exists (<region>.privatelink.snowflakecomputing.com).

  1. On the Route 53 console, choose Hosted zones in the navigation pane.
  2. Select the private hosted zone <region>.privatelink.snowflakecomputing.com and choose Edit.
  3. In the VPCs to associate with the hosted zone section, choose Add VPC.
  4. Choose the Region and VPC ID associated with the accepter VPC.

  1. Choose Save changes.

Configure the Route 53 resolver inbound endpoint for your VPC in Region B

To configure the resolver inbound endpoint in Region B, complete the following steps:

  1. On the Security groups page on the Amazon VPC console, choose Create security group.
  2. Enter a name (for example, quicksight-doc-route53-resolver-sg) and a description.
  3. Choose the VPC ID used in previous steps.
  4. Create rules that allow for DNS (port 53) over UDP and TCP from within the VPC CIDR block (for this post, 172.16.0.0/16).

  1. Choose Create security group.
  2. Take note the security group ID, because we now add a rule to allow traffic to the VPC endpoint security group.

Next, we set up the Route 53 inbound endpoint for this VPC.

  1. On the Route 53 console, choose Inbound endpoint in the navigation pane.
  2. Choose Create inbound endpoint.
  3. Enter a name for the endpoint (for example, quicksight-inbound-resolver).
  4. For VPC in the Region, choose the VPC ID used in previous steps.
  5. For Security group, choose the security group ID from the previous step.
  6. In the IP Address section, select two Availability Zones and subnets, and leave Use an IP address that is selected automatically
  7. Choose Submit.
  8. Choose the inbound endpoint after it’s created.
  9. After the inbound endpoint has provisioned, note the two IP addresses for the resolvers.

Connect the VPC to QuickSight in Region B

To start, we need to create a security group for QuickSight to allow traffic to the Route 53 resolver inbound endpoints, the VPC endpoint for AWS PrivateLink, and traffic within the local network.

  1. On the Security groups page of the Amazon VPC console in Region B, choose Create security group.
  2. Enter a name (for example, quicksight-snowflake-sg) and a description.
  3. Choose the VPC ID for the VPC where you previously created the VPC peering connection.
  4. Create the following ingress rules:
    1. One for the local network all TCP ports (for example, 172.16.0.0/16).
    2. One rule allowing DNS traffic from the security group for the Route 53 resolver inbound endpoint for all TCP ports.
    3. One rule allowing DNS traffic from the security group for the Route 53 resolver inbound endpoint for all UDP ports.
    4. One allowing traffic for all TCP ports to the CIDR block for the VPC located in Region A, where your Snowflake AWS PrivateLink connection exists (for this post, 10.0.0.0/16).

As discussed earlier, security groups for QuickSight are different from other security groups. You must explicitly allow return traffic from the targeted security group, and the inbound rule in your security group must allow traffic on all ports. For more information, see Inbound Rules.

Next, we modify the security group for the Route 53 resolver inbound endpoint in Region B to allow traffic from the security group we created.

  1. On the Security groups page, search for the security group ID used for the Route 53 resolver inbound endpoint.
  2. Choose Edit inbound rules.
  3. Add rules for both DNS over UDP and DNS over TCP, using the CIDR block for the VPC in Region B (for this post, 172.16.0.0/16).

  1. Choose Save rules.

Next, we need to modify the security group we’re using for the AWS PrivateLink connection.

  1. Navigate to the Security groups page on the Amazon VPC console in Region A.
  2. Search for the security group ID that is used for the VPC endpoint for the AWS PrivateLink connection.
  3. Choose Edit inbound rules.
  4. Add rules for both HTTPS and HTTP, using the CIDR Block for the VPC in Region B as the source (for this post, 172.16.0.0/16).

  1. Choose Save rules.

Finally, we set up the QuickSight VPC connection.

  1. Navigate to the QuickSight console in Region B.
  2. Choose the user name and choose Manage QuickSight.
  3. In the navigation pane, choose Manage VPC connection.
  4. Choose Add a VPC connection.
  5. For VPC connection name, enter a connection name (for example, snowflake-privatelink-cross-region).
  6. For VPC ID, choose the VPC ID of the VPC in Region B.
  7. For Subnet, choose a subnet ID from the VPC in Region B that has a route table with a peering connection to the VPC where the AWS PrivateLink connection resides.
  8. For Security group ID, enter the ID of the security group you created.
  9. For DNS resolver endpoints, enter the two IPs for the inbound resolver endpoint created earlier.

  1. Choose Create.

Set up a Snowflake data source in QuickSight through the VPC

To set up a Snowflake data source in QuickSight, complete the following steps:

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.
  3. Choose the Snowflake option.
  4. Enter a name for your data source (for example, snowflake-dataset).
  5. Choose the VPC connection you created (snowflake-privatelink).
  6. For Database server, enter the privatelink-account-url.
  7. For Database name, enter the name of your database.
  8. For Warehouse, enter the name of a running Snowflake warehouse.
  9. For Username, enter your Snowflake username.
  10. For Password, enter your Snowflake password.
  11. Choose Validate.
  12. Upon successful validation, choose Create data source.

For steps to create a dashboard, see the earlier section, Create your first QuickSight dashboard.

For our last configuration, we cover how to set up a QuickSight connection to Snowflake without AWS PrivateLink.

QuickSight connection to Snowflake without AWS PrivateLink

In this section, we show you how to connect to Snowflake with QuickSight without using AWS PrivateLink.

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.
  3. Choose the Snowflake option.
  4. Enter a data source name (for example, snowflake-dataset).
  5. Leave the connection type as Public network.
  6. For Database name, enter the name of your database.
  7. For your database server, enter the URL you use to log in to your Snowflake (xxxxxxxx.snowflakecomputing.com).
  8. For Warehouse, enter the name of a running Snowflake warehouse.
  9. For Username, enter your Snowflake username.
  10. For Password, enter your Snowflake password.
  11. Choose Validate.
  12. Choose Create data source.

For steps to create a dashboard, see the earlier section, Create your first QuickSight dashboard.

Clean up

If your work with QuickSight, Snowflake, and PrivateLink is complete, remove your Route53 resolver inbound endpoint, Route 53 private host zone, and the VPC endpoint for Snowflake in order to avoid incurring additional fees.

Conclusion

In this post, we covered four scenarios for connecting QuickSight to Snowflake as a data source using AWS PrivateLink for connectivity in three different scenarios: the same VPC, with VPC peering in the same Region, and with VPC peering across Regions. We also covered how to connect QuickSight to Snowflake without AWS PrivateLink.

After you set up the data source, you can gain further insights from your data by setting up ML Insights in QuickSight, set up graphical representations of your data using QuickSight visuals, or join data from multiple datasets, as well as all other QuickSight features.


About the Author

Maxwell Moon is a Senior Solutions Architect at AWS working with Independent Software Vendors (ISVs) to design and scale their applications on AWS. Outside of work, Maxwell is a dad to two cats, is an avid supporter of the Wolverhampton Wanderers Football Club, and is patiently waiting for a new wave of ska music.

 

 

Bosco Albuquerque is a Sr Partner Solution Architect at AWS and has over 20 years of experience in working with database and analytics products, from enterprise database vendors, and cloud providers and has helped large technology companies in designing data analytics solutions as well as led engineering teams is designing and implementing data analytics platforms and data products.

Enable federation to multiple Amazon QuickSight accounts with Microsoft Azure Active Directory

Post Syndicated from Srikanth Baheti original https://aws.amazon.com/blogs/big-data/enable-federation-to-multiple-amazon-quicksight-accounts-with-microsoft-azure-active-directory/

Amazon QuickSight is a scalable, serverless, embeddable, machine learning (ML)-powered business intelligence (BI) service built for the cloud that supports identity federation in both Standard and Enterprise editions. Organizations are working towards centralizing their identity and access strategy across all of their applications, including on-premises, third-party, and applications on AWS. Many organizations use Microsoft Azure Active Directory (Azure AD) to control and manage user authentication and authorization centrally. If your organization uses Azure AD for cloud applications and multiple QuickSight accounts, you can enable federation to all of your QuickSight accounts without needing to create and manage users multiple times. This authorizes users to access QuickSight assets—analyses, dashboards, folders, and datasets—through centrally managed Azure AD.

In this post, we go through the steps to configure federated single sign-on (SSO) between a single Azure AD instance and multiple QuickSight accounts. We demonstrate registering an SSO application in Azure AD, creating roles in Azure AD, and assigning these roles to map to QuickSight roles (admin, author, and reader) These QuickSight roles represent three different personas supported in QuickSight. Administrators can publish the QuickSight app in the Azure App portal to enable users to SSO to QuickSight using their Azure AD credentials.

Prerequisites

To complete this walkthrough, you must have the following prerequisites:

  • An Azure AD subscription
  • One or more QuickSight account subscriptions

Solution overview

The walkthrough includes the following steps:

  1. Register an AWS Single Sign-On (AWS SSO) application in Azure AD.
  2. Configure the application in Azure AD.
  3. Add Azure AD as your SAML identity provider (IdP) in AWS.
  4. Configure AWS Identity and Access Management (IAM) policies.
  5. Configure IAM roles.
  6. Create roles in Microsoft Graph Explorer.
  7. Assign the newly created roles through Graph Explorer to users in Azure AD.
  8. Test the application from Azure AD.

Register an AWS SSO application in Azure AD

To configure the integration of an AWS SSO application in Azure AD, you need to add AWS SSO to your list of managed software as a service (SaaS) apps.

  1. Sign in to the Azure portal using a Microsoft account.
  2. Under Azure services, choose Azure Active Directory.

  1. In the navigation pane, under Manage, choose Enterprise Applications.

  1. Choose All applications.
  2. Choose New application.

  1. In the Browse Azure AD Gallery section, search for AWS Single Sign-On.
  2. Choose AWS Single Sign-On from the results panel and add the application.

  1. For Name, enter Amazon QuickSight.
  2. After the application is created, copy the Object ID value from the application overview.

You need this object ID in the later steps.

Configure an AWS SSO application in Azure AD

Follow these steps to enable Azure AD SSO in the Azure portal.

  1. In the Azure portal, on the AWS SSO application registered in first step, in the Manage section, choose single sign-on.
  2. On the Select a single sign-on method page, choose SAML.
  3. Choose the pencil icon.
  4. For Identifier (Entity ID), enter URN:AMAZON:WEBSERVICES.
  5. For Reply URL, enter https://signin.aws.amazon.com/saml.
  6. Leave Sign on URL blank
  7. For Relay State, enter https://quicksight.aws.amazon.com.
  8. Leave Logout URL blank.
  9. Choose Save.

  1. On the Set up Single Sign-On with SAML page, under User Attributes & Claims, choose Edit.

  1. In the Additional Claims section, configure SAML token attributes by using the values in the following table.
Name Source attribute Namespace
RoleSessionName user.userprincipalname https://aws.amazon.com/SAML/Attributes
Role user.assignedroles https://aws.amazon.com/SAML/Attributes
SessionDuration Provide a value from 900 seconds (15 minutes) to 43,200 seconds (12 hours) https://aws.amazon.com/SAML/Attributes
  1. In the SAML Signing Certificate section, choose Download to download the federation metadata XML file.

You this XML document later when setting up the SAML provider in IAM.

Add Azure AD as your SAML IdP in AWS

To configure Azure AD as your SAML IdP, complete the following steps:

  1. Open a new tab in your browser.
  2. Sign in to the IAM console in your AWS account with admin permissions.
  3. On the IAM console, under Access Management in the navigation pane, choose Identity providers.
  4. Choose Add provider.

  1. For Provider name, enter AzureActiveDirectory.
  2. Choose Choose file to upload the metadata document you downloaded in the earlier step.
  3. Choose Add provider.

  1. In the banner message that appears, choose View provider.

  1. Copy the ARN to use in a later step.

  1. Repeat these steps in other accounts where you want to enable SSO.

Configure IAM policies

In this step, you create three IAM policies for mapping to three different roles with permissions in QuickSight (admin, author, and reader).

Use the following steps to set up the QuickSight-Admin-Account1 policy. This policy grants admin privileges in QuickSight to the federated user.

  1. On the IAM console, choose Policies.
  2. Choose Create policy.
  3. Choose JSON and replace the existing text with the code from the following table for QuickSight-Admin-Account1.
Policy Name JSON Text
QuickSight-Admin-Account1
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "quicksight:CreateAdmin",
"Resource": "*"
}
]
}

QuickSight-Author-Account1
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "quicksight:CreateUser",
"Resource": "*"
}
]
}

QuickSight-Reader-Account1
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": " quicksight:CreateReader",
"Resource": "*"
}
]
}

  1. Choose Review policy
  2. For Name, enter QuickSight-Admin-Account1.
  3. Choose Create policy.
  4. Repeat the steps for QuickSight-Author-Account1 and QuickSight-Reader-Account1.
  5. Repeat these steps in other accounts where you want to enable SSO.

Configure IAM roles

Next, create the roles that your Azure AD users assume when federating into QuickSight. Use the following steps to set up the admin role:

  1. On the IAM console, choose Roles.
  2. Choose Create role.
  3. For Select type of trusted entity, choose SAML 2.0 federation.
  4. For SAML provider, choose the provider you created earlier (AzureActiveDirectory).
  5. Select Allow programmatic and AWS Management Console access.
  6. For Attribute, choose SAML:aud.
  7. For Value, enter https://signin.aws.amazon.com/saml.

  1. Choose Next: Permissions.
  2. Choose the QuickSight-Admin-Account1 IAM policy you created in the previous step.
  3. Choose Next: Tags.
  4. Choose Next: Review
  5. For Role name, enter QuickSight-Admin-Role.
  6. For Role description, enter a description.
  7. Choose Create role.

  1. On the IAM console, in the navigation pane, choose Roles.
  2. Choose the QuickSight-Admin-Role role you created to open the role’s properties.
  3. Copy the role ARN to the notepad.
  4. On the Trust Relationships tab, choose Edit Trust Relationship.
  5. Under Trusted Entities, verify that the IdP you created is listed.
  6. Under Conditions, verify that SAML:aud with a value of https://signin.aws.amazon.com/saml is present.

  1. Repeat these steps to create your author and reader roles and attach the appropriate policies:
    1. For QuickSight-Author-Role, use the policy QuickSight-Author-Account1.
    2. For QuickSight-Reader-Role, use the policy QuickSight-Reader-Account1.
  2. Repeat these steps in other accounts where you want to enable SSO.

Create roles in Microsoft Graph Explorer

Optionally, you can create roles within Azure AD with paid subscriptions. Open Microsoft Graph Explorer, and then do the following:

  1. Sign in to the Microsoft Graph Explorer site with the domain account for your tenant.

You need sufficient permissions to create the roles.

  1. To grant permissions, choose the ellipsis (three dots) next to your name and choose Select permissions.

  1. On the Permission list, expand Directory.
  2. Select the three directory-level permissions as shown in the following screenshot and choose Consent.

  1. Sign in to Graph Explorer again, and accept the site usage conditions.
  2. Choose GET for the method, and 0 for the version.
  3. In the query box, enter https://graph.microsoft.com/v1.0/servicePrincipals/<objectId> (use the object ID you saved earlier).
  4. In the Response preview pane, copy the response to an editor of your choice to modify.

  1. Extract the appRoles property from the service principal object.

Now you generate new roles for your application. These roles must match the IAM roles in AWS that you created earlier.

  1. From the notepad, use the format <Role ARN>, <IdP ARN> to create your roles:
    1. arn:aws:iam::5xxxxxxxxxx9:role/QuickSight-Admin-Role,arn:aws:iam::5xxxxxxxxxx9:saml-provider/AzureActiveDirectory
    2. arn:aws:iam::5xxxxxxxxxx9:role/QuickSight-Author-Role,arn:aws:iam::5xxxxxxxxxx9:saml-provider/AzureActiveDirectory
    3. arn:aws:iam::5xxxxxxxxxx9:role/QuickSight-Reader-Role,arn:aws:iam::5xxxxxxxxxx9:saml-provider/AzureActiveDirectory
    4. arn:aws:iam::0xxxxxxxxxx2:role/QS-Admin-AZAd-Role,arn:aws:iam::0xxxxxxxxxx2:saml-provider/AzureAd-Acct2
    5. arn:aws:iam::0xxxxxxxxxx2:role/QS-Author-AZAd-Role,arn:aws:iam::0xxxxxxxxxx2:saml-provider/AzureAd-Acct2
    6. arn:aws:iam::0xxxxxxxxxx2:role/QS-Reader-AZAd-Role,arn:aws:iam::0xxxxxxxxxx2:saml-provider/AzureAd-Acct2
  2. The following JSON code is an example of the appRoles Create a similar object to add the roles for your application:
            "appRoles": [
                {
                    "allowedMemberTypes": [
                        "User"
                    ],
                    "description": "User",
                    "displayName": "User",
                    "id": "8774f594-1d59-4279-b9d9-59ef09a23530",
                    "isEnabled": true,
                    "origin": "Application",
                    "value": null
                },
                {
                    "allowedMemberTypes": [
                        "User"
                    ],
                    "description": "msiam_access",
                    "displayName": "msiam_access",
                    "id": "e7f1a7f3-9eda-48e0-9963-bd67bf531afd",
                    "isEnabled": true,
                    "origin": "Application",
                    "value": null
                },
                {
                    "allowedMemberTypes": [
                        "User"
                    ],
                    "description": "Raji Quicksight Admin",
                    "displayName": "RajiQSAdmin",
                    "id": "9a07d03d-667f-405d-b5d7-68bec5b64584",
                    "isEnabled": true,
                    "origin": "ServicePrincipal",
                    "value": "arn:aws:iam::0xxxxxxxxxx2:role/QS-Admin-AZAd-Role,arn:aws:iam::0xxxxxxxxxx2:saml-provider/AzureAd-Acct2"
                },
                {
                    "allowedMemberTypes": [
                        "User"
                    ],
                    "description": "Sri Quicksight Admin",
                    "displayName": "SriQSAdmin",
                    "id": "77dd76d1-f897-4093-bf9a-8f3aaf25f30e",
                    "isEnabled": true,
                    "origin": "ServicePrincipal",
                    "value": "arn:aws:iam::5xxxxxxxxxx9:role/QuickSight-Admin-Role,arn:aws:iam::5xxxxxxxxxx9:saml-provider/AzureActiveDirectory"
                }
            ]

New roles must be followed by msiam_access for the patch operation. You can also add multiple roles, depending on your organization’s needs. Azure AD sends the value of these roles as the claim value in the SAML response.

When adding new roles, you must provide a new GUID for each ID attribute in the JSON payload. You can use online GUID generation tool for generating a new unique GUID per role.

  1. In Microsoft Graph Explorer, change the method from GET to PATCH.
  2. Patch the service principal object with the roles you want by updating the appRoles property, like the one shown in the preceding example.
  3. Choose Run Query to run the patch operation. A success message confirms the creation of the role for your AWS application.

After the service principal is patched with new roles, you can assign users and groups to their respective roles.

  1. In the Azure portal, go to the QuickSight application you created and choose Users and Groups.
  2. Create your groups.

We recommend creating a new group for every AWS role in order to assign a particular role to the group. This one-to-one mapping means that one group is assigned to one role. You can then add members to the group.

  1. After you create the groups, choose the group and assign it to the application.

Nested groups are not allowed.

  1. To assign the role to the group, choose the role, then choose Assign.

Test the application

In this section, you test your Azure AD SSO configuration by using Microsoft Applications.

  1. Navigate to Microsoft Applications.
  2. On the My Apps page, choose AWS Single Sign-On.

  1. Choose a specific role for the QuickSight account you want to use.

You’re redirected to the QuickSight console.

Summary

This post provided step-by-step instructions to configure federated SSO between a single Azure AD instance and multiple QuickSight accounts. We also discussed how to create new roles and map users and groups in Azure AD to IAM for secure access into multiple QuickSight accounts.

For information about federating from Azure AD to a single QuickSight account, see Enabling Amazon QuickSight federation with Azure AD.


About the Authors

 

Srikanth Baheti is a Specialized World Wide Sr. Solution Architect for Amazon QuickSight. He started his career as a consultant and worked for multiple private and government organizations. Later he worked for PerkinElmer Health and Sciences & eResearch Technology Inc, where he was responsible for designing and developing high traffic web applications, highly scalable and maintainable data pipelines for reporting platforms using AWS services and Serverless computing.

 

Raji Sivasubramaniam is a Specialist Solutions Architect at AWS, focusing on Analytics. Raji has 20 years of experience in architecting end-to-end Enterprise Data Management, Business Intelligence and Analytics solutions for Fortune 500 and Fortune 100 companies across the globe. She has in-depth experience in integrated healthcare data and analytics with wide variety of healthcare datasets including managed market, physician targeting and patient analytics. In her spare time, Raji enjoys hiking, yoga and gardening.

 

Padmaja Suren is a Senior Solutions Architect specialized in QuickSight. She has 20+ years of experience in building scalable data platforms for Reporting, Analytics and AI/ML using a variety of technologies. Prior to AWS, in her role as BI Architect at ERT, she designed, engineered and cloud-enabled the BI and Analytics platform for the management of large scale clinical trial data conducted across the world. She dedicates her free time on her passion project SanghWE which helps sexual trauma survivors in developing nations heal and recover.

Migrate Amazon QuickSight across AWS accounts

Post Syndicated from Abhinav Sarin original https://aws.amazon.com/blogs/big-data/migrate-amazon-quicksight-across-aws-accounts/

This blog post is co-written by Glen Douglas and Alex Savchenko from Integrationworx.

Enterprises that follow an Agile software development lifecycle (SDLC) process for their dashboard development and deployment typically have distinct environments for development, staging, QA and test, and production. One recommended approach when developing using AWS is to create multiple AWS accounts corresponding to the various environments. Amazon QuickSight is a fully managed, serverless business intelligence service offered by AWS for building interactive dashboards. With QuickSight, you can share dashboards with your internal users, or embed dashboards into your applications for external users or customers, scaling to 100s of 1000s of users with no servers or infrastructure to maintain. When an account is created on QuickSight, it corresponds to the underlying AWS account. So, when dashboards are created in the development environment, you need to migrate these assets to a higher environment to be in alignment with current DevOps practices. This requires cross-account dashboard migration. This post outlines the steps involved in migrating dashboards from one account to another.

Solution overview

The following diagram shows the architecture of how QuickSight accounts are mapped to AWS accounts. In this post, we outline the steps involved in migrating QuickSight assets in the dev account to the prod account.

Migrating QuickSight dashboards from a dev account to a prod account involves converting the underlying dev dashboard assets into JSON and then recreating them in prod. QuickSight provides a robust set of APIs to create QuickSight assets, such as dashboards, analysis, datasets, and themes. You can run these APIs via the AWS Command Line Interface (AWS CLI) or through SDKs available for various programming languages. For this post, we use the AWS CLI for illustration; for regular usage, we recommend you implement this with the AWS SDK.

To set up the AWS CLI, see Developing Applications with the QuickSight API.

The following diagram illustrates the high-level steps to migrate QuickSight assets from one account to another. First, we need prepare the dataset and data sources, and create the analysis, template, and dashboards in the dev account.

 Next, we use the dev template with the test dataset and data sources to promote the dashboards.

You can create any QuickSight entity programmatically using APIs (for example, create-dataset and create-datasources). Another API allows generating a JSON representation of the entity that was created (such as describe-dataset and describe-datasource). Analysis entity is an exception—as of this writing, there are no APIs for creating or generating a JSON representation. All analysis entities must be created using the AWS Management Console for the first time. From that point on, you can programmatically manage analysis using templates.

Definitions

The following table defines the parameters used in this post.

Environment Name Demo Reference AWS Account ID QuickSight Account Name
Source (Dev) Source account

 

31********64

 

QS-Dev
Target (Test) Target account

 

86********55

 

QS-Test

The following figure summarizes the QuickSight objects and AWS Identity and Access Management (IAM) resources that are referenced in the migration between accounts. The QuickSight objects from the source account are referenced and recreated in the target account.

The following table summarizes the QuickSight objects used in this post for migration from the source account (dev) to the target account (test). As part of the migration, the name of the data source is changed in order to denote the change in underlying data used in the target environment. As we demonstrate in this post, a QuickSight template is created in the source account only. This template contains the information about the source dashboard that is to be created in the target account.

QuickSight Object Type Object Name (Source) Object Name (Target)
Data Source QS_Dev QS_Test
Dataset sporting_event_ticket_info sporting_event_ticket_info
Dashboard

Sporting_event_ticket_info_dashboard

 

Sporting_event_ticket_info_dashboard

 

Template sporting_event_ticket_info_template

For this post, we use an Amazon RDS for PostgreSQL database as the dataset and create a QuickSight visualization using the database table sporting_event_ticket_info. You can use any of the data sources that QuickSight supports or easily test this process using a spreadsheet. The dashboard to be migrated from the development environment shows data from the corresponding dev database (QS_Dev).

Prerequisites

Consider the situation in which you need to migrate a QuickSight dashboard from a source (or dev) environment to a target (or test) environment. The migration requires the following prerequisites:

Step 1: Enable permissions to migrate QuickSight objects

To facilitate using AWS CLI commands and sharing QuickSight templates from the source account to the target account, we perform the following actions in the source environment:

1a) Create an IAM policy.

1b) Create a new IAM group and attach the policy.

1c) Create a new IAM user and assign it to the group.

1d) Invite the IAM user created to the QuickSight (dev) account.

1e) Create another reader policy in the source (dev) account to grant access to the target (test) account.

1f) Create a deployer role in the source account.

Step 1a: Create an IAM policy

You start by creating IAM resources. For this post, we create an IAM policy called Dev-QuickSightAdmin, which is used by IAM users and roles in the source account. The purpose of this policy is to allow a user in the source account to perform various actions on QuickSight objects.

  1. To get started, as the admin user, sign in to the IAM console.
  2. In the navigation pane, choose Policies.
  3. Choose Create policy.
  4. In the Service section, select Choose a service and choose QuickSight.
  5. Under Actions, select the following permissions:
    1. List – ListAnalysis, ListDashboards, ListDataSets, ListDataSources, ListTemplates
    2. Read – DescribeAnalysis, DescribeDashboard, DescribeDataSet, DescribeDataSource, DescribeTemplate
    3. Write – CreateTemplate, UpdateTemplate
    4. Permissions management – DescribeTemplatePermissions, UpdateTemplatePermissions
  6. Select the appropriate resources.

You can restrict the resources and Region based on your requirement. We allow all resources for this post.

  1. Review and create the policy.

Step 1b: Create a new IAM group

Create a new IAM group Dev-grp-QuickSightAdmin and assign the Dev-QuickSightAdmin policy (from Step 1a) to the group in the source account.

Step 1c: Create a new IAM user

Create a new IAM user called Dev-qs-admin-user and assign it to the Dev-grp-QuickSightAdmin group. You use this user later to run the AWS CLI commands in the source account. Alternately, you can use an existing IAM user for this purpose.

Step 1d: Invite the IAM user to the QuickSight (dev) account

Sign in to QuickSight in the source (dev) account and invite the user from Step 1c to QuickSight. Assign the role of ADMIN and for IAM user, choose Yes to indicate that this is an IAM user.

Step 1e: Create another reader policy in the source (dev) account

In the source (dev) account, create another IAM policy, called Dev-QuickSightReader, to grant access to the target (test) account. The purpose of this policy is to allow the target account to perform list and read actions on QuickSight objects in the source account.

  1. On the IAM console, choose Policies in the navigation pane.
  2. Choose Create policy.
  3. In the Service section, select Choose a service and choose QuickSight.
  4. Under Actions, make sure that All QuickSight actions is not selected.
  5. Under Access level, select List and Read.
  6. Select the following permissions:
    1. List – ListAnalysis, ListDashboards, ListDataSets, ListDataSources, ListTemplates
    2. Read – DescribeAnalysis, DescribeDashboard, DescribeDataSet, DescribeDataSource, DescribeTemplate
  7. Review and create the policy.

Verify the reader IAM policy Dev-QuickSightReader shows only the list and read access level for QuickSight services when complete.

Step 1f: Create a deployer role in the source account (dev)

You now create an IAM role called Dev-QuickSight-Deployer in the source account (dev). This role is specifically assigned to the target account ID and assigned the QuickSightReader policy, as noted in the previous step. This allows the external AWS target (test) account to read the QuickSight template contained in the source (dev) account.

  1. On the IAM console, choose Roles in the navigation pane.
  2. Choose Create role.
  3. Select Another AWS account and provide the account ID of the target account.
  4. In the Attach permissions policies section, attach the Dev-QuickSightReader
  5. In the Add tags section, add any tags (optional).
  6. Choose Next.
  7. In the Review section, assign a role name (we use Dev-QuickSight-Deployer).
  8. Enter a role description.
  9. Choose Create role.

You have completed the creation of the policy, group, and user in the source account. The next step is to configure the permissions in the target account.

  1. Switch from the source account to the target account.
  2. In the target account, sign in to the IAM console and repeat the steps in this section to create the Test-QuickSightAdmin policy and Test-grp-QuickSightAdmin group, and assign the policy to the group. Test-QuicksightAdmin should have following permissions:
    1. List – ListAnalysis, ListDashboards, ListDataSets, ListDataSources, ListTemplates
    2. Read – DescribeAnalysis, DescribeDashboard, DescribeDataSet, DescribeDataSource, DescribeTemplate
    3. Write – CreateTemplate, UpdateTemplate, Createdatasource, CreateDashboard, UpdateDataSet
    4. Permissions management – DescribeTemplatePermissions, UpdateTemplatePermissions
    5. Tag: TagResource, UntagResource
  1. Create the IAM user Test-qs-admin-user and add it to the Test-grp-QuickSightAdmin group
  2. Sign in to QuickSight and invite the test user.

One final step is to add Test-qs-admin-user as a trusted entity in the Dev-QuickSight-Deployer role. The reason is the target account needs cross-account access. We use the IAM user Test-qs-admin-user to create the dashboard in the test account.

  1. Switch back to the source (dev) account.
  2. Select role and search for the Dev-QuickSight-Deployer role.
  3. Choose Trust relationships.
  4. Edit the trust relationship and add the following ARN in the Principal section of the policy: arn:aws:iam::86XXXXX55:user/Test-qs-admin-user.

Step 2: Prepare QuickSight objects in the source account

To migrate QuickSight objects to a target environment, we perform the following actions in the source environment:

2a) Prepare the data source file.

2b) Prepare the dataset file.

2c) Create a dashboard template.

Step 2a: Prepare the data source file

Data sources represent connections to specific sources of data and are made available within a QuickSight account. Multiple data source types are supported within QuickSight, including a variety of relational databases, flat files, JSON semi-structured data files, and software as a service (SaaS) data providers.

Any QuickSight dashboards and datasets to be migrated to the target environment must have their corresponding data sources also migrated within the target environment.

In this step, you create a JSON file with the data source information from the source environment. Then you use the create-data-source AWS CLI command in the target environment with the JSON file as input. 

  1. First, identify the data source for your implementation by running the list-data-sources command in the source (dev) account: 
aws quicksight list-data-sources --aws-account-id 31********64

The following code shows the top portion of the command output:

{
    "Status": 200,
    "DataSources": [
        {
            "Arn": "arn:aws:quicksight:us-east-1:31********64:datasource/4b98fee7-4df1-4dc2-8ca3-115c1c1839ab"",
            "DataSourceId": "4b98fee7-4df1-4dc2-8ca3-115c1c1839ab",
            "Name": "QS_Dev",
            "Type": "POSTGRESQL",
            "Status": "CREATION_SUCCESSFUL",
            "CreatedTime": "2020-12-16T20:42:37.280000-08:00",
            "LastUpdatedTime": "2020-12-16T20:42:37.280000-08:00",
            "DataSourceParameters": {
                "RdsParameters": {
                    "InstanceId": "dmslabinstance",
                    "Database": "sportstickets"
                }
            },
            "SslProperties": {
                "DisableSsl": false
            }
        },

  1. Run the describe-data-source command, using the DataSourceId from the previous list-data-source command output.

This command provides details about the data source, which we use to create the data source in the target account.

aws quicksight describe-data-source --aws-account-id 31********64 --data-source-id "4b98fee7-4df1-4dc2-8ca3-115c1c1839ab "

The following is the resulting output:

{
    "Status": 200,
    "DataSource": {
        "Arn": "arn:aws:quicksight:us-east-1:31*******64:datasource/4b98fee7-4df1-4dc2-8ca3-115c1c1839ab",
        "DataSourceId": "4b98fee7-4df1-4dc2-8ca3-115c1c1839ab",
        "Name": "QS_Dev",
        "Type": "POSTGRESQL",
        "Status": "CREATION_SUCCESSFUL",
        "CreatedTime": "2020-12-16T20:42:37.280000-08:00",
        "LastUpdatedTime": "2020-12-16T20:42:37.280000-08:00",
        "DataSourceParameters": {
            "RdsParameters": {
                "InstanceId": "dmslabinstance",
                "Database": "sportstickets"
            }
        },
        "SslProperties": {
            "DisableSsl": false
        }
    },
    "RequestId": "c2455720-118c-442d-9ba3-4f446cb543f1"
}

If you’re migrating more than one data source, you need to repeat the step for each data source.

  1. Use the output from the describe-data-source command to create a JSON file called create-data-source-cli-input.json, which represents the data source that is being migrated.

The contents of the following JSON file reference the data source information (name, host, credentials) for the target environment:

{
    "AwsAccountId": "86********55",
    "DataSourceId": "QS_Test",
    "Name": "QS_Test",
    "Type": "POSTGRESQL",
    "DataSourceParameters": {
        "PostgreSqlParameters": {
            "Host": "dmslabinstance.***********.us-east-1.rds.amazonaws.com",
            "Port": 5432,
            "Database": "sportstickets"
        }
    },
    "Credentials": {
        "CredentialPair": {
            "Username": "xxxxxxx",
            "Password": "yyyyyyy"
        }
    },
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:86********55:user/default/Test-qs-admin-user",
            "Actions": [
                "quicksight:UpdateDataSourcePermissions",
                "quicksight:DescribeDataSource",
                "quicksight:DescribeDataSourcePermissions",
                "quicksight:PassDataSource",
                "quicksight:UpdateDataSource",
                "quicksight:DeleteDataSource"
            ]
        }
    ],
    "Tags": [
        {
            "Key": "Name",
            "Value": "QS_Test"
        }
    ]
}

For this post, because the target environment is connecting to the same data source as the source environment, the values in the JSON can simply be provided from the previous describe-data-source command output.

Step 2b: Prepare the dataset file

Datasets provide an abstraction layer in QuickSight, which represents prepared data from a data source in the QuickSight account. The intent of prepared datasets is to enable reuse in multiple analyses and sharing amongst QuickSight users. A dataset can include calculated fields, filters, and changed file names or data types. When based on a relational database, datasets can join tables within QuickSight, or as part of the underlying SQL query used to define the dataset.

The sample dataset sporting_event_ticket_info represents a single table; however, in a relational database, datasets can join tables within QuickSight, or as part of the underlying SQL query used to define the dataset.

Similar to the process used for data sources, you create a JSON file representing the datasets from the source account.

  1. Run the list-data-sets command to get all datasets from the source account:
aws quicksight list-data-sets --aws-account-id 31********64

The following code is the output:

{
            "Arn": "arn:aws:quicksight:us-east-1:31********64:dataset/24b1b03a-86ce-41c7-9df7-5be5343ff9d9",
            "DataSetId": "24b1b03a-86ce-41c7-9df7-5be5343ff9d9",
            "Name": "sporting_event_ticket_info",
            "CreatedTime": "2020-12-16T20:45:36.672000-08:00",
            "LastUpdatedTime": "2020-12-16T20:45:36.835000-08:00",
            "ImportMode": "SPICE"
}

  1. Run the describe-data-set command, specifying the DataSetId from the previous command’s response:
aws quicksight describe-data-set --aws-account-id 31********64 --data-set-id "24b1b03a-86ce-41c7-9df7-5be5343ff9d9"

The following code shows the output:

{
    "Status": 200,
    "DataSet": {
        "Arn": "arn:aws:quicksight:us-east-1:31********64:dataset/24b1b03a-86ce-41c7-9df7-5be5343ff9d9",
        "DataSetId": "24b1b03a-86ce-41c7-9df7-5be5343ff9d9",
        "Name": "sporting_event_ticket_info",
        "CreatedTime": "2020-12-16T20:45:36.672000-08:00",
        "LastUpdatedTime": "2020-12-16T20:45:36.835000-08:00",
        "PhysicalTableMap": {
            "d7ec8dff-c136-4c9a-a338-7017a95a4473": {
                "RelationalTable": {
                    "DataSourceArn": "arn:aws:quicksight:us-east-1:31********64:datasource/f72f8c2a-f9e2-4c3c-8221-0b1853e920b2",
                    "Schema": "dms_sample",
                    "Name": "sporting_event_ticket_info",
                    "InputColumns": [
                        {
                            "Name": "ticket_id",
                            "Type": "DECIMAL"
                        },
                        {
                            "Name": "event_id",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "sport",
                            "Type": "STRING"
                        },
                        {
                            "Name": "event_date_time",
                            "Type": "DATETIME"
                        },
                        {
                            "Name": "home_team",
                            "Type": "STRING"
                        },
                        {
                            "Name": "away_team",
                            "Type": "STRING"
                        },
                        {
                            "Name": "location",
                            "Type": "STRING"
                        },
                        {
                            "Name": "city",
                            "Type": "STRING"
                        },
                        {
                            "Name": "seat_level",
                            "Type": "DECIMAL"
                        },
                        {
                            "Name": "seat_section",
                            "Type": "STRING"
                        },
                        {
                            "Name": "seat_row",
                            "Type": "STRING"
                        },
                        {
                            "Name": "seat",
                            "Type": "STRING"
                        },
                        {
                            "Name": "ticket_price",
                            "Type": "DECIMAL"
                        },
                        {
                            "Name": "ticketholder",
                            "Type": "STRING"
                        }
                    ]
                }
            }
        },
        "LogicalTableMap": {
            "d7ec8dff-c136-4c9a-a338-7017a95a4473": {
                "Alias": "sporting_event_ticket_info",
                "DataTransforms": [
                    {
                        "TagColumnOperation": {
                            "ColumnName": "city",
                            "Tags": [
                                {
                                    "ColumnGeographicRole": "CITY"
                                }
                            ]
                        }
                    }
                ],
                "Source": {
                    "PhysicalTableId": "d7ec8dff-c136-4c9a-a338-7017a95a4473"
                }
            }
        },
        "OutputColumns": [
            {
                "Name": "ticket_id",
                "Type": "DECIMAL"
            },
            {
                "Name": "event_id",
                "Type": "INTEGER"
            },
            {
                "Name": "sport",
                "Type": "STRING"
            },
            {
                "Name": "event_date_time",
                "Type": "DATETIME"
            },
            {
                "Name": "home_team",
                "Type": "STRING"
            },
            {
                "Name": "away_team",
                "Type": "STRING"
            },
            {
                "Name": "location",
                "Type": "STRING"
            },
            {
                "Name": "city",
                "Type": "STRING"
            },
            {
                "Name": "seat_level",
                "Type": "DECIMAL"
            },
            {
                "Name": "seat_section",
                "Type": "STRING"
            },
            {
                "Name": "seat_row",
                "Type": "STRING"
            },
            {
                "Name": "seat",
                "Type": "STRING"
            },
            {
                "Name": "ticket_price",
                "Type": "DECIMAL"
            },
            {
                "Name": "ticketholder",
                "Type": "STRING"
            }
        ],
        "ImportMode": "SPICE",
        "ConsumedSpiceCapacityInBytes": 318386511
    },
    "RequestId": "8c6cabb4-5b9d-4607-922c-1916acc9da1a"
}
  1. Based on the dataset description, create a JSON file based on the template file (create-data-set-cli-input-sql.json) with the details listed in the describe-data-set command output:
{

    "AwsAccountId": "86********55",
    "DataSetId": "24b1b03a-86ce-41c7-9df7-5be5343ff9d9",
    "Name": "person",
    "PhysicalTableMap": {
        "23fb761f-df37-4242-9f23-ba61be20a0df": {
            "RelationalTable": {
                "DataSourceArn": "arn:aws:quicksight:us-east-1: 86********55:datasource/QS_Test",
                "Schema": "dms_sample",
                "Name": " sporting_event_ticket_info ",
                "InputColumns": [

                        {
                            "Name": "ticket_id",
                            "Type": "DECIMAL"
                        },
                        {
                            "Name": "event_id",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "sport",
                            "Type": "STRING"
                        },
                        {
                            "Name": "event_date_time",
                            "Type": "DATETIME"
                        },
                        {
                            "Name": "home_team",
                            "Type": "STRING"
                        },
                        {
                            "Name": "away_team",
                            "Type": "STRING"
                        },
                        {
                            "Name": "location",
                            "Type": "STRING"
                        },
                        {
                            "Name": "city",
                            "Type": "STRING"
                        },
                        {
                            "Name": "seat_level",
                            "Type": "DECIMAL"
                        },
                        {
                            "Name": "seat_section",
                            "Type": "STRING"
                        },
                        {
                            "Name": "seat_row",
                            "Type": "STRING"
                        },
                        {
                            "Name": "seat",
                            "Type": "STRING"
                        },
                        {
                            "Name": "ticket_price",
                            "Type": "DECIMAL"
                        },
                        {
                            "Name": "ticketholder",
                            "Type": "STRING"
                        }
                    ]
                }
            }
        },
        "LogicalTableMap": {
            "23fb761f-df37-4242-9f23-ba61be20a0df": {
                "Alias": "person",
                "Source": {
                    "PhysicalTableId": "23fb761f-df37-4242-9f23-ba61be20a0df"
                }
            }
        },
    "ImportMode": "SPICE",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1: 86********55:user/default/Test-qs-admin-user",
            "Actions": [
                "quicksight:UpdateDataSetPermissions",
                "quicksight:DescribeDataSet",
                "quicksight:DescribeDataSetPermissions",
                "quicksight:PassDataSet",
                "quicksight:DescribeIngestion",
                "quicksight:ListIngestions",
                "quicksight:UpdateDataSet",
                "quicksight:DeleteDataSet",
                "quicksight:CreateIngestion",
                "quicksight:CancelIngestion"
            ]
        }
    ],
    "Tags": [
        {
            "Key": "Name",
            "Value": "QS_Test"
        }
    ]
}

The DataSource Arn should reference the ARN of the existing data source (in this case, the source created in the previous step).

Step 2c: Create a QuickSight template

A template in QuickSight is an entity that encapsulates the metadata that describes an analysis. A template provides a layer of abstraction from a specific analysis by using placeholders for the underlying datasets used to create the analysis. When we replace dataset placeholders in a template, we can recreate an analysis for a different dataset that follows the same schema as the original analysis.

You can also share templates across accounts. This feature, combined with the dataset placeholders in a template, provides the means to migrate a dashboard from one account to another.

  1. To create a template, begin by using the list-dashboards command to get list of available dashboards in the source environment:
aws quicksight list-dashboards --aws-account-id 31********64

The command output can be lengthy, depending on the number of dashboards in the source environment.

  1. Search for the “Name” of the desired dashboard in the output file and copy the corresponding DashboardId to use in the next step:
{
            "Arn": "arn:aws:quicksight:us-east-1:31********64:dashboard/c5345f81-e79d-4a46-8203-2763738489d1",
            "DashboardId": "c5345f81-e79d-4a46-8203-2763738489d1",
            "Name": "Sportinng_event_ticket_info_dashboard",
            "CreatedTime": "2020-12-18T01:30:41.209000-08:00",
            "LastUpdatedTime": "2020-12-18T01:30:41.203000-08:00",
            "PublishedVersionNumber": 1,
            "LastPublishedTime": "2020-12-18T01:30:41.209000-08:00"
        }
  1. Run the describe-dashboard command for the DashboardId copied in the preceding step:
aws quicksight describe-dashboard --aws-account-id 31********64 --dashboard-id "c5345f81-e79d-4a46-8203-2763738489d1"

The response should look like the following code:

{
    "Status": 200,
    "Dashboard": {
        "DashboardId": "c5345f81-e79d-4a46-8203-2763738489d1",
        "Arn": "arn:aws:quicksight:us-east-1:31********64:dashboard/c5345f81-e79d-4a46-8203-2763738489d1",
        "Name": "Sportinng_event_ticket_info_dashboard",
        "Version": {
            "CreatedTime": "2020-12-18T01:30:41.203000-08:00",
            "Errors": [],
            "VersionNumber": 1,
            "SourceEntityArn": "arn:aws:quicksight:us-east-1:31*******64:analysis/daddefc4-c97c-4460-86e0-5b488ec29cdb",
            "DataSetArns": [
                "arn:aws:quicksight:us-east-1:31********64:dataset/24b1b03a-86ce-41c7-9df7-5be5343ff9d9"
            ]
        },
        "CreatedTime": "2020-12-18T01:30:41.209000-08:00",
        "LastPublishedTime": "2020-12-18T01:30:41.209000-08:00",
        "LastUpdatedTime": "2020-12-18T01:30:41.203000-08:00"
    },
    "RequestId": "be9e60a6-d271-4aaf-ab6b-67f2ba5c1c20"
}
  1. Use the details obtained from the describe-dashboard command to create a JSON file based on the file create-template-cli-input.json.

The following code represents the input for creating a QuickSight template:

 {
    "AwsAccountId": "31********64",
    "TemplateId": "Sporting_event_ticket_info_template",
    "Name": "Sporting event ticket info template",
    "SourceEntity": {
        "SourceAnalysis": {
            "Arn": "arn:aws:quicksight:us-east-1:31********64:analysis/daddefc4-c97c-4460-86e0-5b488ec29cdb",
            "DataSetReferences": [
                {
                    "DataSetPlaceholder": "TicketInfo",
                    "DataSetArn": "arn:aws:quicksight:us-east-1:31********64:dataset/24b1b03a-86ce-41c7-9df7-5be5343ff9d9"
                }
            ]
        }
    },
    "VersionDescription": "1"
}
  1. Run the create-template command to create a template object based on the file you created.

For example, the JSON file named create-template-cli-input.json would be run as follows:

aws quicksight create-template --cli-input-json file://./create-template-cli-input.json

The following is the expected response for the create-template command:

{
    "Status": 202,
    "Arn": "arn:aws:quicksight:us-east-1:31********64:template/Sporting_event_ticket_info_template",
    "VersionArn": "arn:aws:quicksight:us-east-1:31********64:template/Sporting_event_ticket_info_template/version/1",
    "TemplateId": "Sporting_event_ticket_info_template",
    "CreationStatus": "CREATION_IN_PROGRESS",
    "RequestId": "c32e4eb1-ecb6-40fd-a2da-cce86e15660a" 
}

The template is created in the background, as noted by the CreationStatus. Templates aren’t visible within the QuickSight UI; they’re a developer-managed or admin-managed asset that is only accessible via the APIs.

  1. To check status of the template, run the describe-template command:
aws quicksight describe-template --aws-account-id 31********64 --template-id "Sporting_event_ticket_info_template"

The expected response for the describe-template command should indicate a Status of CREATION_SUCCESSFUL:

{
    "Status": 200,
    "Template": {
        "Arn": "arn:aws:quicksight:us-east-1:31********64:template/Sporting_event_ticket_info_template",
        "Name": "Sporting event ticket info template",
        "Version": {
            "CreatedTime": "2020-12-18T01:56:49.135000-08:00",
            "VersionNumber": 1,
            "Status": "CREATION_SUCCESSFUL",
            "DataSetConfigurations": [
                {
                    "Placeholder": "TicketInfo",
                    "DataSetSchema": {
                        "ColumnSchemaList": [
                            [
                        {
                            "Name": "ticket_id",
                            "Type": "DECIMAL"
                        },
                        {
                            "Name": "event_id",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "sport",
                            "Type": "STRING"
                        },
                        {
                            "Name": "event_date_time",
                            "Type": "DATETIME"
                        },
                        {
                            "Name": "home_team",
                            "Type": "STRING"
                        },
                        {
                            "Name": "away_team",
                            "Type": "STRING"
                        },
                        {
                            "Name": "location",
                            "Type": "STRING"
                        },
                        {
                            "Name": "city",
                            "Type": "STRING"
                        },
                        {
                            "Name": "seat_level",
                            "Type": "DECIMAL"
                        },
                        {
                            "Name": "seat_section",
                            "Type": "STRING"
                        },
                        {
                            "Name": "seat_row",
                            "Type": "STRING"
                        },
                        {
                            "Name": "seat",
                            "Type": "STRING"
                        },
                        {
                            "Name": "ticket_price",
                            "Type": "DECIMAL"
                        },
                        {
                            "Name": "ticketholder",
                            "Type": "STRING"
                        }
                        ]
                    },
                    "ColumnGroupSchemaList": []
                }
            ],
            "Description": "1",
            "SourceEntityArn": "arn:aws:quicksight:us-east-1:31********64:analysis/daddefc4-c97c-4460-86e0-5b488ec29cdb"
        },
        "TemplateId": "Sporting_event_ticket_info_template",
        "LastUpdatedTime": "2020-12-18T01:56:49.125000-08:00",
        "CreatedTime": "2020-12-18T01:56:49.125000-08:00"
    },
    "RequestId": "06c61098-2a7e-4b0c-a27b-1d7fc1742e06"
}
  1. Take note of the TemplateArn value in the output to use in subsequent steps.
  2. After you verify the template has been created, create a second JSON file (TemplatePermissions.json) and replace the Principal value with the ARN for the target account:
[
{
"Principal": "arn:aws:iam::86*******55:root",
"Actions": ["quicksight:UpdateTemplatePermissions","quicksight:DescribeTemplate"]
}
]
  1. Use this JSON file as the input for the update-template-permissions command, which allows cross-account read access from the source template (source account) to the target account:
aws quicksight update-template-permissions --aws-account-id 31********64 --template-id "Sporting_event_ticket_info_template" --grant-permissions file://./TemplatePermission.json --profile default

This command permits the target account to view the template in the source account. The expected response for the update-template-permissions command should look like the following code:

{
    "Status": 200,
    "TemplateId": "Sporting_event_ticket_info_template",
    "TemplateArn": "arn:aws:quicksight:us-east-1:31********64:template/Sporting_event_ticket_info_template",
    "Permissions": [
        {
            "Principal": "arn:aws:iam::86********55:root",
            "Actions": [
                "quicksight:UpdateTemplatePermissions",
                "quicksight:DescribeTemplate"
            ]
        }
    ],
    "RequestId": "fa153511-5674-4891-9018-a8409ad5b8b2"
}

At this point, all the required work in the source account is complete. The next steps use the AWS CLI configured for the target account.

Step 3: Create QuickSight resources in the target account

To create the data sources and data templates in the target account, you perform the following actions in the target environment using Test-qs-admin-user:

3a) Create a data source in the target account.

3b) Create datasets in the target account.

3c) Create dashboards in the target account.

Step 3a: Create a data source in the target account

To create a data source in your target account, complete the following steps:

  1. Use the data source file created in Step 2a to run the create-data-source command in the target environment:
aws quicksight create-data-source --cli-input-json file://./create-data-source-cli-input.json

The response from the command should indicate the creation is in progress:

{
    "Status": 202,
    "Arn": "arn:aws:quicksight:us-east-1:86********55:datasource/QS_Test",
    "DataSourceId": "QS_Test",
    "CreationStatus": "CREATION_IN_PROGRESS",
    "RequestId": "3bf160e2-a5b5-4c74-8c67-f651bef9b729"
}
  1. Use the describe-data-source command to validate that the data source was created successfully:
aws quicksight describe-data-source --aws-account-id 86********55 --data-source-id "QS_Test"

The following code shows the response:

{
    "Status": 200,
    "DataSource": {
        "Arn": "arn:aws:quicksight:us-east-1:86********55:datasource/QS_Test",
        "DataSourceId": "QS_Test",
        "Name": "QS_Test",
        "Type": "POSTGRESQL",
        "Status": "CREATION_SUCCESSFUL",
        "CreatedTime": "2020-12-21T12:11:30.734000-08:00",
        "LastUpdatedTime": "2020-12-21T12:11:31.236000-08:00",
        "DataSourceParameters": {
            "PostgreSqlParameters": {
                "Host": "dmslabinstance.*************.us-east-1.rds.amazonaws.com",
                "Port": 5432,
                "Database": "sportstickets"
            }
        },
        "SslProperties": {
            "DisableSsl": false
        }
    },
    "RequestId": "fbb72f11-1f84-4c57-90d2-60e3cbd20454"}
  1. Take note of the DataSourceArn value to reference later when creating the dataset.

The data source should now be available within QuickSight. Keep in mind that the data source is visible to the Test-qs-admin-user user, so you must sign in as Test-qs-admin-user and open QuickSight. For this post, the input JSON file renamed the data source to reflect the test environment. Alternatively, sign in to the target QuickSight account and choose Create new dataset to view the available data source.

Step 3b: Create datasets in the target account

Now that the data source is in the target environment, you’re ready to create your datasets.

  1. Use the create-data-set command to create the dataset using the create-data-set-cli-input-sql.json created in Step 2b.

Make sure to replace the DataSourceARN in create-data-set-cli-input-sql.json with the Data SourceArn value shown in the describe-data-source command in Step 3a.

aws quicksight create-data-set --cli-input-json file://./create-data-set-cli-input-sql.json

The following code shows our results:

{
    "Status": 201,
    "Arn": "arn:aws:quicksight:us-east-1:86********55:dataset/24b1b03a-86ce-41c7-9df7-5be5343ff9d9",
    "DataSetId": "24b1b03a-86ce-41c7-9df7-5be5343ff9d9",
    "IngestionArn": "arn:aws:quicksight:us-east-1:86********55:dataset/24b1b03a-86ce-41c7-9df7-5be5343ff9d9/ingestion/d819678e-89da-4392-b550-04221a0e4c11",
    "IngestionId": "d819678e-89da-4392-b550-04221a0e4c11",
    "RequestId": "d82caa28-ca43-4c18-86ab-5a92b6b5aa0c"
}
  1. Make note of the ARN of the dataset to use in a later step.
  2. Validate that the dataset was created using the describe-data-set command:
aws quicksight describe-data-set --aws-account-id 86********55 --data-set-id "24b1b03a-86ce-41c7-9df7-5be5343ff9d9"

Alternately, sign in to QuickSight to see new the new datasets on the list.

Step 3c: Create dashboards in the target account

Now that you shared the template with the target account in Step 2c, the final step is to create a JSON file that contains details about the dashboard to migrate to the target account.

  1. Create a JSON file (create-dashboard-cli-input.json) based on the following sample code, and provide the target account and the source account that contains the template:
{
    "AwsAccountId": "86********55",
    "DashboardId": "TicketanalysisTest",
    "Name": "Sportinng_event_ticket_info_dashboard",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:86********55:user/default/Test-qs-admin-user",
            "Actions": [
                "quicksight:DescribeDashboard",
                "quicksight:ListDashboardVersions",
                "quicksight:UpdateDashboardPermissions",
                "quicksight:QueryDashboard",
                "quicksight:UpdateDashboard",
                "quicksight:DeleteDashboard",
                "quicksight:DescribeDashboardPermissions",
                "quicksight:UpdateDashboardPublishedVersion"
            ]
        }
    ],
    "SourceEntity": {
        "SourceTemplate": {
            "DataSetReferences": [
                {
                   "DataSetPlaceholder": "TicketInfo",
                    "DataSetArn": "arn:aws:quicksight:us-east-1:86********55:dataset/24b1b03a-86ce-41c7-9df7-5be5343ff9d9"
                }
            ],
            "Arn": "arn:aws:quicksight:us-east-1:31********64:template/Sporting_event_ticket_info_template"
        }
    },
    "VersionDescription": "1",
    "DashboardPublishOptions": {
        "AdHocFilteringOption": {
            "AvailabilityStatus": "DISABLED"
        },
        "ExportToCSVOption": {
            "AvailabilityStatus": "ENABLED"
        },
        "SheetControlsOption": {
            "VisibilityState": "EXPANDED"
        }
    }
}

The preceding JSON file has a few important values:

  • The Principal value in the Permissions section references a QuickSight user (Test-qs-admin-user) in the target QuickSight account, which is assigned various actions on the new dashboard to be created.
  • The DataSetPlaceholder in the SourceTemplate must use the same name as specified in the template created in Step 2c. This applies to all DataSetPlaceholder values if more than one is referenced in the dashboard.
  • The DataSetArn value is the ARN of the dataset created in Step 3b.
  • The ARN value in the SourceTemplate section references the ARN of the template created in the source account in Step 2c.
  1. After you create the file, run the create-dashboard command to create the dashboard in the target QuickSight account:
aws quicksight create-dashboard --cli-input-json file://./create-dashboard-cli-input.json

The following code shows the response:

{
    "Status": 202,
    "Arn": "arn:aws:quicksight:us-east-1:86********55:dashboard/TicketanalysisTest",
    "VersionArn": "arn:aws:quicksight:us-east-1:86********55:dashboard/TicketanalysisTest/version/1",
    "DashboardId": "TicketanalysisTest",
    "CreationStatus": "CREATION_IN_PROGRESS",
    "RequestId": "d57fa0fb-4736-441d-9e74-c5d64b3e4024"
}
  1. Open QuickSight for the target account to see the newly created dashboard.

Use the same Test-qs-admin-user to sign in to QuickSight. The following screenshot shows that the provided DashboardId is part of the URL.

Like any QuickSight dashboard, you can share the dashboard with users and groups within the target QuickSight account.

The QuickSight CLI includes additional commands for performing operations to update existing datasets and dashboards in a QuickSight account. You can use these commands to promote new versions of existing dashboards.

Clean up your resources

As a matter of good practice, when migration is complete, you should revoke the cross-account role created in Step 1 that allows trusted account access. Similarly, you should disable or remove any other user accounts and permissions created as part of this process.

 Conclusion

This post demonstrated an approach to migrate QuickSight objects from one QuickSight account to another. You can use this solution as a general-purpose method to move QuickSight objects between any two accounts or as a way to support SDLC practices for managing and releasing versions of QuickSight solutions in operational environments.

For more information about automating dashboard deployment, customizing access to the QuickSight console, configuring for team collaboration, and implementing multi-tenancy and client user segregation, check out the video Admin Level-Up Virtual Workshop, V2 on YouTube.


About the authors

Abhinav Sarin is a senior partner solutions architect at Amazon Web Services, his core interests include databases, data analytics and machine learning. He works with AWS customers/partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.

 

Michael Heyd is a Solutions Architect with Amazon Web Services and is based in Vancouver, Canada. Michael works with enterprise AWS customers to transform their business through innovative use of cloud technologies. Outside work he enjoys board games and biking.

 

 

Glen Douglas is an Enterprise Architect with over 25 years IT experience and is a Managing Partner at Integrationworx. He works with clients to solve challenges with data integration, master data management, analytics and data engineering, in a variety of industries and computing platforms. Glen is involved in all aspects of client solution definition through project delivery and has been TOGAF 8 & 9 certified since 2008.

 

Alex Savchenko is a Senior Data Specialist with Integrationworx. He is TOGAF 9 certified and has over 22 years experience applying deep knowledge in data movement, processing, and analytics in a variety of industries and platforms.

Power operational insights with Amazon QuickSight

Post Syndicated from Luis Gerardo Baeza original https://aws.amazon.com/blogs/big-data/power-operational-insights-with-amazon-quicksight/

Organizations need a consolidated view of their applications, but typically application health status is siloed: end-users complain on social media platforms, operational data coming from application logs is stored on complex monitoring tools, formal ticketing systems track reported issues, and synthetic monitoring data is only available for the tool administrators.

In this post, we show how to use Amazon QuickSight, AWS’ fully managed, cloud-native Business Intelligence service to quickly build a dashboard that consolidates:

  • Operational data from application logs coming from Amazon CloudWatch.
  • Issues reported on Jira software as a service (SaaS) edition.
  • Public posts on Twitter.
  • Synthetic monitoring performed using a CloudWatch Synthetics canary.

This dashboard can provide a holistic view of the health status of a workload, for example, you can:

  • Trace end-users complaining on Twitter or creating issues on Jira back to the application logs where the error occurred.
  • Identify when customers are complaining about system performance or availability on social media.
  • Corelate reports with monitoring metrics (such as availability and latency).
  • Track down errors in application code using log information.
  • Prioritize issues already being addressed based on Jira information.

Solution overview

The following architecture for the solution consists of a subscription filter for CloudWatch Logs to continuously send the application logs to an Amazon Simple Storage Service (Amazon S3) bucket, an AWS Glue crawler to update Amazon S3 log table metadata, a view on Amazon Athena that formats the data on the bucket, and three QuickSight datasets: Athena, Jira, and Twitter.

To implement this solution, complete the following steps:

  1. Set up CloudWatch and AWS Glue resources.
  2. Set up a QuickSight dataset for Athena.
  3. Set up a QuickSight dataset for CloudWatch Synthetics.
  4. Set up a QuickSight dataset for Twitter.
  5. Set up a QuickSight dataset for Jira.
  6. Create a QuickSight overview analysis.
  7. Create a QuickSight detailed analysis.
  8. Publish your QuickSight dashboard.

Prerequisites

To get started, make sure you make the following prerequisites:

  • An AWS account.
  • Previous experience working with the AWS Management Console.
  • A Twitter account.
  • A Jira SaaS account. Make sure that the DNS name of your Jira Cloud is accessible to QuickSight.
  • Access to the Athena engine v2.

Set up CloudWatch and AWS Glue resources

Start by deploying a Lambda transformation function to use with your Amazon Kinesis Data Firehose delivery stream:

  1. On the Lambda console, launch a new function using the kinesis-firehose-cloudwatch-logs-processor blueprint.
  2. Enter a function name and choose Create function.
  3. Modify the transformLogEvent function in the Lambda code:
function transformLogEvent(logEvent) {
return Promise.resolve(`${logEvent.timestamp},${logEvent.message}\n`);
}
  1. Choose Deploy to update the function.

As part of these steps, you create a new AWS Identity and Access Management (IAM) role with basic permissions and will attach an IAM policy created by AWS CloudFormation later.

  1. Choose Copy ARN and save the ARN temporarily for later use.

To create the sample resources, complete the following steps:

  1. Choose Launch Stack:

  1. Choose Next.
  2. Enter a stack name.
  3. For TransformationLambdaArn, enter the function ARN you copied earlier.
  4. Choose Next twice, then acknowledge the message about IAM capabilities.
  5. Choose Create stack.

By default, when you launch the template, you’re taken to the AWS CloudFormation Events page. After 5 minutes, the stack launch is complete.

Test the Lambda function

We can test the function to write sample logs into the log group.

  1. On the Resources page of the AWS CloudFormation console search for LogGenerator.
  2. Choose the physical ID of the Lambda function.
  3. On the Lambda console, choose the function you created.
  4. Choose Test, enter sample for the Event name and leave the other configurations at their default.
  5. Choose Create.
  6. Choose Test again and you should receive the message “Logs generated.”
  7. On the Functions page of the Lambda console, choose the transformation function you created.

Change the AWS Lambda function configuration

  1. On the Configuration tab, choose General configuration.
  2. Choose Edit.
  3. For Memory, set to 256 MB.
  4. For Timeout, increase to 5 minutes.
  5. Choose Save.
  6. Choose Permissions, then choose the role name ID to open the IAM console.
  7. Choose Attach policies.
  8. Search for and select InsightsTransformationFunctionPolicy.
  9. Choose Attach policy.

The policy you attached allows your Lambda transformation function to put records into your Kinesis Data Firehose delivery stream.

Partitioning has emerged as an important technique for organizing datasets so that they can be queried efficiently by a variety of big data systems. Data is organized in a hierarchical directory structure based on the distinct values of one or more columns. The Firehose delivery stream automatically partitions data by date.

  1. On the Amazon S3 console, locate and choose the bucket insightlogsbucket.
  2. Choose the cwlogs prefix and navigate through the partitions created by Kinesis Data Firehose (year/month/day/hour).

Make sure the bucket contains at least one file. (It may take up to 5 minutes to show because Kinesis Data Firehose buffers the data by default.)

To optimize the log storage, you can later enable record transformation to Parquet, a columnar data format. For more information, see Converting Input Record Format (Console).

Run the AWS Glue crawler

To complete this section, run the AWS Glue crawler to discover bucket metadata:

  1. On the AWS Glue console, choose Crawlers.
  2. Select the crawler InsightsLogCrawler and choose Run crawler.

  1. Wait for the crawler to complete, then choose Tables.
  2. Choose the filter bar and choose the resource attribute Database.
  3. Enter insightsdb and choose Enter.
  4. You should see a table with a CSV classification.
  5. On the Athena console, enter the following into the query editor:
select * from cwlogs limit 5;
  1. Choose Run query.

You should see a table like the following screenshot.

Set up a QuickSight dataset for Athena

If you haven’t signed up for a QuickSight subscription, do so before creating your dataset.

To use the table created in the AWS Glue Data Catalog, you have to authorize connections to Athena.

  1. On the QuickSight console, choose your QuickSight username and choose Manage QuickSight.
  2. Choose Security & permissions in the navigation pane.
  3. Choose Add or remove to set up access to Amazon S3.

  1. Choose your S3 bucket insightslosbucket created by the CloudFormation template to allow QuickSight to access it.
  2. Choose the QuickSight logo to exit the management screen.

Add the Athena dataset

Now we can set up the Athena dataset.

  1. On the QuickSight console, choose Datasets in the navigation pane, then choose New dataset.
  2. Choose Athena from the available options.
  3. For Data source name, enter cwlogs.
  4. Leave the default workgroup selected (primary).
  5. Choose Create data source.
  6. Open the list of databases and choose insightsdb.
  7. Choose the cwlogs table and choose Use custom SQL.
  8. Replace New custom SQL with cwlogs and enter the following SQL code:
SELECT col1 as logmessage, col0 as datetime FROM "insightsdb"."cwlogs"
  1. Choose Confirm query.

You receive a confirmation of the dataset creation.

  1. Choose Edit/Preview Data.
  2. For Dataset name, enter cwlogs.

Separate severity levels

The logs contain a severity level (INFO, WARN, ERRR) embedded into the message, to enable analysis of the logs based on this value, you separate the severity level from the message using QuickSight calculated fields.

  1. Choose Dataset below the query editor and choose the datetime
  2. Change the type to Date

  1. Open the Fields panel on the left and choose Add calculated field.
  2. For Add name, enter level.
  3. Enter the following code:
substring(logmessage,1,4)
  1. Choose Save.
  2. Choose Add calculated field.
  3. For Add name, enter message.
  4. Enter the following code:
replace(logmessage,concat(level," - "),"")
  1. Choose Save.
  2. Choose the options icon (three dots) next to the logmessage field and choose Exclude field.

The final dataset should be similar to the following screenshot.

  1. Choose Save.

Set up a QuickSight dataset for CloudWatch Synthetics

You add Synthetics monitoring metrics to our QuickSight dashboard to have visibility into the availability of your website. For this, you use the Athena CloudWatch connector.

Create a CloudWatch Synthetics canary

To create a CloudWatch Synthetics canary that monitors your website using heartbeats (sample requests) to test availability, complete the following steps:

  1. On the CloudWatch Synthetics console, choose Create canary.
  2. Make sure the blueprint Heartbeat monitoring is selected.
  3. For Name, enter webstatus.
  4. For Application or endpoint URL, enter your website’s URL.
  5. Under Schedule, enter a frequency that works best for you (1–60 minutes).

The default setup is every 5 minutes.

  1. Enter an S3 location to store artifacts.
  2. Choose Create canary.

Set up the Athena CloudWatch connector

CloudWatch Synthetics sends availability data from the canary to CloudWatch Metrics. To query the metrics, you can use the Athena CloudWatch Metrics connector.

  1. On the Athena console, choose Data sources.
  2. Choose Connect data source and choose Query a data source.
  3. Choose Amazon CloudWatch Metrics and choose Next.
  4. Choose Configure new AWS Lambda function.

 A new tab opens in the browser, which you return to after deploying the Lambda function.

  1. For SpillBucket, enter the name of your S3 bucket insightslobsbucket created by the CloudFormation template you deployed.
  2. For AthenaCatalogName, enter cwmetrics.
  3. Select I acknowledge that this app creates custom IAM roles.
  4. Choose Deploy.
  5. Close the browser tab and go to the Athena tab you were on before.
  6. Refresh the list of Lambda functions by choosing the refresh icon.
  7. Under Lambda function, choose the Lambda function you just created.
  8. For Catalog name, enter cwmetrics.
  9. Choose Connect.

Set up permissions for QuickSight to use the connector

The CloudWatch connector runs on Lambda, which uses a spill bucket to handle large queries, so QuickSight needs permission to invoke the Lambda function and write to the spill bucket. For more information, see the GitHub repo. Let’s set up permission to allow QuickSight use the CloudWatch connector.

  1. On the QuickSight console admin page, choose Security & permissions.
  2. Choose Add or remove.
  3. Choose Athena.
  4. On the S3 tab, specify write permissions for your insightslogsbucket S3 bucket
  5. On the Lambda tab, choose your Lambda function cwmetrics.

Create the new QuickSight dataset

Now we set up the QuickSight dataset for CloudWatch Synthetics.

  1. On the QuickSight console, choose Datasets.
  2. Choose New dataset and choose Athena.
  3. For Data source name, enter cwmetrics.
  4. Choose Create data source.
  5. Open the list of catalogs and choose cwmetrics.
  6. Choose the metric_samples table and choose Use custom SQL.
  7. For New custom SQL field, enter cwmetrics.
  8. Enter the following SQL code:
SELECT timestamp, value, dimension.dim_name, dimension.dim_value
    FROM cwmetrics.default.metric_samples CROSS JOIN UNNEST(dimensions) as  t(dimension)
    WHERE namespace='CloudWatchSynthetics' AND metric_name='SuccessPercent'
        AND statistic='Average' and dim_name='StepName' 
        AND dimension.dim_name='CanaryName' AND dimension.dim_value='webstatus'
        AND timestamp BETWEEN To_unixtime(Now() - INTERVAL '7' DAY) 
        AND To_unixtime(Now())
  1. Choose Confirm query.

You receive a confirmation of the dataset creation.

  1. Choose Edit/Preview data.
  2. Choose Dataset below the query editor and choose the timestamp field.
  3. Change the type to Date.
  4. Choose Save.

Set up a QuickSight dataset for Twitter

To set up the Twitter dataset, complete the following steps:

  1. On the QuickSight console, create a new dataset with Twitter as the source.
  2. For Data source name, enter twitterds.
  3. For Query, enter a hashtag or keyword to analyze from Twitter posts.
  4. Choose Create data source.

A new window opens requesting you to give QuickSight OAuth authorization for Twitter.

  1. Sign in to Twitter and choose Authorize application.

  1. Choose the table Twitt, then choose Edit/Preview data.

It might take a couple of minutes for the records to be imported into SPICE, the QuickSight Super-fast, Parallel, In-memory Calculation Engine. It’s engineered to rapidly perform advanced calculations and serve data. You can continue with the tutorial while SPICE finishes in the background.

Let’s create a calculated field that classifies tweets as a Good experience or Bad experience by searching for the words “error,” “problem,” or “expensive.” You can choose other words that fit your use case.

  1. For Dataset name, enter twitterds.
  2. Choose Add calculated field.
  3. For Add name, enter Experience.
  4. Enter the following code:
ifelse(locate(toLower({Text}),"error")<>0,"BAD",locate(toLower({Text}),"problem")<>0,"BAD",locate(toLower({Text}),"expensive")<>0,"BAD","GOOD")
  1. Choose Save and then choose Save & visualize.

The Twitter Standard Search API returns data for 7 days only. For more information, see Supported Data Sources.

Set up a QuickSight dataset for Jira Cloud

QuickSight can connect to SaaS data sources, including Jira Cloud. To set up the Jira dataset, complete the following steps:

  1. To create an API token, open the Jira website within an authenticated browser.
  2. Choose Create API token.
  3. For Label, enter QuickSight.
  4. Choose Create.

  1. Create a new dataset in QuickSight and choose Jira as the data source.
  2. For Data source name, enter jirads.
  3. For Site base URL, enter the URL you use to access Jira.
  4. For Username, enter your Jira username.
  5. For API token or password, enter the token you created.
  6. Choose Create and choose the Issues  table.
  7. Choose Select and then choose Visualize.

You’re redirected to a newly created QuickSight analysis.

Create a QuickSight overview analysis

We use the previously created analysis as a starting point for our overview analysis.

  1. Choose the Edit data icon (pencil).

  1. Choose Add dataset.
  2. Choose the Issues dataset and choose Select.
  3. Repeat the steps for the twitterds, cwlogs and cwmetrics

You should see the four datasets added to the QuickSight visual.

In the navigation pane, you can find the available fields for the selected dataset and on the right, the visuals. Each visual is tied to a particular dataset. When the tutorial instructs you to create a new visual from a dataset, you must choose the dataset from the list first, then choose + Add.

  1. Choose the visual, then the options icon (three dots).
  2. Choose Delete.

  1. Update the QuickSight analysis name to Operational Dashboard.

Add a visual for the Twitter dataset

To add your first visual, complete the following steps:

  1. Add a new visual from the Twitter dataset.
  2. Change the visual type to KPI.
  3. Choose the field RetweetCount.
  4. Name the visual Bad experience retweet count.
  5. Use the resize button to set the size to a fourth of the screen width.

  1. In the navigation pane, choose Filter and then choose Create one.
  2. Choose the Created field and then choose the filter.
  3. Change the filter type to relative dates and choose
  4. Choose Last N hours and enter 24 for Number of hours.
  5. Choose

This filter allows us to see only the most up-to-date information from the last 24 hours.

  1. Add a second filter and choose the Experience
  2. Leave BAD selected marked and choose Apply.

Now you only see information about Twitter customers with a bad experience.

Add a visual for CloudWatch metrics

Next, we add a new visual for our CloudWatch metrics.

  1. Choose Visualize and then choose the cwmetrics
  2. Add a new gauge chart.
  3. Choose Field wells to open the visual field configuration.
  4. Choose Value and change Aggregate to Average.

  1. Drag and drop the value field from the field list into Target value and change the aggregate to
  2. Name the visual System health status.

  1. Similar to what you did on the previous visual, add a filter to include only the last 24 hours based on the timestamp field.

Add a visual for CloudWatch error logs

Next, we create a visual for our CloudWatch logs dataset.

  1. Choose the cwlogs dataset and add a new KPI visual.
  2. Drag and drop the message field into the Value
  3. Name the visual Error log count.
  4. Create a filter using the level field and from the list of values.
  5. Deselect all but ERRR.
  6. Choose Apply.

This filters only logs where there was an error found.

  1. Create a filter for the last 24 hours.

Add a visual for Jira issues

Now we add a visual for open Jira issues.

  1. Choose the Issues dataset.
  2. Create a KPI visual using the Id field for Value.
  3. Add a filter for issues of type bug.
    1. Use the IssueType_Name field and select only the records with value Error.
  4. Add a filter for issues open.
    1. Use the Status_Name field and select only the records with the value On-going or To-do.
  5. Add a filter for the last 24 hours using the Date_Created
  6. Name the filter Bug Issues open.
  7. Resize the visuals and organize them as needed.

Complete the oversight dashboard

We’re almost done with our oversight dashboard.

  1. Create four new visuals as specified in the following table.
Dataset Visual type Field on X-axis Field on Value Field on Color
twitterds Stacked line chart Created (aggregate: hour) RetweetCount (aggregate: sum)
cwmetrics Stacked line chart timestamp (aggregate: hour) Value (aggregate: avg)
cwlogs Stacked line chart datetime (aggregate: hour) message (aggregate: count)
Issues Vertical stacked bar Date_Created (aggregate: hour) Id (aggregate: count) Status_Name
  1. Modify every filter you created to apply them to all the visuals.

  1. Choose the tab Sheet 1 twice to edit it.
  2. Enter Overview.
  3. Choose Enter.

The overview analysis of our application health dashboard is complete.

QuickSight provides a drill-up and drill-down feature to view data at different levels of a hierarchy; the feature is added automatically for date fields. For more information, see Adding Drill-Downs to Visual Data in Amazon QuickSight.

In the previous step, you applied a drill-down when you changed the aggregation to hour.

Create a QuickSight detailed analysis

To create a detailed analysis, we create new tabs for CloudWatch logs, Tweets, and Jira issues.

Create a tab for CloudWatch logs

To create a tab to analyze our CloudWatch logs, complete the following steps:

  1. Choose the add icon next to the Overview tab and name it Logs.
  2. Create three visuals from the cwlogs dataset :
    1. Donut chart with the level field for Group/Color and message (count) for Value.
    2. Stacked combo bar with the datetime (aggregate: hour) field for X axis box, level (count) for Bars, and level for Group/Color for bars.
    3. Table with the fields level, message, and datetime for Value.

To improve the analysis of log data, format the level field based on its content using QuickSight conditional formatting: red for ERRR and Green for DEBG.

  1. Choose the table visual and choose on the visual options icon (three dots), then choose Conditional formatting.
  2. Choose the add icon and select the level field, then choose Add Text color.
  3. For Value, enter ERRR.
  4. For Color, choose red.
  5. Choose Add condition.
  6. For Value, enter DEBG.
  7. For Color, choose green.

  1. Choose Apply.
  2. Resize the visuals and update their titles as needed.

To enable data exploration, let’s set up an action on the Logs tab.

  1. Choose the Top Logs visual and choose Actions.
  2. Choose Filter same-sheet visuals.
  3. Choose ERRR on the donut chart.

The Latest Logs table filters only the DEBG level rows.

  1. Create a filter for the last 24 hours applicable to all visuals from the dataset.

Create a tab for Tweets

To create a tab for Twitter analysis and add visuals, complete the following steps:

  1. Choose the add icon next to the Logs tab and name the new tab Tweets.
  2. Delete the visual added and create a new donut chart from the twitterds dataset.
  3. Choose the field Source and name the visual Twit Source.
  4. Create a new word cloud visual and choose the Username

With QuickSight, you can exclude visual elements to focus the analysis on certain data. If you see a big “Other” username on the word cloud, choose it and then choose Hide “other” categories.

  1. To narrow down the elements on the word cloud to the top 50, choose the mesh icon.
  2. Under the Group by panel, enter 50 for Number of words.
  3. Choose the mesh icon again and choose Allow vertical words.
  4. Name the visual Top 50 Users.

Let’s create a table with the Twitter details.

  1. Add a new table visual.
  2. Drag and drop the field Text into the Group by box and RetweetCount into Value.
  3. Name the visual Top Retweet.
  4. Resize the columns on the table using the headers border as needed.
  5. To sort the table from the top retweeted posts, choose the header of the field RetweetCount and choose the sort descending icon.

Let’s add a color and an icon based on number of retweets.

  1. Choose the configuration icon (three dots) and choose conditional formatting.
  2. Choose the RetweetCount field, then choose the add icon and choose the three bars icon set.

  1. Choose the Custom conditions option and enter the Value field as follows:
    1. Condition #1 – Value: 10000; color: red
    2. Condition #2 – Start Value: 2000; End Value: 10000; color: orange
    3. Condition #3 – Value: 2000; color: keep the default

Now you can see the field RetweetCount formatted with an icon and color based on the value.

Now we add the user location to the analysis.

  1. Add a new horizontal bar chart visual.
  2. Use the UserLocation field for Y axis and the RetweetCount as Value.
  3. Sort descending by RetweetCount.
  4. Choose the mesh icon to expand the Y-axis panel and enter 10 for Number of data points to show.
  5. If you see an empty country, choose it and choose Exclude empty.
  6. Name the visual Top 10 Locations.

To complete this tab of your analysis, resize the visuals and organize them as follows.

  1. Similarly, as you did before, choose every visual and add the action Filter same-sheet visuals, which allows you to explore your data.

For example, you can choose one location or source and the Top users and Retweet tables are filtered.

  1. Create a filter for the last 24 hours applicable to all visuals from the dataset.

Create a tab for Jira issues

Finally, we create a tab for Jira issue analysis.

  1. Choose the add icon next to the Tweets tab and name the new tab Issues.
  2. Delete the visual created and create a new horizontal stacked 100% bar chart visual from the Issues dataset.
  3. Drag and drop the fields as follows (because this dataset has many fields, you can find them using the Field list search bar):
    1. Y-axisStatus_Name
    2. ValueId (count)
    3. Group/ColorAssigne_DisplayName

This visual shows you how issues have progressed among assignee name.

  1. Add a new area line chart visual with the field Date_Updated for X axis and TimeEstimate for Value.
  2. Add another word cloud visual to find out who the top issue reporters are; use Reporter_DisplayName for Group by and Id (count) for Size.
  3. The last visual you add for this tab is a table, include all the necessary fields on the Value box to be able to investigate. I suggest you include Id, Key, Summary, Votes, WatchCount, Priority, and Reporter_DisplayName.
  4. Resize and rearrange the visuals as needed.

  1. As you did before, choose every visual and add the action Filter same-sheet visuals, which allows you to explore your data.

For example, you can choose one reporter display name or a status and the other visuals are filtered.

  1. Create the filter for the last 24 hours applicable to all visuals from the dataset.

Publish your QuickSight dashboard

The analysis that you’ve been working on is automatically saved. To enable other people to view your findings with read-only capabilities, publish your analysis as a dashboard.

  1. Choose Share and choose Publish dashboard.
  2. Enter a name for your dashboard, such as holistic health status, and choose Publish dashboard.
  3. Optionally, select a person or group to share the dashboard with by entering a name in the search bar.
  4. Choose Share.

Your dashboard is now published and ready to use. You can easily correlate errors in application logs with posts on Twitter and availability data from your website, and quickly identify which errors are being already addressed based on Jira bug issues open.

By default, this dashboard can only be accessed by you, but you can share your dashboard with other people in your QuickSight account.

When you created the QuickSight datasets for Twitter and Jira, the data was automatically imported into SPICE, accelerating the time to query. You can also set up SPICE for other dataset types. Remember that data is imported into SPICE and must be refreshed.

Conclusion

In this post, you created a dashboard with a holistic view of your workload health status, including application logs, issue tracking on Jira, social media comments on Twitter, and monitoring data from CloudWatch Synthetics. To expand on this solution, you can  include data from Amazon CloudFront logs or Application Load Balancer access logs so you can have a complete view of your application. Also, you could easily embed your dashboard into a custom application.

You can also use machine learning to discover hidden data trends, saving hours of manual analysis with QuickSight ML Insights, or use QuickSight Q to power data discovery using natural language questions on your dashboards. Both features are ready to use in QuickSight without machine learning experience required.


About the author

Luis Gerardo Baeza is an Amazon Web Services solutions architect with 10 years of experience in business process transformation, enterprise architecture, agile methodologies adoption, and cloud technologies integration. Luis has worked with education, healthcare, and financial companies in México and Chile.

Building a Showback Dashboard for Cost Visibility with Serverless Architectures

Post Syndicated from Peter Chung original https://aws.amazon.com/blogs/architecture/building-a-showback-dashboard-for-cost-visibility-with-serverless-architectures/

Enterprises with centralized IT organizations and multiple lines of businesses frequently use showback or chargeback mechanisms to hold their departments accountable for their technology usage and costs. Chargeback involves actually billing a department for the cost of their division’s usage. Showback focuses on visibility to make the department more cost conscientious and encourage operational efficiency.

Building a showback mechanism can be potentially challenging for business and financial analysts of an AWS Organization. You may not have the scripting or data engineering skills needed to coordinate workflows and build reports at scale. Although you can use AWS Cost Explorer as starting point, you may want greater customizability, larger datasets beyond a one-year period, and more of a business intelligence (BI) experience.

In this post, we discuss the benefits of building a showback dashboard using the AWS Cost and Usage Report (AWS CUR). You can track costs by cost center, business unit, or project using managed services. Using a showback strategy, you can consolidate and present costs to a business unit to show resource use over a set period of time for your entire AWS Organization. Building this solution with managed services allows you to spend time understanding your costs rather than maintaining the underlying infrastructure.

This solution highlights AWS Glue DataBrew to prepare your data into the appropriate format for dashboards. We recommend DataBrew because it provides a no-code environment for data transformation. It allows anyone to create dashboards similar to those built in the Cloud Intelligence Dashboards Workshop for your Organization.

Figure 1. QuickSight showback dashboard using CUR data transformed by Glue DataBrew and leveraging QuickSight insights

Figure 1. QuickSight showback dashboard using CUR data transformed by Glue DataBrew and leveraging QuickSight insights

Tags for cost allocation

The success of your showback dashboard partially depends on your cost allocation tagging strategy. Typically, customers use business tags such as cost center, business unit, or project to associate AWS costs with traditional financial reporting dimensions within their organization.

The CUR supports the ability to break down AWS costs by tag. For example, if a group of resources are labeled with the same tag, you’ll be able to see the total cost and usage of that group of resources. Read more about Tagging Best Practices to develop a tagging strategy for your organization.

A serverless data workflow for showback dashboards

You can build showback dashboards with managed services such as Amazon QuickSight, without the need to write any code or manage any servers.

Figure 2. A serverless architecture representing data workflow

Figure 2. A serverless architecture representing data workflow

AWS automatically delivers the data you need for showback dashboards through the CUR. Once this data arrives in an Amazon Simple Storage Service (S3) bucket, you can transform the data without the need to write any code by using DataBrew. You can also automatically identify the data schema, and catalog the data’s properties to run queries using Amazon Athena. Lastly, you can visualize the results by publishing and sharing dashboards to key stakeholders within your organization using Amazon QuickSight.

The key benefits of this approach are:

  • Automatic data delivery
  • No-code data transformation
  • Automatic cataloging and querying
  • Serverless data visualization

Let’s take a look at each in more detail.

Automatic data delivery

The CUR is the source for historical cost and usage data. The CUR provides the most comprehensive set of cost and usage data available and will include your defined cost allocation tags for your entire Organization. You configure CUR to deliver your billing data to an Amazon S3 bucket at the payer account level. This will consolidate data for all linked accounts. After delivery starts, Amazon updates the CUR files at least once a day.

No-code data transformation

You can use DataBrew to transform the data in the Amazon S3 bucket aggregating cost and usage according to your tagging strategy. DataBrew summarizes your data for discovery. You can also run transformations called “jobs” in DataBrew without writing any code, using over 250 built-in transforms. Figures 3 through 5 show several job examples.

Figure 3. DataBrew recipe action: rename column

Figure 3. DataBrew recipe action: rename column

Figure 4. DataBrew recipe action: Create column from function

Figure 4. DataBrew recipe action: Create column from function

Figure 5. DataBrew recipe action: fill missing values

Figure 5. DataBrew recipe action: fill missing values

For a full list of columns available in CUR, review the CUR Data Dictionary. Following is a list of relevant columns for an executive summary showback dashboard:

  • bill_billing_period_start_date
  • line_item_usage_account_id
  • line_item_line_item_type
  • product_product_name
  • product_product_family
  • product_product_transfer_type
  • savings_plan_savings_plan_effective cost
  • reservation_effective_cost
  • line_item_unblended_cost

Based on data refresh and business requirements, DataBrew can run a job on a recurring basis (for example, every 12 hours). This can be run at a particular time of day, or as defined by a valid CRON expression. This helps you automate your transformation workflows.

Automatic cataloging and querying

You can use a Glue crawler to automatically classify your data to determine the data’s format, schema, and associated properties. The crawlers write metadata to an AWS Glue Data Catalog to help data users find the data they need.

With the results in Amazon S3, and the metadata in the Glue Data Catalog, you can run standard SQL to queries with Athena. This will help you make more informed business decisions by tracking financial metrics and optimizing costs. This is done directly in Amazon S3 without having to move around data. Using standard SQL, you can create views that aggregate cost and usage by your defined tags.

Serverless data visualization

You can use Amazon QuickSight to create and share dashboards with your teams for cost visibility. QuickSight provides native integration with Athena and S3, and lets you easily create and publish interactive BI dashboards that include ML-powered insights. When building a showback dashboard such as the example in Figure 1, QuickSight authors create visuals and publish interactive dashboards.

Readers log in using your preferred authentication mechanism to view the shared dashboard. You can then filter data based on billing periods, account number, or cost allocation tags. You can also drill down to details using a web browser or mobile app.

Conclusion

In this blog, we’ve discussed designing and building a data transformation process and a showback dashboard. This gives you highly granular cost visualization without having to provision and manage any servers. You can use managed services such as AWS Glue DataBrew, Amazon Athena, and Amazon QuickSight to crawl, catalog, analyze, and visualize your data.

We recommend defining your organization tagging strategy to be able to view costs by tags. You can then get started by creating Cost and Usage Reports. With the data in Amazon S3, you can use the services described in this post to transform the data that works for your business. Additionally, you can get started today by experimenting with the Cloud Intelligence Dashboards Workshop. This workshop provides examples of visualizations that you can build using native AWS services on top of your Cost and Usage Report. You will be able to get cost, usage, and operational insights about your AWS Cloud usage.

Enhancing Existing Building Systems with AWS IoT Services

Post Syndicated from Lewis Taylor original https://aws.amazon.com/blogs/architecture/enhancing-existing-building-systems-with-aws-iot-services/

With the introduction of cloud technology and by extension the rapid emergence of Internet of Things (IoT), the barrier to entry for creating smart building solutions has never been lower. These solutions offer commercial real estate customers potential cost savings and the ability to enhance their tenants’ experience. You can differentiate your business from competitors by offering new amenities and add new sources of revenue by understanding more about your buildings’ operations.

There are several building management systems to consider in commercial buildings, such as air conditioning, fire, elevator, security, and grey/white water. Each system continues to add more features and become more automated, meaning that control mechanisms use all kinds of standards and protocols. This has led to fragmented building systems and inefficiency.

In this blog, we’ll show you how to use AWS for the Edge to bring these systems into one data path for cloud processing. You’ll learn how to use AWS IoT services to review and use this data to build smart building functions. Some common use cases include:

  • Provide building facility teams a holistic view of building status and performance, alerting them to problems sooner and helping them solve problems faster.
  • Provide a detailed record of the efficiency and usage of the building over time.
  • Use historical building data to help optimize building operations and predict maintenance needs.
  • Offer enriched tenant engagement through services like building control and personalized experiences.
  • Allow building owners to gather granular usage data from multiple buildings so they can react to changing usage patterns in a single platform.

Securely connecting building devices to AWS IoT Core

AWS IoT Core supports connections with building devices, wireless gateways, applications, and services. Devices connect to AWS IoT Core to send and receive data from AWS IoT Core services and other devices. Buildings often use different device types, and AWS IoT Core has multiple options to ingest data and enabling connectivity within your building. AWS IoT Core is made up of the following components:

  • Device Gateway is the entry point for all devices. It manages your device connections and supports HTTPS and MQTT (3.1.1) protocols.
  • Message Broker is an elastic and fully managed pub/sub message broker that securely transmits messages (for example, device telemetry data) to and from all your building devices.
  • Registry is a database of all your devices and associated attributes and metadata. It allows you to group devices and services based upon attributes such as building, software version, vendor, class, floor, etc.

The architecture in Figure 1 shows how building devices can connect into AWS IoT Core. AWS IoT Core supports multiple connectivity options:

  • Native MQTT – Multiple building management systems or device controllers have MQTT support immediately.
  • AWS IoT Device SDK – This option supports MQTT protocol and multiple programming languages.
  • AWS IoT Greengrass – The previous options assume that devices are connected to the internet, but this isn’t always possible. AWS IoT Greengrass extends the cloud to the building’s edge. Devices can connect directly to AWS IoT Greengrass and send telemetry to AWS IoT Core.
  • AWS for the Edge partner products – There are several partner solutions, such as Ignition Edge from Inductive Automation, that offer protocol translation software to normalize in-building sensor data.
Data ingestion options from on-premises devices to AWS

Figure 1. Data ingestion options from on-premises devices to AWS

Challenges when connecting buildings to the cloud

There are two common challenges when connecting building devices to the cloud:

  • You need a flexible platform to aggregate building device communication data
  • You need to transform the building data to a standard protocol, such as MQTT

Building data is made up of various protocols and formats. Many of these are system-specific or legacy protocols. To overcome this, we suggest processing building device data at the edge, extracting important data points/values before transforming to MQTT, and then sending the data to the cloud.

Transforming protocols can be complex because they can abstract naming and operation types. AWS IoT Greengrass and partner products such as Ignition Edge make it possible to read that data, normalize the naming, and extract useful information for device operation. Combined with AWS IoT Greengrass, this gives you a single way to validate the building device data and standardize its processing.

Using building data to develop smart building solutions

The architecture in Figure 2 shows an in-building lighting system. It is connected to AWS IoT Core and reports on devices’ status and gives users control over connected lights.

The architecture in Figure 2 has two data paths, which we’ll provide details on in the following sections, but here’s a summary:

  1. The “cold” path gathers all incoming data for batch data analysis and historical dashboarding.
  2. The “warm” bidirectional path is for faster, real-time data. It gathers devices’ current state data. This path is used by end-user applications for sending control messages, real-time reporting, or initiating alarms.
Figure 2. Architecture diagram of a building lighting system connected to AWS IoT Core

Figure 2. Architecture diagram of a building lighting system connected to AWS IoT Core

Cold data path

The cold data path gathers all lighting device telemetry data, such as power consumption, operating temperature, health data, etc. to help you understand how the lighting system is functioning.

Building devices can often deliver unstructured, inconsistent, and large volumes of data. AWS IoT Analytics helps clean up this data by applying filters, transformations, and enrichment from other data sources before storing it. By using Amazon Simple Storage Service (Amazon S3), you can analyze your data in different ways. Here we use Amazon Athena and Amazon QuickSight for building operational dashboard visualizations.

Let’s discuss a real-world example. For building lighting systems, understanding your energy consumption is important for evaluating energy and cost efficiency. Data ingested into AWS IoT Core can be stored long term in Amazon S3, making it available for historical reporting. Athena and QuickSight can quickly query this data and build visualizations that show lighting state (on or off) and annual energy consumption over a set period of time. You can also overlay this data with sunrise and sunset data to provide insight into whether you are using your lighting systems efficiently. For example, adjusting the lighting schedule accordingly to the darker winter months versus the brighter summer months.

Warm data path

In the warm data path, AWS IoT Device Shadow service makes the device state available. Shadow updates are forwarded by an AWS IoT rule into downstream services such an AWS IoT Event, which tracks and monitors multiple devices and data points. Then it initiates actions based on specific events. Further, you could build APIs that interact with AWS IoT Device Shadow. In this architecture, we have used AWS AppSync and AWS Lambda to enable building controls via a tenant smartphone application.

Let’s discuss a real-world example. In an office meeting room lighting system, maintaining a certain brightness level is important for health and safety. If that space is unoccupied, you can save money by turning the lighting down or off. AWS IoT Events can take inputs from lumen sensors, lighting systems, and motorized blinds and put them into a detector model. This model calculates and prompts the best action to maintain the room’s brightness throughout the day. If the lumen level drops below a specific brightness threshold in a room, AWS IoT Events could prompt an action to maintain an optimal brightness level in the room. If an occupancy sensor is added to the room, the model can know if someone is in the room and maintain the lighting state. If that person leaves, it will turn off that lighting. The ongoing calculation of state can also evaluate the time of day or weather conditions. It would then select the most economical option for the room, such as opening the window blinds rather than turning on the lighting system.

Conclusion

In this blog, we demonstrated how to collect and aggregate the data produced by on-premises building management platforms. We discussed how augmenting this data with the AWS IoT Core platform allows for development of smart building solutions such as building automation and operational dashboarding. AWS products and services can enable your buildings to be more efficient while and also provide engaging tenant experiences. For more information on how to get started please check out our getting started with AWS IoT Core developer guide.

Building a Cloud-based OLAP Cube and ETL Architecture with AWS Managed Services

Post Syndicated from Peter Chung original https://aws.amazon.com/blogs/architecture/building-a-cloud-based-olap-cube-and-etl-architecture-with-aws-managed-services/

For decades, enterprises used online analytical processing (OLAP) workloads to answer complex questions about their business by filtering and aggregating their data. These complex queries were compute and memory-intensive. This required teams to build and maintain complex extract, transform, and load (ETL) pipelines to model and organize data, oftentimes with commercial-grade analytics tools.

In this post, we discuss building a cloud-based OLAP cube and ETL architecture that will yield faster results at lower costs without sacrificing performance by:

  • Connecting your on-premises database to the cloud for data profiling, discovery, and transformation
  • Running OLAP workloads without costly third-party software licenses, dedicated infrastructure, or the need to migrate data
  • Using AWS Glue Data Catalog, Amazon Athena, Amazon QuickSight, and Amazon SageMaker to catalog and visualize data with machine learning (ML)

Data analytics pipeline with AWS Managed Services

The proposed architecture in Figure 1 relies on AWS Managed Services. AWS Glue DataBrew is a no-code data transformation service that you can use to quickly build your transformation jobs. AWS Glue crawlers collect metadata from the transformed data and catalogs it for analytics and visualization using Athena and QuickSight. SageMaker will build, train, and deploy ML models.

This architecture will help you get answers from your data to your users as fast as possible without needing to migrate your data to AWS. There is no coding required, so you can leverage data transformation, cataloging, analytics, and ML quickly.

Figure 1. Example architecture using AWS Managed Services

Figure 1. Example architecture using AWS Managed Services

Benefits of AWS Managed Services for data analytics

Immediate connectivity to on-premises databases

The example architecture in Figure 1 begins with an online transaction processing (OLTP) database running in your corporate data center. Figure 2 shows how you can establish a Java database connectivity (JDBC) connection from the OLTP database to DataBrew running in AWS to run OLAP workloads. DataBrew supports data sources using JDBC for common data stores such as Microsoft SQL Server, MySQL, Oracle, and PostgreSQL.

DataBrew - JDBC connection to data source

Figure 2. DataBrew – JDBC connection to data source

Automatic data discovery

Figures 3 through 6 show how DataBrew summarizes your data for discovery. You can profile your data to understand patterns and detect anomalies. You can also run transformations called “jobs” in DataBrew without writing any code using over 250 built-in transforms.

DataBrew - dataset profiling overview

Figure 3. DataBrew – dataset profiling overview

 

DataBrew - data correlation patterns

Figure 4. DataBrew – data correlation patterns

 

DataBrew - data points distribution

Figure 5. DataBrew – data points distribution

No-code data transformation and cataloging

To run OLAP-type transactions, you can create jobs based on the transformation steps shown in Figure 6. These steps collectively are referred to as DataBrew recipes. These recipe results can be run as a job and outputted to an Amazon Simple Storage Service (Amazon S3) bucket.

A DataBrew project user interface view with sample data and transformation functions

Figure 6. A DataBrew project user interface view with sample data and transformation functions

Scheduled DataBrew jobs act similarly to scheduled ETL pipelines in OLAP. Based on data refresh and business requirements, DataBrew can run a job on a recurring basis (for example, every 12 hours). This can be run at a particular time of day, or as defined by a valid CRON expression. This helps you automate your transformation workflows.

The OLAP catalog is a set of metadata that sits between the actual OLAP data stored and applications. To create a Data Catalog, you can use AWS Glue crawlers to automatically classify your data to determine the data’s format, schema, and associated properties. Figure 7 shows the results of a crawler’s results written to Data Catalog as metadata to help data users find the data they need.

AWS Glue crawler metadata table output of column names and data types

Figure 7. AWS Glue crawler metadata table output of column names and data types

Data analytics without third-party software licenses

You can run analytics on your data by referring to the metadata definitions in the Data Catalog as references to the actual data in Amazon S3 using Athena. Athena is well suited for running one-time queries using standard SQL to query the transformed data directly in Amazon S3 without having to move data around. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Enterprises often supplement their OLAP workloads with separate visualization and business intelligence (BI) tools. These tools often come with their own licensing, server management, and security considerations.

You can visualize curated data using QuickSight, a scalable, serverless, embeddable, ML-powered BI service. QuickSight lets you easily create and publish interactive BI dashboards that include ML-powered insights, as shown in Figure 8. These dashboards can be shared with other users and embedded within your own applications.

A sample of data visualization options with Amazon QuickSight

Figure 8. A sample of data visualization options with Amazon QuickSight

Finally, you can incorporate ML workloads to OLAP workloads using SageMaker. In the past, ML workloads were often expensive, resource-intensive, and inaccessible. SageMaker provides a fully managed ML service to quickly and easily build and train ML models and directly deploy them into a production-ready hosted environment.

Conclusion

In this post, we show you how to connect your on-premises database using a JDBC connection to DataBrew for data profiling, discovery, and transformation. We looked at how you can use DataBrew recipes and jobs to run OLAP workloads without costly third-party software licenses, dedicated infrastructure, or the need to migrate any data. We also looked at AWS capabilities in data cataloging, visualization, and machine learning using Data Catalog, Athena, QuickSight, and SageMaker without having to manage any servers.

Laying the foundation to modernize an analytics workflow is critical for many enterprises that are looking to reduce the time it takes to understand their business. With AWS, you can perform enterprise-scale analytics with our portfolio of analytics services.

 

DOCOMO empowers business units with self-service knowledge access thanks to agile AWS QuickSight business intelligence

Post Syndicated from Daiki Itoh original https://aws.amazon.com/blogs/big-data/docomo-empowers-business-units-with-self-service-knowledge-access-thanks-to-agile-aws-quicksight-business-intelligence/

NTT DOCOMO is the largest telecom company in Japan. It provides innovative, convenient, and secure mobile services that enable customers to realize smarter lives. More than 73 million customers in Japan connect through its advanced wireless networks, including a nationwide LTE network and one of the world’s most progressive LTE Advanced networks. In addition to a wide range of communications-related services, DOCOMO offers value-added Smart Life offerings, including those under its +d initiative. These include the d POINT CLUB customer loyalty point program and d Payment, which enables customers to shop online and make electronic payments using their mobile devices.

All of these services create tremendous amounts of data, providing an opportunity of the company to extract insights that drive business value. To accomplish this goal, the company uses Amazon QuickSight and AWS data technologies to help better understand customers and support sales teams.

Many products, one data team

The company’s data team manages the marketing platform, which includes capturing, analyzing, and reporting on data for DOCOMO Smart Life businesses. For data collection and aggregation, it uses Amazon Redshift as a data warehouse. Amazon Redshift is ideal for storing and querying large amounts of structured data with high performance and reliability.

“With millions of connected customers, we need a highly capable data platform,” says Issei Nishimura, Manager, Marketing Platform Planning Department at DOCOMO. “AWS delivers the right levels of performance.”

In addition to regular reporting from the data warehouse, the company’s business leadership is interested in real-time key performance indicators (KPIs). For d Payment, these include metrics such as active users on a monthly and daily basis. Based on these analyses, leadership can decide how to improve the usage or the sales of each service.

Helping business users access analytics

However, when non-technical decision-makers requested self-service access, the data team had no easy way to provide it—until it decided to adopt QuickSight. QuickSight is a fast, cloud-powered business intelligence service that was easy to deploy and required no on-premises infrastructure.

“Because of native integration with existing AWS services, especially Amazon Redshift, we were able to roll out Amazon QuickSight quickly and easily,” Nishimura says. “In fact, it only took one day to build our first QuickSight dashboards.”

The automated data pipeline starts with Amazon Elastic Compute Cloud (Amazon EC2) to perform extract, transform, and load (ETL) on data, which is then pushed to Amazon Redshift. Amazon SageMaker aggregates and exports it to Amazon Simple Storage Service (Amazon S3), from which QuickSight accesses data for dashboards.

The following is a sample dashboard from NTT DOCOMO. For every marketing campaign, NTT DOCOMO analyzes the number of new unique users of d Payment relative to the number of registrations to the campaign. This allows them to understand how much effect the campaign had on each user category.

With pay-per-session pricing, the company can provide ad hoc data access for line of business decision-makers without capital investment and at low total cost. At the same time, QuickSight can scale to support as many users as needed.

The dashboards can be accessed from any device, providing convenience to the product management teams. It’s easy and intuitive enough for non-technical users—there’s no need for them to write SQL or scripts.

Faster insights to compete in an accelerated marketplace

Previously, it would take a few days to meet requests for ad hoc reports. Now, when people want to check a KPI, they can do it instantly.

“Our team can focus on managing the data warehouse and the regular reporting cadence because the volume of out-of-band requests has been reduced,” Nishimura says.

The solution has had immediate benefits for d Payment sales representatives, who work closely with retailers that use the payment service. These sales representatives want to be able to present relevant KPIs and demographics to the retailers to show trends and improve the services. With QuickSight, the sales team can generate appealing, up-to-date visualizations of the relevant information. They no longer have to spend time building graphics because the QuickSight visualizations are ready to use right away.

Summary

DOCOMO is a data-driven company, using insights to continuously improve its services. AWS enables them to run an enterprise data warehouse that ingests millions of data points with unlimited scale—and provides services such as QuickSight that give non-technical users rapid access to the real-time information they need.

“With these solutions, DOCOMO is breaking down barriers to enable greater use of analytics across the organization,” Nishimura says.


About the Authors

Daiki Itoh is a Business Development Manager for Amazon QuickSight in Japan.

 

 

 

 

Chirag Dhull is a Principal Product Marketing Manager for Amazon QuickSight.

 

 

 

 

 

Build a data quality score card using AWS Glue DataBrew, Amazon Athena, and Amazon QuickSight

Post Syndicated from Nitin Aggarwal original https://aws.amazon.com/blogs/big-data/build-a-data-quality-score-card-using-aws-glue-databrew-amazon-athena-and-amazon-quicksight/

Data quality plays an important role while building an extract, transform, and load (ETL) pipeline for sending data to downstream analytical applications and machine learning (ML) models. The analogy “garbage in, garbage out” is apt at describing why it’s important to filter out bad data before further processing. Continuously monitoring data quality and comparing it with predefined target metrics helps you comply with your governance frameworks.

In November 2020, AWS announced the general availability of AWS Glue DataBrew, a new visual data preparation tool that helps you clean and normalize data without writing code. This reduces the time it takes to prepare data for analytics and ML by up to 80% compared to traditional approaches to data preparation.

In this post, we walk through a solution in which we apply various business rules to determine the quality of incoming data and separate good and bad records. Furthermore, we publish a data quality score card using Amazon QuickSight and make records available for further analysis.

Use case overview

For our use case, we use a public dataset that is available for download at Synthetic Patient Records with COVID-19. It contains 100,000 synthetic patient records in CSV format. Data hosted within SyntheticMass has been generated by SyntheaTM, an open-source patient population simulation made available by The MITRE Corporation.

When we unzip the 100k_synthea_covid19_csv.zip file, we see the following CSV files:

  • Allergies.csv
  • Careplans.csv
  • Conditions.csv
  • Devices.csv
  • Encounters.csv
  • Imaging_studies.csv
  • Immunizations.csv
  • Medications.csv
  • Observations.csv
  • Organizations.csv
  • Patients.csv
  • Payer_transitions.csv
  • Payers.csv
  • Procedures.csv
  • Providers.csv
  • Supplies.csv

We perform the data quality checks categorized by the following data quality dimensions:

  • Completeness
  • Consistency
  • Integrity

For our use case, these CSV files are maintained by your organization’s data ingestion team, which uploads the updated CSV file to Amazon Simple Storage Service (Amazon S3) every week. The good and bad records are separated through a series of data preparation steps, and the business team uses the output data to create business intelligence (BI) reports.

Architecture overview

The following architecture uses DataBrew for data preparation and building key KPIs, Amazon Athena for data analysis with standard SQL, and QuickSight for building the data quality score card.

The workflow includes the following steps:

  1. The ingestion team receives CSV files in an S3 input bucket every week.
  2. The DataBrew job scheduled to run every week triggers the recipe job.
  3. DataBrew processes the input files and generates output files that contain additional fields depending on the recipe job logic.
  4. After the output data is written, we create external table on top of it by creating and running an AWS Glue crawler.
  5. The good and bad records are separated by creating views on top of the external table.
  6. Data analysts can use Athena to analyze good and bad records.
  7. The records can also be separated directly using QuickSight calculated fields.
  8. We use QuickSight to create the data quality score card in the form of a dashboard, which fetches data through Athena.

Prerequisites

Before beginning this tutorial, make sure you have the required permissions to create the resources required as part of the solution.

Additionally, create the S3 input and output buckets to capture the data, and upload the input data into the input bucket.

Create DataBrew datasets

To create a DataBrew dataset for the patient data, complete the following steps:

  1. On the DataBrew console, choose Datasets.
  2. Choose Connect new dataset.
  3. For Dataset name, enter a name (for this post, Patients).
  4. For Enter your source from S3, enter the S3 path of the patients input CSV.
  5. Choose Create Dataset.

Repeat these steps to create datasets for other CSV files, such as encounters, conditions, and so on.

Create a DataBrew project

To create a DataBrew project for marketing data, complete the following steps:

  1. On the DataBrew console, choose Projects.
  2. Choose Create a project.
  3. For Project name, enter a name (for this post, patients-data-quality).
  4. For Select a dataset, select My datasets.
  5. Select the patients dataset.
  6. Under Permissions, for Role name, choose an AWS Identity and Access Management (IAM) role that allows DataBrew to read from your Amazon S3 input location.

You can choose a role if you already created one, or create a new one. For more information, see Adding an IAM role with data resource permissions.

  1. Wait till the dataset is loaded (about 1–2 minutes).
  2. To make a consistency check, choose Birthdate.
  3. On the Create menu, choose Flag column.
  4. Under Create column, for Values to flag, select Custom value.
  5. For Source column, choose BIRTHDATE.
  6. For Values to flag, enter the regular expression (?:(?:18|19|20)[0-9]{2}).
  7. For Flag values as, choose Yes or no.
  8. For Destination column, enter BIRTHDATE_flagged.

The new column BIRTHDATE_FLAGGED now displays Yes for a valid four-digit year within BIRTHDATE.

  1. To create a completeness check, repeat the preceding steps to create a DRIVERS_FLAGGED column by choosing the DRIVERS column to mark missing values.
  2. To create an integrity check, choose the JOIN transformation.
  3. Choose the encounters dataset and choose Next.
  4. For Select join type, select Left join.
  5. For Join keys, choose Id for Table A and Patient for Table B.
  6. Under Column list, unselect all columns from Table B except for Patient.
  7. Choose Finish.
  8. Choose the Patient column and create another flag column PATIENTS_FLAG to mark missing values from the Patient column.

For our use case, we created three new columns to demonstrate data quality checks for data quality dimensions in scope (consistency, completeness, and integrity), but you can integrate additional transformations on the same or additional columns as needed.

  1. After you finish applying all your transformations, choose Publish on the recipe.
  2. Enter a description of the recipe version and choose Publish.

Create a DataBrew job

Now that our recipe is ready, we can create a job for it, which gets invoked through our AWS Lambda functions.

  1. On the DataBrew console, choose Jobs.
  2. Choose Create a job.
  3. For Job name¸ enter a name (for example, patient-data-quality).

Your recipe is already linked to the job.

  1. Under Job output settings¸ for File type, choose your final storage format (for this post, we choose CSV).
  2. For S3 location, enter your final S3 output bucket path.
  3. For Compression, choose the compression type you want to apply (for this post, we choose None).
  4. For File output storage, select Replace output files for each job run.

We choose this option because our use case is to publish a data quality score card for every new set of data files.

  1. Under Permissions, for Role name¸ choose your IAM role.
  2. Choose Create and run job.

Create an Athena table

If you’re familiar with Apache Hive, you may find creating tables on Athena to be familiar. You can create tables by writing the DDL statement on the query editor, or by using the wizard or JDBC driver. To use the query editor, enter the following DDL statement to create a table:

CREATE EXTERNAL TABLE `blog_output`(
  `id` string, 
  `birthdate` string, 
  `birthdate_flagged` string, 
  `deathdate` string, 
  `ssn` string, 
  `drivers` string, 
  `drivers_flagged` string, 
  `passport` string, 
  `prefix` string, 
  `first` string, 
  `last` string, 
  `suffix` string, 
  `maiden` string, 
  `marital` string, 
  `race` string, 
  `ethnicity` string, 
  `gender` string, 
  `birthplace` string, 
  `address` string, 
  `city` string, 
  `state` string, 
  `county` string, 
  `zip` bigint, 
  `lat` double, 
  `lon` double, 
  `healthcare_expenses` double, 
  `healthcare_coverage` double, 
  `patient` string, 
  `patient_flagged` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://<your-bucket>/blog_output/';

Let’s validate the table output in Athena by running a simple SELECT query. The following screenshot shows the output.

Create views to filter good and bad records (optional)

To create a good records view, enter the following code:

CREATE OR REPLACE VIEW good_records AS
SELECT * FROM "databrew_blog"."blog_output"
where 
birthdate_flagged = 'Yes' AND
drivers_flagged = 'No' AND
patient_flagged = 'No'

To create a bad records view, enter the following code:

CREATE OR REPLACE VIEW bad_records AS
SELECT * FROM "databrew_blog"."blog_output"
where 
birthdate_flagged = 'No' OR
drivers_flagged = 'Yes' OR 
patient_flagged = 'Yes'

Now you have the ability to query the good and bad records in Athena using these views.

Create a score card using QuickSight

Now let’s complete our final step of the architecture, which is creating a data quality score card through QuickSight by connecting to the Athena table.

  1. On the QuickSight console, choose Athena as your data source.
  2. For Data source name, enter a name.
  3. Choose Create data source.
  4. Choose your catalog and database.
  5. Select the table you have in Athena.
  6. Choose Select.

Now you have created a dataset.

To build the score card, you add calculated fields by editing the dataset blog_output.

  1. Locate your dataset.
  2. Choose Edit dataset.
  3. Choose Add calculated field.
  4. Add the field DQ_Flag with value ifelse({birthdate_flagged} = 'No' OR {drivers_flagged} = 'Yes' OR {patient_flagged} = 'Yes' , 'Invalid', 'Valid').

Similarly, add other calculated fields.

  1. Add the field % Birthdate Invalid Year with value countIf({birthdate_flagged}, {birthdate_flagged} = 'No')/count({birthdate_flagged}).
  2. Add the field % Drivers Missing with value countIf({drivers_flagged}, {drivers_flagged} = 'Yes')/count({drivers_flagged}).
  3. Add the field % Patients missing encounters with value countIf({patient_flagged}, {patient_flagged} = 'Yes')/count({patient_flagged}).
  4. Add the field % Bad records with the value countIf({DQ_Flag}, {DQ_Flag} = 'Invalid')/count({DQ_Flag}).

Now we create the analysis blog_output_analysis.

  1. Change the format of the calculated fields to display the Percent format.
  2. Start adding visuals by choosing Add visual on the + Add menu.

Now you can create a quick report to visualize your data quality score card, as shown in the following screenshot.

If QuickSight is using SPICE storage, you need to refresh the dataset in QuickSight after you receive notification about the completion of the data refresh. If the QuickSight report is running an Athena query for every request, you might see a “table not found” error when data refresh is in progress. We recommend using SPICE storage to get better performance.

Cleaning up

To avoid incurring future charges, delete the resources created during this walkthrough.

Conclusion

This post explains how to create a data quality score card using DataBrew, Athena queries, and QuickSight.

This gives you a great starting point for using this solution with your datasets and applying business rules to build a complete data quality framework to monitor issues within your datasets. We encourage you to use various built-in transformations to get the maximum value for your project.


About the Authors

Nitin Aggarwal is a Senior Solutions Architect at AWS, where helps digital native customers with architecting data analytics solutions and providing technical guidance on various AWS services. He brings more than 16 years of experience in software engineering and architecture roles for various large-scale enterprises.

 

 

 

Gaurav Sharma is a Solutions Architect at AWS. He works with digital native business customers providing architectural guidance on AWS services.

 

 

 

Vivek Kumar is a Solutions Architect at AWS. He works with digital native business customers providing architectural guidance on AWS services.

Create threshold-based alerts in Amazon QuickSight

Post Syndicated from Lillie Atkins original https://aws.amazon.com/blogs/big-data/create-threshold-based-alerts-in-amazon-quicksight/

Every business has a set of key metrics that stakeholders focus on to make the most accurate, data-driven decisions, such as sales per week, inventory turnover rate, daily website visitors, and so on. With threshold-based alerts in Amazon QuickSight, we’re making it simpler than ever for consumers of QuickSight dashboards to stay informed about their key metrics. This blog post walks you through the process of setting up threshold-based alerts to track important metrics on QuickSight dashboards.

Set up an alert

Threshold alerts are set up on dashboards and can be created from KPIs or gauge charts. To set a threshold alert, choose the visual and then choose the alert icon. The alert takes into account all of the filters currently applied to the visual and creates a new alert rule.

The following two screenshots show first a KPI visual and then a gauge visual as well as where to locate the alert icon:

You can set up multiple alerts from a visual, which lets you monitor the data for different sets of changes or conditions.

After you choose the alert icon, you must provide a few configuration details. The alert name auto-fills to the name of the visual. The Alert value is the value in the data that the threshold is checked against based on your rule; this defaults to the primary value of the visual. For the gauge, this means the percent of the goal already achieved (currently 79.81%). In the following screenshot, we see for the KPI it means the week over week difference in the forecasted revenue (currently -$367,456).

Let’s say you want to be alerted whenever the forecasted new monthly revenue dips below $300,000—even if that is multiple times a day. To configure this alert, complete the following steps:

  1. For Alert value¸ choose the actual value rather than the difference value (which was the default).
  2. For Condition, choose Is below.
  3. Enter the value 300,000.
  4. For Notification preference, choose As frequently as possible.
  5. Choose Save.

Let’s now say you change your mind and only want to get notified once a week at most when this alert is going off. This is controlled through the notification preference. To make changes to the alert, you go to the management portal, which can be found by choosing Alerts on the navigation bar.

Here is where you can edit, delete, enable, or disable the alert. When the alert has triggered, you will be able to see a list along with other historical (90-day) alerts. This alert doesn’t have any history, because it hasn’t been triggered yet.

To update your notification preference, choose Edit, under Notification Preference pick Weekly at most, then hit Save.

When an alert is triggered, you receive an email notification customized to what Alert Value you have the alert configured for. You can quickly get to the dashboard by choosing View Dashboard.

Alerts are created based on the visual at that point in time and don’t update with changes to the visual in the future. This means the visual can change or be deleted and the alert continues to work as long as the data in the dataset remains valid.

The evaluation schedule for threshold alerts is based on the dataset. For SPICE datasets alert rules are checked against the data after a successful data refresh. With datasets querying your data sources directly, alerts are evaluated daily at a random time between 6PM to 8AM based on the region of the dataset. We’re working on a control for direct query dataset owners to be able to set up their own schedules for checking alerts and increase the frequency up to hourly.

The admin for the QuickSight account can restrict who has access to set threshold alerts through custom permissions. For more information, see Customizing user permissions in Embed multi-tenant analytics in applications with Amazon QuickSight.

Pricing and availability

Threshold alerts are billed for each evaluation, and follow the familiar pricing used for anomaly detection, starting at $0.50 per 1,000 evaluations. For example, if you set up an alert on a SPICE dataset that refreshes daily, you have 30 evaluations of the alert rule in a month, which costs 30 * $0.5/1000 = $0.015 in a month. For more information, see Amazon QuickSight Pricing.

Threshold alerts are a QuickSight Enterprise Edition feature and available for dashboards consumed in the QuickSight website. Threshold alerts aren’t yet available in embedded QuickSight dashboards or on the mobile app.

Conclusion

In this post, we demonstrated how to set up threshold-based alerts to track important metrics on QuickSight dashboards. This makes it even easier for consumers of QuickSight dashboards to stay up to date on their key metrics. For more information see, Amazon QuickSight Documentation.


About the Author

Lillie Atkins is a Product Manager for Amazon QuickSight, Amazon Web Service’s cloud-native, fully managed BI service.

 

Securely analyze your data with AWS Lake Formation and Amazon QuickSight

Post Syndicated from Julia Soscia original https://aws.amazon.com/blogs/big-data/securely-analyze-your-data-with-aws-lake-formation-and-amazon-quicksight/

Many useful business insights can arise from analyzing customer preferences, behavior, and usage patterns. With this information, businesses can innovate faster and improve the customer experience, leading to better engagement and accelerating product adoption. More and more businesses are looking for ways to securely store and restrict access to customer data, which may include personally identifiable information (PII) and other sensitive information. Accessing customer data for use cases such as analytics and Machine Learning requires careful management of access permissions and the ability to audit usage.

This post explores the ways in which you can securely store, analyze, and visualize sensitive customer data. We start by storing encrypted sample test data in our Amazon Simple Storage Service (Amazon S3) based data lake. We use AWS Lake Formation to configure fine-grained permissions to restrict user access, and finally analyze the data and visualize business insights using Amazon QuickSight.

Prerequisites

For this post, you should be familiar with the following:

The AWS CloudFormation template associated with this post automatically sets up the different architecture components. You then need to take additional steps to configure user permissions directly in Lake Formation. If you already manage access using AWS Glue resource permissions and IAM, launching this CloudFormation template in the same account could result in access being denied to existing users. We recommend launching this template in a new AWS account or one not running business-critical workloads.

Architecture overview

The following diagram illustrates our solution’s architecture.

The workflow includes the following steps:

  1. Use Lake Formation to create a data lake on Amazon S3 that is protected with IAM and encrypted with AWS Key Management Service (AWS KMS).
  2. AWS Glue crawlers scan your datasets and populate the Data Catalog.
  3. The Data Catalog serves as a central repository to store the metadata of the datasets.
  4. Athena accesses the data for ad hoc queries, using the Data Catalog.
  5. You can securely visualize your data with QuickSight.

Use case and business value

The following use case helps illustrate the challenge we’re trying to solve.

Assume you run an ecommerce company and to help improve the customer experience, you need to collect and store customers’ purchasing history. It’s common for the credit card holder to be different than the individual purchasing a product. If you can identify the relationship between the account holder and the cardholder, you may be able to create targeted recommendations.

For example, the account holder and cardholder share the same last name and home address but have a different first name, and the cardholder’s age is greater. You can use this information to deduce with high certainty that the account holder is using the credit card of a family member. You could analyze the account holder’s purchasing history and correlate it with third-party information collected on the cardholder to create a holistic view of the customer and their circle of close family influencers.

With this new information, you can now make personal product recommendations to improve the customer experience. In the following sections, we demonstrate how to accomplish this by setting up a secure data lake, encrypting data at rest, masking sensitive fields, and restricting access to data using fine-grained permissions.

For similar customer use cases related to security and data analytics on AWS, see AWS Cloud Security.

Set up the environment

After you sign in to your test AWS account, launch the CloudFormation template by choosing Launch Stack:  

This template configures the following resources:

  • An encrypted S3 bucket representing the data lake that is registered with Lake Formation
  • Two IAM groups that represent the data lake admin and analyst
  • Two IAM users ([email protected] and [email protected])
  • IAM roles and policies that go along with these services
  • An AWS Glue crawler to populate the Data Catalog with tables
  • An AWS Glue role for running within the data lake
  • An AWS Glue extract, transform, and load (ETL) job to mask the credit card information
  • A KMS customer master key (CMK) for the Amazon S3 server-side encryption (SSE)

When you launch the template, enter values for the following parameters:

  • AdminPassword – An initial password for the [email protected] user
  • AnalystPassword – An initial password for the [email protected] user
  • BucketNameParam – A unique name to be used for the data lake bucket
  • CFNExecRoleOrUserArnParam – An ARN of principal (user or role) executing CloudFormation stack. You can find ARN of logged in user from IAM

The passwords you provide must comply to your account’s password policy. When you later log in as those users, you’re asked to reset these passwords. Make note of the passwords you choose.

Configure your data lake’s security

This section doesn’t cover all the available security offerings in AWS, but rather focuses on restricting user access and protecting data at rest. You should always start by securing all access to your data and resources. It’s important to create policies that use least privilege from the start, even before any data lands in the data lake. You can do this with IAM and AWS KMS.

Set up IAM access policies for your users

First, consider the types of users that access the AWS account that hosts a data lake and use analytic services to access the data assets in the data lake. For this post, we discuss two types of AWS users:

  • Data lake administrator – Responsible for configuring the data lake and managing Lake Formation permissions to the data lake users
  • Data analyst – Responsible for data visualization

When you work in your production account (not the test account), you consider a third user type: the data engineer. The data engineer is responsible for data transformation, loading data into Amazon S3, and making the necessary changes to the data structures so the analysts don’t have access to sensitive data directly. For this post, the data lake administrator acts as the data engineer.

On the IAM console, on the Users page, you can see the following users created by the CloudFormation template.

The recommended best practice is to provide your users with the least access privileges.

For the central access control for data in data lake, we use the Lake Formation permission model. This requires us to create a data lake administrator who can grant any principal (including self) any permission on any Data Catalog resource and data location. The designated data lake administrator user can then grant more granular permissions of resources to other principals. The permission scope of the data lake administrator should be to only have the ability to manage the data lake within the Lake Formation and AWS Glue Data Catalog. The data lake administrator should have the ability to add existing IAM users and roles to data lake, but not actually create the IAM users. The permissions scope of the data engineer user should be limited to the IAM policies required for them to successfully do their job, following the principle of least privilege. Specifically, this means running ETL jobs to mask sensitive information from the data records prior to analysis. Also, because the aim of the use case is to maintain compliance, it’s imperative that no individual users have access to cardholder data either visually or programmatically.

It’s considered a best practice to use AWS managed policies whenever possible to reduce the operational overhead of maintaining your own policies. The data lake admin group role should be granted the AWSLakeFormationDataAdmin, AWSGlueConsoleFullAccess, and AmazonS3ReadOnlyAccess managed policies to administer data pipelines with AWS Glue, configure Lake Formation permissions, run AWS Glue crawlers, and store data on Amazon S3. For more information about IAM managed permissions, see Data Lake Administrator Permissions.

Encryption at rest within the data lake

In addition to using IAM to delegate permissions to your data lake, we recommend encrypting the data at rest. In this post, Amazon S3 acts as the basis for storage of our data lake. Amazon S3 offers several options for encryption based on your preference and compliance requirements. For more information, see Protecting data using encryption.

In this post, we use server-side encryption using CMK because it provides low operational overhead. AWS KMS (SSE-KMS) uses CMK stored in KMS to protect customer data. AWS KMS provides fine-grained audit and access control over requests for encryption keys. SSE-KMS and AWS Glue both use data keys protected by your CMK. You can view the default encryption details by navigating to the Amazon S3 console, choosing the bucket created as part of the CloudFormation template, and viewing its properties.

Encryption in transit within the data lake

After you secure the data at rest, let’s make sure that the S3 service endpoints are protected with SSL. By default, public S3 service endpoints can be accessed via HTTPS or HTTP. Therefore, to ensure that traffic is encrypted in transit, use the default HTTPS endpoint for the Amazon S3 API. To ensure that data in Amazon S3 is protected with SSL, you must force SSL on the bucket using a bucket policy. This policy is enforced on all of your buckets where encryption is required to meet your security needs. For details on encrypting your data in transit as you move your data into your data lake in the AWS Well-Architected Framework, see How do you anticipate, respond to, and recover from incidents?

Preprocess data to restrict access to sensitive data

Not all consumers of the data are allowed to see sensitive information, so we need to split the data in a way that gives us more control over how data is exposed to users. To do that, we transform it using PySpark running in AWS Glue. This allows us to aggregate the data for our analysts and mask the sensitive information before saving it back to Amazon S3 for analysis.

Configure Lake Formation

First, we use Lake Formation to create a central data lake repository on Amazon S3 to store and analyze your data.

  1. On the Lake Formation console, under Data Catalog, choose Settings.
  2. Deselect the two check boxes associated with the Data Catalog using only IAM permissions.

To maintain backward compatibility with AWS Glue, these settings are enabled by default for new Data Catalog resources. These settings effectively cause access to Data Catalog resources to be controlled solely by IAM policies. Deselect the permissions so that you can give individual permissions to Data Catalog resources from Lake Formation.

  1. Choose Save.

Next, we revoke generic IAM principal access to tables in our database. This makes sure that only permissions applied using Lake Formation will apply.

  1. On the Lake Formation console, choose Administrative roles and tasks.
  2. Under Database Creators¸ select IAMAllowedPrincipals.
  3. Choose Revoke.
  4. For Catalog permissions, select Create database.
  5. Leave all other settings at their default and choose Revoke.

We now need to revoke permissions for IAMAllowedPrincipals.

  1. On the Data permissions page, revoke all grants to the group IAMAllowedPrincipals.

Lake Formation is now the single pane of glass for data governance within your data lake. To configure user permissions in Lake Formation, you must be a data lake admin. The CloudFormation template already created [email protected] as our data lake admin. When you’re logged in as the admin, you need to grant them the ability to manage permissions for users.

  1. On the IAM console, choose Users.
  2. Choose the [email protected] user.
  3. On the Security Credentials tab, copy the link for that user to log in.
  4. Open the link in a new browser or private browser window.
  5. Reset the password (on your first login).
  6. On the Lake Formation console, choose Data permissions.
  7. Choose Grant.
  8. Make sure the admin user has both database and grantable Super permissions on the db1 database.
  9. On the Databases page, select the db1 database.
  10. On the Actions menu, choose Edit.
  11. Choose the S3 bucket created as part of the CloudFormation stack as the database location.

The naming convention of the S3 bucket is <account_number><region_name><bucket_name>. For example, it should appear as similar to 111122223333virginiamybucketname. Do not choose the bucket name ending with athenaoutput.

  1. Deselect Use only IAM access control for new tables in this database.
  2. Choose Save.

After this step, if you see IAMAllowedPrincpals under Data permissions, follow the steps as explained before to revoke the permissions.

The next permission we want to grant is the ability for our AWS Glue execution role to create new tables in our db1 database.

  1. On the Data permissions page, choose Grant.
  2. For IAM users and roles, choose the AWS Glue role created as part of the CloudFormation stack.
  3. For Database, choose the db1 database.
  4. For Database permissions, select Create table.
  5. Make sure that no options are selected for Grantable permissions.
  6. Choose Grant.

Now that data lake admin is set up and Lake Formation is managing permissions, we can work on creating table definitions of cards, customers, and sales data into the Lake Formation Data Catalog. Let’s verify the files created by the CloudFormation template into S3 bucket folders.

  1. On the Amazon S3 console, choose the bucket that you chose for the db1 location.

The following CSV files are in their respective folders cards, customers, and sales:

  • cards.csv
  • customers.csv
  • sales.csv

Now that we’ve verified the files, let’s catalog it in the Lake Formation Data Catalog using AWS Glue crawlers.

  1. On the AWS Glue console, choose Crawlers.
  2. Select the crawler security-blog-crawler and choose Run crawler.

This crawler was created by the CloudFormation template. It can crawl multiple data stores like cards, customers, and sales to populate the Data Catalog.

After you run the crawler, you should see the tables in the Data Catalog. To view the tables, switch to the Lake Formation console, choose Databases, select the db1 database, and choose View tables.

Configure data access controls

Now that our datasets are cataloged, it’s important to define access controls to ensure only authorized users are allowed to see sensitive information. We use Lake Formation to define these fine-grained controls for users who need access to data in the data lake.

Configure data encryption for AWS Glue

We use AWS Glue security configuration to secure data when being accessed by AWS Glue crawlers and ETL jobs. The data being written by the ETL jobs to Amazon S3 targets and logs to Amazon CloudWatch. The security settings were automatically configured by the CloudFormation template and can be viewed on the AWS Glue console.

Process the result set

Our dataset includes information about our customer demographics and references between customers that share credit cards when making purchases. We develop a simple job using PySpark to combine the purchasing user’s information with the cardholder. You can perform other transformations and enrichment such as masking sensitive fields or looking up additional details in external systems. When the job is complete, it outputs the data in columnar format to give us better performance and reduce cost when we later analyze it. For more information on how this process works and the tools required, see How to extract, transform, and load data for analytic processing using AWS Glue (Part 2).

To demonstrate this capability in action, you run the AWS Glue ETL jobs created by CloudFormation template. To run the script, you log in as an admin user, but ideally, you should have a data engineer managing the ETL at this point. For the sake of simplicity, we configured the data lake administrator to have these permissions.

Let’s run the ETL jobs to clean the cards and sales data. They create new files under the clean_cards and clean_sales S3 folders with the modifications. We start with cleaning the card data. The job replaces full card numbers with the last four digits of the card numbers and create a new file in the clean_cards folder.

  1. Make sure you’re signed in as the data lake admin with username [email protected].
  2. On the AWS Glue console, choose Jobs.
  3. Select the job clean_cards_data and on the Action menu, choose Run job.
  4. Expand Security configuration, script libraries, and job parameters.
  5. Under Job parameters, add the key --output_s3_bucket_name and the value as the bucket name that contains the CSV files.
  6. Choose Run job.

Next, we clean up our sales data. The dollar amounts for the purchase prices are casted as strings with a dollar sign ($) in them. To make analytics easier downstream, we want to have those casted as decimals without the dollar signs.

  1. Follow the same procedure to run the clean_sales_data

Now that we have generated our clean cards and clean sales data in the S3 bucket, we run security-blog-crawler to add the clean cards and clean sales tables to our Data Catalog.

  1. In the navigation pane, choose Crawlers.
  2. Select the crawler called security-blog-crawler and choose Run crawler.

Now that we have our new tables with masked card data and cleaned sales data, you grant the analyst user permission to access it in Lake Formation.

  1. On the Lake Formation console, grant the Select permission to the clean_cards and clean_sales tables for the user [email protected].

This completes the permissions scope for the analyst user.

Query and visualize data with QuickSight

Now that our data is ready, we use QuickSight to visualize the data. We first add [email protected] to QuickSight as an author who can create different analysis for viewers. We use Lake Formation fine-grained permissions to grant secure access to the analyst user, who can prepare analysis in QuickSight.

You need to be logged in as your account administrator, not the analyst or data lake admin. Your account must be subscribed to QuickSight as an Enterprise user to enable integration with Lake Formation fine-grained access control.

Enable fine-grained permission for QuickSight users

Fine-grained permissions defined in Lake Formation are mapped to QuickSight authors or a group of viewers. To enable fine-grained permissions for viewers, you must first add them to a QuickSight group that can be configured in Lake Formation with fine-grained permissions. For this post, we create the QuickSight user [email protected].

  1. Sign in to your AWS account with the admin user (not the Lake Formation admin user).
  2. In a new tab, open the QuickSight console.
  3. Choose the logged-in user and choose Manage QuickSight.
  4. Add the user [email protected] with the author role.

Now we create the QuickSight group Analyst and add the QuickSight user [email protected] to the group. We use the AWS Command Line Interface (AWS CLI) for this purpose.

  1. Run the following command to create the group (provide your account ID):
    aws quicksight create-group --aws-account-id=<account_id> --namespace=default --group-name="Analyst" --description="Data Analyst Group"

  2. Run the following command to add the user to the group:
    aws quicksight create-group-membership --group-name 'Analyst' --aws-account-id <account_id> --namespace default [email protected]

  3. Run the following command to get the ARN of the QuickSight group:
    aws quicksight describe-group --group-name=Analyst --aws-account-id <account_id> --namespace=default

  4. Make a note of this ARN.

We use this ARN to configure access permissions to this QuickSight group in Lake Formation.

Because we configured the data lake bucket and Athena output bucket with CMKs, we need to grant the following key operations to the QuickSight role.

  1. Enter the following AWS CLI command to create the QuickSight role when you subscribe to QuickSight (also provide the KMS key ID, created by the CloudFormation stack):
    aws kms create-grant --key-id <kms-key> --grantee-principal arn:aws:iam::<accountid>:role/service-role/aws-quicksight-s3-consumers-role-v0 --operations Decrypt Encrypt DescribeKey GenerateDataKey GenerateDataKeyPair

  2. Sign in with the [email protected]
  3. On the Lake Formation console, choose Data permissions.
  4. Choose Grant.
  5. For SAML and Amazon QuickSight users and groups, enter the Analyst group ARN you copied earlier.
  6. For Database, choose db1.
  7. For Tables, choose clean_cards and clean_sales.
  8. For Table permissions, select Select.
  9. Choose Grant.

Now let’s grant permissions to the customers table by excluding the address and email fields.

  1. On the Data permissions page, choose Grant.
  2. For SAML and Amazon QuickSight users and groups, enter the Analyst group ARN you copied earlier.
  3. For Database, choose db1.
  4. For Tables, choose customers.
  5. For Columns, choose Exclude columns.
  6. For Exclude columns, choose address and email.
  7. For Table permissions, select Select.
  8. Choose Grant.

Now the Analyst group has the Lake Formation permission for the proper datasets.

Additional security measures

In addition to configuring permissions for QuickSight, you can implement security measures to ensure that you’re visualizing sensitive data properly.

  1. On the QuickSight console, choose Security & permissions.

Because we’re using Athena to connect to source data in data lake, we grant write permissions to the S3 bucket for the Athena query result output. The naming convention of the Athena S3 bucket is <account_number><region_name><bucket_name>. For example, it should appear similar to 111122223333virginiamybucketname.

Visualize the user data

This post would be incomplete without some data science that demonstrates how we can create correlations between users. We have secured our infrastructure, scrubbed our sensitive data in preprocessing, output the results to a columnar tables in our AWS Glue Data Catalog managed by Lake Formation, and configured security settings in QuickSight. The next step is to import the dataset into QuickSight to analyze the data. For more information, see Working with Datasets.

In our use case, we logically grouped our users and cardholders into a columnar format in preprocessing, which we can now use to sort through and derive value. In QuickSight, we use the heat map visualization to map the buyer to the number of times they used someone else’s credit card. This gives us a visual of which users used different credit cards the most.

  1. Sign in using [email protected].
  2. On the QuickSight console, choose New analysis.
  3. Choose New datasets.
  4. For Create a Dataset, choose Athena.
  5. Enter ABCCompany Analysis as the data source.
  6. Choose Create data source.
  7. Choose the database db1.

As shown in the following screenshot, you’re only shown the table permissions granted to the QuickSight Analyst group.

  1. Choose Use custom SQL.
  2. Enter the query name as Sales-Cards-Query.

We now run queries to verify that the analyst user doesn’t have access to the cards and sales tables.

  1. Enter the following query for the cards table:
    SELECT * from "db1"."cards"

The following screenshot shows that we get a permission error.

  1. Similarly, you can verify the permissions for sales table by running following query. You should see the same permission error as for the cards table.
    SELECT * from "db1"."sales"

  2. Enter the following query for the customers
    SELECT * from "db1"."customers"

The following screenshot shows that the analyst only has access to customer fields other than address and email.

The analyst only has access to the clean_sales, clean_cards, and customers tables (excluding the address and email fields).

  1. Enter the following SQL and choose Edit/Preview Data:
    SELECT "db1"."clean_sales"."customer_id", COUNT("db1"."clean_sales"."customer_id") as num_diff_card_used FROM "db1"."clean_sales" JOIN "db1"."customers" ON "db1"."clean_sales"."customer_id"="db1"."customers"."customer_id" AND "db1"."clean_sales"."card_id" != "db1"."customers"."card_id" GROUP BY "db1"."clean_sales"."customer_id" ORDER BY num_diff_card_used DESC

The following screenshot shows our query results.

  1. Choose Save & visualize to create a visualization.
  2. Choose + Add and choose Add visual.
  3. Choose the heat map visual type.
  4. Set Rows to customer_id.
  5. Set Columns and Values to num_diff_card.
  6. On the Values drop-down menu, choose Aggregate sum.

The following screenshot shows our QuickSight analysis. You can change the color by choosing Format visual.

From this visualization, we can see that several customers are making purchases with more than one card that they don’t own. We can also add further visualizations that add more context to our data like customer IDs and the total number of purchases made with cards that customers don’t own.

The following are some of the additional datasets and visualizations that you can add to your analysis.

This data may provide valuable insights into the relationships between users and also provide a starting point for forensic investigations into customers that may be making fraudulent purchases.

For instructions on creating a similar dataset, see Creating a Dataset Using Amazon Athena Data. For instructions on creating visualizations with your datasets, see Creating an Amazon QuickSight Visual.

Clean up

When you’re done with this exercise, complete the following steps to delete your resources and stop incurring costs:

  1. On the Amazon S3 console, delete the objects in the buckets created by the CloudFormation stack.
  2. On the CloudFormation console, select your stack and choose Delete.

This cleans up all the resources created by the stack.

  1. Cancel your QuickSight account.

Conclusion

In this post, we used an example of credit card purchases to discuss different ways to protect sensitive data, based on separation of responsibilities with least privileges, encryption, and fine-grained access control. With AWS, you gain the control and confidence you need to securely run your business with the most flexible and secure cloud computing environment available today. To show this in action, we created separate IAM users and granted permissions based on the principle of least privilege to allow users to perform their duties. We reviewed the required AWS KMS configuration to protect data at rest using server-side encryption. We used AWS Glue to prepare the data, remove sensitive information, and further protect data access using Lake Formation’s fine-grained access controls. After we applied the security controls, we showed you how the analyst user can safely create different analyses using QuickSight for visualization.


References

For more information about the steps in this solution, see the following:


About the Authors

Julia Soscia is a Solutions Architect Manager with Amazon Web Services on the Startup team, based out of New York City. Her main focus is to help startups create well-architected environments on the AWS cloud platform and build their business. She enjoys skiing on the weekends in Vermont and visiting the many art museums across New York City.

 

 

 

Mitesh Patel is a Senior Solutions Architect at AWS. He works with customers in SMB to help them develop scalable, secure and cost effective solutions in AWS. He enjoys helping customers in modernizing applications using microservices and implementing serverless analytics platform.

Run usage analytics on Amazon QuickSight using AWS CloudTrail

Post Syndicated from Sunil Salunkhe original https://aws.amazon.com/blogs/big-data/run-usage-analytics-on-amazon-quicksight-using-aws-cloudtrail/

Amazon QuickSight is a cloud-native BI service that allows end users to create and publish dashboards in minutes, without provisioning any servers or requiring complex licensing. You can view these dashboards on the QuickSight product console or embed them into applications and websites. After you deploy a dashboard, it’s important to assess how they and other assets are being adopted, accessed, and used across various departments or customers.

In this post, we use a QuickSight dashboard to present the following insights:

  • Most viewed and accessed dashboards
  • Most updated dashboards and analyses
  • Most popular datasets
  • Active users vs. idle users
  • Idle authors
  • Unused datasets (wasted SPICE capacity)

You can use these insights to reduce costs and create operational efficiencies in a deployment. The following diagram illustrates this architecture.

The following diagram illustrates this architecture.

Solution components

The following table summarizes the AWS services and resources that this solution uses.

Resource Type Name Purpose
AWS CloudTrail logs CloudTrailMultiAccount Capture all API calls for all AWS services across all AWS Regions for this account. You can use AWS Organizations to consolidate trails across multiple AWS accounts.
AWS Glue crawler

QSCloudTrailLogsCrawler

QSProcessedDataCrawler

Ensures that all CloudTrail data is crawled periodically and that partitions are updated in the AWS Glue Data Catalog.
AWS Glue ETL job QuickSightCloudTrailProcessing Reads catalogued data from the crawler, processes, transforms, and stores it in an S3 output bucket.
AWS Lambda function ExtractQSMetadata_func Extracts event data using the AWS SDK for Python, Boto3. The event data is enriched with QuickSight metadata objects like user, analysis, datasets, and dashboards.
Amazon Simple Storage Service (s3)

CloudTrailLogsBucket

QuickSight-BIonBI-processed

One bucket stores CloudTrail data. The other stores processed data.
Amazon QuickSight Quicksight_BI_On_BO_Analysis Visualizes the processed data.

 Solution walkthrough

AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. You can use CloudTrail to log, continuously monitor, and retain account activity related to actions across your AWS infrastructure. You can define a trail to collect API actions across all AWS Regions. Although we have enabled a trail for all Regions in our solution, the dashboard shows the data for single Region only.

After you enable CloudTrail, it starts capturing all API actions and then, at 15-minute intervals, delivers logs in JSON format to a configured Amazon Simple Storage Service (Amazon S3) bucket. Before the logs are made available to our ad hoc query engine, Amazon Athena, they must be parsed, transformed, and processed by the AWS Glue crawler and ETL job.

Before the logs are made available to our ad hoc query engine

This will be handled by AWS Glue Crawler & AWS Glue ETL Job. The AWS Glue crawler crawls through the data every day and populates new partitions in the Data Catalog. The data is later made available as a table on the Athena console for processing by the AWS Glue ETL job. Glue ETL Job QuickSightCloudtrail_GlueJob.txt filters logs and processes only those events where the event source is QuickSight. (for example, eventSource = quicksight.amazonaws.com’).

  This will be handled by AWS Glue Crawler & AWS Glue ETL Job.

The following screenshot shows the sample JSON for the QuickSight API calls.

The following screenshot shows the sample JSON for the QuickSight API calls.

The job processes those events and creates a Parquet file. The following table summarizes the file’s data points.

Quicksightlogs
Field Name Data Type
eventtime Datetime
eventname String
awsregion String
accountid String
username String
analysisname String
Date Date

The processed data is stored in an S3 folder at s3://<BucketName>/processedlogs/. For performance optimization during querying and connecting this data to QuickSight for visualization, these logs are partitioned by date field. For this reason, we recommend that you configure the AWS Glue crawler to detect the new data and partitions and update the Data Catalog for subsequent analysis. We have configured the crawler to run one time a day.

We need to enrich this log data with metadata from QuickSight, such as a list of analyses, users, and datasets. This metadata can be extracted using descibe_analysis, describe_user, describe_data_set in the AWS SDK for Python.

We provide an AWS Lambda function that is ideal for this extraction. We configured it to be triggered once a day through Amazon EventBridge. The extracted metadata is stored in the S3 folder at s3://<BucketName>/metadata/.

Now that we have processed logs and metadata for enrichment, we need to prepare the data visualization in QuickSight. Athena allows us to build views that can be imported into QuickSight as datasets.

We build the following views based on the tables populated by the Lambda function and the ETL job:

CREATE VIEW vw_quicksight_bionbi 
AS 
  SELECT Date_parse(eventtime, '%Y-%m-%dT%H:%i:%SZ') AS "Event Time", 
         eventname  AS "Event Name", 
         awsregion  AS "AWS Region", 
         accountid  AS "Account ID", 
         username   AS "User Name", 
         analysisname AS "Analysis Name", 
         dashboardname AS "Dashboard Name", 
         Date_parse(date, '%Y%m%d') AS "Event Date" 
  FROM   "quicksightbionbi"."quicksightoutput_aggregatedoutput" 

CREATE VIEW vw_users 
AS 
  SELECT usr.username "User Name", 
         usr.role     AS "Role", 
         usr.active   AS "Active" 
  FROM   (quicksightbionbi.users 
          CROSS JOIN Unnest("users") t (usr)) 

CREATE VIEW vw_analysis 
AS 
  SELECT aly.analysisname "Analysis Name", 
         aly.analysisid   AS "Analysis ID" 
  FROM   (quicksightbionbi.analysis 
          CROSS JOIN Unnest("analysis") t (aly)) 

CREATE VIEW vw_analysisdatasets 
AS 
  SELECT alyds.analysesname "Analysis Name", 
         alyds.analysisid   AS "Analysis ID", 
         alyds.datasetid    AS "Dataset ID", 
         alyds.datasetname  AS "Dataset Name" 
  FROM   (quicksightbionbi.analysisdatasets 
          CROSS JOIN Unnest("analysisdatasets") t (alyds)) 

CREATE VIEW vw_datasets 
AS 
  SELECT ds.datasetname AS "Dataset Name", 
         ds.importmode  AS "Import Mode" 
  FROM   (quicksightbionbi.datasets 
          CROSS JOIN Unnest("datasets") t (ds))

QuickSight visualization

Follow these steps to connect the prepared data with QuickSight and start building the BI visualization.

  1. Sign in to the AWS Management Console and open the QuickSight console.

You can set up QuickSight access for end users through SSO providers such as AWS Single Sign-On (AWS SSO), Okta, Ping, and Azure AD so they don’t need to open the console.

You can set up QuickSight access for end users through SSO providers

  1. On the QuickSight console, choose Datasets.
  2. Choose New dataset to create a dataset for our analysis.

Choose New dataset to create a dataset for our analysis.

  1. For Create a Data Set, choose Athena.

In the previous steps, we prepared all our data in the form of Athena views.

  1. Configure permission for QuickSight to access AWS services, including Athena and its S3 buckets. For information, see Accessing Data Sources.

Configure permission for QuickSight to access AWS services,

  1. For Data source name, enter QuickSightBIbBI.
  2. Choose Create data source.

Choose Create data source.

  1. On Choose your table, for Database, choose quicksightbionbi.
  2. For Tables, select vw_quicksight_bionbi.
  3. Choose Select.

Choose Select.

  1. For Finish data set creation, there are two options to choose from:
    1. Import to SPICE for quicker analytics – Built from the ground up for the cloud, SPICE uses a combination of columnar storage, in-memory technologies enabled through the latest hardware innovations, and machine code generation to run interactive queries on large datasets and get rapid responses. We use this option for this post.
    2. Directly query your data – You can connect to the data source in real time, but if the data query is expected to bring bulky results, this option might slow down the dashboard refresh.
  2. Choose Visualize to complete the data source creation process.

Choose Visualize to complete the data source creation process.

Now you can build your visualizations sheets. QuickSight refreshes the data source first. You can also schedule a periodic refresh of your data source.

Now you can build your visualizations sheets.

The following screenshot shows some examples of visualizations we built from the data source.

The following screenshot shows some examples of visualizations we built from the data source.

 

This dashboard presents us with two main areas for cost optimization:

  • Usage analysis – We can see how analyses and dashboards are being consumed by users. This area highlights the opportunity for cost saving by looking at datasets that have not been used for the last 90 days in any of the analysis but are still holding a major chunk of SPICE capacity.
  • Account governance – Because author subscriptions are charged on a fixed fee basis, it’s important to monitor if they are actively used. The dashboard helps us identify idle authors for the last 60 days.

Based on the information in the dashboard, we could do the following to save costs:

Conclusion

In this post, we showed how you can use CloudTrail logs to review the use of QuickSight objects, including analysis, dashboards, datasets, and users. You can use the information available in dashboards to save money on storage, subscriptions, understand maturity of QuickSight Tool adoption and more.


About the Author

Sunil SalunkheSunil Salunkhe is a Senior Solution Architect working with Strategic Accounts on their vision to leverage the cloud to drive aggressive growth strategies. He practices customer obsession by solving their complex challenges in all the aspects of the cloud journey including scale, security and reliability. While not working, he enjoys playing cricket and go cycling with his wife and a son.

Building an administrative console in Amazon QuickSight to analyze usage metrics

Post Syndicated from Ying Wang original https://aws.amazon.com/blogs/big-data/building-an-administrative-console-in-amazon-quicksight-to-analyze-usage-metrics/

Given the scalability of Amazon QuickSight to hundreds and thousands of users, a common use case is to monitor QuickSight group and user activities, analyze the utilization of dashboards, and identify usage patterns of an individual user and dashboard. With timely access to interactive usage metrics, business intelligence (BI) administrators and data team leads can efficiently plan for stakeholder engagement and dashboard improvements. For example, you can remove inactive authors to reduce license cost, as well as analyze dashboard popularity to understand user acceptance and stickiness.

This post demonstrates how to build an administrative console dashboard and serverless data pipeline. We combine QuickSight APIs with AWS CloudTrail logs to create the datasets to collect comprehensive information of user behavior and QuickSight asset usage patterns.

This post provides a detailed workflow that covers the data pipeline, sample Python code, and a sample dashboard of this administrative console. With the guidance of this post, you can configure this administrative console in your own environment.

Let’s look at Forwood Safety, an innovative, values-driven company with a laser focus on fatality prevention. An early adopter of QuickSight, they have collaborated with AWS to deploy this solution to collect BI application usage insights.

“Our engineers love this admin console solution,” says Faye Crompton, Leader of Analytics and Benchmarking at Forwood. “It helps us to understand how users analyze critical control learnings by helping us to quickly identify the most frequently visited dashboards in Forwood’s self-service analytics and reporting tool, FAST.”

Solution overview

The following diagram illustrates the workflow of the solution.

The following diagram illustrates the workflow of the solution.

The workflow involves the following steps:

  1. The AWS Lambda function Data_Prepare is scheduled to run hourly. This function calls QuickSight APIs to get QuickSight namespace, group, user, and assets access permissions information and saves the results to an Amazon Simple Storage Service (Amazon S3) bucket.
  2. CloudTrail logs are stored in S3 bucket.
  3. Based on the file in Amazon S3 that contains user-group information, the QuickSight assets access permissions information, as well as view dashboard and user login events in CloudTrail logs. Three Amazon Athena tables and several views are created. Optionally, the BI engineer can combine these two tables with employee information tables to display human resource information of the users.
  4. Two QuickSight datasets fetch the data in the Athena tables created in Step 3 through SPICE mode. Then, based on these datasets, a QuickSight dashboard is created.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account
  • Access to the following AWS services:
    • Amazon QuickSight
    • Amazon Athena
    • AWS Lambda
    • Amazon S3
  • Basic knowledge of Python
  • Optionally, Security Assertion Markup Language 2.0 (SAML 2.0) or OpenID Connect (OIDC) single sign-on (SSO) configured for QuickSight access

Creating resources

Create your resources by launching the following AWS CloudFormation stack:

After the stack creation is successful, you have one Amazon CloudWatch Events rule, one Lambda function, one S3 bucket, and the corresponding AWS Identity and Access Management (IAM) policies.

To create the resources in a Region other than us-east-1, download the Lambda function.

Creating Athena tables

The Data_Prepare Lambda function is scheduled to run hourly with the CloudWatch Events rule admin-console-every-hour. This function calls the QuickSight APIs list_namespaces, list_users, list_user_groups, list_dashboards, list_datasets, list_datasources, list_analyses, list_themes, describe_data_set_permissions, describe_dashboard_permissions, describe_data_source_permissions, describe_analysis_permissions, and describe_theme_permissions to get QuickSight users and assets access permissions information. Finally, this function creates two files, group_membership.csv and object_access.csv, and saves these files to an S3 bucket.

Run the following SQL query to create two Athena tables (group_membership and object_access):

CREATE EXTERNAL TABLE `group_membership`(
`namespace` string,   
`group` string, 
`user` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3:// admin-console<aws_account_id>/monitoring/quicksight/group_membership/'
TBLPROPERTIES (
  'areColumnsQuoted'='false', 
  'classification'='csv', 
  'columnsOrdered'='true', 
  'compressionType'='none', 
  'delimiter'=',',
  'typeOfData'='file')
CREATE EXTERNAL TABLE `object_access`(
`aws_region` string,   
`object_type` string, 
`object_name` string,
`object_id` string,
`principal_type` string,
`principal_name` string,
`namespace` string,
`permissions` string
)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3:// admin-console<aws_account_id>/monitoring/quicksight/object_access/'
TBLPROPERTIES (
  'areColumnsQuoted'='false', 
  'classification'='csv', 
  'columnsOrdered'='true', 
  'compressionType'='none', 
  'delimiter'=',',
  'typeOfData'='file')

The following screenshot is sample data of the group_membership table.

The following screenshot is sample data of the group_membership table.

The following screenshot is sample data of the object_access table.

The following screenshot is sample data of the object_access table.

For instructions on building an Athena table with CloudTrail events, see Amazon QuickSight Now Supports Audit Logging with AWS CloudTrail. For this post, we create the table cloudtrail_logs in the default database.

Creating views in Athena

Now we have the tables ready in Athena and can run SQL queries against them to generate some views to analyze the usage metrics of dashboards and users.

Create a view of a user’s role status with the following code:

CREATE OR REPLACE VIEW users AS
(select Namespace,
 Group,
 User,
(case 
when Group in ('quicksight-fed-bi-developer', 'quicksight-fed-bi-admin') 
then 'Author' 
else 'Reader' 
end) 
as author_status
from "group_membership" );

Create a view of GetDashboard events that happened in the last 3 months with the following code:

CREATE OR REPLACE VIEW getdashboard AS 
(SELECT 
"useridentity"."type",   "split_part"("useridentity"."sessioncontext"."sessionissuer"."arn",'/', 2) AS "assumed_role", COALESCE("useridentity"."username","concat"("split_part"("userid
entity"."arn", '/', 2), '/', "split_part"("useridentity"."arn",
'/', 3))) AS "user_name",
awsregion,
"split_part"("split_part"("serviceeventdetails", 'dashboardName":', 2),',', 1) AS dashboard_name, "split_part"("split_part"("split_part"("split_part"("serviceeventdetails", 'dashboardId":', 2),',', 1), 'dashboard/', 2),'"}',1) AS dashboardId,
date_parse(eventtime, '%Y-%m-%dT%H:%i:%sZ') AS event_time, max(date_parse(eventtime, '%Y-%m-%dT%H:%i:%sZ')) AS latest_event_time
FROM cloudtrail_logs
WHERE 
eventsource = 'quicksight.amazonaws.com' 
AND
eventname = 'GetDashboard' 
AND
DATE_TRUNC('day',date_parse(eventtime, '%Y-%m-%dT%H:%i:%sZ')) > cast(current_date - interval '3' month AS date)
GROUP BY  1,2,3,4,5,6,7)

In the preceding query, the conditions defined in the where clause only fetch the records of GetDashboard events of QuickSight.

How can we design queries to fetch records of other events? We can review the CloudTrail logs to look for the information. For example, let’s look at the sample GetDashboard CloudTrail event:

{
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "<principal_id>: <user_name>",
        "arn": "arn:aws:sts:: <aws_account_id>:assumed-role/<IAM_role_ name>/<user_name>",
        "accountId": "<aws_account_id>",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "<principal_id>",
                …
            }
        }
    },
    "eventTime": "2021-01-13T16:55:36Z",
    "eventSource": "quicksight.amazonaws.com",
    "eventName": "GetDashboard",
    "awsRegion": "us-east-1",
    "eventID": "a599c8be-003f-46b7-a40f-2319efb6b87a",
    "readOnly": true,
    "eventType": "AwsServiceEvent",
    "serviceEventDetails": {
        "eventRequestDetails": {
            "dashboardId": "arn:aws:quicksight:us-east-1: <aws_account_id>:dashboard/<dashboard_id>"
        },
        "eventResponseDetails": {
            "dashboardDetails": {
                "dashboardName": "Admin Console",
                "dashboardId": "arn:aws:quicksight:us-east-1: <aws_account_id>:dashboard/<dashboard_id>",
                "analysisIdList": [
                    "arn:aws:quicksight:us-east-1: <aws_account_id>:analysis/<analysis_id>"
            }
        }
    }
}

With eventSource=“quicksight.amazonaws.com” and eventName=“GetDashboard”, we can get all the view QuickSight dashboard events.

Similarly, we can define the condition as eventname = ‘AssumeRoleWithSAML‘ to fetch the user login events. (This solution assumes that the users log in to their QuickSight account with identity federation through SAML.) For more information about querying CloudTrail logs to monitor other interesting user behaviors, see Using administrative dashboards for a centralized view of Amazon QuickSight objects.

Furthermore, we can join with employee information tables to get a QuickSight user’s human resources information.

Finally, we can generate a view called admin_console with QuickSight group and user information, assets information, CloudTrail logs, and, optionally, employee information. The following screenshot shows an example preview.

The following screenshot shows an example preview.

Creating datasets

With the Athena views ready, we can build some QuickSight datasets. We can load the view called admin_console to build a SPICE dataset called admin_console and schedule this dataset to be refreshed hourly. Optionally, you can create a similar dataset called admin_console_login_events with the Athena table based on eventname = ‘AssumeRoleWithSAML‘ to analyze QuickSight users log in events. According to the usage metrics requirement in your organization, you can create other datasets to serve the different requests.

Building dashboards

Now we can build a QuickSight dashboard as the administrative console to analyze usage metrics. The following steps are based on the dataset admin_console. The schema of the optional dataset admin_console_login_events is the same as admin_console. You can apply the same logic to create the calculated fields to analyze user login activities.

  1. Create parameters.

For example, we can create a parameter called InActivityMonths, as in the following screenshot.For example, we can create a parameter called InActivityMonths, as in the following screenshot.Similarly, we can create other parameters such as InActivityDays, Start Date, and End Date.

  1. Create controls based on the parameters.

Create controls based on the parameters.

  1. Create calculated fields.

For instance, we can create a calculated field to detect the active or inactive status of QuickSight authors. If the time span between the latest view dashboard activity and now is larger or equal to the number defined in the Inactivity Months control, the author status is Inactive. The following screenshot shows the relevant code.

The following screenshot shows the relevant code.

According to end user’s requirement, we can define several calculated fields to perform the analysis.

  1. Create visuals.

For example, we create an insight to display the top three dashboards view by readers and a visual to display the authors of these dashboards.

For example, we create an insight to display the top three dashboards view by readers and a visual to display the authors of these dashboards.

  1. We can add URL action to define some extra features to email inactive authors or check details of users.

We can add URL action to define some extra features to email inactive authors or check details of users.

The following sample code defines the action to email inactive authors:

mailto:<<email>>?subject=Alert to inactive author! &body=Hi, <<username>>, any author without activity for more than a month will be deleted. Please log in to your QuickSight account to continue accessing and building analyses and dashboards!

The following sample code defines the action to email inactive authors:
The following screenshots show an example dashboard that you can make using our data.

The following is the administrative console landing page. We provide the overview, terminology explanation and thumbnails of the other two tabs in this page.

The following is the administrative console landing page.

The following screenshots show the User Analysis tab.

The following screenshots show the User Analysis tab.

The following screenshots show the Dashboards Analysis tab.

The following screenshots show the Dashboards Analysis tab.

You can interactively play with the sample dashboard in the following Interactive Dashboard Demo.

You can reference to public template of the preceding dashboard in create-template, create-analysis, and create-dashboard API calls to create this dashboard and analysis in your account. The public template of this dashboard with the template ARN is 'TemplateArn': 'arn:aws:quicksight:us-east-1:889399602426:template/admin-console'.

Additional usage metrics

Additionally, we can perform some complicated analysis to collect advanced usage metrics. For example, Forwood Safety raised a unique request to analyze the readers who log in but don’t do any viewing of dashboard actions (see the following code). This helps their clients identify and prevent any wasting of reader sessions fees. Leadership teams value the ability to minimize uneconomical user activity.

CREATE OR REPLACE VIEW "loginwithoutviewdashboard" AS
with login as
(SELECT COALESCE("useridentity"."username", "split_part"("useridentity"."arn", '/', 3)) AS "user_name", awsregion,
date_parse(eventtime, '%Y-%m-%dT%H:%i:%sZ') AS event_time
FROM cloudtrail_logs
WHERE
eventname = 'AssumeRoleWithSAML'
GROUP BY  1,2,3),
dashboard as
(SELECT COALESCE("useridentity"."username", "split_part"("useridentity"."arn", '/', 3)) AS "user_name", awsregion,
date_parse(eventtime, '%Y-%m-%dT%H:%i:%sZ') AS event_time
FROM cloudtrail_logs
WHERE
eventsource = 'quicksight.amazonaws.com'
AND
eventname = 'GetDashboard'
GROUP BY  1,2,3),
users as 
(select Namespace,
Group,
User,
(case
when Group in (‘quicksight-fed-bi-developer’, ‘quicksight-fed-bi-admin’)
then ‘Author’
else ‘Reader’
end)
as author_status
from "group_membership" )
select l.* 
from login as l 
join dashboard as d 
join users as u 
on l.user_name=d.user_name 
and 
l.awsregion=d.awsregion 
and 
l.user_name=u.user_name
where d.event_time>(l.event_time + interval '30' minute ) 
and 
d.event_time<l.event_time 
and 
u.author_status='Reader'

Cleaning up

To avoid incurring future charges, delete the resources you created with the CloudFormation template.

Conclusion

This post discussed how BI administrators can use QuickSight, CloudTrail, and other AWS services to create a centralized view to analyze QuickSight usage metrics. We also presented a serverless data pipeline to support the administrative console dashboard.

You can request a demo of this administrative console to try for yourself.


About the Authors

Ying Wang is a Data Visualization Engineer with the Data & Analytics Global Specialty Practice in AWS Professional Services.

 

 

 

Jill FlorantJill Florant manages Customer Success for the Amazon QuickSight Service team

Create a custom data connector to Slack’s Member Analytics API in Amazon QuickSight with Amazon Athena Federated Query

Post Syndicated from Pablo Redondo Sanchez original https://aws.amazon.com/blogs/big-data/create-a-custom-data-connector-to-slacks-member-analytics-api-in-amazon-quicksight-with-amazon-athena-federated-query/

Amazon QuickSight recently added support for Amazon Athena Federated Query, which allows you to query data in place from various data sources. With this capability, QuickSight can extend support to query additional data sources like Amazon CloudWatch Logs, Amazon DynamoDB, and Amazon DocumentDB (with Mongo DB compatibility) via their existing Amazon Athena data source. You can also use the Athena Query Federation SDK to write custom connectors and query any source accessible with a Java API, whether it is relational, non-relational, object, or a custom data endpoint.

A common analytics use case is to access data from a REST API endpoint and blend it with information from other sources. In this post, I walk you through the process of setting up a custom federated query connector in Athena to query data from a REST API endpoint and build a QuickSight dashboard that blends data from the REST API endpoint with other data sources.

To illustrate this use case, we work with Slack, the makers of a leading channel-based messaging platform, to test their Member Analytics API, which can help our mock company, Example Corp, understand Slack adoption and member engagement across different teams.

How the Slack Member Analytics API works

The following diagram illustrates the Slack Member Analytics API.

The following diagram illustrates the Slack Member Analytics API.

The Slack Member Analytics API is a REST API endpoint available for Slack Enterprise Grid customers. Authorized users and services can access the member usage stats dataset via the admin.analytics.getFile endpoint of the Slack Web API. The data consists on a new-line delimited JSON file with daily Slack activity stats at the member level. A record looks like the following code:

{ 
    "enterprise_id":"AAAAAAA",
    "date":"2020-11-10",
    "user_id":"U01ERHY4589",
    "email_address":"[email protected]",
    "is_guest":false,
    "is_billable_seat":false,
    "is_active":true,
    "is_active_ios":false,
    "is_active_android":false,
    "is_active_desktop":true,
    "reactions_added_count":3,
    "messages_posted_count":10, 
    "channel_messages_posted_count":0,
    "files_added_count":0
}

To request data, you must provide a date argument in the format of YYYY-MM-DD, a type argument with the value member, and an OAuth bearer token as the header. The response is a compressed (.gzip) JSON file with data for the requested date. See the following code of a sample request:

curl -X GET -H “Authorization: Bearer xoxp-..."  https://slack.com/api/admin.analytics.getFile?date=2020-09-01&type=member > data.gzip

Building the solution for Example Corp

For our use case, Example Corp has recently purchased Slack for 1,000 users and as the Collaboration team onboards new teams to Slack, they want to measure Slack adoption and engagement within each new team. If they see low adoption or engagement within a group at the company, they can work with that group to understand why they aren’t using Slack and provide education and support, as needed.

Example Corp wants to provide analysts access to the Slack member usage stats to run ad hoc queries in place (directly from the source) without maintaining a new extract, transform, and load (ETL) pipeline. They use the QuickSight cross data source join feature to blend their Slack usage stats with their HR dataset.

To achieve this, Example Corp implements the following steps:

  1. Authorize the custom federated query connector with Slack to access the Member Analytics API.
  2. Develop and deploy a custom federated query connector in the Example Corp AWS account.
  3. Create a dataset in the Example Corp QuickSight environment that reads Slack member usage data for the last 30 days and blends it with an HR dataset.
  4. Create a QuickSight dashboard that shows usage trends of provisioned vs. active users.

Example Corp program managers can now monitor slack engagement using their QuickSight Dashboard (see the following screenshot).

Example Corp program managers can now monitor slack engagement using their QuickSight Dashboard

The following diagram illustrates the overall architecture of the solution.

The following diagram illustrates the overall architecture of the solution.

The following sections describe the components in detail and provide sample code to implement the solution in your environment.

Authorizing the custom federated query connector to access the Slack Analytics API

Data REST API endpoints typically have an authentication mechanism such as standard HTTP authentication or a bearer token. In the case of the Slack Web API, a bearer token is required on every request. The Slack Member Analytics API uses an OAuth protocol to authorize applications’ read access to data from an organization’s Slack environment.

To perform the OAuth handshake, Example Corp deploys a custom web application on Amazon Elastic Compute Cloud (Amazon EC2) and registers it as a new Slack application. When it’s deployed, Example Corp Slack admins can access the web application UI to authenticate with Slack and authorize read access to the custom federated query connector. After successful authentication, the custom web application stores the bearer token as a secret in AWS Secrets Manager. Only the custom application server and the federated query connector have access to this secret.

The following is an architecture diagram and brief description of the OAuth authorization workflow between Slack.com and the custom web application. As a prerequisite, you need to register your custom application with Slack.

The following is an architecture diagram and brief description of the OAuth authorization workflow between Slack.com and the custom web application.

  1. The Slack admin accesses the custom application UI from their browser and chooses Add to Slack to begin the authorization process.

The Slack admin accesses the custom application UI from their browser and chooses Add to Slack to begin the authorization process.

  1. The custom application redirects the admin to Slack.com to authenticate and authorize the client with an admin.analytics:read access for Example Corp Slack Enterprise Grid.

The custom application redirects the admin to Slack.com to authenticate and authorize the client with an admin.analytics:read access for Example Corp Slack Enterprise Grid.

  1. Slack.com redirects the admin back to the custom application UI, passing a temporary authorization code in the request.
  2. On the backend, the custom application retrieves Slack client secrets from a Secrets Manager secret. The Slack client secrets are obtained during the Slack application registration.
  3. The custom application server makes a request for a bearer token to the Slack API, passing both the temporary authorization code and the Slack client secrets.
  4. If both the temporary authorization code and the client secrets are valid, then the Slack API returns a bearer token to the custom application server.
  5. The custom application saves the bearer token in the Secrets Manager secret.
  6. Finally, the application sends a confirmation of successful authorization to the admin.

Slack admins can revoke access to the application from the organization’s console at any time.

You can find the source code and detailed instructions to deploy this sample OAuth web application in the GitHub repo. When the authorization workflow is complete, you can pause or stop the resources running the web application. Going forward, the federated query connector accesses the token from Secrets Manager.

Deploying the custom federated query connector

When the OAuth workflow is complete, we can deploy the custom federated query connector in the Example Corp AWS environment. For Example Corp, we develop a custom AWS Lambda function using the Athena Query Federation Java SDK and a Java HTTP client to connect with the Slack Member Analytics REST API. Finally, we register it as a new data source within Athena.

The following is a diagram of how the custom connector workflow operates.

The following is a diagram of how the custom connector workflow operates.

The workflow includes the following steps:

  1. Users submit a query to Athena using the following query: select * from <catalog_name>.slackanalytics.member_analytics where date='2020-11-10', where <catalog_name> is the name specified when creating the Athena data source.
  2. Athena compiles the query and runs the Lambda function to retrieve the Slack authorization token from Secrets Manager and determine the number of partitions based on the query predicates (where clause).
  3. The Slack Member Analytics Connector partitions the data by date and runs a Lambda function for each partition (date) specified in the query. For example, if the predicate is WHERE date IN (‘2020-11-10’, ‘2020-11-12’), Athena runs two instances of the Lambda function. When no dates are specified in the where clause, the connector gets data for the last 30 days.
  4. Each instance of the Lambda function makes a request to the Slack Member API to retrieve data for each day.
  5. Finally, Athena performs any aggregation and computation specified in the query and return the results to the client.

You can deploy this sample Slack Member Analytics Lambda function in your AWS environment via AWS CloudFormation with the following template. If you want to modify and build the connector from scratch, you can find the source code and instructions in the GitHub repo.

After the Lambda function has been deployed, create a new data source in Athena. For step-by-step instructions, see Deploying a Connector and Connecting to a Data Source.

  1. On the Athena console, in the query editor, choose Connect data source.

On the Athena console, in the query editor, choose Connect data source.

  1. Select All other data sources.
  2. Point your catalog to your new Lambda function.

Point your catalog to your new Lambda function.

You should be able to browse your new catalog within Athena from the Athena console and query the Slack Member Analytics API using SQL.

You should be able to browse your new catalog within Athena from the Athena console and query the Slack Member Analytics API using SQL.

Creating a dataset that reads Slack member usage data and blends it with an HR dataset

As a prerequisite to query the Slack Member Analytics API from QuickSight, we must provide the proper permission for QuickSight to access the federated query data source in Athena. We do this directly from the QuickSight admin UI following these steps:

  1. As an admin, on the Admin menu, choose Manage QuickSight.
  2. Under Security & Permissions, choose QuickSight access to AWS services.
  3. Choose Add or Remove services.
  4. Select Athena.
  5. Choose Next when prompted to set the Amazon Simple Storage Service (Amazon S3) bucket and Lambda function permissions.

QuickSight browses the Athena catalogs and displays any Lambda functions associated with your account. If you don’t see a Lambda function, it means you haven’t mapped a data source within Athena.

  1. Select the function.
  2. Choose Finish.

Choose Finish.

When the Example Corp QuickSight environment has the proper permissions, analysts can query the Slack Analytics Member API using their existing Athena data source. For instructions on creating your own dataset, see Creating a Dataset Using Amazon Athena Data.

The custom connector appears as a new Catalog, Database, and Tables option.

  1. In QuickSight, on the Datasets page, choose New dataset.

In QuickSight, on the Datasets page, choose New dataset.

  1. Choose Athena as your data source.
  2. Choose Create dataset.

Choose Create dataset.

  1. Choose your table or, for this use case, choose Use custom SQL.

Choose your table or, for this use case, choose Use custom SQL.

For this analysis, we write a custom SQL that gets member activity for the last 30 days:

SELECT date,
       is_active,
       email_address,
       messages_posted_count
FROM   slackanalytics_catalog.slackanalytics.member_analytics
WHERE  date >= date_format(date_trunc('month',current_date),'%Y-%m-%d')

With the QuickSight cross data source join feature, analysts can enrich the Slack member stats with their HR info. For this use case, we imported a local HR_dataset.csv file containing the list of subscribed users with their respective Example Corp department, and joined them via the employee_email field.

With the QuickSight cross data source join feature, analysts can enrich the Slack member stats with their HR info.

The result is a dataset with Slack activity by employee and department. We’ve also updated the date field from a String type to a Date type using the QuickSight Data Prep page to take advantage of additional visualization features with Date type fields.

The result is a dataset with Slack activity by employee and department.

Creating a QuickSight dashboard that shows usage trends of provisioned vs. active users

Example Corp Analysts want to visualize the trend of provisioned users vs. active users and understand Slack adoption by department. To support these visualizations, we created the following calculated fields within our QuickSight analysis:

  • active distinct_countIf(employee,{is_active}='true')
  • provisioneddistinct_count(employee)

You can also create these calculated fields when you create your dataset. This way, you can reuse them in other QuickSight analyses. 

We use QuickSight narrative insights, a line chart, a bar chart, and a pivot table with conditional formatting to create the following analysis.

We use QuickSight narrative insights, a line chart, a bar chart, and a pivot table with conditional formatting to create the following analysis.

From this analysis, Example Corp can see that the adoption trend is positive; however, there is an adoption gap within the Marketing team. The program managers can engage the Marketing department leads and focus their training resources to improve their adoption.

From this analysis, Example Corp can see that the adoption trend is positive; however, there is an adoption gap within the Marketing team.

This dashboard can now be published to stakeholders within the organization as needed—either within the QuickSight app or embedded within existing enterprise applications. 

Conclusion

With the recent integration of QuickSight and Athena Federated Query, organizations can access additional data sources beyond those already supported by QuickSight. Analysts can leverage QuickSight capabilities to analyze and build dashboards that blend data from a variety of data sources, and with the Athena Query Federation SDK, you can build custom connectors to access relational, non-relational, object, and custom data endpoints using standard SQL.

To get started, try the lab Athena Deploying Custom Connector.


About the Author

Pablo Redondo SanchezPablo Redondo Sanchez is a Senior Solutions Architect at Amazon Web Services. He is a data enthusiast and works with customers to help them achieve better insights and faster outcomes from their data analytics workflows. In his spare time, Pablo enjoys woodworking and spending time outdoor with his family in Northern California.

 

 

 

How the Yahoo! JAPAN Smart Devices Team is improving voice user interfaces with Amazon QuickSight business intelligence

Post Syndicated from Kazuhide Fujita original https://aws.amazon.com/blogs/big-data/how-the-yahoo-japan-smart-devices-team-is-improving-voice-user-interfaces-with-amazon-quicksight-business-intelligence/

This is a guest blog post by Kazuhide Fujita, Product Manager at Yahoo! JAPAN.

Yahoo! JAPAN is a large internet search and media company, with Yahoo! JAPAN’s web portal being the one of the most commonly used websites in Japan. Our smart devices team is responsible for building and improving Yahoo! JAPAN apps for voice user interfaces (VUI) such as Amazon Alexa and Google Assistant. We see VUI as a market that will grow exponentially in the future, and we want to be ready to lead the consumer experience with such devices. In this post, I discuss how we’re using Amazon QuickSight business intelligence (BI) to help our product teams improve these services.

Enhanced access to insights at lower cost

To continuously improve our services, we use data to understand how consumers are interacting with the software and to identify growth trends. However, the data we get directly from smart device makers is limited. So, we built our own log system to capture more granular data, such as the types of commands customers are using, the time of day they use the application, and how frequently they use it.

Early on, we used Amazon Elasticsearch Service (Amazon ES) and Kibana to analyze data. Although this solution was very capable, it came at a higher price point than we were targeting. Another option was to export data directly to Microsoft Excel for ad hoc analysis. However, this was very time consuming and limited us to working with extracts of historical data rather than the latest information.

We decided to look for a solution that would suit the full spectrum of our needs while being cost-effective for our specific use case. While we were searching, our data team made the decision to standardize on a data lake architecture using Amazon Simple Storage Service (Amazon S3) and Amazon Athena. This approach provided a high level of flexibility and scalability. To visualize our data, it made sense to use QuickSight, the serverless BI solution with pay-per-session pricing on AWS.

Unifying data to understand customers better

This system has proven to be a good fit for our needs. The data lake allows us to accumulate different types of data from monitoring many KPIs and VUI products. For example, we might want to know the number of active users over a given period, and then drill down into how active those users were in the 2 weeks from when they registered. The data lake makes this possible. It’s easy to maintain even though the data is very diverse. For aggregating and performing calculations on the data, we use Athena because it provides optimal performance for complex queries thanks to the distributed computing model.

For ad hoc analysis, dashboards, and reporting, QuickSight connects seamlessly to our data lake. QuickSight makes it easy to view trends in customer behavior such as the time of usage, method of interaction, typical settings, and so on. The following screenshot shows a sample dashboard in QuickSight.

The following screenshot shows a sample dashboard in QuickSight.

For example, the default wake word for Alexa-powered devices is to say the name of the voice assistant: “Hey, Alexa.” However, Japanese customers may prefer to say “ohayō,” which means “good morning” in Japanese. Which setting customers prefer could be an important trend for us to know when we configure our offerings. With QuickSight, it’s easy to compare trends for this type of behavior across other user characteristics.

This is only one small example of the kinds of insights we glean by using QuickSight. Another use case is regarding initiatives to increase product usage through marketing or incentives. We can track the outcome of these programs using QuickSight by tracking whether they result in an uptick in usage relative to the communications we send out.

The freedom to focus on what matters to the business

One of the big advantages of using QuickSight and other AWS services is that we don’t have to worry about maintaining on-premises systems for our data lake and analytics. It’s easy to manage and we can focus on gaining insights and improving our products—not running data center infrastructure. Building our end-to-end data-to-insights pipeline on AWS ensures that we can easily apply security and governance policies to all our data.

Overall, QuickSight provides us with the flexibility to analyze all kinds of data quickly, so we can aim to be the market leader in the VUI marketplace. We’re excited to see what the future holds for this powerful tool—and to apply the knowledge we gain to improving our services.


About the Author

Kazuhide Fujita is the Skill Team Product Manager, Smart Device Division, at Yahoo Japan Corporation

How Edmunds GovTech unifies data and analytics data for municipalities with Amazon QuickSight

Post Syndicated from Edmunds GovTech original https://aws.amazon.com/blogs/big-data/how-edmunds-govtech-unifies-data-and-analytics-data-for-municipalities-with-amazon-quicksight/

This is a guest post from an Amazon QuickSight customer, Edmunds GovTech

Over the past 30 years, Edmunds GovTech has grown to provide enterprise resource planning (ERP) solutions to thousands of East Coast municipalities. We also serve cities and towns in 25 other states. In this blog, I’ll talk about how we used Amazon QuickSight embedded business intelligence (BI) to quickly bring powerful dashboards to our on-premises and cloud-based customers.

Unifying insights

Our customers rely on our suite of solutions to manage finances, personnel, revenue, and municipal management activities such as permits, land management, business licensing, and fleet maintenance. They can access a wide variety of reports and data analysis tools tailored to the needs of users in finance, operations, and other departments. Recent acquisitions have also added new capabilities to our offering, each with its own set of reporting tools.

These reports serve specialist users well. However, we wanted to add the ability to aggregate and visualize information in one easy-to-consume service. Time-starved executives, boards, and decision-makers needed a better way to gain key insights into spending trends and implement better cost and cash management strategies. They strive to better achieve the financial goals of their municipalities without having to spend time running reports in different areas of the solution.

Production-ready in record time

With this vision in place, our primary directive was speed to market, with the aim of releasing a production-ready solution in just 4 months. We carefully evaluated our priorities and functional requirements and, ultimately outsourcing infrastructure management was key. QuickSight, a fully managed, cloud-native BI service from AWS, was the only option that allowed us to deliver so quickly.

Just as importantly, our professional services team saves an extensive amount of time to implement and train customers. That means immediate value for the customer and more time for our professional services team to spend on other activities, increasing profitability. We sell the embedded dashboard service as a subscription-based add-on, so customers can easily purchase and use it.

Flexible and future-proof

Although many of our customers use traditional client/server configurations in their own data centers, our cloud-hosted solution is becoming increasingly popular, especially with increasing numbers of remote workers. We’re also developing a software as a service (SaaS) version of our suite and continue to acquire other vendors to add functionality. All these factors mean our QuickSight dashboard service needs to be platform-agnostic. It must work with any source application, whether in AWS or on premises.

We accomplished this using Amazon Simple Queue Service (Amazon SQS) and Amazon Simple Storage Service (Amazon S3). The source application emits events about finance accounts, vendors, and yearly budgets using Amazon SQS, with Amazon S3 available to ingest large messages that exceed the limits of Amazon SQS. We rely on AWS Lambda serverless functions to handle the ingestion and routing of the messages. Each customer has an individual reporting store, separate from the database of the source system.

This system transforms data from the customer’s system into a format that is normalized for QuickSight reporting. By pointing QuickSight at these schemas, we enable it to report on that data. The customer dashboard is embedded in the ERP application, so the customer doesn’t need to go to the QuickSight website to access it.

Any source application that can adhere to the messaging format can be reported on. The source system is responsible for the number-crunching, so any customizations the customer has applied are reflected in the reports.

The following diagram illustrates this architecture.

The following diagram illustrates this architecture.

The high-level architecture is as follows:

  1. Application sends JSON message to SQS queue. If the message is too large, it is added to an S3 bucket and the message contains a reference to the large message. Note this source can be any application as long as it produces messaging adhering to predefined JSON schema.
  1. Lambda consumer ingests batch of messages, validates payloads, and transforms payloads from JSON to tenant’s MYSQL Aurora reporting database that uses a star schema. The consumer can ingest small messages directly from SQS event or retrieve large messages from S3.
  1. QuickSight Namespace for tenant contains dashboard created from a master template that points to the appropriate reporting database.
  1. Source application requests dashboard on users behalf. Dashboard is embedded within the source application UI.

Because the system relies on Lambda functions, it’s a modern, decoupled architecture that is inherently future-proof and scalable. We don’t have to manage cloud or on-premises servers, and we only pay for what clients actually use.

Additionally, we were able to build a user interface that makes it easy to deploy new customers with just a few clicks. We use the installer to create the infrastructure for new clients using the AWS Command Line Interface (AWS CLI). The customer simply pushes a button from the source system to push data to the dashboard. They can be using the dashboard in less than an hour from start to finish.

Continuously increasing customer value

QuickSight has rolled out a lot of new features just in the short time we’ve been using it, and we’re already taking advantage of them. For example, QuickSight now remembers filter selections by user, so that the choices a user makes are remembered from session to session. That saves our customers time and effort and helps them get the information they need faster.

Embedded authoring is another significant feature that we’re looking forward to rolling out soon. As of this writing, we manage and maintain reporting templates for customers and push them out to clients using the AWS CLI. With the new embedded authoring capability of QuickSight, customers will be able to explore data in a self-service manner, perform ad hoc analysis, and create new dashboards. This will greatly increase the utility of the service while maintaining ease of use for customers and simplicity of management for our team. We’re also adopting the new namespace functionality to help customers maintain data separation from others in our multi-tenant solution.

Together today and tomorrow

Working with AWS has been a great experience. Our account representative has always been available for questions and feedback, which helped us succeed especially on such an accelerated timeframe. In addition to bringing QuickSight to our customers, we value the relationship we’ve developed with AWS and look forward to building on it as we move forward with our cloud solutions. Partnering with AWS has led to many benefits across our entire organization.

Marketing and sales teams in our organization are leading client demos with the QuickSight dashboard because it looks great and works seamlessly, and it’s something a lot of customers have been asking for. For department heads, executives, and other leaders, the ability to quickly visualize current and historical budget information is huge. They can also show their boards the information they need in a very easy-to-consume way. By giving customers one place to go for a high-level strategic view across their municipality, we’re helping them make better decisions and ultimately serve their constituents more effectively.


About the Author

Thomas Mancini is the VP, Concept Development at Edmunds GovTech

Centrally tracking dashboard lineage, permissions, and more with Amazon QuickSight administrative dashboards

Post Syndicated from Jesse Gebhardt original https://aws.amazon.com/blogs/big-data/centrally-tracking-dashboard-lineage-permissions-and-more-with-amazon-quicksight-administrative-dashboards/

This post is co-written with Shawn Koupal, an Enterprise Analytics IT Architect at Best Western International, Inc.

A common ask from Amazon QuickSight administrators is to understand the lineage of a given dashboard (what analysis is it built from, what datasets are used in the analysis, and what data sources do those datasets use). QuickSight APIs allow us to capture the metadata from each object and build a complete picture of the linkages between each object. As a QuickSight administrator, you can build a dashboard that displays the lineage from dashboard to data source, along with the permissions for each asset type. It can be helpful to see all permissions assigned to each of your assets as well as the relationships between them, all in one place.

Solution overview

In this solution, you build an end-to-end data pipeline using QuickSight to ingest data from an AWS Glue table.

The following diagram illustrates the architecture of the solution.

You can invoke the QuickSight APIs via the AWS Software Development Kit (AWS SDK) or the AWS Command Line Interface (AWS CLI). For this post, we use the AWS SDK.

The solution starts with an AWS Lambda function that calls the QuickSight list APIs (list_data_sources, list_data_sets, list_analyses, list_templates, and list_dashboards) depending on the event message to build lists of assets in chunks of 100, which are iterated through by a second Lambda function. The reason for splitting the work into two functions is to work around the 15-minute time limit in Lambda. You can schedule the Lambda function to run on each asset type based on an event rule trigger. See the following code:

import boto3
import os
import time
import datetime
import json
​
AWS_ACCOUNT_ID=os.environ['AWS_ACCOUNT_ID']
AWS_REGION=os.environ['AWS_REGION']
QS_S3_BUCKET=os.environ['QS_S3_BUCKET']
DownloaderFunctionName=os.environ['DownloaderFunctionName']
​
client = boto3.client('quicksight', region_name=AWS_REGION)
lambda_client = boto3.client('lambda')
​
def invoke_downloader(iteration, apicall, list_results):
​
    apicall=apicall.replace("list_data_sources","datasource").replace("list_data_sets","dataset").replace("list_analyses","analysis").replace("list_dashboards","dashboard").replace("list_templates","template")
    msg = {"Iteration": iteration, "api": apicall, "Objects":  list_results }
    invoke_response = lambda_client.invoke(FunctionName=DownloaderFunctionName,
                                           InvocationType='Event',
                                           Payload=json.dumps(msg, default=datetime_handler))
​
​
def datetime_handler(x):
    if isinstance(x, datetime.datetime):
        return x.isoformat()
    raise TypeError("Unknown type")
​
def file_cleanup(apicall):
    #Replace the apicall with the S3 folder name
    object_type=apicall.replace("list_data_sources","datasource").replace("list_data_sets","dataset").replace("list_analyses","analysis").replace("list_dashboards","dashboard").replace("list_templates","template")
    
    s3_path='quicksight_lineage/'+object_type+'/'
    s3_path2='quicksight_lineage/'+object_type+'_permissions/'
    fileprefix="QuickSight_"+object_type
    botoSession = boto3.Session (region_name = 'us-west-2')
    s3_session = botoSession.resource('s3')
    bucket = s3_session.Bucket(QS_S3_BUCKET)
    #Delete Any files with prefix in s3_path and s3_path2
    bucket.objects.filter(Prefix=s3_path+fileprefix).delete()
    bucket.objects.filter(Prefix=s3_path2+fileprefix).delete()
​
def lambda_handler(event, context):
​
​
    if event == {}:
        #Call All APIs assests 
        apicall_list=['list_data_sources','list_data_sets','list_analyses','list_dashboards','list_templates']
    elif  event["api"] == 'datasource':
        apicall_list=['list_data_sources']
    elif event["api"] == 'dataset':
        apicall_list=['list_data_sets']
    elif event["api"] == 'analysis':
        apicall_list=['list_analyses']
    elif event["api"] == 'dashboard':
        apicall_list=['list_dashboards']
    elif event["api"] == 'template':
        apicall_list=['list_templates']
    else:
        print("[WARN] Exception: Invalid Event Type.")
        return
    for apicall in apicall_list: 
        try:
            #Clean up files from previous run
            file_cleanup(apicall)
            #Reset variables for each apicall
            iteration=0
            user_token = None
            list_results={}
​
            while True:
                iteration+=1
                print("Calling ",apicall, iteration)
                
                if user_token is None:
                    exec("""results=client."""+apicall+"""(AwsAccountId='"""+AWS_ACCOUNT_ID+"""', MaxResults=100);""",globals(), list_results)
                else:
                    exec("""results=client."""+apicall+"""(AwsAccountId='"""+AWS_ACCOUNT_ID+"""', MaxResults=100,NextToken='"""+user_token+"""');""",globals(), list_results)
​
                invoke_downloader(iteration, apicall, list_results["results"])
                user_token=list_results["results"]["NextToken"]
                print(user_token)
        except KeyError:
            print("NextToken not found.")

The second Lambda function consumes the list of assets from the event parameter from the first function and uses the QuickSight describe APIs (describe_datasource, describe_dataset, describe_analysis, describe_template, and describe_dashboard). The details of each QuickSight asset are written to CSV files in an Amazon Simple Storage Service (Amazon S3) bucket in groups of 100. Because the first function calls the second function in parallel, it’s recommended to set the reserved concurrency to 2 in the second Lambda function to avoid throttling errors (if you use the AWS CloudFormation template provided later in this post, this is automatically configured for you). See the following code:

import boto3
import os
import time
import datetime
import json

AWS_ACCOUNT_ID=os.environ['AWS_ACCOUNT_ID']
AWS_REGION=os.environ['AWS_REGION']
QS_S3_BUCKET=os.environ['QS_S3_BUCKET']

client = boto3.client('quicksight', region_name=AWS_REGION)
lambda_client = boto3.client('lambda')
s3 = boto3.client('s3')

def process_dashboards(list_dashboard,iteration,object_type):
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("DashboardId,Name,SourceEntityArn,VersionCreatedTime,VersionNumber,CreatedTime,DataSetArns,LastPublishedTime,LastUpdatedTime" + '\n')

    for dashboard in list_dashboard["DashboardSummaryList"]:
        dashboard_desc= client.describe_dashboard(AwsAccountId=AWS_ACCOUNT_ID,DashboardId=dashboard["DashboardId"])
        
        source_entity_arn = dashboard_desc["Dashboard"]["Version"]["SourceEntityArn"]
        version_created_time = dashboard_desc["Dashboard"]["Version"]["CreatedTime"].isoformat()
        version_number = str(dashboard_desc["Dashboard"]["Version"]["VersionNumber"])
        created_time = dashboard_desc["Dashboard"]["CreatedTime"].isoformat()

        last_published_time = dashboard_desc["Dashboard"]["LastPublishedTime"].isoformat()
        last_updated_time = dashboard_desc["Dashboard"]["LastUpdatedTime"].isoformat()
        try:
            for arn in dashboard_desc["Dashboard"]["Version"]["DataSetArns"]:
                f.write(dashboard["DashboardId"]+',"'+ dashboard["Name"] + '",' + source_entity_arn + ',' + version_created_time + ',' + version_number + ','   + created_time + ','+ arn + ',' + last_published_time + ',' + last_updated_time +'\n')
        except Exception as e:
            print(e)
            dataset_arn=''
            f.write(dashboard["DashboardId"]+',"'+ dashboard["Name"] + '",' + source_entity_arn + ',' + version_created_time + ',' + version_number + ','   + created_time + ','+ dataset_arn + ',' + last_published_time + ',' + last_updated_time +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            
        

def process_dashboards_permissions(list_dashboard,iteration,object_type):

    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("DashboardId,Name,Principal,Permission,Iteration" + '\n')
	
    for dashboard in list_dashboard["DashboardSummaryList"]:

        try:
            list_permissions = client.describe_dashboard_permissions(AwsAccountId=AWS_ACCOUNT_ID,DashboardId=dashboard["DashboardId"])
        except:
            print("Error Listing Permissions for:"+dashboard["DashboardId"])
            continue

        for permission in list_permissions["Permissions"]:
            #If Action includes delete operation then principal has co-owner permissions
            if "quicksight:DeleteDashboard" in permission["Actions"]:
                action = "Co-Owner"
            else:
                action = "View"

            f.write(dashboard["DashboardId"]+',"'+ dashboard["Name"] + '",' + permission["Principal"] +  ',' + action +  ',' + iteration +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)      


def process_analysis(list_analyses,iteration,object_type):
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("AnalysisId,Name,AnalysisArn,CreatedTime,LastUpdatedTime,DataSetArn,Iteration" + '\n')

    for analysis in list_analyses["AnalysisSummaryList"]:
        #Call describe_analysis
        analysis_desc= client.describe_analysis(AwsAccountId=AWS_ACCOUNT_ID,AnalysisId=analysis["AnalysisId"])

        analysis_arn = analysis_desc["Analysis"]["Arn"]
        created_time = analysis_desc["Analysis"]["CreatedTime"].isoformat()
        last_updated_time = analysis_desc["Analysis"]["LastUpdatedTime"].isoformat()

        try:
            for arn in analysis_desc["Analysis"]["DataSetArns"]:
                f.write(analysis["AnalysisId"]+',"'+ analysis["Name"] + '",' + analysis_arn + ',' + created_time + ','  + last_updated_time + ',' + arn + ',' + iteration  +'\n')
        except Exception as e:
            print(e)
            dataset_arn=''
            f.write(analysis["AnalysisId"]+',"'+ analysis["Name"] + '",' + analysis_arn + ',' + created_time + ','  + last_updated_time + ',' + dataset_arn  + ',' + iteration +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            

        
def process_analysis_permissions(list_analyses,iteration,object_type):

    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("AnalysisId,Name,Principal,Permission,Iteration" + '\n')
	
    for analysis in list_analyses["AnalysisSummaryList"]:

        try:
            list_permissions = client.describe_analysis_permissions(AwsAccountId=AWS_ACCOUNT_ID,AnalysisId=analysis["AnalysisId"])
        except:
            print("Error Listing Permissions for:"+analysis["AnalysisId"])
            continue
        for permission in list_permissions["Permissions"]:
            #If Action includes delete operation then principal has co-owner permissions
            if "quicksight:DeleteAnalysis" in permission["Actions"]:
                action = "Co-Owner"
            else:
                action = "View"

            f.write(analysis["AnalysisId"]+',"'+ analysis["Name"] + '",' + permission["Principal"] +  ',' + action +  ',' + iteration +'\n')
    
    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)      


def process_templates(list_templates,iteration,object_type):
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("TemplateId,Name,TemplateArn,CreatedTime,LastUpdatedTime,SourceEntityArn,VersionNumber,Iteration" + '\n')

    for template in list_templates["TemplateSummaryList"]:
        #Call describe_template
        template_desc= client.describe_template(AwsAccountId=AWS_ACCOUNT_ID,TemplateId=template["TemplateId"])

        template_arn = template_desc["Template"]["Arn"]
        created_time = template_desc["Template"]["CreatedTime"].isoformat()
        last_updated_time = template_desc["Template"]["LastUpdatedTime"].isoformat()
        source_entity_arn = template_desc["Template"]["Version"]["SourceEntityArn"]
        version_number = str(template_desc["Template"]["Version"]["VersionNumber"])
        f.write(template["TemplateId"]+',"'+ template["Name"] + '",' + template_arn + ',' + created_time + ','  + last_updated_time + ',' + source_entity_arn + ',' + version_number +  ',' + iteration  +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            

        
def process_templates_permissions(list_templates,iteration,object_type):

    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("TemplateId,Name,Principal,Permission,Iteration" + '\n')
	
    for template in list_templates["TemplateSummaryList"]:

        try:
            list_permissions = client.describe_template_permissions(AwsAccountId=AWS_ACCOUNT_ID,TemplateId=template["TemplateId"])
        except:
            print("Error Listing Permissions for:"+template["TemplateId"])
            continue
        for permission in list_permissions["Permissions"]:
            #If Action includes delete operation then principal has co-owner permissions
            if "quicksight:DeleteTemplate" in permission["Actions"]:
                action = "Co-Owner"
            else:
                action = "View"

            f.write(template["TemplateId"]+',"'+ template["Name"] + '",' + permission["Principal"] +  ',' + action +  ',' + iteration +'\n')
    
    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)      


def process_datasources(list_data_sources,iteration,object_type):
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("DataSourceId,DataSourceArn,Name,Type,LastUpdatedTime,CreatedTime,Status,ErrorInfo,Iteration" + '\n')

    global datasource_list
    datasource_list=[]
    for datasource in list_data_sources["DataSources"]:
        datasource_id=datasource["DataSourceId"]
        name=datasource["Name"]
        datasource_type=datasource["Type"]
        try:
            status=datasource["Status"]
        except:
            status=''
        CreatedTime=str(datasource["CreatedTime"])
        LastUpdatedTime=str(datasource["LastUpdatedTime"])
        try:
            ErrorInfo="Type: "+datasource["ErrorInfo"]["Type"]+" Message: "+datasource["ErrorInfo"]["Message"]
        except:
            ErrorInfo="Null"

        f.write( datasource_id + ',' + datasource["Arn"] + ',"' + name + '",'  + datasource_type + ',' + LastUpdatedTime+ ',' + CreatedTime + ',' + status + ',' + ErrorInfo+ ',' + iteration +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            


def process_datasources_permissions(list_data_sources,iteration,object_type):
    
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write("DataSourceID,Name,Principal,Permission,Iteration" + '\n')

    for datasource in list_data_sources["DataSources"]:
        try:
            list_permissions = client.describe_data_source_permissions(AwsAccountId=AWS_ACCOUNT_ID,DataSourceId=datasource["DataSourceId"])
        except:
            print("Error Listing Permissions for:"+datasource["DataSourceId"])
            continue
        for permission in list_permissions["Permissions"]:
            #If Action includes delete operation then principal has co-owner permissions
            if "quicksight:DeleteDataSource" in permission["Actions"]:
                action = "Co-Owner"
            else:
                action = "View"
                
            f.write(datasource["DataSourceId"]+',"'+ datasource["Name"] + '",' + permission["Principal"] +  ',' + action +  ',' + iteration +'\n')
    
    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            
    

def process_datasets(list_datasets,iteration,object_type):

    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    #CSV Header
    f.write('DatasetId,DataSetArn,Name,SpiceSize,ImportMode,LastUpdatedTime,CreatedTime,DataSourceArn,DataSourceName,DataSourceType,Source,Columns,Iteration' + '\n')
    
    for dataset in list_datasets["DataSetSummaries"]:
        
        try:
            response= client.describe_data_set(AwsAccountId=AWS_ACCOUNT_ID,DataSetId=dataset["DataSetId"])
        except Exception as e:
            print("Dataset ID: ", dataset["DataSetId"], e)
            f.write( dataset["DataSetId"] + ',' + dataset["Arn"] + ',"' + dataset["Name"] + '",' + '0' + ',' + dataset["ImportMode"] + ',' + str(dataset["LastUpdatedTime"])+ ','+ str(dataset["CreatedTime"])+ ',' + 'n/a' + ',"' + 'n/a' + '",' +  'n/a'  + ',' +  'n/a' + ',"'  + 'n/a'+ '",' + iteration +'\n')
            continue

        dataset_id=response["DataSet"]["DataSetId"]
        dataset_name=response["DataSet"]["Name"]
        dataset_size=response["DataSet"]["ConsumedSpiceCapacityInBytes"]
        ImportMode=response["DataSet"]["ImportMode"]
        LastUpdatedTime=response["DataSet"]["LastUpdatedTime"].isoformat()
        CreatedTime=response["DataSet"]["CreatedTime"].isoformat()

        try:
            for key in response["DataSet"]["PhysicalTableMap"].keys():
                
                if key == 's3PhysicalTable':
                    
                    source='S3Source'
                    DataSourceArn=response["DataSet"]["PhysicalTableMap"]["s3PhysicalTable"]["S3Source"]["DataSourceArn"]
                    Columns=response["DataSet"]["PhysicalTableMap"]["s3PhysicalTable"]["S3Source"]["InputColumns"]
                    #SqlQuery="Null"

                else:

                    try:
                        DataSourceArn=response["DataSet"]["PhysicalTableMap"][key]["RelationalTable"]["DataSourceArn"]
                        Columns=""
                        source="VisualEditor"
                    except:
                        DataSourceArn=response["DataSet"]["PhysicalTableMap"][key]["CustomSql"]["DataSourceArn"]
                        Columns=response["DataSet"]["PhysicalTableMap"][key]["CustomSql"]["Columns"]
                        source="CustomSql"

                DataSourceName=""
                DataSourceType=""
                
                f.write( dataset_id + ',' + dataset["Arn"] + ',"' + dataset_name + '",' + str(dataset_size) + ',' + ImportMode + ',' + LastUpdatedTime+ ','+ CreatedTime+ ',' + DataSourceArn + ',"' + DataSourceName + '",' +  DataSourceType  + ',' +  source + ',"'  + str(Columns) + '",' + iteration +'\n')
                
        except:
            print("[DEBUG]: Exception in main write for: " + str(dataset))
            f.write( dataset_id  + ',' + dataset["Arn"] +',"' + dataset_name + '",' + str(dataset_size) + ',' + ImportMode + ',' + LastUpdatedTime+ ',' + CreatedTime + ',,,,Unknown,"'  + str(Columns) + '",' + iteration +'\n')

    f.close()
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)


def process_datasets_permissions(list_datasets,iteration,object_type):
    
    filename="QuickSight_"+object_type+"_"+iteration+".csv"
    filePath = os.path.join("/tmp",filename)
    f=open(filePath,"w")
    f.write('DataSetID,Name,Principal,Permission,Iteration'+'\n')

    for dataset in list_datasets["DataSetSummaries"]:
        try:
            list_permissions = client.describe_data_set_permissions(AwsAccountId=AWS_ACCOUNT_ID,DataSetId=dataset["DataSetId"])
        except:
            print("Error Listing Permissions for:"+dataset["DataSetId"])
            continue
        
        for permission in list_permissions["Permissions"]:
            #If Action includes delete operation then principal has co-owner permissions
            if "quicksight:DeleteDataSet" in permission["Actions"]:
                action = "Co-Owner"
            else:
                action = "View"
                
            f.write(dataset["DataSetId"]+',"'+ dataset["Name"] + '",' + permission["Principal"] +  ',' + action+  ',' + iteration +'\n')

    f.close()        
    s3_path='quicksight_lineage/'+object_type+'/'
    s3.upload_file("{}/{}".format("/tmp",  filename), QS_S3_BUCKET, s3_path + filename)            


def lambda_handler(event, context):

    list_objects=event["Objects"]
    iteration=str(event["Iteration"])
    
    print("Iteration: ", iteration)
    print("[INFO]Processing QuickSight:", event["api"] )
    
    if  event["api"] == 'datasource':
        process_datasources(list_objects, iteration, event["api"])
        process_datasources_permissions(list_objects, iteration, event["api"]+'_permissions')
    elif event["api"] == 'dataset':
        process_datasets(list_objects, iteration, event["api"])
        process_datasets_permissions(list_objects, iteration, event["api"]+'_permissions')
    elif event["api"] == 'analysis':
        process_analysis(list_objects, iteration, event["api"])
        process_analysis_permissions(list_objects, iteration, event["api"]+'_permissions')
    elif event["api"] == 'dashboard':
        process_dashboards(list_objects, iteration, event["api"])
        process_dashboards_permissions(list_objects, iteration, event["api"]+'_permissions')
    elif event["api"] == 'template':
        process_templates(list_objects, iteration, event["api"])
        process_templates_permissions(list_objects, iteration, event["api"]+'_permissions')
    else:
        print("[WARN] Exception: Invalid Event Type.")
        return

Afterwards, the S3 bucket has the directory structure under the quicksight_lineage folder as shown in the following screenshot.

You then use AWS Glue to store the metadata of each file in an AWS Glue table, which allows you to query the information from QuickSight using an Amazon Athena or Amazon Redshift Spectrum data source (if you run the CloudFormation stack, the tables are set up for you).

The following diagram shows the tables and relationships.

Walkthrough overview

The workflow is comprised of the following high-level steps:

  1. Deploy the CloudFormation template to build the Lambda functions, AWS Identity and Access Management (IAM) roles, S3 bucket, AWS Glue database, and AWS Glue tables.
  2. Run the Python Lambda functions to build CSV files that contain the QuickSight object details.
  3. Visualize the data in QuickSight. To do so, you must create your data source, dataset, and then analysis.

For this post, we use Athena as the query engine. To use Redshift Spectrum, you must modify the provided queries.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account
  • An IAM user with access to AWS resources used in this solution (CloudFormation, IAM, Amazon S3, AWS Glue, Athena, QuickSight)
  • Athena configured with a query result location
  • QuickSight Enterprise Edition

Creating resources

Create your resources by launching the following CloudFormation stack:

During the stack creation process, you must provide an S3 bucket name in the S3BucketName parameter (AWSAccountNumber is appended to the bucket name provided to make it unique).

After the stack creation is successful, you have two Lambda functions, two S3 buckets, an AWS Glue database and tables, and the corresponding IAM roles and policies.

Running the Lambda function

To run your Lambda function, complete the following steps:

  1. On the Lambda console, navigate to the QuickSight-Lineage-Dispatcher function.
  2. From the Select a test event menu, choose Configure test events.

  1. Select Create new test event.

You create one test event for all QuickSight assets.

  1. For Event name, enter all.
  2. Enter an empty JSON object ({}).

  1. Choose Test to run the Lambda function and generate CSV files of the assets.

Alternatively, you can create test events for each QuickSight object (Data Source, DataSet, Analysis, Dashboard, and Template) for larger QuickSight environments:

  • Test event DataSource code:
    {
      "api": "datasource"
    }

  • Test event DataSet code:
    {
      "api": "dataset"
    }

  • Test event Analysis code:
    {
      "api": "analysis"
    }

  • Test event Dashboard code:
    {
      "api": "dashboard"
    }

  • Test event Template code:
    {
      "api": "template"
    }

The following screenshot shows the configuration of a test event for Analysis.

Creating your data source and lineage data set

In this step, you use QuickSight to access the tables in your AWS Glue database.

  1. Log in to QuickSight.
  2. Choose Manage QuickSight.
  3. Choose Security & permissions.
  4. Ensure that access to the S3 bucket (that was created through CloudFormation) is enabled.
  5. Choose New analysis.
  6. Choose New dataset.
  7. For the data source, choose Athena.

  1. For your data source name, enter QuickSight-Lineage.
  2. Choose Create data source.

QuickSight prompts you to select your schema or database.

  1. Choose Use custom SQL.

  1. Update the query name from New custom SQL to QuickSight Lineage.
  2. Enter the following code into the query box:
    select 
       a.analysisid      as analysis_id,
       a.name            as analysis_name,
       a.analysisarn     as analysis_arn,
       date_parse(substr(a.createdtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')     as analysis_createdtime,
       date_parse(substr(a.lastupdatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as analysis_lastupdatedtime,
       a.datasetarn      as analysis_datasetarn,
       r.dashboardid        as dashboard_id,
       r.name               as dashboard_name,
       r.name||' - ID: '||r.dashboardid as dashboard_name_w_id,
       date_parse(substr(r.versioncreatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as dashboard_versioncreatedtime,
       r.versionnumber      as dashboard_versionnumber     ,
       date_parse(substr(r.createdtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')  as dashboard_createdtime,
       date_parse(substr(r.lastpublishedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as dashboard_lastpublishedtime ,
       date_parse(substr(r.lastupdatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as dashboard_lastupdatedtime,
       d.datasetid        as dataset_id,
       d.datasetarn       as dataset_arn,
       d.name             as dataset_name,
       d.spicesize        as dataset_spicesize,
       d.importmode       as dataset_importmode,
       date_parse(substr(d.lastupdatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')  as dataset_lastupdatedtime,
       date_parse(substr(d.createdtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')      as dataset_createdtime,
       d.source           as dataset_source,
       d.columns          as dataset_columns,
       s.datasourceid     as datasource_id,
       s.datasourcearn    as datasource_arn,
       s.name             as datasource_name,
       s.type             as datasource_type,
       date_parse(substr(s.lastupdatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as datasource_lastupdatedtime,
       date_parse(substr(s.createdtime,1,26),'%Y-%m-%dT%H:%i:%s.%f') as datasource_createdtime,
       s.status           as datasource_status,
       s.errorinfo        as datasource_errorinfo,
       t.templateid       as template_id,
       t.name             as template_name,
       t.templatearn      as template_arn,
       date_parse(substr(t.createdtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')      as template_createtime,
       date_parse(substr(t.lastupdatedtime,1,26),'%Y-%m-%dT%H:%i:%s.%f')  as template_lastupdatedtime
    from "quicksight-lineage"."dashboard" r
    left join  "quicksight-lineage"."analysis" a
    on a.analysisarn = r.sourceentityarn and a.datasetarn=r.datasetarns
    left join "quicksight-lineage"."template" t
    on t.templatearn = r.sourceentityarn
    left join  "quicksight-lineage"."dataset" d
    on d.datasetarn = r.datasetarns
    left join  "quicksight-lineage"."datasource" s
    on s.datasourcearn = d.datasourcearn

  1. Choose Confirm query.

  1. Select Import to SPICE for quicker analytics.
  2. Choose Visualize.

In the new analysis, one empty visual is loaded by default.

  1. Change the visual type to pivot table.
  2. Choose (single-click) dashboard_name, analysis_name, template_name, dataset_name, and datasource_name in the Fields list.

You can search for name in field list to make this step easier

  1. Confirm that all fields were also added to the Rows

If you have assets with duplicates names, it can helpful to add the corresponding ID columns to the visual; for example, dashboard_id, analysis_id, template_id, dataset_id, datasource_id.

Visualizing your assets and lineage

You now create five new visuals, one for each asset type (Dashboard, Analysis, Template, Dataset, Data Source), to display the additional columns pulled from the APIs.

  1. From the Add menu, choose Add visual.

  1. For the first new visual, choose the table visual type.
  2. Search for dashboard_ in Field List.
  3. Choose (single-click) all matching columns.

  1. For the second visual, choose the table visual type.
  2. Search for analysis_ in the Field List.
  3. Choose (single-click) all matching columns.
  4. Move the second visual underneath the first visual.
  5. Repeat same steps for template_, dataset_, and datasource_.

Creating your permissions data set

You now create your new data set.

  1. Leave the analysis by choosing the QuickSight logo on the top left.
  2. In the navigation pane, choose Datasets.
  3. Choose New dataset.
  4. Locate and choose the QuickSight-Lineage data source created earlier in the FROM EXISTING DATA SOURCES
  5. In the QuickSight Lineage data source window, choose Create data set.
  6. Choose Use custom SQL.

  1. Update the name from New custom SQL to QuickSight Lineage Permissions.
  2. Enter the following code into the query box:
    select distinct 'datasource' as QuickSightObjectType, sp.datasourceid as "QuickSightID",sp.name,
    split_part(principal,':',6) as principal,permission, r.name||' - ID: '||r.dashboardid as dashboard_name_w_id
     from "quicksight-lineage"."datasource_permissions"  sp
     inner join  "quicksight-lineage"."datasource" s
      on s.datasourceid = sp.datasourceid
     left join  "quicksight-lineage"."dataset" d
       on s.datasourcearn = d.datasourcearn
     left join  "quicksight-lineage"."dashboard" r
       on d.datasetarn = r.datasetarns
    union
    select distinct 'dataset' as QuickSightObjectType, dp.datasetid as "QuickSightID",dp.name,
    split_part(principal,':',6) as principal,permission, r.name||' - ID: '||r.dashboardid as dashboard_name_w_id 
     from "quicksight-lineage"."dataset_permissions" dp
     inner join  "quicksight-lineage"."dataset" d
       on d.datasetid = dp.datasetid
     left join  "quicksight-lineage"."dashboard" r
       on d.datasetarn = r.datasetarns
    union
    select distinct 'analysis' as QuickSightObjectType, ap.analysisid as "QuickSightID",ap.name,
    split_part(principal,':',6) as principal,permission, r.name||' - ID: '||r.dashboardid as dashboard_name_w_id  
     from "quicksight-lineage"."analysis_permissions" ap
      inner join  "quicksight-lineage"."analysis" a
       on a.analysisid = ap.analysisid
      left join  "quicksight-lineage"."dashboard" r
       on a.analysisarn = r.sourceentityarn  
    union
    select distinct 'template' as QuickSightObjectType, tp.templateid as "QuickSightID",tp.name,
    split_part(principal,':',6) as principal,permission, r.name||' - ID: '||r.dashboardid as dashboard_name_w_id  
     from "quicksight-lineage"."template_permissions" tp
      inner join  "quicksight-lineage"."template" t
       on t.templateid = tp.templateid
      left join  "quicksight-lineage"."dashboard" r
       on t.templatearn = r.sourceentityarn     
    union
    select distinct 'dashboard' as QuickSightObjectType, dashboardid as "QuickSightID",name,
    split_part(principal,':',6) as principal,permission, name||' - ID: '||dashboardid as dashboard_name_w_id
     from "quicksight-lineage"."dashboard_permissions"

  1. Choose Edit / Preview data.
  2. Choose Apply.
  3. For Query mode, select SPICE.

  1. Choose Save.
  2. Navigate to the Analyses page and open the analysis created earlier.
  3. Choose the pencil icon to add the new dataset.

  1. Choose Add data set.

  1. Select QuickSight Lineage Permissions.
  2. Choose Select.

  1. Make sure the new QuickSight Lineage Permissions dataset is active in the Data set drop-down menu.

Visualizing your permissions

You now add a new visual to display permissions. 

  1. Choose the table visual type.
  2. Choose (single-click) name, principal, and permission in the Field List.
  3. In the navigation pane, choose Filter.
  4. Choose +.
  5. Choose quicksightobjecttype.

  1. Choose the new filter.
  2. Deselect Select all.
  3. Select dashboard.
  4. Choose Apply.

  1. Choose Close.
  2. Move the new permissions visual so it’s to the right of the dashboard visual.

 

  1. On the new permissions visual, choose the menu options (…).
  2. Choose Duplicate visual.
  3. Repeat this step four times.
  4. Modify the quicksightobjectype filter on each new permission visual so you have one visual for each asset type.
  5. Move the visual to the right of the corresponding asset type visual.

  

Creating parameters for filtering

At this point all the visuals are created; next you need to create a parameter. You can simplify the following steps by using the new simplified filter control creation process. For more information, see Amazon QuickSight adds support for on-sheet filter controls. The following steps still work fine, but to add filter controls to an analysis, you don’t need to create parameters anymore. 

  1. Navigate to the Parameters menu.
  2. Choose Create one
  3. For Name, enter DashboardNameWithID.
  4. Choose Create.

 

  1. Choose Create a new control for a filter or a calculated field.
  2. For Display name, enter Dashboard Name with ID.
  3. For Style, choose Single select drop down.
  4. For Values, select Link to a data set field.
  5. For Select a data set, choose QuickSight Lineage Permissions.
  6. For Select a column, choose dashboard_name_w_id.
  7. Choose Add.

  1. Choose the first visual (Count of Records by Dashboard_name, Template_name, Dataset_name, Datasource_name, and Analysis_name).
  2. Add a filter in the dashboard_name_w_id field.
  3. Choose the newly added filter.
  4. Set the filter scope to All visuals.
  5. For Filter type, choose Custom filter.
  6. Select Use parameters.
  7. From the drop-down menu, choose DashboardNameWithId.
  8. Choose Apply.
  9. Choose Close.

  1. Choose the first permissions visual (Permission, Principal, and Name).
  2. Add a filter in the dashboard_name_w_id field.
  3. Set the filter scope to All visuals.
  4. For Filter type, choose Custom filter.
  5. Select Use parameters.
  6. From the drop-down menu, choose DashboardNameWithID.
  7. Choose Apply.
  8. Choose Close.

The analysis build is complete and can be published as a dashboard.

Creating additional visuals

You can also create additional visuals for different use cases.

Visualizing SPICE usage across all your SPICE datasets

To visualize Spice usage across your SPICE datasets, complete the following steps.

  1. Use the QuickSight Lineage dataset and choose the donut chart visual.
  2. For Group/Color, add dataset_name.
  3. For Value, add dataset_spicesize.
  4. Change the aggregation of dataset_spicesize to Average because a dataset can be listed multiple times in the dataset if it is reused across multiple dashboards.

This visual can be useful to track down what is consuming SPICE storage.

Visualizing SPICE refreshes by hour

To visualize SPICE refreshes by hour, complete the following steps:

  1. Use the QuickSight Lineage dataset to create a vertical stacked bar chart.
  2. For X axis, add dataset_lastupdatetime aggregated by HOUR.
  3. For Value, add dataset_id aggregated by Count district.
  4. For Group/Color, add dataset_name.
  5. Create a filter on dataset_importmode equal to SPICE.

This visual can be useful to see when all the SPICE dataset refreshes last occurred. The source data is a snapshot in time, so you need to update the source data by running the Lambda function on a regular basis.

Cleaning up

To avoid incurring future charges, delete the resources you created in this walkthrough by deleting the CloudFormation stack. Also, be sure to delete the analysis and dataset (to free up SPICE usage).

Conclusion

We also created some visuals to display SPICE usage by data set as well as the last refresh time per data set, allowing you to view the health of your SPICE refreshes and to free up SPICE capacity by cleaning up older data sets.

Give this technique of building administrative dashboards from data collected via the QuickSight APIs a try, and share you feedback and questions in the comments.


About the Authors

Jesse Gebhardt is a senior global business development manager focused on analytics. He has spent over 10 years in the Business Intelligence industry. At AWS, he aids customers around the globe gain insight and value from the data they have stored in their data lakes and data warehouses. Jesse lives in sunny Phoenix, and is an amateur electronic music producer.

 

 

Arun Santhosh is a Specialized World Wide Solution Architect for Amazon QuickSight. Arun started his career at IBM as a developer and progressed on to be an Application Architect. Later, he worked as a Technical Architect at Cognizant. Business Intelligence has been his core focus in these prior roles as well.

 

 

Shawn Koupal is an Enterprise Analytics IT Architect at Best Western International, Inc.

 

Accessing and visualizing data from multiple data sources with Amazon Athena and Amazon QuickSight

Post Syndicated from Saurabh Bhutyani original https://aws.amazon.com/blogs/big-data/accessing-and-visualizing-data-from-multiple-data-sources-with-amazon-athena-and-amazon-quicksight/

Amazon Athena now supports federated query, a feature that allows you to query data in sources other than Amazon Simple Storage Service (Amazon S3). You can use federated queries in Athena to query the data in place or build pipelines that extract data from multiple data sources and store them in Amazon S3. With Athena Federated Query, you can run SQL queries across data stored in relational, non-relational, object, and custom data sources. Athena queries including federated queries can be run from the Athena console, a JDBC or ODBC connection, the Athena API, the Athena CLI, the AWS SDK, or AWS Tools for Windows PowerShell.

The goal for this post is to discuss how we can use different connectors to run federated queries with complex joins across different data sources with Athena and visualize the data with Amazon QuickSight.

Athena Federated Query

Athena uses data source connectors that run on AWS Lambda to run federated queries. A data source connector is a piece of code that translates between your target data source and Athena. You can think of a connector as an extension of the Athena query engine. Prebuilt Athena data source connectors exist for data sources like Amazon CloudWatch Logs, Amazon DynamoDB, Amazon DocumentDB (with MongoDB compatibility), Amazon Elasticsearch Service (Amazon ES), Amazon ElastiCache for Redis, and JDBC-compliant relational data sources such as MySQL, PostgreSQL, and Amazon Redshift under the Apache 2.0 license. You can also use the Athena Query Federation SDK to write custom connectors. After you deploy data source connectors, the connector is associated with a catalog name that you can specify in your SQL queries. You can combine SQL statements from multiple catalogs and span multiple data sources with a single query.

When a query is submitted against a data source, Athena invokes the corresponding connector to identify parts of the tables that need to be read, manages parallelism, and pushes down filter predicates. Based on the user submitting the query, connectors can provide or restrict access to specific data elements. Connectors use Apache Arrow as the format for returning data requested in a query, which enables connectors to be implemented in languages such as C, C++, Java, Python, and Rust. Because connectors run in Lambda, you can use them to access data from any data source in the cloud or on premises that is accessible from Lambda.

Prerequisites

Before creating your development environment, you must have the following prerequisites:

Configuring your data source connectors

After you deploy your CloudFormation stack, follow the instructions in the post Extracting and joining data from multiple data sources with Athena Federated Query to configure various Athena data source connectors for HBase on Amazon EMR, DynamoDB, ElastiCache for Redis, and Amazon Aurora MySQL.

You can run Athena federated queries in the AmazonAthenaPreviewFunctionality workgroup created as part of the CloudFormation stack or you could run them in the primary workgroup or other workgroups as long as you’re running with Athena engine version 2. As of this writing, Athena Federated Query is generally available in the Asia Pacific (Mumbai), Asia Pacific (Tokyo), Europe (Ireland), US East (N. Virginia), US East (Ohio), US West (N. California), and US West (Oregon) Regions. If you’re running in other Regions, use the AmazonAthenaPreviewFunctionality workgroup.

For information about changing your workgroup to Athena engine version 2, see Changing Athena Engine Versions.

Configuring QuickSight

The next step is to configure QuickSight to use these connectors to query data and visualize with QuickSight.

  1. On the AWS Management Console, navigate to QuickSight.
  2. If you’re not signed up for QuickSight, you’re prompted with the option to sign up. Follow the steps to sign up to use QuickSight.
  3. After you log in to QuickSight, choose Manage QuickSight under your account.

After you log in to QuickSight, choose Manage QuickSight under your account.

  1. In the navigation pane, choose Security & permissions.
  2. Under QuickSight access to AWS services, choose Add or remove.

Under QuickSight access to AWS services, choose Add or remove.

A page appears for enabling QuickSight access to AWS services.

  1. Choose Athena.

Choose Athena.

  1. In the pop-up window, choose Next.

In the pop-up window, choose Next.

  1. On the S3 tab, select the necessary S3 buckets. For this post, I select the athena-federation-workshop-<account_id> bucket and another one that stores my Athena query results.
  2. For each bucket, also select Write permission for Athena Workgroup.

For each bucket, also select Write permission for Athena Workgroup.

  1. On the Lambda tab, select the Lambda functions corresponding to the Athena federated connectors that Athena federated queries use. If you followed the post Extracting and joining data from multiple data sources with Athena Federated Query when configuring your Athena federated connectors, you can select dynamo, hbase, mysql, and redis.

For information about registering a data source in Athena, see the appendix in this post.

  1. Choose Finish.

Choose Finish.

  1. Choose Update.
  2. On the QuickSight console, choose New analysis.
  3. Choose New dataset.
  4. For Datasets, choose Athena.
  5. For Data source name, enter Athena-federation.
  6. For Athena workgroup, choose primary.
  7. Choose Create data source. 

As stated earlier, you can use the AmazonAthenaPreviewFunctionality workgroup or another workgroup as long as you’re running Athena engine version 2 in a supported Region.

You can use the AmazonAthenaPreviewFunctionality workgroup or another workgroup as long as you’re running Athena engine version 2 in a supported Region.

  1. For Catalog, choose the catalog that you created for your Athena federated connector.

For information about creating and registering a data source in Athena, see the appendix in this post.

For information about creating and registering a data source in Athena, see the appendix in this post.

  1. For this post, I choose the dynamo catalog, which does a federation to the Athena DynamoDB connector.

For this post, I choose the dynamo catalog, which does a federation to the Athena DynamoDB connector.

I can now see the database and tables listed in QuickSight.

  1. Choose Edit/Preview data to see the data.
  2. Choose Save & Visualize to start using this data for creating visualizations in QuickSight.

22. Choose Save & Visualize to start using this data for creating visualizations in QuickSight.

  1. To do a join with another Athena data source, choose Add data and select the catalog and table.
  2. Choose the join link between the two datasets and choose the appropriate join configuration.
  3. Choose Apply.

Choose Apply

You should be able to see the joined data.

You should be able to see the joined data.

Running a query in QuickSight

Now we use the custom SQL option in QuickSight to run a complex query with multiple Athena federated data sources.

  1. On the QuickSight console, choose New analysis.
  2. Choose New dataset.
  3. For Datasets, choose Athena.
  4. For Data source name, enter Athena-federation.
  5. For the workgroup, choose primary.
  6. Choose Create data source.
  7. Choose Use custom SQL.
  8. Enter the query for ProfitBySupplierNation.
  9. Choose Edit/Preview data.

Choose Edit/Preview data.

Under Query mode, you have the option to view your query in either SPICE or direct query. SPICE is the QuickSight Super-fast, Parallel, In-memory Calculation Engine. It’s engineered to rapidly perform advanced calculations and serve data. Using SPICE can save time and money because your analytical queries process faster, you don’t need to wait for a direct query to process, and you can reuse data stored in SPICE multiple times without incurring additional costs. You also can refresh data in SPICE on a recurring basis as needed or on demand. For more information about refresh options, see Refreshing Data.

With direct query, QuickSight doesn’t use SPICE data and sends the query every time to Athena to get the data.

  1. Select SPICE.
  2. Choose Apply.
  3. Choose Save & visualize.

Choose Save & visualize.

  1. On the Visualize page, under Fields list, choose nation and sum_profit.

QuickSight automatically chooses the best visualization type based on the selected fields. You can change the visual type based on your requirement. The following screenshot shows a pie chart for Sum_profit grouped by Nation.

The following screenshot shows a pie chart for Sum_profit grouped by Nation.

You can add more datasets using Athena federated queries and create dashboards. The following screenshot is an example of a visual analysis over various datasets that were added as part of this post.

The following screenshot is an example of a visual analysis over various datasets that were added as part of this post.

When your analysis is ready, you can choose Share to create a dashboard and share it within your organization.

Summary

QuickSight is a powerful visualization tool, and with Athena federated queries, you can run analysis and build dashboards on various data sources like DynamoDB, HBase on Amazon EMR, and many more. You can also easily join relational, non-relational, and custom object stores in Athena queries and use them with QuickSight to create visualizations and dashboards.

For more information about Athena Federated Query, see Using Amazon Athena Federated Query and Query any data source with Amazon Athena’s new federated query.


Appendix

To register a data source in Athena, complete the following steps:

  1. On the Athena console, choose Data sources.

On the Athena console, choose Data sources.

  1. Choose Connect data source.

Choose Connect data source.

  1. Select Query a data source.
  2. For Choose a data source, select a data source (for this post, I select Redis).
  3. Choose Next.

Choose Next.

  1. For Lambda function, choose your function.

For this post, I use the redis Lambda function, which I configured as part of configuring the Athena federated connector in the post Extracting and joining data from multiple data sources with Athena Federated Query.

  1. For Catalog name, enter a name (for example, redis).

The catalog name you specify here is the one that is displayed in QuickSight when selecting Lambda functions for access.

  1. Choose Connect.

Choose Connect.

When the data source is registered, it’s available in the Data source drop-down list on the Athena console.

When the data source is registered, it’s available in the Data source drop-down list on the Athena console.


About the Author

Saurabh Bhutyani is a Senior Big Data Specialist Solutions Architect at Amazon Web Services. He is an early adopter of open-source big data technologies. At AWS, he works with customers to provide architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena, AWS Glue, and AWS Lake Formation. In his free time, he likes to watch movies and spend time with his family.