Tag Archives: Q/Machine Learning.

Secure generative SQL with Amazon Q

2025-07-25 Gregory Knowles

Post Syndicated from Gregory Knowles original https://aws.amazon.com/blogs/big-data/secure-generative-sql-with-amazon-q/

Amazon Q generative SQL brings generative AI capabilities to help speed up deriving insights from your Amazon Redshift data warehouses and AWS Glue Data Catalog, generating SQL for Amazon Redshift or Amazon Athena. With Amazon Q, you get SQL commands generated with your context. This means you can focus on deriving insights faster, rather than having to first learn potentially complex schemas. Without generative SQL, your data analysts might have to frequently switch between different types of SQL, which can further slow analysis down. Amazon Q generative SQL can help by generating SQL statements from natural language and speeding up development. This can help onboard analysts faster and improve analyst productivity. The generative SQL experience is available through Amazon SageMaker Unified Studio and Amazon Redshift Query Editor v2.

To scale the use of generative SQL in production scenarios, you need to consider how relevant and accurate SQL is generated. In doing so, it’s important to understand what data is used and how your information is protected. Amazon Q generative SQL is designed to keep your data secure and private. Your queries, data, and database schemas are not used to train generative AI foundation models (FMs). For more information, see Considerations when interacting with Amazon Q generative SQL.

In the post Write queries faster with Amazon Q generative SQL for Amazon Redshift, we provided general advice around getting started with generative SQL. In this post, we discuss the design and security controls in place when using generative SQL and its use in both SageMaker Unified Studio and Amazon Redshift Query Editor v2.

Solution overview

Generating relevant SQL requires context from your data warehouse or data catalog schemas. Your analysts can ask free text or natural language questions in the Amazon Q chat window and have SQL statements returned that reference your tables and columns. It’s important that the generated SQL is consistent with your schema so that it can find the most relevant fields to answer questions and generate queries that accurately reference data. In SageMaker Unified Studio or Amazon Redshift Query Editor v2, when the Amazon Q chat window is open, database metadata that is viewable under the connection context is made available to Amazon Q for SQL generation. This means that only the schema information that the connecting user can access is used. Tables or database objects the user doesn’t have access to are excluded.

When a user submits questions in the Amazon Q chat window, a search algorithm is used to find the most relevant context from the available database schema metadata information. This context is combined with the user’s question and used as a prompt to a large language model (LLM) to generate a SQL statement. The supporting information is cached so that your data source doesn’t need to be queried every time a user initiates SQL generation. Instead, data source metadata will be periodically refreshed if it remains in use, or you can trigger a manual refresh. If the data is not being used, Amazon Q will automatically delete it. Where applicable, the information used to support SQL generation is encrypted with an AWS Key Management Service (AWS KMS) customer managed KMS key where one has been specified in the SageMaker Unified Studio or Amazon Redshift Query Editor v2 settings. Otherwise, an AWS managed key is used. Your information is encrypted in transit and at rest.

The following diagram shows the process flow for SQL generation when using SageMaker Unified Studio or the Amazon Redshift Query Editor and using Amazon Redshift or Data Catalog source data.

Process diagram for SQL generation

The Amazon Q generative SQL process can be summarized as the following steps:

A user interacts with the Amazon Q chat pane through SageMaker Unified Studio or the Amazon Redshift Query Editor.
The SQL chat frontend sends the prompt along with the connection configuration to Amazon Q.
Amazon Q uses the connection context to retrieve information that will support SQL generation if this data is not already available.
Amazon Q encrypts the retrieved information under the appropriate AWS managed or customer managed KMS key. The information is subsequently decrypted on retrieval.
The information is stored along with custom context information, if this has been provided.
Relevant context from the combined information is selected and added to the user’s questions and sent to an LLM to generate a SQL statement, which is returned to the user.
The user can decide whether to run the statement and can provide feedback on usefulness and accuracy.

Additional context to enhance SQL generation

You can provide further context to supplement the database schema information, which can help improve the accuracy and relevancy of the generated SQL.

One option is to provide custom context. Custom context gives the option to specify instructions and extra information, such as descriptions of tables and columns. These descriptions can then be used to help the selection of relevant tables and attributes when generating SQL statements. This is particularly relevant when your schema uses more obscure naming that might not directly relate to business terms or uses non-standard abbreviations. For example, consider a table called sls_r1_2024. With custom context, you can add a table description specifying that, for example, the table includes sales information across stores in the US region for the calendar year 2024. This information can help the LLM generate SQL referencing the correct tables. The same approach can be applied to columns within the table. Your custom context is encrypted using a customer managed KMS key if one has been specified (during Amazon Redshift Query Editor account creation or SageMaker Unified Studio project creation) or an AWS managed key otherwise.

You can also introduce constraints using custom context. For example, you can explicitly include or exclude specific schemas, tables, or columns from SQL query generation. Similarly, specific topics can also be disallowed, such as not generating SQL statements to support financial reporting. For more details about the information that can be supplied, refer to Custom context.

Another option is to grant SQL query history access to the user establishing the connection. This information is then also made available to enhance SQL generation and to provide the LLM with examples of relevant queries. Be aware that granting wider SQL query history access to the connecting user, and therefore also the generative SQL workflow, allows viewing of queries over tables or objects the user might not have access to. Furthermore, string literals might be present in historic statements that might contain sensitive information. To help mitigate this risk, you could instead use the CuratedQueries section of custom context to provide predefined question and answer examples, without exposing all user queries.

Generated statement response

Before a SQL statement is returned to the user, Amazon Q tries to detect syntax issues. This step helps improve the likelihood that only valid SQL syntax is returned. Amazon Q will use the available information for the user to return statements that align with user permissions, to reduce scenarios where users can’t run generated statements. For example, if you have given access to SQL query history information, then the SQL generation step might produce a query statement referencing a table that the user asking the question doesn’t have access to. Amazon Q minimizes the occurrence of this scenario by assessing if the generated SQL aligns with user permissions and updating the statement if not. User permissions are not bypassed through the use of Amazon Q generative SQL. If a statement was returned referencing a table the user doesn’t have access to, the authorization applied to the user will enforce access control when the statement is executed.

Statements generated by Amazon Q that could potentially change your database, such as DML or DDL statements, are returned with a warning. The warning highlights to the user that running the statement could potentially modify the database. Again, these statements are only executable if the user has the required permissions.

Prerequisites

Amazon Q generative SQL works with your Redshift data warehouses and Data Catalog tables. To get started, you should have data available in either or both of these environments. To use Amazon Q generative SQL with your AWS Glue tables, you need a SageMaker Unified Studio domain. Within your domain, you can use the Amazon Q chat integration to ask questions of your data and have SQL generated. This also works for Amazon Redshift data sources available in the domain. You can use Amazon Q generative SQL without a SageMaker Unified Studio domain using the Amazon Redshift Query Editor. Access to the editor enables Amazon Q chat integration against your Amazon Redshift data sources.

Enable Amazon Q generative SQL

You can control access to generative SQL at the account-Region level in the Amazon Redshift Query Editor or at the SageMaker Unified Studio domain level. To enable this feature, an account admin must explicitly turn on Amazon Q generative SQL. By default, the feature is not accessible to your users. Administrators that have permission for the sqlworkbench:UpdateAccountQSqlSettings AWS Identity and Access Management (IAM) action can turn the Amazon Q generation SQL feature on or off through the admin window, as illustrated in the following sections. When turned off, this will restrict users from opening the Amazon Q chat pane and help prevent interaction with generative SQL.

Enable Amazon Q in your SageMaker domain

To enable Amazon Q in your SageMaker domain, you can navigate to the Amazon Q tab on the domain settings page and choose to enable the service. For more information, see Amazon Q in Amazon SageMaker Unified Studio.

Enable Amazon Q in SageMaker Unified Studio domain

Enable Amazon Q in Amazon Redshift

To enable Amazon Q generative SQL from the Amazon Redshift Query Editor, access the Amazon Q generative SQL settings. This requires the administrator to have the sqlworkbench:UpdateAccountQSqlSettings permission in their IAM policy. For more information, see Updating generative SQL settings as an administrator.

Enabling Amazon Q generative SQL from Redshift query editor

With generative SQL enabled at the account-Region level, you can restrict access to specific users with IAM controls. IAM administrators can build IAM policies that allow or deny access to the action sqlworkbench:GetQSqlRecommendations. For more information, refer to Actions, resources, and condition keys for AWS SQL Workbench. Policies can then be associated with IAM users or roles to control access to SQL generation at a more granular level. An appropriately scoped service control policy (SCP) can be used to limit access to SQL generation to specific accounts within your organization if required.

The following is an example policy denying access to use SQL generation:

{
"Version": "2012-10-17",
    "Statement": [
        {
"Sid": "DenyAccessToAmazonQGenerativeSql",
            "Effect": "Deny",
            "Action": [
                "sqlworkbench:GetQSqlRecommendations"
            ],
            "Resource": "*",
        }
    ]
}

Cross-Region inference

Amazon Q Developer uses cross-Region inference to distribute traffic across different AWS Regions, which provides increased throughput and resilience during high demand periods, improved performance, and access to the latest Amazon Q Developer capabilities.

When a request is made from an Amazon Q Developer profile, it is kept within the Regions in the same geography as the original data. Although this doesn’t change where the data is stored, the requests and output results might move across Regions during the inference process. Data is encrypted when transmitted across Amazon’s network. For more information on cross-Region inference, see Cross-region processing in Amazon Q Developer.

Monitoring

To monitor which IAM users or roles are interacting with generative SQL, you can use AWS CloudTrail. CloudTrail monitors API calls and logs which identities have performed particular actions. When a user first asks a question, a CloudTrail event is emitted called IngestQSqlMetadata. This is a result of Amazon Q starting the metadata ingest process. Ingestion is an asynchronous operation, so there might be a series of GetQSqlMetadataStatus events. This is due to the workflow checking the ingestion process status.

After the workflow has completed successfully, each question sees a GetQSqlRecommendation event. This is the result of users submitting questions and triggering generation of SQL statements. The following is an example CloudTrail event for GetQSqlRecommendation. In this example, Amazon Q emits detailed CloudTrail events highlighting the warehouse being queried, IAM principal calling Amazon Q, and the entire response structure from Amazon Q in responseElements:

{
    "eventVersion": "1.09",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "AROA123456789EXAMPLE:demouser",
        "arn": "arn:aws:sts::111122223333:assumed-role/DemoUser",
        "accountId": "111122223333",
        "accessKeyId": "ASIAIOSFODNN7EXAMPLE",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "AROA123456789EXAMPLE",
                "arn": "arn:aws:iam::111122223333:role/DemoUser",
                "accountId": "111122223333",
                "userName": "DemoUser"
            },
            "attributes": {
                "creationDate": "2025-01-17T05:31:01Z",
                "mfaAuthenticated": "false"
            }
        }
    },
    "eventTime": "2025-01-17T05:34:51Z",
    "eventSource": "sqlworkbench.amazonaws.com",
    "eventName": "GetQSqlRecommendation",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "122.171.17.139",
    "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:133.0) Gecko/20100101 Firefox/133.0",
    "requestParameters": {
        "dbConfig": {
            "database": "sample_data_dev"
        },
        "databaseConfiguration": {
            "redshiftConfig": {
                "clusterIdentifier": "redshift-cluster-1",
                "database": "sample_data_dev"
            }
        },
        "prompt": "HIDDEN_DUE_TO_SECURITY_REASONS",
        "clientToken": "HIDDEN_DUE_TO_SECURITY_REASONS",
        "logConfig": {},
        "sqlworkbenchConnectionArn": "arn:aws:sqlworkbench:us-east-1:111122223333:connection/47ahg61-ce0b-4646-831b-a140ea4055ae"
    },
    "responseElements": {
        "data": {
            "extractionErrors": false,
            "guardRails": {
                "isDml": false
            },
            "sqlStatement": "HIDDEN_DUE_TO_SECURITY_REASONS",
            "syntaxErrors": "HIDDEN_DUE_TO_SECURITY_REASONS"
        },
        "logSessionId": "623318ad-dbcc-4f69-ae08-f85d1b63a70f",
        "questionId": "623318ad-dbcc-4f69-ae08-f85d1b63a70f",
        "originalQuestionId": "623318ad-dbcc-4f69-ae08-ae08asd1a"
    },
    "requestID": "623318ad-dbcc-4f69-ae08-f85d1b63a70f",
    "eventID": "ac2c1932-49b1-41b3-a1af-20fa4461cf7d",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "111122223333",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.3",
        "cipherSuite": "TLS_AES_128_GCM_SHA256",
        "clientProvidedHostHeader": "qsql.sqlworkbench.us-east-1.amazonaws.com"
    },
    "sessionCredentialFromConsole": "true"
}

Conclusion

In this post, we discussed the Amazon Q generative SQL workflow. We highlighted the process around using your schema context alongside metadata such as historic SQL queries and custom context. Using this metadata allows the generation of relevant SQL that helps accelerate your analyst’s productivity. Although it’s important to assist analysts, it’s also imperative to make sure data remains secure and protected. To support this, generative SQL uses only the data the connected user has access to. This helps prevent exposure to information beyond their authorization.When you’re looking to increase the relevance of generated SQL through sharing additional query history, it’s important to consider the trade-off of exposing additional information to the user. Deciding your approach here should take into account the domain context of the data and the possible exposure of metadata the user doesn’t have access to, or potentially sensitive information that might appear in query strings. Keeping these considerations in mind can help you achieve the appropriate security posture for your workloads.

To get started with Amazon Q generative SQL, see Write queries faster with Amazon Q generative SQL for Amazon Redshift and Interacting with Amazon Q generative SQL.

About the authors

Gregory Knowles is a data and AI specialist solution architect at AWS, focusing on the UK public sector. With extensive experience in cloud-based architectures, Greg guides public sector customers in implementing modern data solutions. His expertise spans governance, analytics, and AI/ML. Greg’s passion lies in accelerating transformation and innovation to improve productivity and outcomes. He has successfully led projects that moved data systems into the cloud, adopted new data architectures, and implemented AI at scale in production.

Abhinav Tripathy is a Software Engineer and Security Guardian at AWS, where he develops Amazon Q generative SQL by combining machine learning, databases, and web systems. Abhinav is passionate about building scalable web systems from scratch that solve real customer challenges. Outside of work, he enjoys traveling, watching soccer, and playing badminton.

Erol Murtezaoglu is a Technical Product Manager at AWS, is an inquisitive and enthusiastic thinker with a drive for self-improvement and learning. He has a strong and proven technical background in software development and architecture, balanced with a drive to deliver commercially successful products. Erol highly values the process of understanding customer needs and problems, in order to deliver solutions that exceed expectations.

Best practices for enabling business users to answer questions about data using natural language in Amazon QuickSight

2023-06-16 Amy Laresch

Post Syndicated from Amy Laresch original https://aws.amazon.com/blogs/big-data/best-practices-for-enabling-business-users-to-answer-questions-about-data-using-natural-language-in-amazon-quicksight/

In this post, we explain how you can enable business users to ask and answer questions about data using their everyday business language by using the Amazon QuickSight natural language query function, Amazon QuickSight Q.

QuickSight is a unified BI service providing modern interactive dashboards, natural language querying, paginated reports, machine learning (ML) insights, and embedded analytics at scale. Powered by ML, Q uses natural language processing (NLP) to answer your business questions quickly. Q empowers any user in an organization to start asking questions using their own language. Q uses the same QuickSight datasets you use for your dashboards and reports so your data is governed and secured. Just as data is prepared visually using dashboards and reports, it can be readied for language-based interactions using a topic. Topics are collections of one or more datasets that represent a subject area that your business users can ask questions about. To learn how to create a topic, refer to Creating Amazon QuickSight Q topics.

With automated data preparation in QuickSight Q, the model will do a lot of the topic setup for you, but there is some context that is specific to your business that you need to provide. To learn more about the initial setup work that Q does behind the scenes, check out New – Announcing Automated Data Preparation for Amazon QuickSight Q.

Business users can access Q from the QuickSight console or embedded in your website or application. To learn how to embed the Q bar, refer to Embedding the Amazon QuickSight Q search bar for registered users or anonymous (unregistered) users. To see examples of embedded dashboards with Q, refer to the QuickSight DemoCentral.

Once you have a topic shared with your business users, they can ask their own questions and save questions to their pinboard as seen in GIF 1.

QuickSight authors can also add their Q visuals straight to an analysis to speed up dashboard creation, as seen in GIF 2.

This post assumes you’re familiar with building visual analytics in dashboards or reports, and shares new and different strategies needed to build natural language interfaces that are simple to use.

In this post, we discuss the following:

The importance of starting with a narrow and focused use case
Why and how to teach the system your unique business language
How to get success by providing support and having a feedback loop

If you don’t have Q enabled yet, refer to Getting started with Amazon QuickSight Q or watch the following video.

Follow along

In the following examples, we often refer to two out-of-the-box sample topics, Product Sales and Student Enrollment Statistics, so you can follow along as you go. We recommend creating the topics now before continuing with this post, because they take a few minutes to be ready.

Understand your users

Before we jump into solutions, let’s talk about when natural language query (NLQ) capabilities are right for your use case. NLQ is a fast way for a business user who is an expert in their business area to flexibly answer a large variety of questions from a scoped data domain. NLQ doesn’t replace the need for dashboards. Instead, when designed to augment a dashboard or reporting use case, NLQ helps business users get customized answers about specific details without asking a business analyst for help.

It’s critical to have a well-understood use case because language is inherently complex. There are many ways to refer to the same concept. For example, a university might refer to “classes” several ways, such as “courses,” “programs,” or “enrollments.” Language also has inherent ambiguity—“top students” might mean by highest GPA to one person and highest number of extracurriculars to another. By understanding the use case up front, you can uncover areas of potential ambiguity and build that knowledge directly into the topic.

For example, the AWS Analytics sales leadership team uses QuickSight and Q to track key metrics for their region as part of their monthly business review. When I worked with the sales leaders, I learned their preferred terminology and business language through our usability sessions. One observation I made was that they referred to the data field Sales Amortized Revenue as “adrr”. With these learnings, I could easily add this context to the topic using synonyms, which I cover in detail below. One of the sales leaders shared, “This will be awesome for next month when I write my MBR. What previously took a couple of hours, I can now do in a few minutes. Now I can spend more time working to deliver my customer’s outcomes.” If the sales leader asked a question about “adrr” but that connection was not included in their Q topic, then the leader would feel misunderstood and revert back to their original, but slower, ways of finding the answer. Check out more QuickSight use cases and success stories on the AWS Big Data Blog.

Start small

In this section, we share a few common challenges and considerations when getting started with Q.

Data can contain overlapping words

One pitfall to look out for is any fields with long strings, like survey write-in responses, product descriptions, and so on. This type of data introduces additional lexical complexity for readers to navigate. In other words, when an end-user asks a question, there is a higher chance that a word in one of the strings will overlap with other relevant fields, such as a survey write-in that mentions a product name in your Product field. Other non-descriptor fields can also contain overlaps. You can have two or more field names with lexical overlap, and the same across values, and even between fields and values. For example, let’s say you have a topic with a Product Order Status field with the values Open and Closed and a Customer Complaint Status field also with the values Open and Closed. To help avoid this overlap, consider alternate names that would be natural to your end-users to avoid the potential ambiguity. In our example, I’d keep the Product Order Status values and change the Customer Complaint Status to Resolved and Unresolved.

Avoid including aggregation names in your fields and values

Another common pitfall that introduces unnecessary ambiguity is including calculated fields for basic aggregations that Q can do on the fly. For example, business users might track average clickthrough rates for a website or month-to-date free to paid conversions. Although these types of calculations are necessary in a dashboard, with Q, these calculated fields are not needed. Q can aggregate metrics using natural language, like simply asking “year over year sales” or “top customers by sales” or “average product discount,” as you can see in Figure 1. Defining a field with the name YoY Sales adds an additional potential answer choice to your topic, leaving end-users to select between the pre-defined YoY Sales field, or using Q’s built-in YoY aggregation capability, whereas you may already know which of these choices is likely to bring them the best outcome. If you have complex business logic built into calculated fields, those are still relevant to include (and if you create the topic from your existing analysis, then Q will bring them over.)

Q answer showing MoM sales

Figure 1: Q visual showing MoM sales for EMEA

Start with a single use case

For this post, we recommend defining a use case as a well-defined set of questions that actual business users will ask. Q gives the ability to answer questions not already answered in dashboards and reports, so simply having a dashboard or a dataset doesn’t mean you necessarily have a Q-ready use case. These questions are the real words and phrases used by business users, like “how are my customers performing?” where the word “performing” might map in the data to “sales amortized revenue,” but a business user might not ask questions using the precise data names.

Start with a single use case and the minimum number of fields to meet it. Then incrementally layer in more as needed. It’s better to introduce a topic with, for example, 10 fields and a 100% success rate of answering questions as expected vs. starting with 30 fields and a 70% success rate to help users feel confident.

To help you start small, Q enables you to create your topic in one click from your existing analysis (Figure 2).

Enable a Q topic from a QuickSight analysis

Figure 2: Enable a Q topic from a QuickSight analysis

Q will scan the underlying metadata in your analysis and automatically select high-value columns based on how they are used in the analysis. You’ll also get all your existing calculated fields ported over to the new topic so you don’t have to re-create them.

Add lexical context

Q knows English well. It understands a variety of phrases and different forms of the same word. What it doesn’t know is the unique terms from your business, and only you can teach it.

There are some key ways to provide Q this context, including adding synonyms, semantic types, default aggregations, primary date, named filters, and named entities. If you created your Q topic as described in the previous section, you will be a few steps ahead, but it’s always good to check the model’s work.

Add synonyms

In a dashboard, authors use visual titles, text boxes, and filter names to help business users navigate and find their answers. With NLQ, language is the interface. NLQ empowers business users to ask their questions in their own words. The author needs to make those business lexicon connections for Q using synonyms. Your business users might refer to revenue as “gross sales,” “amortized revenue,” or any number of terms specific to your business. From the topic authoring page, you can add relevant terms (Figure 3).

Add Q synonyms

Figure 3: Adding relevant synonyms

If your business users refer to the data values in multiple ways, you can use value synonyms to create those connections for Q (Figure 4). For example, in the Student Enrollment topic, let’s say your business users sometimes use First Years to map to Freshmen and so on for each classification type. If you don’t have that data directly in your dataset, you can create those mappings using value synonyms (Figure 5).

Configure Q value synonyms

Figure 4: Configure field value synonyms

Add Q value synonyms

Figure 5: Example value synonyms for Student Enrollment topic

Check semantic types

When you create a topic using automatic data prep, Q will automatically select relevant semantic types that it can detect. Q uses semantic types to understand which specific fields to use to answer vague question like who, where, when, and how many. For example, in the student enrollment statistics example, Q already set Home of Origin as Location so if someone asks “where,” Q knows to use this field (Figure 6). Another example is adding Person for the Student Name and Professor fields so Q knows what fields to use when your business users ask for “who.”

Home of origin semantic type

Figure 6: Semantic Type set to “Location”

Another important semantic type is the Identifier. This tells Q what to count when your business users ask questions like “How many were enrolled in biology in 2021?” (Figure 7). In this example, Student ID is set as the Identifier.

Q answer showing a KPI of 3

Figure 7: Q visual showing a “how many” question

Here is a list of semantic types that map to implicit question phrases:

Location: Where?
Person or Organization: Who?
- If there are no person or organization fields, then Q will use the identifier
Identifier: How many? What is the number of?
Duration: How long?
Date Part: When?
Age: How old?
Distance: How far?

Semantic types also help the model in several other ways, including mapping terms like “most expensive” or “cheapest” to Currency. There is not always a relevant semantic type, so it’s okay to leave those empty.

Set default aggregations

Q will always aggregate measure values a business user asks for, so it’s important to use measures that retain their meaning when brought together with other values. As of this writing, Q works best with underlying data that is summative, for example, a currency value or a count. Examples of metrics that are not summative are percentages, percentiles, and medians. Measures of this type can produce misleading or statistically inaccurate results when added with one another. Q can be used to produce averages, percentiles, and medians by end-users without first performing those calculations in underlying data.

Help Q understand the business logic behind your data by setting default aggregations. For example, in the Student Enrollment topic, we have student test scores for every course, which should be averaged and not summed, because it’s a percentage. Therefore, we set Average as the default and set Sum as a not allowed aggregation type (Figure 8).

Percentage semantic type

Figure 8: Setting “Sum” as a “Not allowed aggregation” for a percentage data field

To ensure end-users get a correct count, consider whether the default aggregation type for each dimensional field should be Distinct Count or Count and set accordingly. For example, if we wanted to ask “how many courses do we offer,” we would want to set Courses to Distinct Count because the underlying data contains multiple records for the same course to track each student enrolled.

If we have a count, we get over 6,000 courses, which is a count of all rows that have data in the Courses field, covering every student in the dataset (Figure 9).

Q KPI showing 6,277

Figure 9: Q visual showing a count of courses

If we set the default aggregation to Distinct Count, we get the count of unique course names, which is more likely to be what the end-user expects (Figure 10).

Q KPI showing 15

Figure 10: Q visual showing the unique count of courses

Review the primary date field

Q will automatically select a primary date field for answering time related questions like “when” or “yoy”. If your data includes more than one date field, you may want to choose a different date than Q’s default choice. End-users can also ask about additional date fields by explicitly naming them (Figure 12). You can always specify a different date if you’d like. To review or change the primary date, go to the topic page, navigate to the Data section, and choose the Datasets tab. Expand the dataset and review the value for Default date (Figure 11).

Reviewing the Q default date

Figure 11: Reviewing the default date

You can change the date as needed.

Changing the date field

Figure 12: Asking about non-default dates

Add named filters

In a dashboard, filters are critical to allow users to focus in on their area of interest. With Q, traditional filters aren’t required because users can automatically ask to filter any field values included in the Q topic. For example, you could ask “What were sales last week for Acme Inc. for returning shoppers?” Instead of building the filters in a dashboard (date, customer name, and returning vs. new customer), Q does the filtering on the fly to instantly provide the answer.

With Q, a filter is a specific word or phrase your business users will use to instruct Q to filter returned results. For example, you have student test scores but you want a way for your users to ask about failing test scores. You can set up a filter for “Failing” defined as test scores less than 70% (Figure 13).

Q filter configuration

Figure 13: Filter configuration example using a measure

Additionally, maybe you have a field for Student Classification, which includes Freshmen, Sophomore, Junior, Senior, and Graduate, and you want to let users ask about “undergrads” vs. “graduates” (Figure 14). You can make a filter that includes the relevant values.

Q undergrad named filter example

Figure 14: Filter configuration example using a dimension

Add named entities

Named entities are a way to get Q to return a set of fields as a table visual when a user asks for a specific word or phrase. If someone wanted to know “sales for retail december” and they get a KPI saying $6,169 without any extra context, it is hard to understand all data this number includes (Figure 15).

Q KPI showing $6,169

Figure 15: A Q visual showing “sales for retail december”

By presenting the KPI in a table view with other relevant dimensions, the data includes additional context making it easier to understand meaning (Figure 16).

Q named entity as a table visual

Figure 16: A Q visual showing “sales details for retail december”

By building these table views, you can happily surprise your business users by anticipating the information they want to see without having to explicitly ask for each piece of data. The best part is your business users can easily filter the table using language to answer their own data questions. For example, in the Student Enrollment topic, we created a Student information named entity with some important student details like their name, major, email, and test scores per course.

Q named entity for student information

Figure 17: Named entity example

If a university administrator wanted to reach out to students who are failing biology, they can simply ask for “student information for failing biology majors.” In one step, they get a filtered list that already includes their emails and test scores so they can reach out (Figure 18).

Q named entity filtered down for failing biology students

Figure 18: Filtering a named entity

If the university administrator wanted to also see the phone numbers of the students to send texts offering free tutoring, they could simply ask Q “Student information for failing biology majors with phone numbers.” Now, Mobile is added as the first column (Figure 19).

Q named entity adding phone number

Figure 19: Adding a column to a named entity

Entities can also be referenced using synonyms in order to capture all the ways your business users might refer to this group of data. In our example, we could also add “student contact info” and “academic details” based on the common terminology the university admins use.

Besides looking for patterns in the data fields, ask yourself about what your business users care about. For example, let’s assume we have data for our HR specialists, and we know they care about job postings, candidates, and recruiters. Each author might think of the groups slightly differently, but as long as it’s rooted in your business jobs to be done, then your groupings are providing value. With those three groups in mind, we can sort all the data into one of those buckets. For this use case, our Candidate bucket is pretty large, with about 20 fields. We can scan the list and notice that we track information for rejected and accepted candidates, so we start splitting the metrics into two groups: Successful Candidates and Rejected Candidates. Now information like Offer Letter Date, Accept Date, and Final Salary are all in the Successful Candidate group, and related fields about Rejected Candidates are clearly grouped together.

If you’re curious about strategies for how to create entities, check out card sorting techniques.

In the Product Sales sample topic, after scanning the data, we would start with Sales, Product, and Customer as three key groupings of information to analyze. Try out the exercise on your own data and feel free to ask any questions on the QuickSight Community. To learn how to create named entities, refer to Adding named entities to a topic dataset.

Drive NLQ adoption

After you have refined your topic, tested it out with some readers, and made it available for a larger audience, it’s important to follow two strategies to drive adoption.

First, provide your business users with support. Support might look like a short tutorial video or newsletter announcement. Consider keeping an open channel like a Slack or Teams chat where active users can post questions or enhancements.

Here at Amazon, the Prime team has a dedicated Product Manager (PM) for their embedded Q application that they call PrimeQ. The PM hosts regular demo and training sessions where the Prime team can ask them any questions and get ideas about what types of answers they can get. The PM also sends out a monthly newsletter to announce the availability of new data and topics along with sample questions, FAQs, and quotes from Prime team members who get value out of Q. The PM also has an active Slack channel where every single question gets answered within 24 hours, either by the PM or a data engineer on the Prime team.

Pro tip: Make sure your business users know who they can reach out to if they get stuck. Avoid the black box of “reach out to your author” so readers feel confident their questions will be answered by a known person. For embedded applications, be sure to build an easy way to get support.

Second, maintain a healthy feedback loop. Look at the usage data directly in the product and schedule 1-on-1 sessions with your readers. Use the usage data to track adoption and identify readers who are asking unanswerable questions (Figure 20). Engage with both your successful and struggling readers to learn how to continue to iterate and improve the experience. Talking to business users is especially important to uncover the implicit ambiguity of language.

Another example here at Amazon, after first launching the Revenue Insights topic for the AWS Analytics sales team, a QuickSight Solution Architect (SA) and myself checked the usage tab on a daily basis to track unanswerable questions and directly reach out to the sales team member to let them know how to adjust their question or that we made a change so their question would now work. For example, we initially had a field turned off for Market Segment and noticed a question from a sales leader asking about sales by segment. We turned the field on and let him know those questions would now work. The SA and I have a Slack channel with other stakeholders so we can troubleshoot asynchronously with ease. Now that the topic has been available for several months, we check the usage tab on a weekly basis.

Q user activity tab

Figure 20: User Activity tab in Q

Conclusion

In this post, we discussed how language is inherently complex and what context you need to provide Q to teach the system about your unique business language. Q’s automated data prep will get you started, but you need to add the context that is specific to your business user’s language. As we mentioned at the start of the post, consider the following:

Start with a narrow and focused use case
Teach the system your unique business language
Get success by providing support and having a feedback loop

Follow this post to enable your business users to answer questions of data using natural language in QuickSight.

Ready to get started with Q? Watch our quick tutorial on enabling QuickSight Q.

Want some tutorial videos to share with your team? Check out the following:

To see how Q can answer the “Why” behind data changes and forecast future business performance, refer to New analytical questions available in Amazon QuickSight Q: “Why” and “Forecast”.

About the Author

Amy Laresch is a product manager for Amazon QuickSight Q. She is passionate about analytics and is focused on delivering the best experience for every QuickSight Q reader. Check out her videos on the @AmazonQuickSight YouTube channel for best practices and to see what’s new for QuickSight Q.

AWS Analytics sales team uses QuickSight Q to save hours creating monthly business reviews

2022-12-13 Amy Laresch

Post Syndicated from Amy Laresch original https://aws.amazon.com/blogs/big-data/aws-analytics-sales-team-uses-quicksight-q-to-save-hours-creating-monthly-business-reviews/

The AWS Analytics sales team is a group of subject-matter experts who work to enable customers to become more data driven through the use of our native analytics services like Amazon Athena, Amazon Redshift, and Amazon QuickSight. Every month, each sales leader is responsible for reporting on observations and trends in their business. To support their observations, the leaders track key metrics for their region as part of their monthly business review (MBR).

Today, sales leaders use a QuickSight dashboard to analyze these key metrics. Establishing a baseline is a time-intensive process that requires navigating various tabs and filters. To save time, analytics sales managers for the Americas regions have been eager to ask QuickSight Q, in their own business language, questions like “Who are my top customers by month-over-month revenue?” or “How much did Customer X spend on Amazon Redshift this month compared with last?”

Rather than manually filtering their views to understand the underlying signals, they now use the native capabilities of QuickSight Q, resulting in many hours saved per leader.

These sales leaders can instead focus on “why it happened” and “what’s coming next” (spoiler alert: Q supports “why?” and forecast questions).

Since each leader reports on the same metrics each month, they would like to save each QuickSight Q answer, curated for their region, so they can focus on growing their business. With QuickSight Q pinboards, they can do just that. They can pin visuals for one-click access to frequently asked questions. Every time the dataset updates, the visual will reflect the latest data, all of which gets rendered in seconds because of SPICE (Super-fast, Parallel, In-memory Calculation Engine).

The features explored in this post are part of Amazon QuickSight Q. Powered by machine learning (ML), Q uses natural language processing to answer your business questions quickly. If you’re an existing QuickSight user, be sure that the Q add-on is enabled. For steps on how to do this, see Getting started with Amazon QuickSight Q.

Personalized data for sales managers

Kellie Burton, Sr. QuickSight Solutions Architect, and Amy Laresch, a Product Manager for QuickSight Q, worked with sales leaders Patrick Callahan, US West, and Jeff Pratt, US Central, to build a QuickSight Q topic for Americas Analytics revenue. A topic is a collection of one or more datasets that represents a subject area that business users can ask questions about. The Americas Analytics topic is built on a revenue dataset that is protected with row-level security (RLS), so any question asked is restricted by the same rules.

To keep the topic focused and avoid potential language ambiguity, Kellie and Amy used copies of previous MBR deliverables to understand what measures, dimensions, and calculated fields were required in the topic. With QuickSight Q automated data prep, the calculated fields were automatically added to the topic, so the topic authors did not have to recreate them. With Q, readers could ask questions like “year-to-date (YTD) YoY % for us-west analytics by segment” to get the exact table view that Patrick includes in his MBR. During a usability session, the authors worked with Jeff and Patrick to ask Q each required question and save it to their pinboard.

After opening his completed pinboard, Jeff said, “Wow, that is really cool. It answers all the questions I write the MBR for in my own custom pinboard. A report that used to take me 2-3 hours to pull together will now only take me 5 minutes.” With the extra time, he’s energized to focus more on the story behind the data and planning for future.

Patrick shared Jeff’s sentiment saying, “This will be awesome for next month when I write my MBR. What previously took a couple of hours, I can now do in a few minutes. Now I can spend more time working to deliver my customer’s outcomes.”

Completed sales pinboard showing visualizations like a bar chart for top 10 customers, using sample data from the Software Sales sample topic

Sample pinboard for a sales leader for the Americas region with mock data (from the Software Sales sample topic)

Once you have an answer to a question, you might want to understand why that happened. This is where Q Why questions come into play.

Why questions

Understanding why is critical to making data-backed decisions to delight your customers and grow your business. For example, in this Software Sales sample topic, I asked Q for monthly revenue and noticed a drop in October 2022.

Amazon QuickSight Q displaying a monthly revenue trend line chart

Mock data from the Software Sales sample topic

I ask Q, “Why?” and see four key drivers: Customer Contact, Country, Product, and Industry.

Amazon QuickSight Q Why visual displaying four key drivers for why revenue dropped in October 2022

Next, I change Country to Region to see the impact at a higher level.

Amazon QuickSight Q Why visual with dropdown open to change a key driver

Forecast questions

Next, I can ask Q for a forecast that uses ML and factors, like seasonality, to predict the trend.

Amazon QuickSight Q forecast question showing trend for revenue

With pinboards, why questions, and forecast questions, QuickSight Q not only saves significant time and energy but delivers insights that previously required the help of an analyst or data scientist. Reflecting on the project, Kellie shared, “It’s been fun building on the bleeding edge of analytics. I’m so excited to see what Q will do in 2023!”

To learn more, watch What’s New for Readers with Amazon QuickSight Q and What’s New for Authors with Amazon QuickSight Q.

About the authors

Kellie Burton is a Sr. Solutions Architect for Amazon QuickSight with over 25 years of experience in business analytics helping customers across a variety of industries. Kellie has a passion for helping customers harness the power of their data to uncover insights to make decisions.

Fulfillment by Amazon uses Amazon QuickSight to deliver key reporting insights to Amazon Marketplace sellers

2022-11-17 Ravi Kiran Nidadavolu

Post Syndicated from Ravi Kiran Nidadavolu original https://aws.amazon.com/blogs/big-data/fulfillment-by-amazon-uses-amazon-quicksight-to-deliver-key-reporting-insights-to-amazon-marketplace-sellers/

Fulfillment by Amazon (FBA) was launched in 2006, allowing businesses to outsource shipping to Amazon. With this fulfillment option, Amazon stores, picks, packs, ships, and delivers the products to customer as well as handling the customer service and returns for those orders.

Within Seller Central, a website where sellers can monitor their Amazon sales activity, the FBA Reporting team manages Report Central and the associated reporting APIs accessed by FBA sellers. Report Central offers 60+ downloadable reports to help sellers make data-driven decisions across inventory, sales, payments, customer concessions, removals and global shipping services.

The FBA Reporting team receives millions of report download requests from hundreds of thousands of users, in a multi-tenant fashion, every month. Tasked with ensuring speed, efficiency, and user-friendliness in the reports that sellers need to track their business operations, we turned to Amazon QuickSight.

In this post, we discuss what influenced the decision to implement QuickSight Embedded, as well as some of the benefits to FBA sellers.

Customer obsession drives innovation

Prior to implementing QuickSight Embedded, sellers had to create visuals within the limitations of their preferred spreadsheet program, which meant additional work and manual step-by-step processes to get the data out of Report Central and into whichever tools would be used. In some cases, it also meant additional cost to FBA sellers, depending on the analytics tools they were using.

To address this pain point for FBA sellers, we began researching, comparing, and contrasting our options. Initially, we considered building our own analytics tool, but decided against it due to concerns with the effort needed for long-term maintenance, upgrades, and the time investment necessary before we’d be able to bring innovative insights to customers. Once the decision was made to move forward with implementing an existing analytics solution, we began researching QuickSight, a fast, easy-to-use, cloud-powered business analytics service that makes it easy for all users to build visualizations, perform ad hoc analysis, and quickly get business insights from their data, any time, on any device.

The following screenshot shows examples of visuals created by Sales Trend data.

QuickSight enabled the us to provide data access to sellers via a QuickSight dashboard embedded directly into Seller Central.

The FBA Sales Analytics dashboard provides sellers with a summarized view of how their sales are trending over a specified time period. Sellers can filter the content on the page by SKU (or multiple SKUs), Marketplace, and time period. They can also define the granularity of the data: daily, weekly, or monthly. The analytics page includes 1 year of historical data. There are three major components:

Sales Fulfilled Chart – A chart showing the trend of fulfilled sales over the specified time period, with year-over-year comparisons. The granularity of the data is determined by the filter selection (daily, weekly, monthly). Sellers can hover over any data point on the chart to see a pop-up with details. They can also download the data to a CSV file.
Summary Table – This table displays a comparison of Sales Fulfilled ($) and Units Fulfilled reports, based on the selected filters. It also displays the year-over-year variance.
Top Selling SKUs – This shows details about the top-selling SKUs (based on total sales) and the year-over-year variance per SKU, and also includes a link to drill down per SKU.

With QuickSight, we are able to provide sellers with useful visualizations of sales-related data, allowing them to quickly and easily identify trends and track the impact of their actions, without needing to spend extra time and resources creating data visualizations via more manual processes.

Partnering with QuickSight has also enabled us to offer more detail and flexibility to sellers, allowing them to change the period of time for the data they’d like to review, filter based on useful controls, and drill down into whichever areas are most helpful for them.

Analytics at the speed of business

The primary motivation for our choosing QuickSight was that it not only provided a faster, more efficient, and less expensive experience for FBA sellers, but it also did most of the heavy lifting, freeing our team up to focus more of our time on product and dashboard design. One of the benefits of QuickSight being an Amazon product is that accessing data stored in Amazon Redshift, Amazon Relational Database Service (Amazon RDS), Amazon Athena, and other AWS services is a seamless, automatic experience.

The visual editor and toolkit provided by QuickSight allows us to make updates to established dashboards, and even create entirely new dashboards, in a matter of hours. Agility fosters experimentation with changes to visual elements by quickly copying a dashboard, tweaking it, soliciting feedback from stakeholders, and implementing it. The effort reduction from embedding QuickSight’s analytical capabilities allows our team to launch updates and features much faster.

Improving the status quo with QuickSight

By embedding QuickSight into Seller Central, we have been able to offer hundreds of thousands of FBA sellers year-over-year comparison information. QuickSight has allowed our team to iterate and launch updates more quickly, without having to worry about scaling or maintaining infrastructure resources.

Currently underway is an initiative to launch more dashboards to provide insights into inventory trends.

To learn more about how you can embed customized data visuals, interactive dashboards and natural language querying into any application, visit Amazon QuickSight Embedded.

About the Author

Ravi Kiran Nidadavolu is a Software Development Manager with Fulfillment by Amazon.

California State University Chancellor’s Office reduces cost and improves efficiency using Amazon QuickSight for streamlined HR reporting in higher education

2022-11-02 Madi Hsieh

Post Syndicated from Madi Hsieh original https://aws.amazon.com/blogs/big-data/california-state-university-chancellors-office-reduces-cost-and-improves-efficiency-using-amazon-quicksight-for-streamlined-hr-reporting-in-higher-education/

The California State University Chancellor’s Office (CSUCO) sits at the center of America’s most significant and diverse 4-year universities. The California State University (CSU) serves approximately 477,000 students and employs more than 55,000 staff and faculty members across 23 universities and 7 off-campus centers. The CSU provides students with opportunities to develop intellectually and personally, and to contribute back to the communities throughout California. For this large organization, managing a wide system of campuses while maintaining the decentralized autonomy of each is crucial. In 2019, they needed a highly secure tool to streamline the process of pulling HR data. The CSU had been using a legacy central data warehouse based on data from their financial system, but it lacked the robustness to keep up with modern technology. This wasn’t going to work for their HR reporting needs.

Looking for a tool to match the cloud-based infrastructure of their other operations, the Business Intelligence and Data Operations (BI/DO) team within the Chancellor’s Office chose Amazon QuickSight, a fast, easy-to-use, cloud-powered business analytics service that makes it easy for all employees within an organization to build visualizations, perform ad hoc analysis, and quickly get business insights from their data, any time, on any device. The team uses QuickSight to organize HR information across the CSU, implementing a centralized security system.

“It’s easy to use, very straightforward, and relatively intuitive. When you couple the experience of using QuickSight, with a huge cost difference to [the BI platform we had been using], to me, it’s a simple choice,”

– Andy Sydnor, Director Business Intelligence and Data Operations at the CSUCO.

With QuickSight, the team has the capability to harness security measures and deliver data insights efficiently across their campuses.

In this post, we share how the CSUCO uses QuickSight to reduce cost and improve efficiency in their HR reporting.

Delivering BI insights across the CSU’s 23 universities

The CSUCO serves the university system’s faculty, students, and staff by overseeing operations in several areas, including finance, HR, student information, and space and facilities. Since migrating to QuickSight in 2019, the team has built dashboards to support these operations. Dashboards include COVID-related leaves of absence, historical financial reports, and employee training data, along with a large selection of dashboards to track employee data at an individual campus level or from a system-wide perspective.

The team created a process for reading security roles from the ERP system and then translating them using QuickSight groups for internal HR reporting. QuickSight allowed them to match security measures with the benefits of low maintenance and familiarity to their end-users.

With QuickSight, the CSUCO is able to run a decentralized security process where campus security teams can provision access directly and users can get to their data faster. Before transitioning to QuickSight, the BI/DO team spent hours trying to get to specific individual-level data, but with QuickSight, the retrieval time was shortened to just minutes. For the first time, Sydnor and his team were able to pinpoint a specific employee’s work history without having to take additional actions to find the exact data they needed.

Cost savings compared to other BI tools

Sydnor shares that, for a public organization, one of the most attractive qualities of QuickSight is the immense cost savings. The BI/DO team at the Chancellor’s Office estimates that they’re saving roughly 40% on costs since switching from their previous BI platform, which is a huge benefit for a public organization of this scale. Their previous BI tool was costing them extensive amounts of money on licensing for features they didn’t require; the CSUCO felt they weren’t getting the best use of their investment.

The functionality of QuickSight to meet their reporting needs at an affordable price point is what makes QuickSight the CSUCO’s preferred BI reporting tool. Sydnor likes that with QuickSight, “we don’t have to go out and buy a subscription or a license for somebody, we can just provision access. It’s much easier to distribute the product.” QuickSight allows the CSUCO to focus their budget in other areas rather than having to pay for charges by infrequent users.

Simple and intuitive interface

Getting started in QuickSight was a no-brainer for Sydnor and his team. As a public organization, the procurement process can be cumbersome, thereby slowing down valuable time for putting their data to action. As an existing AWS customer, the CSUCO could seamlessly integrate QuickSight into their package of AWS services. An issue they were running into with other BI tools was encountering roadblocks to setting up the system, which wasn’t an issue with QuickSight, because it’s a fully managed service that doesn’t require deploying any servers.

The following screenshot shows an example of the CSUCO security audit dashboard.

Sydnor tells us, “Our previous BI tool had a huge library of visualization, but we don’t need 95% of those. Our presentations look great with the breadth of visuals QuickSight provides. Most people just want the data and ultimately, need a robust vehicle to get data out of a database and onto a table or visualization.”

Converting from their original BI tool to QuickSight was painless for his team. Sydnor tells us that he has “yet to see something we can’t do with QuickSight.” One of Sydnor’s employees who was a user of the previous tool learned QuickSight in just 30 minutes. Now, they conduct QuickSight demos all the time.

Looking to the future: Expanding BI integration and adopting Amazon QuickSight Q

With QuickSight, the Chancellor’s Office aims to roll out more HR dashboards across its campuses and extend the tool for faculty use in the classroom. In the upcoming year, two campuses are joining CSUCO in building their own HR reporting dashboards through QuickSight. The organization is also making plans to use QuickSight to report on student data and implement external-facing dashboards. Some of the data points they’re excited to explore are insights into at-risk students and classroom scheduling on campus.

Thinking ahead, CSUCO is considering Amazon QuickSight Q, a machine learning-powered natural language capability that gives anyone in an organization the ability to ask business questions in natural language and receive accurate answers with relevant visualizations. Sydnor says, “How cool would that be if professors could go in and ask simple, straightforward questions like, ‘How many of my department’s students are taking full course loads this semester?’ It has a lot of potential.”

Summary

The CSUCO is excited to be a champion of QuickSight in the CSU, and are looking for ways to increase its implementation across their organization in the future.

To learn more, visit the website for the California State University Chancellor’s Office. For more on QuickSight, visit the Amazon QuickSight product page, or browse other Big Data Blog posts featuring QuickSight.

About the authors

Madi Hsieh, AWS 2022 Summer Intern, UCLA.

Tina Kelleher, Program Manager at AWS.

Measure the adoption of your Amazon QuickSight dashboards and view your BI portfolio in a single pane of glass

2022-10-28 Maitri Brahmbhatt

Post Syndicated from Maitri Brahmbhatt original https://aws.amazon.com/blogs/big-data/measure-the-adoption-of-your-amazon-quicksight-dashboards-and-view-your-bi-portfolio-in-a-single-pane-of-glass/

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service. If you plan to deploy enterprise-grade QuickSight dashboards, measuring user adoption and usage patterns is an important ingredient for the success of your BI investment. For example, knowing the usage patterns like geo location, department, and job role can help you fine-tune your dashboards to the right audience. Furthermore, to return the investment of your BI portfolio, with dashboard usage, you can reduce license costs by identifying inactive QuickSight authors.

In this post, we introduce the latest Admin Console, an AWS packaged solution that you can easily deploy and use to create a usage and inventory dashboard for your QuickSight assets. The Admin Console helps identify usage patterns of an individual user and dashboards. It can also help you track which dashboards and groups you have or need access to, and what you can do with that access, by providing more details on QuickSight group and user permissions and activities and QuickSight asset (dashboards, analyses, and datasets) permissions. With timely access to interactive usage metrics, the Admin Console can help BI leaders and administrators make a cost-efficient plan for dashboard improvements. Another common use case of this dashboard is to provide a centralized repository of the QuickSight assets. QuickSight artifacts consists of multiple types of assets (dashboards, analyses, datasets, and more) with dependencies between them. Having a single repository to view all assets and their dependencies can be an important element in your enterprise data dictionary.

This post demonstrates how to build the Admin Console using a serverless data pipeline. With basic AWS knowledge, you can create this solution in your own environment within an hour. Alternatively, you can dive deep into the source code to meet your specific needs.

Admin Console dashboard

The following animation displays the contents of our demo dashboard.

The Admin Console dashboard includes six sheets:

Landing Page – Provides drill-down into each detailed tabs.
User Analysis – Provides detailed analysis of the user behavior and identifies active and inactive users and authors.
Dashboard Analysis – Shows the most commonly viewed dashboards.
Assets Access Permissions – Provides information on permissions applied to each asset, such as dashboard, analysis, datasets, data source, and themes.
Data Dictionary – Provides information on the relationships between each of your assets, such as which analysis was used to build each dashboard, and which datasets and data sources are being used in each analysis. It also provides details on each dataset, including schema name, table name, columns, and more.
Overview – Provides instructions on how to use the dashboard.

You can interactively play with the sample dashboard in the following Interactive Dashboard Demo.

Let’s look at Forwood Safety, an innovative, values-driven company with a laser focus on fatality prevention. An early adopter of QuickSight, they collaborated with AWS to deploy this solution to collect BI application usage insights.

“Our engineers love this admin console solution,” says Faye Crompton, Leader of Analytics and Benchmarking at Forwood. “It helps us to understand how users analyze critical control learnings by helping us to quickly identify the most frequently visited dashboards in Forwood’s self-service analytics and reporting tool, FAST.”

Solution overview

The following diagram illustrates the workflow of the solution.

The workflow involves the following steps:

The AWS Lambda function Data_Prepare is scheduled to run hourly. This function calls QuickSight APIs to get the QuickSight namespace, group, user, and asset access permissions information.
The Lambda function Dataset_Info is scheduled to run hourly. This function calls QuickSight APIs to get dashboard, analysis, dataset, and data source information.
Both the functions save the results to an Amazon Simple Storage Service (Amazon S3) bucket.
AWS CloudTrail logs are stored in an S3 bucket.
Based on the file in Amazon S3 that contains user-group information, dataset information, QuickSight assets access permissions information, as well as dashboard views and user login events from the CloudTrail logs, five Amazon Athena tables are created. Optionally, the BI engineer can combine these tables with employee information tables to display human resource information of the users.
Four QuickSight datasets fetch the data from the Athena tables created in Step 5 and import them into SPICE. Then, based on these datasets, a QuickSight dashboard is created.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
Access to the following AWS services:
- AWS Athena
- AWS CloudFormation
- AWS Lambda
- AWS QuickSight
- Amazon Simple Storage Service (Amazon S3)
Basic knowledge of Python
Basic knowledge of SQL

Create solution resources

We can create all the resources needed for this dashboard using three CloudFormation templates: one for Lambda functions, one for Athena tables, and one for QuickSight objects.

CloudFormation template for Lambda functions

This template creates the Lambda functions data_prepare and dataset_info.

Choose Launch Stack and follow the steps to create these resources.

After the stack creation is successful, you have two Lambda functions, data_prepare and dataset_info, and one S3 bucket named admin-console[AWS-account-ID]. You can verify if the Lambda function can run successfully and if the group_membership, object_access, datasets_info, and data_dictionary folders are created in the S3 bucket under admin-console[AWS-account-ID]/monitoring/quicksight/, as shown in the following screenshots.

The Data_Prepare Lambda function is scheduled to run hourly with the CloudWatch Events rule admin-console-every-hour. This function calls the QuickSight Assets APIs to get QuickSight users, assets, and the access permissions information. Finally, this function creates two files, group_membership.csv and object_access.csv, and saves these files to an S3 bucket.

The Dataset_Info Lambda function is scheduled to run hourly and calls the QuickSight Assets APIs to get datasets, schemas, tables, and fields (columns) information. Then this function creates two files, datasets_info.csv and data_dictionary.csv, and saves these files to an S3 bucket.

Create a CloudTrail log if you don’t already have one and note down the S3 bucket name of the log files for future use.
Note down all the resources created from the previous steps. If the S3 bucket name for the CloudTrail log from step 2 is different from the one in step 1’s output, use the S3 bucket from step 2.

The following table summarizes the keys and values you use when creating the Athena tables with the next CloudFormation stack.

Key	Value	Description
cloudtraillog	`s3://cloudtrail-awslogs-[aws-account-id]-do-not-delete/AWSLogs/[aws-account-id]/CloudTrail`	The Amazon S3 location of the CloudTrail log
cloudtraillogtablename	`cloudtrail_logs`	The table name of CloudTrail log
groupmembership	`s3://admin-console[aws-account-id]/monitoring/quicksight/group_membership`	The Amazon S3 location of `group_membership.csv`
objectaccess	`s3://admin-console[aws-account-id]/monitoring/quicksight/object_access`	The Amazon S3 location of `object_access.csv`
dataset info	`s3://admin-console[aws-account-id]/monitoring/quicksight/datsets_info`	The Amazon S3 location of `datsets_info.csv`
datadict	`s3://admin-console[aws-account-id]/monitoring/quicksight/data_dictionary`	The Amazon S3 location of `data_dictionary.csv`

CloudFormation template for Athena tables

To create your Athena tables, complete the following steps:

Download the following JSON file.
Edit the file and replace the corresponding fields with the keys and values you noted in the previous section.

For example, search for the groupmembership keyword.

Then replace the location value with the Amazon S3 location for the groupmembership folder.

Create Athena tables by deploying this edited file as a CloudFormation template. For instructions, refer to Get started.

After a successful deployment, you have a database called admin-console created in AwsDataCatalog in Athena and three tables in the database: cloudtrail_logs, group_membership, object_access, datasets_info and data_dict

Confirm the tables via the Athena console.

The following screenshot shows sample data of the group_membership table.

The following screenshot shows sample data of the object_access table.

For instructions on building an Athena table with CloudTrail events, see Amazon QuickSight Now Supports Audit Logging with AWS CloudTrail. For this post, we create the table cloudtrail_logs in the default database.

After all five tables are created in Athena, go to the security permissions on the QuickSight console to enable bucket access for s3://admin-console[AWS-account-ID] and s3://cloudtrail-awslogs-[aws-account-id]-do-not-delete.
Enable Athena access under Security & Permissions.

Now QuickSight can access all five tables through Athena.

CloudFormation template for QuickSight objects

To create the QuickSight objects, complete the following steps:

Get the QuickSight admin user’s ARN by running following command in the AWS Command Line Interface (AWS CLI):
```
aws quicksight describe-user --aws-account-id [aws-account-id] --namespace default --user-name [admin-user-name]
```
For example: arn:aws:quicksight:us-east-1:12345678910:user/default/admin/xyz.

Choose Launch Stack to create the QuickSight datasets and dashboard:

Provide the ARN you noted earlier.

After a successful deployment, four datasets named Admin-Console-Group-Membership, Admin-Console-dataset-info, Admin-Console-Object-Access, and Admin-Console-CFN-Main are created and you have the dashboard named admin-console-dashboard. If modifying the dashboard is preferred, use the dashboard save-as option, then recreate the analysis, make modifications, and publish a new dashboard.

Set your preferred SPICE refresh schedule for the four SPICE datasets, and share the dashboard in your organization as needed.

Dashboard demo

The following screenshot shows the Admin Console Landing page.

The following screenshot shows the User Analysis sheet.

The following screenshot shows the Dashboards Analysis sheet.

The following screenshot shows the Access Permissions sheet.

The following screenshot shows the Data Dictionary sheet.

The following screenshot shows the Overview sheet.

You can interactively play with the sample dashboard in the following Interactive Dashboard Demo.

You can reference the public template of the preceding dashboard in create-template, create-analysis, and create-dashboard API calls to create this dashboard and analysis in your account. The public template of this dashboard with the template ARN is 'TemplateArn': 'arn:aws:quicksight:us-east-1:889399602426:template/admin-console'.

Tips and tricks

Here are some advanced tips and tricks to build the dashboard as the Admin Console to analyze usage metrics. The following steps are based on the dataset admin_console. You can apply the same logic to create the calculated fields to analyze user login activities.

Create parameters – For example, we can create a parameter called InActivityMonths, as in the following screenshot. Similarly, we can create other parameters such as InActivityDays, Start Date, and End Date.

Create controls based on the parameters – In the following screenshot, we create controls based on the start and end date.

Create calculated fields – For instance, we can create a calculated field to detect the active or inactive status of QuickSight authors. If the time span between the latest view dashboard activity and now is larger or equal to the number defined in the Inactivity Months control, the author status is Inactive. The following screenshot shows the relevant code. According to the end-user’s requirements, we can define several calculated fields to perform the analysis.

Create visuals – For example, we create an insight to display the top three dashboard views by reader and a visual to display the authors of these dashboards.

Add URL actions – You can add an URL action to define some extra features to email inactive authors or check details of users.

The following sample code defines the action to email inactive authors:

mailto:<<email>>?subject=Alert to inactive author! &body=Hi, <<username>>, any author without activity for more than a month will be deleted. Please log in to your QuickSight account to continue accessing and building analyses and dashboards!

Clean up

To avoid incurring future charges, delete all the resources you created with the CloudFormation templates.

Conclusion

This post discussed how BI administrators can use QuickSight, CloudTrail, and other AWS services to create a centralized view to analyze QuickSight usage metrics. We also presented a serverless data pipeline to support the Admin Console dashboard.

If you would like to have a demo, please email us.

Appendix

We can perform some additional sophisticated analysis to collect advanced usage metrics. For example, Forwood Safety raised a unique request to analyze the readers who log in but don’t view any dashboard actions (see the following code). This helps their clients identify and prevent any wasting of reader sessions fees. Leadership teams value the ability to minimize uneconomical user activity.

CREATE OR REPLACE VIEW "loginwithoutviewdashboard" AS
with login as
(SELECT COALESCE("useridentity"."username", "split_part"("useridentity"."arn", '/', 3)) AS "user_name", awsregion,
date_parse(eventtime, '%Y-%m-%dT%H:%i:%sZ') AS event_time
FROM cloudtrail_logs
WHERE
eventname = 'AssumeRoleWithSAML'
GROUP BY  1,2,3),
dashboard as
(SELECT COALESCE("useridentity"."username", "split_part"("useridentity"."arn", '/', 3)) AS "user_name", awsregion,
date_parse(eventtime, '%Y-%m-%dT%H:%i:%sZ') AS event_time
FROM cloudtrail_logs
WHERE
eventsource = 'quicksight.amazonaws.com'
AND
eventname = 'GetDashboard'
GROUP BY  1,2,3),
users as 
(select Namespace,
Group,
User,
(case
when Group in (‘quicksight-fed-bi-developer’, ‘quicksight-fed-bi-admin’)
then ‘Author’
else ‘Reader’
end)
as author_status
from "group_membership" )
select l.* 
from login as l 
join dashboard as d 
join users as u 
on l.user_name=d.user_name 
and 
l.awsregion=d.awsregion 
and 
l.user_name=u.user_name
where d.event_time>(l.event_time + interval '30' minute ) 
and 
d.event_time<l.event_time 
and 
u.author_status='Reader'

About the Authors

Ying Wang is a Manager of Software Development Engineer. She has 12 years of expertise in data analytics and science. She assisted customers with enterprise data architecture solutions to scale their data analytics in the cloud during her time as a data architect. Currently, she helps customer to unlock the power of Data with QuickSight from engineering by delivering new features.

Ian Liao is a Senior Data Visualization Architect at AWS Professional Services. Before AWS, Ian spent years building startups in data and analytics. Now he enjoys helping customer to scale their data application on the cloud.

Maitri Brahmbhatt is a Business Intelligence Engineer at AWS. She helps customers and partners leverage their data to gain insights into their business and make data driven decisions by developing QuickSight dashboards.

Solution overview

Additional context to enhance SQL generation

Generated statement response

Prerequisites

Enable Amazon Q generative SQL

Enable Amazon Q in your SageMaker domain

Enable Amazon Q in Amazon Redshift

Cross-Region inference

Monitoring

Conclusion

About the authors

Follow along

Understand your users

Start small

Data can contain overlapping words

Avoid including aggregation names in your fields and values

Start with a single use case

Add lexical context

Add synonyms

Check semantic types

Set default aggregations

Review the primary date field

Add named filters

Add named entities

Drive NLQ adoption

Conclusion

About the Author

Personalized data for sales managers

Why questions

Forecast questions

About the authors

Customer obsession drives innovation

Analytics at the speed of business

Improving the status quo with QuickSight

About the Author

Delivering BI insights across the CSU’s 23 universities

Cost savings compared to other BI tools

Simple and intuitive interface

Looking to the future: Expanding BI integration and adopting Amazon QuickSight Q

Summary

About the authors

Admin Console dashboard

Solution overview

Prerequisites

Create solution resources

CloudFormation template for Lambda functions

CloudFormation template for Athena tables

CloudFormation template for QuickSight objects

Dashboard demo

Tips and tricks

Clean up

Conclusion

Appendix

About the Authors

The collective thoughts of the interwebz