Tag Archives: Amazon QuickSight

Create advanced insights using level-aware calculations in Amazon QuickSight

Post Syndicated from Karthik Tharmarajan original https://aws.amazon.com/blogs/big-data/create-advanced-insights-using-level-aware-calculations-in-amazon-quicksight/

Calculation at the right granularity always needs to be handled carefully when performing data analytics. Especially when data is generated through joining across multiple tables, the denormalization of datasets can add a lot of complications to make accurate calculations challenging. Amazon QuickSight recently launched a new functionality called level-aware calculations (LAC), which enables you to specify any level of granularity at which you want the aggregate functions (at what level to group by) or window functions (in what window to partition by) to be conducted. This brings flexibility and simplification for you to build advanced calculations and powerful analyses. Without LAC, you have to prepare pre-aggregated tables in your original data source, or run queries in the data prep phase to enable those calculations.

In this post, we demonstrate how to create advanced insights using LAC in QuickSight. Before we start walking through the functions, let’s first introduce the important concept of order of evaluation in QuickSight and then talk about LAC in more detail.

QuickSight order of evaluation

Every time you open or update an analysis, QuickSight evaluates everything that is configured in the analysis in a specific sequence, and translates the configuration into a query that a database engine can run. This is the order of evaluation. The level of complication depends on how many elements are embedded in a visual, multiplied by the number of visuals plus the interactives between the visuals. But if we abstract the concept, the order of evaluation logic can be demonstrated by the following chart.

In the first DEFAULT route, QuickSight evaluates simple calculations at row level for the raw dataset, then it applies the WHERE filters. After that, based on what dimensions are added to the visual, QuickSight evaluates the aggregation for the selected measures at visual dimensions and applies HAVING filters. Table calculations such as running total or percent of total then get evaluated after the visual is formed, with the subtotal and total calculated last.

Sometimes, you may want the analytical steps to be conducted with a different sequence. For example, you may want to do an aggregation before the data being filtered, or do an aggregation first for some specific dimensions and then aggregate again for the visual dimensions. Based on those different needs, QuickSight offers three variations of order of evaluation (as shown in the preceding chart). Specifically, you can use the key words of PRE_FILTER to add a calculation step before the WHERE filter, use PRE_AGG to add a calculation step before the visual aggregation, or use a whole suite of level-aware calculation-aggregation functions to define an aggregation at independent dimensions, and then aggregate them at the visual dimension (a nested aggregation).

Most of the time, your visuals will include more than one calculated field. You’ll want to be careful to define each of them and understand how they’re interacting with the visuals and with different filters. Applying a filter before or after a window function can generate totally different results and business meanings.

With all the background introduced, now let’s talk about the new LAC functions and their capabilities, and demonstrate several typical use cases.

Level-aware calculations (LAC)

There are two groups of LAC functions:

  • Level-aware calculations-aggregate functions (LAC-A) – These are our newly launched functions. By adding one argument into an existing aggregate function (for example, sum(), max(), or count()), you can define any group-by dimensions you desire for the aggregation. A typical syntax for LAC-A is sum(measure,[group_field_A]). With LAC-A, you can add an aggregation step before the visual aggregation. The added layer can be fixed, which is independent of the visual dimension. It also can be dynamically interacting with the visual dimensions. We give some detailed examples later in this post. For a list of supported aggregation functions, refer to Level-aware calculation – aggregate (LAC-A) functions.
  • Level-aware calculation-window functions (LAC-W) – These are the existing functions. They used to be called level-aware aggregations (LAA). We recently changed its name to better reflect the function nature, due to underlying difference between window functions and aggregate functions. LAC-W is a group of window functions (such as sumover(), maxover(), and denseRank()) where using a third parameter, you can choose to run the calculation at the PRE_FILTER or PRE-AGG stage. A typical syntax for LAC-W is sumOver(measure,[partition_field_A],pre_agg). For a list of supported window functions, refer to Level-aware calculation – window (LAC-W) functions.

The following high-level diagram shows different branches of LAC. In this post, we mostly focus on the new LAC-A functions.

With the new LAC-A functions, you can run two layers of aggregation calculations. This offers the following benefits:

  • Run aggregation calculations that are independent of the group-by fields in the visual calculation
  • Run aggregation calculations for the dimensions that are NOT in the visual
  • Remove duplication of raw data before running calculations
  • Run aggregation calculations with nested group-by fields dynamically adapting to visual group-by fields

Let’s explore how we can achieve those benefits by demonstrating a few use cases.

Use case #1: Identify orders where actual ordered quantity for a product is higher than the average quantity

In this case, our visual is at the order level; however, we want to compute the average quantity of a product and use that to display the difference at each individual row/order level. With the LAC-A function, we can easily create an aggregation that is independent of the level in the visual.

We first compute the average quantity sold at product level using the expression of avg(Quantity,[Product]). To do so, change the visual-level aggregation to Average. In this case, visual-level aggregation doesn’t matter because we have product as a column and the LAC-A are at the same level. In the result table, the average quantity value for product is repeated across all orders because this is computed at the product level.

Now that we have computed the average quantity at the product level, we can extend this to compute the difference between actual quantity ordered and the average quantity of product using the expression sum(Quantity) - avg(avg(Quantity,[Product])). This computed difference can then be used to conditionally format the view to highlight orders that have quantity higher than the average quantity of a product.

As seen in this example, although the visual was at the order level, we easily created an aggregation like average quantity of product, which is independent of the level in the visual, and used that to display the difference at each individual row/order level.

Use case #2: Identify the average of total country sales by region

Here we want to aggregate the sales for each country and then compute the average of country-level sales at region level using the same dataset.

With the LAC-A function, we can easily create an aggregation at a dimension level that is NOT in the visual. In this example, although country is not included in the visual, the LAC-A function first aggregates the sales at the country level and then the visual-level calculation generates the average number for each region. Within QuickSight, we can implement this in two ways.

Option 1: Nest the LAC-A with visual-level aggregate functions

Create a calculated column to compute sales at country level by using the expression of sum(Sales,[Country] ), then add the calculation to the visual and change the aggregation to Average, as shown in the following screenshot. Note that if we don’t use LAC-A to specify the level, the average sales are calculated at the lowest granular level (the base level of the dataset) for each region. That’s why the numbers are significantly smaller for the sales column.

Option 2: Use LAC-A combined with other aggregate functions and nest them in the calculated column

Create a calculated column to compute sales at country level by using the expression sum(Sales,[Country]) and then nest that with additional aggregation, in this case Average, by using the expression avg(sum(Sales,[Country])).

Use case #3: Calculate total and average for a denormalized dataset with duplications

LAC-A calculations are designed to effectively handle duplicates in data while performing computation. It allows you to perform computations like average without the need for explicit handling of duplicates in data.

Consider a dataset that has employee and project details along with each employee’s salary. An employee can be associated with multiple projects, as shown in the following example.

Now let’s calculate total employee salary, average employee salary, and minimum and maximum salary using this sample dataset.

To compute total and average, we have to consider each employee’s salary just once even though an employee can be part of multiple projects and the salary for the employee can get duplicated across these projects. We can easily achieve this using LAC-A to compute the maximum of the salary at the employee level and then use that to compute the total and average.

Create a calculated column called Total Salary using the expression sum(max(Salary,[{Employee Name}])) and create another calculated column called Average Salary using avg(max(Salary,[{Employee Name}])). We can easily calculate Min Salary using the expression min(Salary) and Max Salary using max(Salary).

If we try to solve this without using LAC-A, we have to explicitly handle salary duplication in our calculation, and we have to go through multiple steps to get to the final result. Refer to the QuickSight Community blog for a similar use case.

Dynamic group keys for LAC-A

In addition to defining a static level of aggregation as seen in the preceding examples, you can also dynamically add or remove dimensions from the visual level group-by fields. The following are example syntax for a dynamic level:

  • Add dimensions to the visual dimensions with sum(cost, [$VisualDimensions, LevelA, LevelB])
  • Remove dimensions from the visual dimensions with sum(cost, [$VisualDimensions, !LevelC, !LevelD])

This capability brings a lot of flexibility and scalability for you to make the LAC even more powerful. Firstly, you can define one LAC calculated field and reuse it across multiple visuals with similar business intents. Additionally, if you’re building the visuals and keep adding or deleting the visual fields, you don’t need to edit the LAC-A calculated field each time, and the LAC-A will automatically adjust to the visual dimensions and give the right output.

Use case #4: Identify average sales of customers within each region or country

To compute this, we can use the dynamic expression Sum(Sales, [$VisualDimensions, Customers]). This is because each customer has purchases in more than one region. We need to calculate the customer average sales only within each region. We can reuse the same expression in a different visual with country as the visual dimension if we want to calculate average sales of customers within each country.

Use Average as the visual-level metric with Group by as Region to get the region-level average. In this case, “$visualDimensions” is adapted to Region. So the expression is equivalent to Sum(Sales, [Region, Customers]) in this visual.

If you have a similar visual with the Country dimension, then the dynamic expression is equivalent to Sum(Sales, [Country, Customers]). Reusing the same expression in different visuals saves us a lot of time, especially when we want to build similar visuals with slightly different business context.

Use case #5: Identify the sales percentage of each subregion compared to region level

In this example, we want to calculate the percentage of sales for each subregion, comparing with the total region sales. You can use the fixed dimension as we mentioned before, but imagine a situation when you want to include more dimensions into the visual such as product, year, or supplier. Using the dynamic group key by removing only {subregion} from the visual dimensions makes the exploration process much easier and quicker.

First, create an expression to calculate the sum of sales without the subregion level using sum(Sales, [$visualDimensions, !subregion]), then calculate the percentage using sum(Sales) / sum(sum(Sales, [$visualDimensions, !subregion])).

Conclusion

In this post, we introduced new QuickSight LAC-A functions, which enable powerful and advanced aggregations with user-defined dimensions. We introduced the QuickSight order of evaluation, and walked through three use cases for LAC-A static keys and two use cases of LAC-A dynamic keys. Level-aware calculations are now generally available in all supported QuickSight Regions.

We look forward to your feedback and stories on how you apply these calculations for your business needs.


About the Authors

Karthik Tharmarajan is a Senior Specialist Solutions Architect for Amazon QuickSight. Karthik has over 15 years of experience implementing enterprise business intelligence (BI) solutions and specializes in integration of BI solutions with business applications and enable data-driven decisions.

Emily Zhu is a Senior Product Manager-Tech at Amazon QuickSight, AWS’s cloud-native, fully managed SaaS BI service. She leads the development of the QuickSight analytics and query experience. Before joining AWS, she worked in the Amazon Prime Air drone delivery program and the Boeing company as senior strategist for several years. Emily is passionate about the potential of cloud-based BI solutions and looks forward to helping customers advance in their data-driven strategy making.

Feng Han is a Software Development Manager in AWS QuickSight Query Platform team. He focuses on query generation platform and advanced function calculation, and is leading the team to next generation of calculation engine.

Create small multiples in Amazon QuickSight

Post Syndicated from Bhupinder Chadha original https://aws.amazon.com/blogs/big-data/create-small-multiples-in-amazon-quicksight/

We’re excited to announce the launch of small multiples in Amazon QuickSight at AWS re:Invent 2022! Small multiples is one of the most powerful data visualization features when it comes to comparative analysis. Previously, you had to either use a filter or create multiple visuals side by side to analyze multiples slices of the same data. With the launch of small multiples, authors can simply add a dimension to the Small multiples field well to create multiple variations of the same chart.

In this post, we explore this new feature and its benefits.

What are Small Multiples?

Small multiples (aka trellis, facet, lattice, grid, or panel charts) allows you to compare data across many values for a given dimension. It splits a base visual into multiple versions of itself, presented side by side, with its data partitioned across these versions by a dimension. It provides an easier way to get a holistic view of your business instead of looking at data in silos. For example, you can view how your business is doing in each state without needing to create over 50 different visuals. Instead, you can see one visual with multiple small charts inside of it, as shown in the following screenshot.

Common use cases for small multiples

The small multiples feature provides the following benefits to your use cases:

  • Comparative analysis with easy pattern recognition – You can use small multiples to compare different subsets (categories) of the same dataset. Compared to animation, small multiples presents all of the data at once, making it easier for viewers to naturally compare each chart with others, instead of trying to flip back and forth between views.
  • Clean and digestible data – Trying to display multiple variables in a single chart can lead to overplotting and obscure readers, which can be overcome using small multiples(refer below example). Also, once the reader understands the design of one chart, they can apply this knowledge as they scan the rest of the charts. This shifts the reader’s effort from understanding how the chart works to what the data says, which makes decoding the chart lot easier.
  • Storytelling – Small multiples is a great tool often used by newspapers to depict a story and allow customers to view changes in patterns (for example, consider the following Washington Post article).
  • Track or view multiple slices of data at once – You can monitor multiple resources at once without generating multiple visualizations or filtering to slice and dice the data.

Create your own small multiples in QuickSight

Currently, you can create small multiples on vertical bar, clustered bar, line, dual axis line, and pie charts. To get started, create one of the supported visuals and drag the field by which you want to partition the data to the Small multiples field well. This creates multiples of the chart split by the dimension assigned in a grid fashion.

For charts supporting axes like line and bar, the X-axis is made common across charts. This provides a cleaner look to easily spot trends and identify the pain points or contributing factors. The axis display settings and axis titles are controlled by the individual chart settings. The following screenshot shows an example.

Styling small multiples

You can customize the style of small multiples in the following ways:

  • Choose a layout – For ease of use, small multiples by default chooses the best possible panel layout depending on the cardinality of the field. You can also adjust the number of rows and columns up to a max of 10 rows per column and 64 panels. Any multiples that don’t fit in the layout are loaded as you scroll vertically. In the following example, we have selected two rows and three columns as our layout. This means you only see six panels at a time; the rest can be accessed via vertical scroll.
  • Limit panels – You can define the maximum number of panels to be displayed based on the sort order applied on the field in your Small multiples field well.
  • Format panel titles – Panel titles add context to the multiples being viewed. You can add rich text formatting like font styling and alignment options.
  • Customize panel styling – You can further customize the small multiples look and feel by playing around with the panel border styling options, adjusting the panel spacing, and adding panel background color.

Interacting with the data

  • Sorting– Depending on the chart type, you can sort by axis, value, and small multiples fields in ascending or descending order.
  • Filtering experience- You can also set filter actions in same way you set for other charts. Filtering on small multiples will pass both the small multiples and axis (in case of line or bar charts) or group (pie charts) context.

More to come

We’re excited about our launch, and we have lots more planned for small multiples. In upcoming releases, we will tackle the current limitations:

  • Drill-down is currently not supported for small multiples.
  • Reference lines have been disabled for now.
  • Show Other small multiples panel is not supported.
  • Small multiples isn’t supported when the primary category is sorted by a field that has the distinct count aggregation function.
  • Small multiples doesn’t support calculated values with table calculation.
  • Data zoom on the X axis for line and bar charts is disabled for now.
  • Panel title is not displayed in the tooltip by default. To see it, you have to manually customize the tooltip and add a Small multiples field in it.

Summary

In this post, we looked at the small multiples features, and learned about its various use cases and how to configure them in QuickSight. You can analyze data by multiple dimension attributes in an easy-to-understand and easy-to-maintain way. With small multiples, readers can quickly see which categories are most important, and spot trends and insights. Start using small multiples to enrich your dashboards’ current visualization and unlock new business use cases today!


About the Author

Bhupinder Chadha is a senior product manager for Amazon QuickSight focused on visualization and front end experiences. He is passionate about BI, data visualization and low-code/no-code experiences. Prior to QuickSight he was the lead product manager for Inforiver, responsible for building a enterprise BI product from ground up. Bhupinder started his career in presales, followed by a small gig in consulting and then PM for xViz, an add on visualization product.

Add text boxes to your Amazon QuickSight analysis

Post Syndicated from Bhupinder Chadha original https://aws.amazon.com/blogs/big-data/add-text-boxes-to-your-amazon-quicksight-analysis/

We are excited to announce the launch of text boxes in Amazon QuickSight. Now you can add text for common use cases, including but not limited to titles, subtitles, annotations, adding additional information for KPIs etc has been simplified than ever before with the new text box. You can reposition, resize, and make your text stand out with various rich text formatting options like font styling, text alignment, and text box styling. You can also add dynamic text values with the help of user-defined parameters and system-defined parameters like date and page number.

In this post, we explore the new text box feature and its benefits.

Solution overview

The new text box feature may feel like an overlap with narrative insight capabilities, which is focused on advance use cases where a text box is lightweight, focused on simplified textual content, and doesn’t need any data assignment. You can add text boxes without being restricted by the total visual limit because it’s not counted towards the sheet visual limit.

You can do the following with a text box in QuickSight:

  • Add freehand text and display dynamic text values with the help of parameters
  • Support reporting use cases with system-defined parameters like date and page count
  • Perform rich text editing with font styling options, alignment options, and bulleting
  • Add an image to your text box, such as displaying your company logo with the analysis title text
  • Hyperlink your text to navigate to a different sheet or external URLs
  • Style your text box with background color and border styling options

Add a text box to an analysis sheet

  1. To add a text box to your analysis, you can either choose Add on the drop-down menu or choose the text box icon from the visual selector. Note that unlike other visuals, you can’t convert a text box to another visual, or vice versa.
  2. You now have an empty text box on the sheet. To position the text box, select the box and drag it from the borders. To resize the text box, select and drag any of the outline handles.
  3. Now you can add text and format it by using the various formatting options, such as:
    • Font styling, including font style, color, and size
    • Typographical emphasis, such as bold, italics, underline, and strikethrough
    • Alignment options and bulleting

  4. You can hyperlink text or add a static or dynamic URL by choosing the URL icon.
  5. Just like adding parameters to titles and subtitles, you can display parameter values in a text box as well. Choose from a list of parameters available in the analysis from the parameter drop-down menu.
  6. Along with text, you can add images as well inside the text box by clicking the image icon.
  7. Apart from text formatting you can also format background and border color.

When you’ve finished editing the text box, click on the blank space on the sheet.

Add a text box to a reporting sheet

Although the steps to add a text box to a reporting sheet remain the same as that of an analysis sheet, you have two new system parameter options while adding the text box to the header or footer of your reporting sheet:

  • Date parameter – Add the print date and time to your report by choosing the date parameter (AWS:printTime) and selecting from a list of date formats. If your organization uses a different date format, you can customize it by choosing the added date and selecting from the list of supported date formats.
  • Page parameter – Display the current page number (AWS:pageNumber), total number of pages (AWS:pageCount), or a combination of both by choosing the page parameter.

Summary

In this post, we looked at how the new text box feature in QuickSight provides a great way to add text and format it with rich text capabilities and provide parameter support, URLs, and images. The new text box is now generally available in all supported QuickSight Regions.

Give the text box a try and let us know about your experience in the comments section.


About the author

Bhupinder Chadha is a senior product manager for Amazon QuickSight focused on visualization and front end experiences. He is passionate about BI, data visualization and low-code/no-code experiences. Prior to QuickSight he was the lead product manager for Inforiver, responsible for building a enterprise BI product from ground up. Bhupinder started his career in presales, followed by a small gig in consulting and then PM for xViz, an add on visualization product.

New line chart customization options in Amazon QuickSight

Post Syndicated from Bhupinder Chadha original https://aws.amazon.com/blogs/big-data/new-line-chart-customization-options-in-amazon-quicksight/

Amazon QuickSight is a serverless, cloud-based business intelligence (BI) service that brings data insights to your teams and end-users through machine learning (ML)-powered dashboards and data visualizations that can be accessed via QuickSight or embedded in apps and portals that your users access.

Line charts in QuickSight have undergone a major overhaul this year, starting with an increased limit on the number of data points that can be rendered in a single chart from 2,500 to 10,000 points, followed by missing data treatment, and finally ending this year with line and marker customization options.

With the new styling options, you can make line charts more readable and drive readers’ attention towards the line series you want to emphasize. Although colors help convey meaning, in certain cases choosing different line patterns could help differentiate between lines for people with color blindness, and making lines bold could make them stand out from the rest. Some enhancements like step line interpolation are more than a styling feature, where you can display step lines for data that changes at a specific point in time and at irregular intervals. For example, if you were to build a chart of postage stamp prices in the US, a step line would be preferred than a slope to indicate change in price. The line between the dates would remain flat until the next price change is observed.

In this post, we discuss the updates to the line chart customization options and their benefits.

Line chart format pane updates

The data series section under the format pane has been updated with a new flyout menu consisting of line styling, marker customization, and axis settings. Authors can choose to set a base style, which is a common style across all series, and also customize individual series by either choosing Select Series to Style or by directly selecting the series on the line chart on the sheet. Previously, the long list of series names made it difficult to track series where the axis setting was changed. This has been updated to only display individually styled series. Also, you can remove and reset to default styling if you wish to go back to the previous style.

The following figure compares the old and new experience.

Line styling

The data series tab, where users adjust axis placement (left and right Y axis), now also includes options to style individual series or all series in the line chart and change presentation options. For example, you can create a dot plot chart in QuickSight by hiding the line series and displaying only markers.

Let’s look at the recently added line styling options:

  1. Line interpolation – Allow authors to choose between linear, smooth, and stepped line interpolation options:
    1. Linear – The most commonly used line interpolation for trend analysis.
    2. Smooth – A variation of line charts, which connects the data points using smooth curves instead of straight lines. Apart from the aesthetic aspect, these charts are preferred for displaying a gradual change in trends and where data accuracy isn’t critical.
    3. Stepped – A specialized version of the traditional line charts, where the data points are connected via step-like progressions instead of straight lines. You can use stepped lines to view sudden and exact time of changes in the data, where the vertical lines represent the magnitude of the changes and the horizontal lines represent periods of time where data is constant. You can also use this approach to create bump charts, which are commonly used to visualize changes in rank over time.

  2. Line style – Another way to make your line charts more accessible and printer friendly is to define line styles to differentiate between series. A classic example would be visualizing actual and target values across a given time period. You can choose between solid, dotted, and dash line style.
  3. Color customization – Colors play an important role in conveying meaning and driving user attention. Although defining a distinct color for each series helps distinguish between the different series, doing the same for a large number of series could add visual clutter. As part of best practices in such cases, highlighting certain series makes it easier for readers to focus on the important information, especially while viewing a large number of line series.
  4. Line width – You can choose from predefined line widths to easily differentiate between line series and let important values stick out to drive reader focus. In the following example, we’re interested in showing the seasonality trends for gardening and home decor in relation to other parts of the business. To do that, instead of assigning each series a different color, we might only adjust color and increase thickness for gardening and home decor and keep other series as grey and thin. This way, reader attention is easily drawn towards the thicker, colored lines.

Marker customization

Data markers help draw attention to the data points along a line. They also come in handy when showing which data points are covered by your data in case your line chart contains missing data points. Data markers can be enabled for all series or individual series. When enabling data markers for all series, you should be judicious to avoid visual clutter and draw attention to too many elements.

These new customizations enable new use cases such as dot plot charts (hide line series and display only markers) and slope graphs. Data point markers will only show up for data points where data exists and not for interpolated values using the missing data control. The following are the marker properties:

  1. Shape – Choose from a predefined list of marker shapes like circle, diamond, or square.
  2. Size – Define a size for the marker. Note that the marker size should be greater than the line width to be visible.
  3. Color – By default, markers are assigned the same color as individual series. However, this can be changed as desired.

How to style lines and markers

Create or choose a line chart that you want to style and choose the pencil icon to open the Format visual pane. Navigate to the data series section and select the base style to apply common styling options to all series—color toned down to light grey and line interpolation as smooth.

  • Line styling – To style a specific series, select the series to style on the drop-down menu and set the desired line styling options. In our example, the high series line width has been increased to 4px and the color is set to purple.
  • Marker styling – Turn the marker styling toggle on to customize markers and choose marker styling options. Select your desired marker shape and size.
  • Axis placement – You can change the series axis from left to right and convert it into a dual axis chart.
  • Remove styling – If you want to reset to the default styling, choose Remove styling for an individual series and choose the reset icon for the base style.
  • Visual ingress – Apart from using the format pane for styling, authors can also style a series directly by selecting the series on the visual.

Summary

In this post, we looked at the new line and marker styling options for line charts and how you can apply a common style across all series as well as style each series individually. We also looked at the different ingress and Y axis placement options. Learn more on how to use small multiples in QuickSight To learn more, click here

We look forward to learning more about your interesting use cases and feedback on how we can better serve you! Follow the QuickSight Community to stay up to date with new feature announcement, events, learning center, and questions and answers.


About the author

Bhupinder Chadha is a senior product manager for Amazon QuickSight focused on visualization and front end experiences. He is passionate about BI, data visualization and low-code/no-code experiences. Prior to QuickSight he was the lead product manager for Inforiver, responsible for building a enterprise BI product from ground up. Bhupinder started his career in presales, followed by a small gig in consulting and then PM for xViz, an add on visualization product.

Amazon Alexa Audio migrates business intelligence to Amazon QuickSight for faster performance

Post Syndicated from Weijia Shou original https://aws.amazon.com/blogs/big-data/amazon-alexa-audio-migrates-business-intelligence-to-amazon-quicksight-for-faster-performance/

The Amazon Alexa Audio Data & Insights (AUDI) team strives to make business intelligence (BI) accessible for engineers across Amazon Alexa Audio. Amazon Alexa Audio helps you stay entertained with your favorite music, podcasts, books, and radio from providers such as Amazon Music, Spotify, and iHeart Radio. The AUDI team manages dashboards and reports to help Amazon Alexa Audio teams pull metrics, derive insights, and perform deep-dive analyses. To support the performance of business-critical activities, the team sought a fast, high-performing BI solution so that it could improve the user experience.

The AUDI team turned to AWS and migrated its dashboards to Amazon QuickSight, a fully managed, cloud-native BI service powered by machine learning. By moving its dashboards to QuickSight, the AUDI team reduced load times by a factor of 25, from 2–5 minutes to only 5–10 seconds. This has made it simpler than ever for Amazon Alexa Audio to quickly harness BI insights.

“Our users were blown away at how fast Amazon QuickSight loaded our dashboards,” says Kani Singaravel, BI manager at Amazon Alexa Audio. “It’s pretty sleek and very simple to use, which is why we love it.”

Image showing charts and graphs that illustrates how Amazon QuickSight presents data visualizations.

Delivering BI insights across Amazon Alexa

The AUDI team delivers nimble BI insights for the Amazon Alexa Audio organization, harnessing data to generate weekly, monthly, quarterly, and exception-based business reports. To support the needs of the users who rely on the BI service, the AUDI team needed a faster, higher-performing solution. Previously, it would take several minutes to load dashboards and view insights, and Amazon Alexa Audio teams required lower latency to perform their important work.

After recognizing these challenges, the AUDI team began to explore advanced BI options on AWS.

“We chose to migrate to AWS after we received a lot of complaints about latency,” says Weijia Shou, BI engineer at Amazon Alexa Audio. “This decision also aligned well with our usage of other AWS tools.”

In February 2022, with the goal of quickly accessing key insights, the team chose Amazon QuickSight as its preferred BI service and began to migrate its dashboards.

Reducing costs and latency on Amazon QuickSight

From research to launch, the AUDI team’s first dashboard migration took 2 months to complete. As of July 2022, it has migrated more than 10 dashboards to Amazon QuickSight.

“The first dashboard that we migrated was the Alexa Audio Summary Trend dashboard, which covers more than 80% of the use cases in our organization,” Shou says. “After that pilot migration, we saw a great improvement on latency from our dashboard.”

The team saw a more than 25-times improvement in latency, reducing dashboard loading times from 2–5 minutes to only 5–10 seconds.

To achieve this acceleration in performance, the AUDI team took advantage of the Super-fast, Parallel, In-Memory Calculation Engine (SPICE) in-memory data store in Amazon QuickSight. SPICE is a purpose-built query engine that offers consistently fast performance and automatically scales to meet high concurrency needs during peak periods, such as weekday mornings or end-of-month reporting cycles. When data is imported into SPICE, it’s converted into a highly optimized, proprietary format. Dashboard queries are then powered by SPICE, providing fast performance for dashboard users across Amazon Alexa Audio and shielding the underlying data source from the intensity of traffic generated by interactive dashboard queries. SPICE data can be refreshed on a schedule or by one-time API calls.

On Amazon QuickSight, the AUDI team has vastly accelerated its querying time for the BI service, which is used by over 300 employees. “The new dashboard is much faster, so we can quickly understand how we’re tracking against our goals and key engagement metrics,” says Blake N., Senior Product Manager at Amazon Alexa International. In addition to improving speed and performance, the AUDI team has also reduced its expenses. Because Amazon QuickSight has a pay-as-you-go pricing structure, it’s much more cost effective than other BI tools on the market.

Accelerating data-driven decision-making on AWS

By migrating BI dashboards to Amazon QuickSight, the AUDI team has made it simpler and faster for hundreds of Amazon Alexa Audio engineers to access the information that they need to make critical decisions in their roles. In the future, the team will continue the migration, and plans to build new dashboards on Amazon QuickSight as it launches innovative new features and services.

“Amazon QuickSight has been a critical migration for our business,” says LeAnn Lorbiecki, senior manager at Amazon Alexa Audio. “Previously, the latency made our dashboards unusable and challenged our ability to make data-driven decisions quickly on behalf of our customers, products, and overall business. The work on this migration was an incredible team effort with returns seen very quickly.”


About the authors

Weijia Shou is a Business Intelligence Engineer II at Amazon Alexa Audio Data & Insights team. She led the Alexa Audio migration of reporting dashboards to AWS QuickSight. Weijia is passionate about building scalable solutions and tools to accelerate data-driven decision making.

Kani Singaravel is a Business Intelligence Manager at Alexa Audio, where she enjoys working with the team to solve customer problems and challenges using data analytics.

Automate your Amazon QuickSight deployment with the new API-based account creation and deletion

Post Syndicated from Srikanth Baheti original https://aws.amazon.com/blogs/big-data/automate-your-amazon-quicksight-deployment-with-the-new-api-based-account-creation-and-deletion/

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data, create interactive dashboards, and share these with tens of thousands of users, either within the QuickSight interface, or embedded in software as a service (SaaS) applications or web portals.

We’re excited to announce the availability of QuickSight APIs that enable administrators and developers to programmatically create and delete accounts with QuickSight Enterprise and Enterprise + Q editions. This allows developers and administrators to automate the deployment and teardown of QuickSight accounts in their organization at scale.

Feature overview

To create and delete QuickSight accounts programmatically, you can use the following APIs:

You need to have the correct AWS Identity and Access Management (IAM) permissions to sign up for QuickSight. For more information, see IAM policy examples for Amazon QuickSight. If your IAM policy includes both the Subscribe and CreateAccountSubscription actions, make sure that both actions are set to Allow. If either action is set to Deny, the Deny action prevails and your API call fails.

Create a QuickSight account

Use the CreateAccountSubscription API to create a QuickSight account:

POST /account/AwsAccountId HTTP/1.1 
Content-type: application/json 
{ 
    "AccountName": "string", 
    "ActiveDirectoryName": "string", 
    "AdminGroup": [ "string" ], 
    "AuthenticationMethod": "string", 
    "AuthorGroup": [ "string" ], 
    "ContactNumber": "string", 
    "DirectoryId": "string", 
    "Edition": "string", 
    "EmailAddress": "string", 
    "FirstName": "string", 
    "LastName": "string", 
    "NotificationEmail": "string", 
    "ReaderGroup": [ "string" ], 
    "Realm": "string" 
}

You can choose to subscribe to Enterprise or Enterprise + Q edition of QuickSight. You can choose the user authentication method: IAM only, IAM and QuickSight, or Active Directory. If you choose Active Directory, you need to provide an Active Directory name and an admin group associated with your Active Directory. Provide the notification email address that you want QuickSight to send notifications to regarding your QuickSight account or QuickSight subscription.

The following code shows the response for CreateAccountSubscription:

HTTP/1.1 Status
Content-type: application/json
{
   "RequestId": "string",
   "SignupResponse": { 
      "accountName": "string",
      "directoryType": "string",
      "IAMUser": boolean,
      "userLoginName": "string"
   }
}

Describe a QuickSight account

Use the DescribeAccountSubscription API to receive a description of a QuickSight account’s subscription:

GET /account/AwsAccountId HTTP/1.1

A successful API call returns an AccountInfo object that includes an account’s name, subscription status, authentication type, edition, and notification email address:

HTTP/1.1 Status 
Content-type: application/json 
{ 
    "AccountInfo": { 
        "AccountName": "string", 
        "AccountSubscriptionStatus": "string", 
        "AuthenticationType": "string", 
        "Edition": "string", 
        "NotificationEmail": "string" 
    },
    "RequestId": "string" 
}

Update and delete a QuickSight account

To avoid accidental deletion, a termination protection flag has been introduced. When a new QuickSight account is provisioned through the CreateAccountSubscription API or user interface, the termination protection flag is set to True by default. You can use the existing DescribeAccountSettings API to inquire the current value of the termination protection flag:

GET /account/AwsAccountId HTTP/1.1

The following code shows the response for DescribeAccountSettings:

HTTP/1.1 Status 
Content-type: application/json
{ 
    "AccountSettings": { 
        "AccountName": "string", 
        "Edition": "string", 
        "DefaultNamespace": "string", 
        "PublicSharingEnabled": "boolean", 
        "NotificationEmail": "string",
        "TerminationProtectionEnabled": "boolean"
    },
    "RequestId": "string" 
}

Use the UpdateAccountSettings API to disable the termination protection flag before proceeding with deleting the account. Optionally, this API can be used to update notification email:

PUT /accounts/{AwsAccountId}/settings HTTP/1.1 
Content-type: application/json 
{ 
    "AwsAccountId": "string", 
    "DefaultNamespace": "string", //Currently only supports default namespace: default 
    "TerminationProtectionEnabled": "boolean",
     "NotificationEmail": "string"
}

After the termination protection has been set to False, use the DeleteAccountSubscription API to delete the QuickSight account:

DELETE /account/{AwsAccountId} HTTP/1.1 
Content-type: application/json

Sample use case

Let’s consider a fictional company, AnyCompany, which owns healthcare facilities across the country. The central IT team of AnyCompany is responsible for setting up and maintaining the IT infrastructure and services for all the facilities in each state. Because AnyCompany is scaling their business, they want to automate the onboarding and maintenance of the IT infrastructure as much as possible. QuickSight is one of the services used by AnyCompany sites, and central IT needs to be able to set up and tear down QuickSight accounts automatically.

The following code shows their sample CreateAccountSubscription API call for setting up a QuickSight Enterprise account:

aws quicksight create-account-subscription --edition ENTERPRISE 
--authentication-method IAM_AND_QUICKSIGHT --aws-account-id XXXXXXXXXXXX 
--account-name quicksight-enterprise-reporting --notification-email XXXXXXXXXX 
--region us-west-2

The following screenshot shows the response.

They use the DescribeAccountSubscription API to monitor the status of new QuickSight account’s subscription:

aws quicksight describe-account-subscription --aws-account-id XXXXXXXXXXX

The following screenshot shows the response.

In case of a site closing event, the following code turns off the account protection using the UpdateAccountSettings API call:

 aws quicksight update-account-settings --aws-account-id XXXXXXXXXXXX 
 --default-namespace default --no-termination-protection-enabled 
 --region us-west-2

The following screenshot shows the response.

The following code deletes the QuickSight account using the DeleteAccountSubscription API call:

aws quicksight delete-account-subscription --aws-account-id XXXXXXXXXXXX 
--region us-west-2

The following screenshot shows the response.

Conclusion

Provisioning and deleting QuickSight accounts enables IT teams to automate their deployments of QuickSight instances across their organization. You can scale and keep up with the rapidly growing business, and minimize human error such as choosing the wrong authentication method. For more information, refer to Amazon QuickSight and What’s New in the Amazon QuickSight User Guide.

Try out QuickSight account provisioning and deletion APIs to automate your QuickSight deployments, and share your feedback and questions in the comments.


About the authors

Srikanth Baheti is a Specialized World Wide Sr. Solution Architect for Amazon QuickSight. He started his career as a consultant and worked for multiple private and government organizations. Later he worked for PerkinElmer Health and Sciences & eResearch Technology Inc, where he was responsible for designing and developing high traffic web applications, highly scalable and maintainable data pipelines for reporting platforms using AWS services and Serverless computing.

Raji Sivasubramaniam is a Sr. Solutions Architect at AWS, focusing on Analytics. Raji is specialized in architecting end-to-end Enterprise Data Management, Business Intelligence and Analytics solutions for Fortune 500 and Fortune 100 companies across the globe. She has in-depth experience in integrated healthcare data and analytics with wide variety of healthcare datasets including managed market, physician targeting and patient analytics.

Mayank Agarwal is a product manager for Amazon QuickSight, AWS’ cloud-native, fully managed BI service. He focuses on account administration, governance and developer experience. He started his career as an embedded software engineer developing handheld devices. Prior to QuickSight he was leading engineering teams at Credence ID, developing custom mobile embedded device and web solutions using AWS services that make biometric enrollment and identification fast, intuitive, and cost-effective for Government sector, healthcare and transaction security applications.

Fulfillment by Amazon uses Amazon QuickSight to deliver key reporting insights to Amazon Marketplace sellers

Post Syndicated from Ravi Kiran Nidadavolu original https://aws.amazon.com/blogs/big-data/fulfillment-by-amazon-uses-amazon-quicksight-to-deliver-key-reporting-insights-to-amazon-marketplace-sellers/

Fulfillment by Amazon logoFulfillment by Amazon (FBA) was launched in 2006, allowing businesses to outsource shipping to Amazon. With this fulfillment option, Amazon stores, picks, packs, ships, and delivers the products to customer as well as handling the customer service and returns for those orders.

Within Seller Central, a website where sellers can monitor their Amazon sales activity, the FBA Reporting team manages Report Central and the associated reporting APIs accessed by FBA sellers. Report Central offers 60+ downloadable reports to help sellers make data-driven decisions across inventory, sales, payments, customer concessions, removals and global shipping services.

The FBA Reporting team receives millions of report download requests from hundreds of thousands of users, in a multi-tenant fashion, every month. Tasked with ensuring speed, efficiency, and user-friendliness in the reports that sellers need to track their business operations, we turned to Amazon QuickSight.

In this post, we discuss what influenced the decision to implement QuickSight Embedded, as well as some of the benefits to FBA sellers.

Customer obsession drives innovation

Prior to implementing QuickSight Embedded, sellers had to create visuals within the limitations of their preferred spreadsheet program, which meant additional work and manual step-by-step processes to get the data out of Report Central and into whichever tools would be used. In some cases, it also meant additional cost to FBA sellers, depending on the analytics tools they were using.

To address this pain point for FBA sellers, we began researching, comparing, and contrasting our options. Initially, we considered building our own analytics tool, but decided against it due to concerns with the effort needed for long-term maintenance, upgrades, and the time investment necessary before we’d be able to bring innovative insights to customers. Once the decision was made to move forward with implementing an existing analytics solution, we began researching QuickSight, a fast, easy-to-use, cloud-powered business analytics service that makes it easy for all users to build visualizations, perform ad hoc analysis, and quickly get business insights from their data, any time, on any device.

The following screenshot shows examples of visuals created by Sales Trend data.

QuickSight enabled the us to provide data access to sellers via a QuickSight dashboard embedded directly into Seller Central.

The FBA Sales Analytics dashboard provides sellers with a summarized view of how their sales are trending over a specified time period. Sellers can filter the content on the page by SKU (or multiple SKUs), Marketplace, and time period. They can also define the granularity of the data: daily, weekly, or monthly. The analytics page includes 1 year of historical data. There are three major components:

  • Sales Fulfilled Chart – A chart showing the trend of fulfilled sales over the specified time period, with year-over-year comparisons. The granularity of the data is determined by the filter selection (daily, weekly, monthly). Sellers can hover over any data point on the chart to see a pop-up with details. They can also download the data to a CSV file.
  • Summary Table – This table displays a comparison of Sales Fulfilled ($) and Units Fulfilled reports, based on the selected filters. It also displays the year-over-year variance.
  • Top Selling SKUs – This shows details about the top-selling SKUs (based on total sales) and the year-over-year variance per SKU, and also includes a link to drill down per SKU.

With QuickSight, we are able to provide sellers with useful visualizations of sales-related data, allowing them to quickly and easily identify trends and track the impact of their actions, without needing to spend extra time and resources creating data visualizations via more manual processes.

Partnering with QuickSight has also enabled us to offer more detail and flexibility to sellers, allowing them to change the period of time for the data they’d like to review, filter based on useful controls, and drill down into whichever areas are most helpful for them.

Analytics at the speed of business

The primary motivation for our choosing QuickSight was that it not only provided a faster, more efficient, and less expensive experience for FBA sellers, but it also did most of the heavy lifting, freeing our team up to focus more of our time on product and dashboard design. One of the benefits of QuickSight being an Amazon product is that accessing data stored in Amazon Redshift, Amazon Relational Database Service (Amazon RDS), Amazon Athena, and other AWS services is a seamless, automatic experience.

The visual editor and toolkit provided by QuickSight allows us to make updates to established dashboards, and even create entirely new dashboards, in a matter of hours. Agility fosters experimentation with changes to visual elements by quickly copying a dashboard, tweaking it, soliciting feedback from stakeholders, and implementing it. The effort reduction from embedding QuickSight’s analytical capabilities allows our team to launch updates and features much faster.

Improving the status quo with QuickSight

By embedding QuickSight into Seller Central, we have been able to offer hundreds of thousands of FBA sellers year-over-year comparison information. QuickSight has allowed our team to iterate and launch updates more quickly, without having to worry about scaling or maintaining infrastructure resources.

Currently underway is an initiative to launch more dashboards to provide insights into inventory trends.

To learn more about how you can embed customized data visuals, interactive dashboards and natural language querying into any application, visit Amazon QuickSight Embedded.


About the Author

Ravi Kiran Nidadavolu is a Software Development Manager with Fulfillment by Amazon.

What’s new with Amazon QuickSight at AWS re:Invent 2022

Post Syndicated from Mia Heard original https://aws.amazon.com/blogs/big-data/whats-new-with-amazon-quicksight-at-aws-reinvent-2022/

AWS re:Invent is a learning conference hosted by AWS for the global cloud computing community. This year’s re:Invent will be held in Las Vegas, Nevada, from November 28 to December 2.

Amazon QuickSight is the most popular cloud-native serverless BI service. This post walks you through the details of all QuickSight-related sessions and activities to help you plan your conference week accordingly. These sessions should appeal to data and analytics teams, product and engineering teams, and line of business and technology leaders interested in modernizing their BI capabilities to transform data into actionable insights for all.

To access the session catalog and reserve your seat for one of our BI sessions, you must be registered for re:Invent. Register now!

Keynotes

Adam Selipsky, Chief Executive Officer of Amazon Web Services – Keynote

Tuesday November 29 | 8:30 AM – 10:30 AM PST | The Venetian

Join Adam Selipsky, Chief Executive Officer of Amazon Web Services, as he looks at the ways that forward-thinking builders are transforming industries and even our future, powered by AWS. He highlights innovations in data, infrastructure, and more that are helping customers achieve their goals faster, take advantage of untapped potential, and create a better future with AWS.

Swami Sivasubramanian, Vice President of AWS Data and Machine Learning – Keynote

Wednesday November 30 | 8:30 AM – 10:30 AM PST | The Venetian

Join Swami Sivasubramanian, Vice President of AWS Data and Machine Learning, as he reveals the latest AWS innovations that can help you transform your company’s data into meaningful insights and actions for your business. In this keynote, several speakers discuss the key components of a future-proof data strategy and how to empower your organization to drive the next wave of modern invention with data. Hear from leading AWS customers who are using data to bring new experiences to life for their customers.

Leadership sessions

ANT203-L (LVL 200) Unlock the value of your data with AWS analytics

Wednesday November 30 | 2:30 – 3:30 PM PST | The Venetian

Data fuels digital transformation and drives effective business decisions. To survive in an ever-changing world, organizations are turning to data to derive insights, create new experiences, and reinvent themselves so they can remain relevant today and in the future. AWS offers analytics services that allow organizations to gain faster and deeper insights from all their data. In this session, G2 Krishnamoorthy, VP of AWS Analytics, addresses the current state of analytics on AWS, covers the latest service innovations around data, and highlights customer successes with AWS analytics. Also, learn from organizations like FINRA and more who have turned to AWS for their digital transformation journey.
Reserve your seat now!

BSI201 (LVL 200) Reinvent how you derive value from your data with Amazon QuickSight

Tuesday November 29 | 2:00 PM – 3:00 PM PST | Mandalay Bay

In this session, learn how you can use AWS-native business analytics to provide your users with machine learning-powered interactive dashboards, natural language query (NLQ), and embedded analytics to provide insights to users at scale, when and where they need it. Join this session to also learn more about how Amazon uses QuickSight internally.
Reserve your seat now!

Breakout sessions

BSI202 (LVL 200) Migrate to cloud-native business analytics with Amazon QuickSight

Wednesday November 30 | 2:30 PM – 3:30 PM PST | Encore

Legacy BI systems can hurt agile decision-making in the modern organization, with expensive licensing, outdated capabilities, and expensive infrastructure management. In this session, discover how migrating your BI to the cloud with cloud-native, fully managed business analytics capabilities from QuickSight can help you overcome these challenges. Learn how you can use QuickSight’s interactive dashboards and reporting capabilities to provide insights to every user in the organization, lowering your costs and enabling better decision-making. Join this session to also learn more about Siemens QuickSight use case.
Reserve your seat now!

BSI207 (LVL 200) Get clarity on your data in seconds with Amazon QuickSight Q

Wednesday November 30 | 4:45 PM – 5:45 PM PST | MGM Grand

Amazon QuickSight Q is a machine learning–powered natural language capability that empowers business users to ask questions about all of their data using everyday business language and get answers in seconds. Q interprets questions to understand their intent and generates an answer instantly in the form of a visual without requiring authors to create graphics, dashboards, or analyses. In this session, the QuickSight Q team provides an overview and demonstration of Q in action. Join this session to also learn more about NASDAQ’s QuickSight use case.
Reserve your seat now!

BSI203 (LVL 200) Differentiate your apps with Amazon QuickSight embedded analytics

Thursday December 1 | 12:30 PM – 1:30 PM PST | Caesars Forum

In this session, learn how to enable new monetization opportunities and grow your business with QuickSight embedded analytics. Discover how you can differentiate your end-user experience by embedding data visualizations, dashboards, and ML-powered natural language query into your applications at scale with no infrastructure to manage. Join this session to also learn more about Guardian Life and Showpad’s QuickSight use case.
Reserve your seat now!

BSI304 (LVL 300) Optimize your AWS cost and usage with Cloud Intelligence Dashboards

Thursday December 1 | 3:30 PM – 4:30 PM PST | Encore

Do your engineers know how much they’re spending? Do you have insight into the details of your cost and usage on AWS? Are you taking advantage of all your cost optimization opportunities? Attend this session to learn how organizations are using the Cloud Intelligence Dashboards to start their FinOps journeys and create cost-aware cultures within their organizations. Dive deep into specific use cases and learn how you can use these insights to drive and measure your cost optimization efforts. Discover how unit economics, resource-level visibility, and periodic spend updates make it possible for FinOps practitioners, developers, and business executives to come together to make smarter decisions. Join this session to also learn more about Dolby laboratories’ QuickSight use case.
Reserve your seat now!

Chalk talks

BSI302 (LVL 300) Deploy your BI assets at scale to thousands with Amazon QuickSight

Tuesday November 29 | 11:45 AM – 12:45 AM PST | Wynn
As your user bases grow to hundreds or thousands of users, managing assets and user permissions at scale becomes increasingly important. In this chalk talk, learn about best practices for content development, promotion, authorization, organization, and cleanup to help ensure that your users are developing and sharing content in a safe and scalable manner.
Reserve your seat now!

BSI301 (LVL 300) Architecting multi-tenancy for your apps with Amazon QuickSight

Tuesday November 29 | 2:45 PM – 3:45 PM PST | Caesars Forum

Whether you are deploying QuickSight internally in a centrally managed single account or developing a SaaS application with multiple external tenants, it is paramount to focus on security and governance and to isolate tenants from each other. In this chalk talk, learn about different architectures and security frameworks that you can use to deploy QuickSight to thousands of departments or clients in a scalable and controlled manner.
Reserve your seat now!

*This session will also be repeated Wednesday November 30 | 7:45 PM – 8:45 PM PST | Wynn

BSI401 (LVL 400) Insightful dashboards through advanced calculations with QuickSight

Monday November 28 | 12:15 PM – 1:15 PM PST | MGM Grand
Loading data into various charting types is very rarely the end goal for your users. When they find interesting patterns or trends, they tend to dig deeper into their data and use calculations to surface more underlying insights. In this chalk talk, learn about various ways to build insightful and creative dashboards using QuickSight’s new advanced calculation capabilities, such as level-aware calculation and period functions.
Reserve your seat now!

Workshops

BSI205 (LVL 200) Build stunning customized dashboards with Amazon QuickSight

Monday November 28 | 10:45 AM – 12:45 PM PST | Wynn

Want to grow your dashboard building skills? In this workshop, the QuickSight team demonstrates the latest authoring functionality designed to empower you to build beautiful layouts and robust interactive experiences with other applications, right from within your dashboard. You must bring your laptop to participate.
Reserve your seat now!

*This session will be also be repeated Thursday December 1 | 11:45 AM – 1:45 PM PST | Caesars Forum

BSI303 (LVL 300) Seamlessly embed analytics into your apps with Amazon QuickSight
Wednesday November 30 | 5:30 PM – 7:30 PM PST | Wynn

In this workshop, learn how you can bring data insights to your internal teams and end customers by simply and seamlessly embedding rich, interactive data visualizations and dashboards into your web applications and portals. You must bring your laptop to participate.
Reserve your seat now!

Partner session

PEX307 (LVL 300) Migrating business intelligence systems to Amazon QuickSight

Wednesday November 30 | 9:15 AM – 10:15 AM PST | Encore

QuickSight is a scalable, serverless, embeddable, machine learning–powered BI tool built for the cloud. If you’re building a cloud-native BI solution and are unsure how to migrate on AWS, this session is for you. Dive deep into BI best practices, tools, and methodologies for migrating BI dashboards to QuickSight, and learn how to use APIs and the AWS CLI to automate common migration tasks required to perform BI dashboard migration. This session is intended for AWS Partners.
Reserve your seat now!

Additional activities

Business Intelligence kiosk in the AWS Village

Visit the Business Intelligence kiosk within the AWS Village to meet with experts to dive deeper into QuickSight capabilities such as Q and embedded analytics. You will be able to ask our experts questions and experience live demos for our newly launched capabilities.

Free QuickSight swag

Make sure to stop by the swag distribution table to grab free QuickSight swag if you have attended either the Business Intelligence kiosk or one of our breakout sessions, chalk talks, or workshops.

Useful resources

Whether you plan on attending re:Invent in person or view available content virtually, you can always learn more about QuickSight through these helpful resources:

QuickSight Community Hub – Ask, answer, and learn with others in the QuickSight Community.

QuickSight YouTube channel – Subscribe to stay up to date on the latest QuickSight workshops, how tos, and demo videos.

QuickSight DemoCentral – Experience QuickSight first-hand through interactive dashboards and demos


About the authors

Mia Heard is a Product Marketing Manager for Amazon QuickSight, AWS’ cloud-native, fully managed BI service.

California State University Chancellor’s Office reduces cost and improves efficiency using Amazon QuickSight for streamlined HR reporting in higher education

Post Syndicated from Madi Hsieh original https://aws.amazon.com/blogs/big-data/california-state-university-chancellors-office-reduces-cost-and-improves-efficiency-using-amazon-quicksight-for-streamlined-hr-reporting-in-higher-education/

The California State University Chancellor’s Office (CSUCO) sits at the center of America’s most significant and diverse 4-year universities. The California State University (CSU) serves approximately 477,000 students and employs more than 55,000 staff and faculty members across 23 universities and 7 off-campus centers. The CSU provides students with opportunities to develop intellectually and personally, and to contribute back to the communities throughout California. For this large organization, managing a wide system of campuses while maintaining the decentralized autonomy of each is crucial. In 2019, they needed a highly secure tool to streamline the process of pulling HR data. The CSU had been using a legacy central data warehouse based on data from their financial system, but it lacked the robustness to keep up with modern technology. This wasn’t going to work for their HR reporting needs.

Looking for a tool to match the cloud-based infrastructure of their other operations, the Business Intelligence and Data Operations (BI/DO) team within the Chancellor’s Office chose Amazon QuickSight, a fast, easy-to-use, cloud-powered business analytics service that makes it easy for all employees within an organization to build visualizations, perform ad hoc analysis, and quickly get business insights from their data, any time, on any device. The team uses QuickSight to organize HR information across the CSU, implementing a centralized security system.

“It’s easy to use, very straightforward, and relatively intuitive. When you couple the experience of using QuickSight, with a huge cost difference to [the BI platform we had been using], to me, it’s a simple choice,”

– Andy Sydnor, Director Business Intelligence and Data Operations at the CSUCO.

With QuickSight, the team has the capability to harness security measures and deliver data insights efficiently across their campuses.

In this post, we share how the CSUCO uses QuickSight to reduce cost and improve efficiency in their HR reporting.

Delivering BI insights across the CSU’s 23 universities

The CSUCO serves the university system’s faculty, students, and staff by overseeing operations in several areas, including finance, HR, student information, and space and facilities. Since migrating to QuickSight in 2019, the team has built dashboards to support these operations. Dashboards include COVID-related leaves of absence, historical financial reports, and employee training data, along with a large selection of dashboards to track employee data at an individual campus level or from a system-wide perspective.

The team created a process for reading security roles from the ERP system and then translating them using QuickSight groups for internal HR reporting. QuickSight allowed them to match security measures with the benefits of low maintenance and familiarity to their end-users.

With QuickSight, the CSUCO is able to run a decentralized security process where campus security teams can provision access directly and users can get to their data faster. Before transitioning to QuickSight, the BI/DO team spent hours trying to get to specific individual-level data, but with QuickSight, the retrieval time was shortened to just minutes. For the first time, Sydnor and his team were able to pinpoint a specific employee’s work history without having to take additional actions to find the exact data they needed.

Cost savings compared to other BI tools

Sydnor shares that, for a public organization, one of the most attractive qualities of QuickSight is the immense cost savings. The BI/DO team at the Chancellor’s Office estimates that they’re saving roughly 40% on costs since switching from their previous BI platform, which is a huge benefit for a public organization of this scale. Their previous BI tool was costing them extensive amounts of money on licensing for features they didn’t require; the CSUCO felt they weren’t getting the best use of their investment.

The functionality of QuickSight to meet their reporting needs at an affordable price point is what makes QuickSight the CSUCO’s preferred BI reporting tool. Sydnor likes that with QuickSight, “we don’t have to go out and buy a subscription or a license for somebody, we can just provision access. It’s much easier to distribute the product.” QuickSight allows the CSUCO to focus their budget in other areas rather than having to pay for charges by infrequent users.

Simple and intuitive interface

Getting started in QuickSight was a no-brainer for Sydnor and his team. As a public organization, the procurement process can be cumbersome, thereby slowing down valuable time for putting their data to action. As an existing AWS customer, the CSUCO could seamlessly integrate QuickSight into their package of AWS services. An issue they were running into with other BI tools was encountering roadblocks to setting up the system, which wasn’t an issue with QuickSight, because it’s a fully managed service that doesn’t require deploying any servers.

The following screenshot shows an example of the CSUCO security audit dashboard.

example of the CSUCO security audit dashboard.

Sydnor tells us, “Our previous BI tool had a huge library of visualization, but we don’t need 95% of those. Our presentations look great with the breadth of visuals QuickSight provides. Most people just want the data and ultimately, need a robust vehicle to get data out of a database and onto a table or visualization.”

Converting from their original BI tool to QuickSight was painless for his team. Sydnor tells us that he has “yet to see something we can’t do with QuickSight.” One of Sydnor’s employees who was a user of the previous tool learned QuickSight in just 30 minutes. Now, they conduct QuickSight demos all the time.

Looking to the future: Expanding BI integration and adopting Amazon QuickSight Q

With QuickSight, the Chancellor’s Office aims to roll out more HR dashboards across its campuses and extend the tool for faculty use in the classroom. In the upcoming year, two campuses are joining CSUCO in building their own HR reporting dashboards through QuickSight. The organization is also making plans to use QuickSight to report on student data and implement external-facing dashboards. Some of the data points they’re excited to explore are insights into at-risk students and classroom scheduling on campus.

Thinking ahead, CSUCO is considering Amazon QuickSight Q, a machine learning-powered natural language capability that gives anyone in an organization the ability to ask business questions in natural language and receive accurate answers with relevant visualizations. Sydnor says, “How cool would that be if professors could go in and ask simple, straightforward questions like, ‘How many of my department’s students are taking full course loads this semester?’ It has a lot of potential.”

Summary

The CSUCO is excited to be a champion of QuickSight in the CSU, and are looking for ways to increase its implementation across their organization in the future.

To learn more, visit the website for the California State University Chancellor’s Office. For more on QuickSight, visit the Amazon QuickSight product page, or browse other Big Data Blog posts featuring QuickSight.


About the authors

Madi Hsieh, AWS 2022 Summer Intern, UCLA.

Tina Kelleher, Program Manager at AWS.

Measure the adoption of your Amazon QuickSight dashboards and view your BI portfolio in a single pane of glass

Post Syndicated from Maitri Brahmbhatt original https://aws.amazon.com/blogs/big-data/measure-the-adoption-of-your-amazon-quicksight-dashboards-and-view-your-bi-portfolio-in-a-single-pane-of-glass/

Amazon QuickSight is a fully managed, cloud-native business intelligence (BI) service. If you plan to deploy enterprise-grade QuickSight dashboards, measuring user adoption and usage patterns is an important ingredient for the success of your BI investment. For example, knowing the usage patterns like geo location, department, and job role can help you fine-tune your dashboards to the right audience. Furthermore, to return the investment of your BI portfolio, with dashboard usage, you can reduce license costs by identifying inactive QuickSight authors.

In this post, we introduce the latest Admin Console, an AWS packaged solution that you can easily deploy and use to create a usage and inventory dashboard for your QuickSight assets. The Admin Console helps identify usage patterns of an individual user and dashboards. It can also help you track which dashboards and groups you have or need access to, and what you can do with that access, by providing more details on QuickSight group and user permissions and activities and QuickSight asset (dashboards, analyses, and datasets) permissions. With timely access to interactive usage metrics, the Admin Console can help BI leaders and administrators make a cost-efficient plan for dashboard improvements. Another common use case of this dashboard is to provide a centralized repository of the QuickSight assets. QuickSight artifacts consists of multiple types of assets (dashboards, analyses, datasets, and more) with dependencies between them. Having a single repository to view all assets and their dependencies can be an important element in your enterprise data dictionary.

This post demonstrates how to build the Admin Console using a serverless data pipeline. With basic AWS knowledge, you can create this solution in your own environment within an hour. Alternatively, you can dive deep into the source code to meet your specific needs.

Admin Console dashboard

The following animation displays the contents of our demo dashboard.

The Admin Console dashboard includes six sheets:

  • Landing Page – Provides drill-down into each detailed tabs.
  • User Analysis – Provides detailed analysis of the user behavior and identifies active and inactive users and authors.
  • Dashboard Analysis – Shows the most commonly viewed dashboards.
  • Assets Access Permissions – Provides information on permissions applied to each asset, such as dashboard, analysis, datasets, data source, and themes.
  • Data Dictionary – Provides information on the relationships between each of your assets, such as which analysis was used to build each dashboard, and which datasets and data sources are being used in each analysis. It also provides details on each dataset, including schema name, table name, columns, and more.
  • Overview – Provides instructions on how to use the dashboard.

You can interactively play with the sample dashboard in the following Interactive Dashboard Demo.

Let’s look at Forwood Safety, an innovative, values-driven company with a laser focus on fatality prevention. An early adopter of QuickSight, they collaborated with AWS to deploy this solution to collect BI application usage insights.

“Our engineers love this admin console solution,” says Faye Crompton, Leader of Analytics and Benchmarking at Forwood. “It helps us to understand how users analyze critical control learnings by helping us to quickly identify the most frequently visited dashboards in Forwood’s self-service analytics and reporting tool, FAST.”

Solution overview

The following diagram illustrates the workflow of the solution.

The workflow involves the following steps:

  1. The AWS Lambda function Data_Prepare is scheduled to run hourly. This function calls QuickSight APIs to get the QuickSight namespace, group, user, and asset access permissions information.
  2. The Lambda function Dataset_Info is scheduled to run hourly. This function calls QuickSight APIs to get dashboard, analysis, dataset, and data source information.
  3. Both the functions save the results to an Amazon Simple Storage Service (Amazon S3) bucket.
  4. AWS CloudTrail logs are stored in an S3 bucket.
  5. Based on the file in Amazon S3 that contains user-group information, dataset information, QuickSight assets access permissions information, as well as dashboard views and user login events from the CloudTrail logs, five Amazon Athena tables are created. Optionally, the BI engineer can combine these tables with employee information tables to display human resource information of the users.
  6. Four QuickSight datasets fetch the data from the Athena tables created in Step 5 and import them into SPICE. Then, based on these datasets, a QuickSight dashboard is created.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Create solution resources

We can create all the resources needed for this dashboard using three CloudFormation templates: one for Lambda functions, one for Athena tables, and one for QuickSight objects.

CloudFormation template for Lambda functions

This template creates the Lambda functions data_prepare and dataset_info.

  • Choose Launch Stack and follow the steps to create these resources.

After the stack creation is successful, you have two Lambda functions, data_prepare and dataset_info, and one S3 bucket named admin-console[AWS-account-ID]. You can verify if the Lambda function can run successfully and if the group_membership, object_access, datasets_info, and data_dictionary folders are created in the S3 bucket under admin-console[AWS-account-ID]/monitoring/quicksight/, as shown in the following screenshots.

The Data_Prepare Lambda function is scheduled to run hourly with the CloudWatch Events rule admin-console-every-hour. This function calls the QuickSight Assets APIs to get QuickSight users, assets, and the access permissions information. Finally, this function creates two files, group_membership.csv and object_access.csv, and saves these files to an S3 bucket.

The Dataset_Info Lambda function is scheduled to run hourly and calls the QuickSight Assets APIs to get datasets, schemas, tables, and fields (columns) information. Then this function creates two files, datasets_info.csv and data_dictionary.csv, and saves these files to an S3 bucket.

  •  Create a CloudTrail log if you don’t already have one and note down the S3 bucket name of the log files for future use.
  •  Note down all the resources created from the previous steps. If the S3 bucket name for the CloudTrail log from step 2 is different from the one in step 1’s output, use the S3 bucket from step 2.

The following table summarizes the keys and values you use when creating the Athena tables with the next CloudFormation stack.

Key Value Description
cloudtraillog s3://cloudtrail-awslogs-[aws-account-id]-do-not-delete/AWSLogs/[aws-account-id]/CloudTrail The Amazon S3 location of the CloudTrail log
cloudtraillogtablename cloudtrail_logs The table name of CloudTrail log
groupmembership s3://admin-console[aws-account-id]/monitoring/quicksight/group_membership The Amazon S3 location of group_membership.csv
objectaccess s3://admin-console[aws-account-id]/monitoring/quicksight/object_access The Amazon S3 location of object_access.csv
dataset info s3://admin-console[aws-account-id]/monitoring/quicksight/datsets_info The Amazon S3 location of datsets_info.csv
datadict s3://admin-console[aws-account-id]/monitoring/quicksight/data_dictionary The Amazon S3 location of data_dictionary.csv

CloudFormation template for Athena tables

To create your Athena tables, complete the following steps:

  • Download the following JSON file.
  • Edit the file and replace the corresponding fields with the keys and values you noted in the previous section.

For example, search for the groupmembership keyword.

Then replace the location value with the Amazon S3 location for the groupmembership folder.

  • Create Athena tables by deploying this edited file as a CloudFormation template. For instructions, refer to Get started.

After a successful deployment, you have a database called admin-console created in AwsDataCatalog in Athena and three tables in the database: cloudtrail_logs, group_membership, object_access, datasets_info and data_dict

  • Confirm the tables via the Athena console.

The following screenshot shows sample data of the group_membership table.

The following screenshot shows sample data of the object_access table.

For instructions on building an Athena table with CloudTrail events, see Amazon QuickSight Now Supports Audit Logging with AWS CloudTrail. For this post, we create the table cloudtrail_logs in the default database.

  • After all five tables are created in Athena, go to the security permissions on the QuickSight console to enable bucket access for s3://admin-console[AWS-account-ID] and s3://cloudtrail-awslogs-[aws-account-id]-do-not-delete.
  • Enable Athena access under Security & Permissions.

Now QuickSight can access all five tables through Athena.

CloudFormation template for QuickSight objects

To create the QuickSight objects, complete the following steps:

  • Get the QuickSight admin user’s ARN by running following command in the AWS Command Line Interface (AWS CLI):
    aws quicksight describe-user --aws-account-id [aws-account-id] --namespace default --user-name [admin-user-name]

    For example: arn:aws:quicksight:us-east-1:12345678910:user/default/admin/xyz.

  • Choose Launch Stack to create the QuickSight datasets and dashboard:

  • Provide the ARN you noted earlier.

After a successful deployment, four datasets named Admin-Console-Group-Membership, Admin-Console-dataset-info, Admin-Console-Object-Access, and Admin-Console-CFN-Main are created and you have the dashboard named admin-console-dashboard. If modifying the dashboard is preferred, use the dashboard save-as option, then recreate the analysis, make modifications, and publish a new dashboard.

  • Set your preferred SPICE refresh schedule for the four SPICE datasets, and share the dashboard in your organization as needed.

Dashboard demo

The following screenshot shows the Admin Console Landing page.

The following screenshot shows the User Analysis sheet.

The following screenshot shows the Dashboards Analysis sheet.

The following screenshot shows the Access Permissions sheet.

The following screenshot shows the Data Dictionary sheet.

The following screenshot shows the Overview sheet.

You can interactively play with the sample dashboard in the following Interactive Dashboard Demo.

You can reference the public template of the preceding dashboard in create-template, create-analysis, and create-dashboard API calls to create this dashboard and analysis in your account. The public template of this dashboard with the template ARN is 'TemplateArn': 'arn:aws:quicksight:us-east-1:889399602426:template/admin-console'.

Tips and tricks

Here are some advanced tips and tricks to build the dashboard as the Admin Console to analyze usage metrics. The following steps are based on the dataset admin_console. You can apply the same logic to create the calculated fields to analyze user login activities.

  • Create parameters – For example, we can create a parameter called InActivityMonths, as in the following screenshot. Similarly, we can create other parameters such as InActivityDays, Start Date, and End Date.

  • Create controls based on the parameters – In the following screenshot, we create controls based on the start and end date.

  • Create calculated fields – For instance, we can create a calculated field to detect the active or inactive status of QuickSight authors. If the time span between the latest view dashboard activity and now is larger or equal to the number defined in the Inactivity Months control, the author status is Inactive. The following screenshot shows the relevant code. According to the end-user’s requirements, we can define several calculated fields to perform the analysis.

  • Create visuals – For example, we create an insight to display the top three dashboard views by reader and a visual to display the authors of these dashboards.

  • Add URL actions – You can add an URL action to define some extra features to email inactive authors or check details of users.

The following sample code defines the action to email inactive authors:

mailto:<<email>>?subject=Alert to inactive author! &body=Hi, <<username>>, any author without activity for more than a month will be deleted. Please log in to your QuickSight account to continue accessing and building analyses and dashboards!

Clean up

To avoid incurring future charges, delete all the resources you created with the CloudFormation templates.

Conclusion

This post discussed how BI administrators can use QuickSight, CloudTrail, and other AWS services to create a centralized view to analyze QuickSight usage metrics. We also presented a serverless data pipeline to support the Admin Console dashboard.

If you would like to have a demo, please email us.

Appendix

We can perform some additional sophisticated analysis to collect advanced usage metrics. For example, Forwood Safety raised a unique request to analyze the readers who log in but don’t view any dashboard actions (see the following code). This helps their clients identify and prevent any wasting of reader sessions fees. Leadership teams value the ability to minimize uneconomical user activity.

CREATE OR REPLACE VIEW "loginwithoutviewdashboard" AS
with login as
(SELECT COALESCE("useridentity"."username", "split_part"("useridentity"."arn", '/', 3)) AS "user_name", awsregion,
date_parse(eventtime, '%Y-%m-%dT%H:%i:%sZ') AS event_time
FROM cloudtrail_logs
WHERE
eventname = 'AssumeRoleWithSAML'
GROUP BY  1,2,3),
dashboard as
(SELECT COALESCE("useridentity"."username", "split_part"("useridentity"."arn", '/', 3)) AS "user_name", awsregion,
date_parse(eventtime, '%Y-%m-%dT%H:%i:%sZ') AS event_time
FROM cloudtrail_logs
WHERE
eventsource = 'quicksight.amazonaws.com'
AND
eventname = 'GetDashboard'
GROUP BY  1,2,3),
users as 
(select Namespace,
Group,
User,
(case
when Group in (‘quicksight-fed-bi-developer’, ‘quicksight-fed-bi-admin’)
then ‘Author’
else ‘Reader’
end)
as author_status
from "group_membership" )
select l.* 
from login as l 
join dashboard as d 
join users as u 
on l.user_name=d.user_name 
and 
l.awsregion=d.awsregion 
and 
l.user_name=u.user_name
where d.event_time>(l.event_time + interval '30' minute ) 
and 
d.event_time<l.event_time 
and 
u.author_status='Reader'

About the Authors

Ying Wang is a Manager of Software Development Engineer. She has 12 years of expertise in data analytics and science. She assisted customers with enterprise data architecture solutions to scale their data analytics in the cloud during her time as a data architect. Currently, she helps customer to unlock the power of Data with QuickSight from engineering by delivering new features.

Ian Liao is a Senior Data Visualization Architect at AWS Professional Services. Before AWS, Ian spent years building startups in data and analytics. Now he enjoys helping customer to scale their data application on the cloud.

Maitri Brahmbhatt is a Business Intelligence Engineer at AWS. She helps customers and partners leverage their data to gain insights into their business and make data driven decisions by developing QuickSight dashboards.

How The Mill Adventure enabled data-driven decision-making in iGaming using Amazon QuickSight

Post Syndicated from Deepak Singh original https://aws.amazon.com/blogs/big-data/how-the-mill-adventure-enabled-data-driven-decision-making-in-igaming-using-amazon-quicksight/

This post is co-written with Darren Demicoli from The Mill Adventure.

The Mill Adventure is an iGaming industry enabler offering customizable turnkey solutions to B2B partners and custom branding enablement for its B2C partners. They provide a complete gaming platform, including licenses and operations, for rapid deployment and success in iGaming, and are committed to improving the iGaming experience by being a differentiator through innovation. The Mill Adventure already provides its services to a number of iGaming brands and seeks to continuously grow through the ranks of the industry.

In this post, we show how The Mill Adventure is helping its partners answer business-critical iGaming questions by building a data analytics application using modern data strategy using AWS. This modern data strategy approach has led to high velocity innovation while lowering the total operating cost.

With a gross market revenue exceeding $70 billion and a global player base of around 3 billion players (per a recent imarc Market Overview 2022-2027), the iGaming industry has, without a doubt, been booming over the past few years. This presents a lucrative opportunity to an ever-growing list of businesses seeking to tap into the market and attract a bigger share as their audience. Needless to say, staying competitive in this somewhat saturated market is extremely challenging. Making data-driven decisions is critical to the growth and success of iGaming businesses.

Business challenges

Gaming companies typically generate a massive amount of data, which could potentially enable meaningful insights and answer business-critical questions. Some of the critical and common business challenges in iGaming industry are:

  • What impacts the brand’s turnover—its new players, retained players, or a mix of both?
  • How to assess the effectiveness of a marketing campaign? Should a campaign be reinstated? Which games to promote via campaigns?
  • Which affiliates drive quality players that have better conversion rates? Which paid traffic channels should be discontinued?
  • For how long does the typical player stay active within a brand? What is the lifetime deposit from a player?
  • How to improve the registration to first deposit processes? What are the most pressing issues impacting player conversion?

Though sufficient data was captured, The Mill Adventure found two key challenges in their ability to generate actionable insights:

  • Lack of analysis-ready datasets (not raw and unusable data formats)
  • Lack of timely access to business-critical data

For example, The Mill Adventure generates over 50 GB of data daily. Its partners have access to this data. However, due to the data being in a raw form, they find it of little value in answering their business-critical questions. This affects their decision-making processes.

To address these challenges, The Mill Adventure chose to build a modern data platform on AWS that was not only capable of providing timely and meaningful business insights for the iGaming industry, but also efficiently manageable, low-cost, scalable, and secure.

Modern data architecture

The Mill Adventure wanted to build a data analytics platform using a modern data strategy that would grow as the company grows. Key tenets of this modern data strategy are:

  • Build a modern business application and store data in the cloud
  • Unify data from different application sources into a common data lake, preferably in its native format or in an open file format
  • Innovate using analytics and machine learning, with an overarching need to meet security and governance compliance requirements

A modern data architecture on AWS applies these tenets. Two key features that form the basic foundation of a modern data architecture on AWS are serverless and microservices.

The Mill Adventure solution

The Mill Adventure built a serverless iGaming data analytics platform that allows its partners to have quick and easy access to a dashboard with data visualizations driven by the varied sources of gaming data, including real-time streaming data. With this platform, stakeholders can use data to devise strategies and plan for future growth based on past performance, evaluate outcomes, and respond to market events with more agility. Having the capability to access insightful information in a timely manner and respond promptly has substantial impact on the turnover and revenue of the business.

A serverless iGaming platform on AWS

In building the iGaming platform, The Mill Adventure was quick to recognize the benefits of having a serverless microservice infrastructure. We wanted to spend time on innovating and building new applications, not managing infrastructure. AWS services such as Amazon API Gateway, AWS Lambda, Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon Simple Storage Service (Amazon S3), Amazon Athena, and Amazon QuickSight are at the core of this data platform solution. Moving to AWS serverless services has saved time, reduced cost, and improved productivity. A microservice architecture has enabled us to accelerate time to value, increase innovation speed, and reduce the need to re-platform, refactor, and rearchitect in the future.

The following diagram illustrates the data flow from the gaming platform to QuickSight.

The data flow includes the following steps:

  1. As players access the gaming portal, associated business functions such as gaming activity, payment, bonus, accounts management, and session management capture the relevant player actions.
  2. Each business function has a corresponding Lambda-based microservice that handles the ingestion of the data from that business function. For example, the Session service handles player session management. The Payment service handles player funds, including deposits and withdrawals from player wallets. Each microservice stores data locally in DynamoDB and manages the create, read, update, and delete (CRUD) tasks for the data. For event sourcing implementation details, see How The Mill Adventure Implemented Event Sourcing at Scale Using DynamoDB.
  3. Data records resulting from the CRUD outputs are written in real time to Kinesis Data Streams, which forms the primary data source for the analytics dashboards of the platform.
  4. Amazon S3 forms the underlying storage for data in Kinesis Data Streams and forms the internal real-time data lake containing raw data.
  5. The raw data is transformed and optimized through custom-built extract, transform, and load (ETL) pipelines and stored in a different S3 bucket in the data lake.
  6. Both raw and processed data are immediately available for querying via Athena and QuickSight.

Raw data is transformed, optimized, and stored as processed data using an hourly data pipeline to meet analytics and business intelligence needs. The following figure shows an example of record counts and the size of the data being written into Kinesis Data Streams, which eventually needs to be processed from the data lake.

These data pipeline jobs can be broadly classified into six main stages:

  • Cleanup – Filtering out invalid records
  • Deduplicate – Removing duplicate data records
  • Aggregate at various levels – Grouping data at various aggregation levels of interest (such as per player, per session, or per hour or day)
  • Optimize – Writing files to Amazon S3 in optimized Parquet format
  • Report – Triggering connectors with updated data (such as updates to affiliate providers and compliance)
  • Ingest – Triggering an event to ingest data in QuickSight for analytics and visualizations

The output of this data pipeline is two-fold:

  • A transformed data lake that is designed and optimized for fast query performance
  • A refreshed view of data for all QuickSight dashboards and analyses

Cultivating a data-driven mindset with QuickSight

The Mill Adventure’s partners access their data securely via QuickSight datasets. These datasets are purposefully curated views on top of the transformed data lake. Each partner can access and visualize their data immediately. With QuickSight, partners can build useful dashboards without having deep technical knowledge or familiarity with the internal structure of the data. This approach significantly reduces the time and effort required and speeds up access to valuable gaming insights for business decision-making.

The Mill Adventure also provides each partner with a set of readily available dashboards. These dashboards are built on the years of experience that The Mill Adventure has in the iGaming industry, cover the most common business intelligence requirements, and jumpstart a data-driven mindset.

In the following sections, we provide a high-level overview of some of The Mill Adventure iGaming dashboard features and how these are used to meet the iGaming business analytics needs.

Key performance indicators

This analysis provides a comprehensive set of iGaming key performance indicators (KPIs) across different functional areas, including but not limited to payment activity (deposits and withdrawals), game activity (bets, gross game wins, return to player) and conversion metrics (active customers, active players, unique depositing customers, newly registered customers, new depositing customers, first-time depositors). These are presented concisely in both a quantitative view and in more visual forms.

In the following example KPI report, we can see how by presenting different iGaming metrics for key periods and lifetime, we can identify the overall performance of the brand.

Affiliates analysis

This analysis presents metrics related to the activity generated by players acquired through affiliates. Affiliates usually account for a large share of the traffic driven to gaming sites, and such a report helps identify the most effective affiliates. It informs performance trends per affiliate and compares across different affiliates. By combining data from multiple sources via QuickSight cross-data source joins, affiliate provider-related data such as earnings and clicks can be presented together with other key gaming platform metrics. By having these metrics broken down by affiliate, we can determine which affiliates contribute the most to the brand, as shown in the following example figure.

Cohort analysis

Cohort analyses track the progression of KPIs (such as average deposits) over a period of time for groups of players after their first deposit day. In the following figure, the average deposits per user (ADPU) is presented for players registering in different quarters within the last 2 years. By moving horizontally along each row on the graph, we can see how the ADPU changes for successive quarters for the same group of players. In the following example, the ADPU decreases substantially, indicating higher player churn.

We can use cohort analyses to calculate the churn rate (rate of players who become inactive). Additionally, by averaging the ADPU figures from this analysis, you can extract the lifetime value (LTV) of the ADPU. This shows the average deposit that can be expected to be deposited by players over their lifetime with the brand.

Player onboarding journey

Player onboarding is not a single-step process. In particular, jurisdictional requirements impose a number of compliance checks that need to be fulfilled along various stages during registration flow. All these, plus other steps along the registration (such as email verification), could pose potential pitfalls for players, leading them to fail to complete registration. Showing these steps in QuickSight funnel visuals helps identify such issues and pinpoint any bottlenecks in such flows, as shown in the following example. Additionally, Sankey visuals are used to monitor player movement across registration steps, identifying steps that need to be optimized.

Campaign outcome analysis

Bonus campaigns are a valuable promotional technique used to reward players and boost engagement. Campaigns can drive turnover and revenue, but there is always an inherent cost associated. It’s critical to assess the performance of campaigns and determine the net outcome. We have built a specific analysis to simplify the task of evaluating these promotions. A number of key metrics related to players activated by campaigns are available. These include both monetary incentives for game activity and deposits and other details related to player demographics (such as country, age group, gender, and channel). Individual campaign performance is analyzed and high-performance ones are identified.

In the following example, the figure on the left shows a time series distribution of deposits coming from campaigns in comparison to the global ones. The figure on the right shows a geographic plot of players activated from selected campaigns.

Demographics distribution analysis

Brands may seek to improve player engagement and retention by tailoring their content for their player base. They need to collect and understand information about their players’ demographics. Players’ demographic distribution varies from brand to brand, and the outcome of actions taken on different brands will vary due to this distribution. Keeping an eye on this demographic (age, country, gender) distribution helps shape a brand strategy in the best way that suits the player base and helps choose the right promotions that appeal most to its audience.

Through visuals such as the following example, it’s possible to quickly analyze the distribution of the selected metric along different demographic categories.

In addition, grouping players by the number of days since registration indicates which players are making a higher contribution to revenue, whether it is existing players or newly registered players. In the following figure, we can see that players who registered in the last 3 months continually account for the highest contribution to deposits. In addition, the proportion of deposits coming from the other two bands of players isn’t increasing, indicating an issue with player retention.

Compliance and responsible gaming

The Mill Adventure treats player protection with the utmost priority. Each iGaming regulated market has its own rules that need to be followed by the gaming operators. These include a number of compliance reports that need to be regularly sent to authorities in the respective jurisdictions. This process was simplified for new brands by creating a common reports template and automating the report creation in QuickSight. This helps new B2B brands meet these reporting requirements quickly and with minimal effort.

In addition, a number of control reports highlighting different areas of player protection are in place. As shown in the following example, responsible gaming reports such as those outlining player behavior deviations help identify accounts with problematic gambling patterns.

Players whose gaming pattern varies from the identified norm are flagged for inspection. This is useful to identify players who may need intervention.

Assessing game play and releases

It’s important to measure the performance and popularity of new games post release. Metrics such as unique player participation and player stakes are monitored during the initial days after the release, as shown in the following figures.

Not only does this help evaluate the overall player engagement, but it can also give a clear indication of how these games will perform in the future. By identifying popular games, a brand may choose to focus marketing campaigns on those games, and therefore ensure that it’s promoting games that appeal to its player base.

As shown in these example dashboards, we can use QuickSight to design and create business analytics insights of the iGaming data. This helps us answer real-life business-critical questions and take measurable actions using these insights.

Conclusion

In the iGaming industry, decisions not backed up by data are like an attempt to hit the bullseye blindfolded. With QuickSight, The Mill Adventure empowers its B2B partners and customers to harness data in a timely and convenient manner and support decision-making with winning strategies. Ultimately, in addition to gaining a competitive edge in maximizing revenue opportunities, improved decision-making will also lead to enhanced player experiences.

Reach out to The Mill Adventure and kick-start your iGaming journey today.

Explore rich set of out-of-the-box Amazon QuickSight ML Insights. Amazon QuickSight Q enables dashboards with natural language querying capabilities. For more information and resources on how to get started with free trial, visit Amazon QuickSight.


About the authors

Darren Demicoli is a Senior Devops and Business Intelligence Engineer at The Mill Adventure. He has worked in different roles in technical infrastructure, software development and database administration and has been building solutions for the iGaming sector for the past few years. Outside work, he enjoys travelling, exploring good food and spending time with his family.

Padmaja Suren is a Technical Business Development Manager serving the Public Sector Field Community in Market Intelligence on Analytics. She has 20+ years of experience in building scalable data platforms using a variety of technologies. At AWS she has served as Specialist Solution Architect on services such as Database, Analytics and QuickSight. Prior to AWS, she has implemented successful data and BI initiatives for diverse industry sectors in her capacity as Datawarehouse and BI Architect. She dedicates her free time on her passion project SanghWE which delivers psychosocial education for sexual trauma survivors to heal and recover.

Deepak Singh is a Solution Architect at AWS with specialization in business intelligence and analytics. Deepak has worked across a number of industry verticals such as Finance, Healthcare, Utilities, Retail, and High Tech. Throughout his career, he has focused on solving complex business problems to help customers achieve impactful business outcomes using applied intelligence solutions and services.

Simplify semi-structured nested JSON data analysis with AWS Glue DataBrew and Amazon QuickSight

Post Syndicated from Sriharsh Adari original https://aws.amazon.com/blogs/big-data/simplify-semi-structured-nested-json-data-analysis-with-aws-glue-databrew-and-amazon-quicksight/

As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Data comes from many different sources in structured, semi-structured, and unstructured formats. For semi-structured data, one of the most common lightweight file formats is JSON. However, due to the complex nature of data, JSON often includes nested key-value structures. Analysts may want a simpler graphical user interface to conduct data analysis and profiling.

To support these requirements, AWS Glue DataBrew offers an easy visual data preparation tool with over 350 pre-built transformations. You can use DataBrew to analyze complex nested JSON files that would otherwise require days or weeks writing hand-coded transformations. You can then use Amazon QuickSight for data analysis and visualization.

In this post, we demonstrate how to configure DataBrew to work with nested JSON objects and use QuickSight for data visualization.

Solution overview

To implement our solution, we create a DataBrew project and DataBrew job for unnesting data. We profile the unested data in DataBrew and analyze data in QuickSight. The following diagram illustrates the architecture of this solution.

Prerequisites

Before you get started, make sure you have the following prerequisites:

Prepare the data

To illustrate the DataBrew functionality to support data analysis for nested JSON files, we use a publicly available sample customer order details nested JSON dataset.

Complete the following steps to prepare your data:

  1. Sign in to the AWS Management Console.
  2. Browse to the publicly available datasets on the Amazon S3 console.
  3. Select the first dataset (customer_1.json) and choose Download to save the files on your local machine.
  4. Repeat this step to download all three JSON files.

    You can view the sample data from your local machine using any text editor, as shown in the following screenshot.
  5. Create input and output S3 buckets with subfolders nestedjson and outputjson to capture data.
  6. Choose Upload and upload the three JSON files to the nestedjson folder.

Create a DataBrew project

To create your Amazon S3 connection, complete the following steps:

  1. On the DataBrew console, choose Projects in the navigation pane.
  2. Choose Create project.
  3. For Project name, enter Glue-DataBew-NestedJSON-Blog.
  4. Select New dataset.
  5. For Dataset name, enter Glue-DataBew-NestedJSON-Dataset.
  6. For Enter your source from S3, enter the path to the nestedjson folder.
  7. Choose Select the entire folder to select all the files.
  8. Under Additional configurations, select JSON as the file type, then select JSON document.
  9. In the Permissions section, choose Choose existing IAM role if you have one available, or choose Create new IAM role.
  10. Choose Create project.
  11. Skip the preview steps and wait for the project to be ready.
    As shown in the following screenshot, the three JSON files were uploaded to the S3 bucket, so three rows of customer order details are loaded.
    The orders column contains nested files. We can use DataBrew to unnest or nest transform to flatten the columns and rows.
  12. Choose the menu icon (three dots) and choose Nest-unnest.
  13. Depending on the nesting, either choose Unnest to columns or Unnest to rows. In this blog post, we choose Unnest to columns to flatten example JSON file.

    Repeat this step until you get a flattened json for all the nested json data and this will create the AWS Glue Databrew recipe as shown below.
  14. Choose Apply.

    DataBrew automatically creates the required recipe steps with updated column values.
  15. Choose Create job.
  16. For Job name, enter Glue-DataBew-NestedJSON-job.
  17. For S3 location, enter the path to the outputjson folder.
  18. In the Permissions section, for Role name, choose the role you created earlier.
  19. Choose Create and run job.

On the Jobs page, you can choose the job to view its run history, details, and data lineage.

Profile the metadata with DataBrew

After you have a flattened file in the S3 output bucket, you can use DataBrew to carry out the data analysis and profiling for the flattened file. Complete the following steps:

  1. On the Datasets page, choose Connect new datasets.
  2. Provide your dataset details and choose Create dataset.
  3. Choose the newly added data source, then choose the Data profile overview tab.
  4. Enter the name of the job and the S3 path to save the output.
  5. Choose Create and run job.

The job takes around two minutes to complete and display all the updated information. You can explore the data further on the Data profile overview and Column statistics tabs.

Visualize the data in QuickSight

After you have the output file generated by DataBrew in the S3 output bucket, you can use QuickSight to query the JSON data. QuickSight is a scalable, serverless, embeddable, ML-powered business intelligence (BI) service built for the cloud. QuickSight lets you easily create and publish interactive BI dashboards that include ML-powered insights. QuickSight dashboards can be accessed from any device, and seamlessly embedded into your applications, portals, and websites.

Launch QuickSight

On the console, enter quicksight into the search bar and choose QuickSight.

You’re presented with the QuickSight welcome page. If you haven’t signed up for QuickSight, you may have to complete the signup wizard. For more information, refer to Signing up for an Amazon QuickSight subscription.

After you have signed up, QuickSight presents a “Welcome wizard.” You can view the short tutorial, or you can close it.

Grant Amazon S3 access

To grant Amazon S3 access, complete the following steps:

  1. On the QuickSight console, choose your user name, choose Manage QuickSight, then choose Security & permissions.
  2. Choose Add or remove.
  3. Locate Amazon S3 in the list. Choose one of the following:
    1. If the check box is clear, select Amazon S3.
    2. If the check box is already selected, choose Details, then choose Select S3 buckets.
  4. Choose the buckets that you want to access from QuickSight, then choose Select.
  5. Choose Update.
  6. If you changed your Region during the first step of this process, change it back to the Region that you want to use.

Create a dataset

Now that you have QuickSight up and running, you can create your dataset. Complete the following steps:

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.

    QuickSight supports several data sources. For a complete list, refer to Supported data sources.
  3. For your data source, choose S3.

    The S3 import requires a data source name and a manifest file.
  4. On your machine, use a text editor to create a manifest file called BlogGlueDataBrew.manifest using the following structure (provide the name of the your output bucket):
    {
        "fileLocations": [
            {
                "URIPrefixes": [
                "https://s3.amazonaws.com/ s3://<output bucket>/outputjson/"
                ]
            }
        ],
        "globalUploadSettings": {
            "format": "CSV",
            "delimiter": ","
        }
    }

    The manifest file points to the folder that you created earlier as part of your DataBrew project. For more information, refer to Supported formats for Amazon S3 manifest files.

  5. Select Upload and navigate to the manifest file to upload it.
  6. Choose Connect to upload data into SPICE, which is an in-memory database built into QuickSight to achieve fast performance.
  7. Choose Visualize.

You can now create visuals by adding different fields.

To learn more about authoring dashboards in QuickSight, check out the QuickSight Author Workshop.

Clean up

Complete the following steps to avoid incurring future charges:

  1. On the DataBrew console, choose Projects in the navigation pane.
  2. Select the project you created and on the Actions menu, choose Delete.
  3. Choose Jobs in the navigation pane.
  4. Select the job you created and on the Actions menu, choose Delete.
  5. Choose Recipes in the navigation pane.
  6. Select the recipe you created and on the Actions menu, choose Delete.
  7. On the QuickSight dashboard, choose your user name on the application bar, then choose Manage QuickSight.
  8. Choose Account settings, then choose Delete account.
  9. Choose Delete account.
  10. Enter confirm and choose Delete account.

Conclusion

This post walked you through the steps to configure DataBrew to work with nested JSON objects and use QuickSight for data visualization. We used Glue DataBrew to unnest our JSON file and profile the data, and then used QuickSight to create dashboards and visualizations for further analysis.

You can use this solution for your own use cases when you need to unnest complex semi-structured JSON files without writing code. If you have comments or feedback, please leave them in the comments section.


About the authors

Sriharsh Adari is a Senior Solutions Architect at Amazon Web Services (AWS), where he helps customers work backwards from business outcomes to develop innovative solutions on AWS. Over the years, he has helped multiple customers on data platform transformations across industry verticals. His core area of expertise include Technology Strategy, Data Analytics, and Data Science. In his spare time, he enjoys playing sports, binge-watching TV shows, and playing Tabla.

Rahul Sonawane is a Principal Analytics Solutions Architect at AWS with AI/ML and Analytics as his area of specialty.

Amogh Gaikwad is a Solutions Developer at Amazon Web Services. He helps global customers build and deploy AI/ML solutions. His work is mainly focused on computer vision, and NLP uses-cases and helping customers optimize their AI/ML workloads for sustainability. Amogh has received his master’s in Computer Science specializing in Machine Learning.

AWS Week in Review – October 17, 2022

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/aws-week-in-review-october-17-2020/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Monday means it’s time for another Week in Review post, so, without further ado, let’s dive right in!

Last Week’s Launches
Here’s some launch announcements from last week you may have missed.

AWS Directory Service for Microsoft Active Directory is now available on Windows Server 2019, and all new directories will run on this server platform. Those of you with existing directories can choose to update with either a few clicks on the AWS Managed Microsoft AD console, or you can update programmatically using an API. With either approach, you can update at a time convenient to you and your organization between now and March 2023. After March 2023, directories will be updated automatically.

Users of SAP Solution Manager can now use automated deployments to provision it, in accordance with AWS and SAP best practices, to both single-node and distributed architectures using AWS Launch Wizard.

AWS Activate is a program that offers free tools, resources, and the opportunity to apply for credits to smaller early stage businesses and also more advanced digital businesses, helping them get started quickly on AWS. The program is now open to any self-identified startup.

Amazon QuickSight users who employ row-level security (RLS) to control access to restricted datasets will be interested in a new feature that enables you to ask questions against topics in these datasets. User-based rules control the answers received to questions and any auto-complete suggestions provided when the questions are being framed. This ensures that users only ever receive answer data that they are granted permission to access.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
This interesting blog post focus on the startup Pieces Technologies, who are putting predictive artificial intelligence (AI) and machine learning (ML) tools to work on AWS to predict and offer clinical insights on patient outcomes such as such as projected discharge dates, anticipated clinical and non-clinical barriers to discharge, and risk of readmission. To help healthcare teams work more efficiently, the insights are provided in natural language and seek to optimize overall clarity of a patient’s clinical issues.

As usual, there’s another AWS open-source and updates newsletter. The newsletter is published weekly to bring you up to date on the latest news on open-source projects, posts, and events.

Upcoming Events
Speaking of upcoming events, the following are some you may be interested in joining, especially if you work with .NET:

Looking to modernize .NET workloads using Windows containers on AWS? There’s a free webinar, with follow-along lab, in just a couple of days on October 20. You can find more details and register here.

My .NET colleagues are also hosting another webinar on November 2 related to building modern .NET applications on AWS. If you’re curious about the hosting and development capabilities of AWS for .NET applications, this is a webinar you should definitely check out. You’ll find further information and registration here.

And finally, a reminder that reserved seating for sessions at AWS re:Invent 2022 is now open. We’re now just 6 weeks away from the event! There are lots of great sessions for your attention, but those of particular interest to me are the ones related to .NET, and at this year’s event we have seven breakouts, three chalk talks, and a workshop for you. You can find all the details using the .NET filter in the session catalog (the sessions all start with the prefix XNT, by the way).

That’s all for this week. Check back next Monday for another AWS Week in Review!

— Steve

Manage your Amazon QuickSight datasets more efficiently with the new user interface

Post Syndicated from Arturo Duarte original https://aws.amazon.com/blogs/big-data/manage-your-amazon-quicksight-datasets-more-efficiently-with-the-new-user-interface/

Amazon QuickSight has launched a new user interface for dataset management. Previously, the dataset management experience was a popup dialog modal with limited space, and all functionality was displayed in this one small modal. The new dataset management experience replaces the existing popup dialog with a full-page experience, providing a clearer breakdown of a dataset’s properties.

In this post, we walk through the end-to-end dataset management user experience.

Access the new UI

To get started, choose Datasets in the navigation pane on the QuickSight console, and choose any dataset that you want to manage.

When you choose a dataset, you see the full-page dataset management UI. This new UI is divided into four main tabs: Summary, Refresh, Permissions, and Usage.

Use case overview

Let’s consider a fictional company, AnyCompany. They have used QuickSight for a long time and now have a large number of datasets that have to be managed. Among the datasets they use, they have a combination of Direct Query and SPICE modes. They need a unified view of each dataset, with details related to permissions, refreshes, and usage. Additionally, they need to be able to schedule when they want to refresh the data and have a history of all the successful and failed attempts of these updates.

The Summary tab

As a data analyst at AnyCompany, you need to review details about your datasets. You can find several options by navigating to the Summary tab.

The About section shows if the dataset is stored in SPICE or if it’s using Direct Query. If the dataset is stored in SPICE, you can also get the size of the dataset. If the dataset is using Direct Query, you can choose Set alert schedule to setup a schedule for when alerts on dashboards should be evaluated.

Specify the time zone, if you want to repeat it daily or hourly, and the start time.

To continue exploring the dataset, choose a new dataset that is stored in SPICE. In the Refresh section, you can verify the status of the SPICE dataset and the last successful refresh date.

Under Access Settings, you can see details about how many owners and viewers this dataset has and also the options to enable row-level and column-level security.

To add row-level security to this dataset, choose Set up under Row-level security.

Under User-based rules, select the permissions dataset with rules to restrict access for each user or group.

To apply column-level security, choose Set up under Column-level security.

Select the columns to be restricted and choose Next.

Choose who can access the restricted columns and choose Apply.

In the Sources section on the Summary tab, a list of data sources is displayed to show the ones used in this dataset. In the following example, we can see the sources SaaS Sales 2022.csv and SaaS-Sales-MonthlySummary.

You also need to identify where (analysis, dashboards, or other datasets) the different datasets are being used, to determine if you can eliminate some unused ones.

To verify this, you just have to look at the Usage section (more details are on the Usage tab).

It’s also possible to go to the data prep interface by choosing Edit dataset or duplicate it by opening the drop-down menu.

You can also directly create a new analysis with this dataset or choose Use in dataset to take advantage of dataset as a source capability. When you use this option, any data preparation that the parent dataset contains, such as any joins or calculated fields, is kept. You can add additional preparation to the data in the new, child datasets, such as joining new data and filtering data. You can also set up your own data refresh schedule for the child dataset and track the dashboards and analyses that use it. Some of the advantages are: Central management of datasets, reduction of dataset management, predefined key metrics and flexibility to customize data.

The Refresh tab

At AnyCompany, you also need to refresh the latest data for your datasets. To achieve this, you have two different options.

You can choose Refresh now to manually get the latest records in the dataset.

You can also choose Add new schedule to create a refresh schedule and not worry about running it manually in the future. You can set the time zone, start time, and frequency.

There are two types of scheduled refresh: full refresh and incremental refresh. Full refresh will completely reload the whole dataset, while incremental refresh only updates a specified small portion of your dataset. Using incremental refresh enables you to access the most recent insights much sooner.

In order to setup the incremental refresh, you need to perform the following actions:

  1. Choose Refresh Now.
  2. For Refresh type, choose Incremental refresh.
  3. If this is your first incremental refresh on the dataset, choose Configure.
  4. On the Configure incremental refresh page, do the following:
    1. For Date column, choose a date column that you want to base the look-back window on.
    2. For Window size, enter a number for size, and then choose an amount of time that you want to look back for changes.You can choose to refresh changes to the data that occurred within a specified number of hours, days, or weeks from now. For example, you can choose to refresh changes to the data that occurred within two weeks of the current date.
  5. Choose Submit.

There are two main sections on the Refresh tab: Schedules and History. Under Schedules, you can see details about the scheduled refreshes of the dataset. There is also the option to edit and delete the schedule.

In the History section, you can see details about the past refreshes, such as status, duration, skipped rows, ingested rows, dataset rows, and refresh type.

The Permissions tab

On the Permissions tab, you can manage the settings and permissions for users and groups that access the dataset.

As the dataset owner at AnyCompany, you need to manage access to the datasets and add users and groups. To do so, simply choose Add users & groups.

Choose the specific user or group to provide access to this dataset.

Review the list of users and groups that have access to the dataset as well as the level of permission (viewer or owner). You can also revoke access to the users or groups.

The Usage tab

It’s not always easy for AnyCompany to determine whether or not a dataset is being used by users or in other assets such as analyses or dashboards.

To answer this kind of question, you can easily review the information on the Usage tab. Here you can review the list of analyses and dashboards where the dataset is being used (choose the name of an analysis or dashboard to view the actual asset).

Under the Users column, you can get the details about who is using this analysis or dashboard.

Conclusion

In this post, we introduced the new user interface of the dataset management page on the QuickSight console. This new user interface simplifies the administration and use of datasets by having everything organized and centralized. This will primarily help authors and administrators quickly manage their datasets, while also contributing to a better QuickSight navigation experience. The new user interface is now generally available in all supported QuickSight Regions.

We look forward to your feedback and stories on how you use the new dataset management interface for your business needs.


About the Authors

Arturo Duarte is a Partner Solutions Architect focused on Amazon QuickSight at Amazon Web Services. He works with EMEA APN Partners to help develop their data and analytics practices with enterprise and mission-critical solutions for their end customers.

Emily Zhu is a Senior Product Manager at Amazon QuickSight, AWS’s cloud-native, fully managed SaaS BI service. She leads the development of the QuickSight analytics and query capability. Before joining AWS, she worked in the Amazon Prime Air drone delivery program and the Boeing company as senior strategist for several years. Emily is passionate about the potential of cloud-based BI solutions and looks forward to helping customers advance in their data-driven strategy making.

Customize Amazon QuickSight dashboards with the new bookmarks functionality

Post Syndicated from Mei Qu original https://aws.amazon.com/blogs/big-data/customize-amazon-quicksight-dashboards-with-the-new-bookmarks-functionality/

Amazon QuickSight users now can add bookmarks in dashboards to save customized dashboard preferences into a list of bookmarks for easy one-click access to specific views of the dashboard without having to manually make multiple filter and parameter changes every time. Combined with the “Share this view” functionality, you can also now share your bookmark views with other readers for easy collaboration and discussion.

In this post, we introduce the bookmark functionality and its capabilities, and demonstrate typical use cases. We also discuss several scenarios extending the usage of bookmarks for sharing and presentation.

Create a bookmark

Open the published dashboard that you want to view and make changes to the filters, controls, or select the sheet that you want. For example, you can filter to the Region that interests you, or you can select a specific date range using a sheet control on the dashboard.

To create a bookmark of this view, choose the bookmark icon, then choose Add bookmark.

Enter a name, then choose Save.

The bookmark is saved, and the dashboard name on the banner updates with the bookmark name.

You can return to the original dashboard view that the author published at any time by going back to the bookmark icon and choosing Original dashboard.

Rename or delete a bookmark

To rename or delete a bookmark, on the bookmark pane, choose the context menu (three vertical dots) and choose Rename or Delete.

Although you can make any of these edits to bookmarks you have created, the Original dashboard view can’t be renamed, deleted, or updated, because it’s what the author has published.

Update a bookmark

At any time, you can change a bookmark dashboard view and update the bookmark to always reflect those changes.

After you make your edits, on the banner, you can see Modified next to the bookmark you’re currently editing.

To save your updates, on the bookmarks pane, choose the context menu and choose Update.

Make a bookmark the default view

After you create a bookmark, you can set it as the default view of the dashboard that you see when you open the dashboard in a new session. This doesn’t affect anyone else’s view of the dashboard. You can also set the Original dashboard that the author published as the default view. When a default view is set, any time that you open the dashboard, the bookmark view is presented to you, regardless of the changes you made during your last session. However, if no default bookmark has been set, QuickSight will persist your current filter selections when you leave the dashboard. This way, readers can still pick up where they left off and don’t have to re-select filters.

To do this, in the Bookmarks pane, choose the context menu (the three dots) for the bookmark that you want to set as your default view, then choose Set as default.

Share a bookmark

After you create a bookmark, you can share a URL link for the view with others who have permission to view the dashboard. They can then save that view as their own bookmark. The shared view is a snapshot of your bookmark, meaning that if you make any updates to your bookmark after generating the URL link, the shared view doesn’t change.

To do this, choose the bookmark that you want to share so that the dashboard updates to that view. Choose the share icon, then choose Share this view. You can then copy the generated URL to share it with others.

Presentation with bookmarks

One extended use case of the new bookmarks feature is the ability to create a presentation through visuals in a much more elegant fashion. Previously, you may have had to change multiple filters and controls before arriving at your next presentation. Now you can save each presentation as a bookmark beforehand and just go through each bookmark like a slideshow. By creating a snapshot for each of the configurations you want to show, you create a smoother and cleaner experience for your audience.

For example, let’s prepare a presentation through bookmarks using a Restaurant Operations dashboard. We can start with the global view of all the states and their related KPIs.

Best vs. worst performing restaurant

We’re interested in analyzing the difference between the best performing and worst performing store based on revenue. If we had to filter this dashboard on the fly, it would be much more time-consuming because we would have to manually enter both store names into the filter. However, with bookmarks, we can pre-filter to the two addresses, save the bookmark, and be automatically directed to the desired stores when we select it.

Worst performing restaurant analysis

When comparing the best and worst performing stores, we see that the worst performing store (10 Resort Plz) had a below average revenue amount in December 2021. We’re interested in seeing how we can better boost sales around the holiday season and if there is something the owner lacked in that month. We can create a bookmark that directs us to the Owner View with the filters selected to December 2021 and the address and city pre-filtered.

Pennsylvania December 2021 analysis

After looking at this bookmark, we found that in the month of December, the call for support score in the restaurant category was high. Additionally, in the week of December 19, 2021 product sales were at the monthly low. We want to see how these KPIs compare to other stores in the same state for that week. Is it just a seasonal trend, or do we need to look even deeper and take action? Again, we can create a bookmark that gives us this comparison on a state level.

From the last bookmark, we can see that the other store in the same state (Pennsylvania) also had lower than average product sales in the week of December 19, 2021. For the purpose of this demo, we can go ahead and stop here, but we could create even more bookmarks to help answer additional questions, such as if we see this trend across other cities and states.

With these three bookmarks, we have created a top-down presentation looking at the worst performing store and its relevant metrics and comparisons. If we wanted to present this, it would be as simple as cycling through the bookmarks we’ve created to put together a cohesive presentation instead of having distracting onscreen filtering.

Conclusion

In this post, we discussed how we can create, modify, and delete bookmarks, along with setting it as a default view and sharing bookmarks. We also demonstrated an extended use case of presentation with bookmarks. The new bookmarks functionality is now generally available in all supported QuickSight Regions.

We’re looking forward to your feedback and stories on how you apply these calculations for your business needs.

Did you know that you can also ask questions of you data in natural language with QuickSight Q? Watch this video to learn more.


About the Authors

Mei Qu is a Business Intelligence Engineer on the AWS QuickSight ProServe Team. She helps customers and partners leverage their data to develop key business insights and monitor ongoing initiatives. She also develops QuickSight proof of concept projects, delivers technical workshops, and supports implementation projects.

Emily Zhu is a Senior Product Manager at Amazon QuickSight, AWS’s cloud-native, fully managed SaaS BI service. She leads the development of QuickSight analytics and query capabilities. Before joining AWS, she was working in the Amazon Prime Air drone delivery program and the Boeing company as senior strategist for several years. Emily is passionate about potentials of cloud-based BI solutions and looks forward to helping customers advance in their data-driven strategy-making.

Amazon Personalize customer outreach on your ecommerce platform

Post Syndicated from Sridhar Chevendra original https://aws.amazon.com/blogs/architecture/amazon-personalize-customer-outreach-on-your-ecommerce-platform/

In the past, brick-and-mortar retailers leveraged native marketing and advertisement channels to engage with consumers. They have promoted their products and services through TV commercials, and magazine and newspaper ads. Many of them have started using social media and digital advertisements. Although marketing approaches are beginning to modernize and expand to digital channels, businesses still depend on expensive marketing agencies and inefficient manual processes to measure campaign effectiveness and understand buyer behavior. The recent pandemic has forced many retailers to take their businesses online. Those who are ready to embrace these changes have embarked on a technological and digital transformation to connect to their customers. As a result, they have begun to see greater business success compared to their peers.

Digitizing a business can be a daunting task, due to lack of expertise and high infrastructure costs. By using Amazon Web Services (AWS), retailers are able to quickly deploy their products and services online with minimal overhead. They don’t have to manage their own infrastructure. With AWS, retailers have no upfront costs, have minimal operational overhead, and have access to enterprise-level capabilities that scale elastically, based on their customers’ demands. Retailers can gain a greater understanding of customers’ shopping behaviors and personal preferences. Then, they are able to conduct effective marketing and advertisement campaigns, and develop and measure customer outreach. This results in increased satisfaction, higher retention, and greater customer loyalty. With AWS you can manage your supply chain and directly influence your bottom line.

Building a personalized shopping experience

Let’s dive into the components involved in building this experience. The first step in a retailer’s digital transformation journey is to create an ecommerce platform for their customers. This platform enables the organization to capture their customers’ actions, also referred to as ‘events’. Some examples of events are clicking on the shopping site to browse product categories, searching for a particular product, adding an item to the shopping cart, and purchasing a product. Each of these events gives the organization information about their customer’s intent, which is invaluable in creating a personalized experience for that customer. For instance, if a customer is browsing the “baby products” category, it indicates their interest in that category even if a purchase is not made. These insights are typically difficult to capture in an in-store experience. Online shopping makes gaining this knowledge much more straightforward and scalable.

The proposed solution outlines the use of AWS services to create a digital experience for a retailer and consumers. The three key areas are: 1) capturing customer interactions, 2) making real-time recommendations using AWS managed Artificial Intelligence/Machine Learning (AI/ML) services, and 3) creating an analytics platform to detect patterns and adjust customer outreach campaigns. Figure 1 illustrates the solution architecture.

Digital shopping experience architecture

Figure 1. Digital shopping experience architecture

For this use case, let’s assume that you are the owner of a local pizzeria, and you manage deliveries through an ecommerce platform like Shopify or WooCommerce. We will walk you through how to best serve your customer with a personalized experience based on their preferences.

The proposed solution consists of the following components:

  1. Data collection
  2. Promotion campaigns
  3. Recommendation engine
  4. Data analytics
  5. Customer reachability

Let’s explore each of these components separately.

Data collection with Amazon Kinesis Data Streams

When a customer uses your web/mobile application to order a pizza, the application captures their activity as click-stream ‘events’. These events provide valuable insights about your customers’ behavior. You can use these insights to understand the trends and browsing pattern of prospects who visited your web/mobile app, and use the data collected for creating promotion campaigns. As your business scales, you’ll need a durable system to preserve these events against system failures, and scale based on unpredictable traffic on your platform.

Amazon Kinesis is a Multi-AZ, managed streaming service that provides resiliency, scalability, and durability to capture an unlimited number of events without any additional operational overhead. Using Kinesis producers (Kinesis Agent, Kinesis Producer Library, and the Kinesis API), you can configure applications to capture your customer activity. You can ingest these events from the frontend, and then publish them to Amazon Kinesis Data Streams.

Let us start by setting up Amazon Kinesis Data Streams to capture the real-time sales transactions from the online channels like a portal or mobile app. For this blog post, we have used the Kaggle’s public data set as a reference. Figure 2 illustrates a snapshot of sample data to build personalized recommendations for a customer.

Sample sales transaction data

Figure 2. Sample sales transaction data

Promotion campaigns with AWS Lambda

One way to increase customer conversion is by offering discounts. When the customer adds a pizza to their cart, you want to make sure they are receiving the best deal. Let’s assume that by adding an additional item, your customer will receive the best possible discount. Just by knowing the total cost of added items to the cart, you can provide these relevant promotions to this customer.

For this scenario, the AWS Lambda service polls the Amazon Kinesis Data Streams to read all the events in the stream. It then matches the events based on your criteria of items in the cart. In turn, these events will be processed by the Lambda function. The Lambda function will read your up-to-date promotions stored in Amazon DynamoDB. As an option, caching recent or most popular promotions will improve your application response time, as well as improve the customer experience on your platform. Amazon DynamoDB DAX is an integrated caching for DynamoDB that caches the most recent or popular promotions or items.

For example, when the customer added the items to their shopping cart, Lambda will send promotion details to them based on the purchase amount. This can be for free shipping or discount of a certain percentage. Figure 3 illustrates the snapshot of sample promotions.

Promotions table in DynamoDB

Figure 3. Promotions table in DynamoDB

Recommendations engine with Amazon Personalize

In addition to sharing these promotions with your customer, you may also want to share the recommended add-ons. In order to understand your customer preferences, you must gather historical datasets to determine patterns and generate relevant recommendations. Since web activity consists of millions of events, this would be a daunting task for humans to review, determine the patterns, and make recommendations. And since user preferences change, you need a system that can use all this volume of data and provide accurate predictions.

Amazon Personalize is a managed AI/ML service that will help you to train an ML model based on datasets. It provides an inference point for real-time recommendations prior to having ML experience. Based on the datasets, Amazon Personalize also provides recipes to generate recommendations. As your customers interact on the ecommerce platform, your frontend application calls Amazon Personalize inference endpoints. It then retrieves a set of personalized recommendations based on your customer preferences.

Here is the sample Python code to display the list of available recommenders, and associated recommendations.

import boto3
import json
client = boto3.client('personalize')

# Connect to the personalize runtime for the customer recommendations

recomm_endpoint = boto3.client('personalize-runtime')
response = recomm_endpoint.get_recommendations(itemId='79323P',
  recommenderArn='arn:aws:personalize:us-east-1::recommender/my-items',
  numResults=5)

print(json.dumps(response['itemList'], indent=2))

[
  {
    "itemId": "79323W"
  },
  {
    "itemId": "79323GR"
  },
  {
    "itemId": "79323LP"
  },
  {
  "itemId": "79323B"
  },
  {
    "itemId": "79323G"
  }
]

You can use Amazon Kinesis Data Firehose to read the data near real time from the Amazon Kinesis Data Streams collected the data from the front-end applications. Then you can store this data in Amazon Simple Storage Service (S3). Amazon S3 is peta-byte scale storage help you scale and acts as a repository and single source of truth. We use S3 data as seed data to build a personalized recommendation engine using Amazon Personalize. As your customers interact on the ecommerce platform, call the Amazon Personalize inference endpoint to make personalized recommendations based on user preferences.

Customer reachability with Amazon Pinpoint

If a customer adds products to their cart but never checks out, you may want to send them a reminder. You can set up an email to suggest they re-order after a period of time after their first order. Or you may want to send them promotions based on their preferences. And as your customers’ behavior changes, you probably want to adapt your messaging accordingly.

Your customer may have a communication preference, such as phone, email, SMS, or in-app notifications. If an order has an issue, you can inform the customer as soon as possible using their preferred method of communication, and perhaps follow it up with a discount.

Amazon Pinpoint is a flexible and scalable outbound and inbound marketing communications service. You can add users to Audience Segments, create reusable content templates integrated with Amazon Personalize, and run scheduled campaigns. With Amazon Pinpoint journeys, you can send action or time-based notifications to your users.

The following workflow shown in Figure 4, illustrates customer communication workflow for promotion. A journey is created for a cohort of college students: a “Free Drink” promotion is offered with a new order. You can send this promotion over email. If the student opens the email, you can immediately send them a push notification reminding them to place an order. But if they didn’t open this email, you could wait three days, and follow up with a text message.

Promotion workflow in Amazon Pinpoint

Figure 4. Promotion workflow in Amazon Pinpoint

Data analytics with Amazon Athena and Amazon QuickSight

To understand the effectiveness of your campaigns, you can use S3 data as a source for Amazon Athena. Athena is an interactive query service that analyzes data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

There are different ways to create visualizations in Amazon QuickSight. For instance, you can use Amazon S3 as a data lake. One option is to import your data into SPICE (Super-fast, Parallel, In-memory Calculation Engine) to provide high performance and concurrency. You can also create a direct connection to the underlying data source. For this use case, we choose to import to SPICE, which provides faster visualization in a production setup. Schedule consistent refreshes to help ensure that dashboards are referring to the most current data.

Once your data is imported to your SPICE, review QuickSight’s visualization dashboard. Here, you’ll be able to choose from a wide variety of charts and tables, while adding interactive features like drill downs and filters.

The process following illustrates how to create a customer outreach strategy using ZIP codes, and allocate budgets to the marketing campaigns accordingly. First, we use this sample SQL command that we ran in Athena to query for top 10 pizza providers. The results are shown in Figure 5.

SELECT name, count(*) as total_count FROM "analyticsdemodb"."fooddatauswest2"
group by name
order by total_count desc
limit 10

Athena query results for top 10 pizza providers

Figure 5. Athena query results for top 10 pizza providers

Second, here is the sample SQL command that we ran in Athena to find Total pizza counts by postal code (ZIP code). Figure 6 shows a visualization to help create customer outreach strategy per ZIP codes and budget the marketing campaigns accordingly.

SELECT postalcode, count(*) as total_count FROM "analyticsdemodb"."fooddatauswest2"
where postalcode is not null
group by postalcode
order by total_count desc limit 50;

QuickSight visualization showing pizza orders by zip codes

Figure 6. QuickSight visualization showing pizza orders by zip codes

Conclusion

AWS enables you to build an ecommerce platform and scale your existing business with minimal operational overhead and no upfront costs. You can augment your ecommerce platform by building personalized recommendations and effective marketing campaigns based on your customer needs. The solution approach provided in the blog will help organizations build re-usable architecture pattern and personalization using AWS managed services.

Enable self-service visual data integration and analysis for fund performance using AWS Glue Studio and Amazon QuickSight

Post Syndicated from Rajeshkumar Karuppaswamy original https://aws.amazon.com/blogs/big-data/enable-self-service-visual-data-integration-and-analysis-for-fund-performance-using-aws-glue-studio-and-amazon-quicksight/

IMM (Institutional Money Market) is a mutual fund that invests in highly liquid instruments, cash, and cash equivalents. IMM funds are large financial intermediaries that are crucial to financial stability in the US. Due to its criticality, IMM funds are highly regulated under the security laws, notably Rule 2a-7, Which states that during market stress, fund managers can impose a liquidity fee up to 2% or redemption gates (a delay in processing redemption) if the fund’s weekly liquid assets drop below 30% of its total assets. The liquidity fees and gates allow money market funds to stop heavy redemption in times of market volatility.

Traditional banks use legacy systems and rely on monolithic architectures. Typically, data and business logic is tightly coupled on the same mainframe machines. It’s hard for analysts and fund managers to perform self-service and gather real-time analytics from these legacy systems. They work on the previous nightly report and struggle to keep up with market fluctuations. The slightest modification to the reports on these legacy systems involves vast costs, time, and significant dependency on the software development team. Due to these limitations, analysts and fund managers can’t respond effectively to market trends and face a tremendous challenge in adhering to the regulatory requirements of monitoring the market volatility.

Over the last few years, many banks have adopted the cloud. Banks have migrated their legacy workloads to reduce cost, improve their competitive advantage, and address competition from FinTech and startups. As part of the cloud strategy, many mainframe applications got re-platformed or re-architected to a more efficient database platform. However, many opportunities exist in modernizing the application. One such option is to enable self-service to run real-time analytics. AWS offers various services that help such use cases. In this post, we demonstrate how to analyze fund performance visually using AWS Glue Studio and QuickSight in a self-service fashion.

The aim of the post is to assist operations analysts and fund managers to self-service their data analysis needs without previous coding experience. This post demonstrates how AWS Glue Studio reduces the software development team’s dependency and helps analysts and fund managers perform near-real-time analytics. This post also illustrates how to build visualizations and quickly get business insights using Amazon QuickSight.

Solution overview

Most banks record their daily trading transactions activity in relational database systems. A relational database keeps the ledger of daily transactions that involves many buys and sells of IMM funds. We use the mock trades data and a simulated Morningstar data feed to demonstrate our use case.

The following sample Amazon Relational Database Service (Amazon RDS) instance records daily IMM trades, and Morningstar market data gets stored in Amazon Simple Storage Service (Amazon S3). With AWS Glue Studio, analysts and fund managers can analyze the IMM trades in near-real time and compare them with market observations from Morningstar. They can then review the data in Amazon Athena, and use QuickSight to visualize and further analyze the trade patterns and market trends.

This near-real time and self-service enables fund managers quickly respond to the market volatility and apply fees or gates on IMM funds to comply with Rule 2a-7 regulatory requirements.

The following diagram illustrates the solution architecture.

Provision resources with AWS CloudFormation

To create your resources for this use case, we deploy an AWS CloudFormation template. Complete the following steps:

  1. Choose Launch Stack (in us-east-1):
  2. Choose Next three times to reach the Review step.
  3. Select I acknowledge that AWS CloudFormation might create IAM resources.
  4. Choose Create stack.

Create an AWS Glue connection

You create an AWS Glue connection to access the MySQL database created by the CloudFormation template. An AWS Glue crawler uses the connection in the next step.

  1. On the AWS Glue console, under Databases in the navigation pane, choose Connections.
  2. Choose Add connection.
  3. For Connection name, enter Trade-Analysis.
  4. For Connection type¸ choose JDBC.
  5. Choose Next.
  6. For JDBC URL, enter your URL.
    To connect to an Amazon RDS for MySQL data store with a DBDEV database, use the following code:

    jdbc: mysql://xxx-cluster.cluster-xxx.us-east-1.rds.amazonaws.com:3306/DBDEV

    For more details, see AWS Glue connection properties. Refer to the CloudFormation fund-analysis stack Outputs tab to get the Amazon RDS ARN.

    The next step requires you to first retrieve your MySQL database user name and password via AWS Secrets Manager.

  7. On the Secrets Manager console, choose Secrets in the navigation pane.
  8. Choose the secret rds-secret-fund-analysis.
  9. Choose Retrieve secret value to get the user name and password.
  10. Return to the connection configuration and enter the user name and password.
  11. For VPC, choose the VPC ending with fund-analysis.
  12. For Subnet and Security groups, choose the values ending with fund-analysis.
  13. Choose Next and Finish to complete the connection setup.
  14. Select the connection you created and choose Test Connection.
  15. For IAM role, choose the role AWSGlueServiceRole-Studio.

For more details about using AWS Identity and Access Management (IAM), refer to Setting up for AWS Glue Studio.

Create and run AWS Glue crawlers

In this step, you create two crawlers. The crawlers connect to a data store, determine the schema for your data, and then create metadata tables in your AWS Glue Data Catalog.

Crawl MySQL data stores

The first crawler creates metadata for the MySQL data stores. Complete the following steps:

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Choose Add crawler.
  3. For Crawler name, enter Trades Crawlers.
  4. Choose Next.
  5. For Crawler source type, choose Data stores.
  6. For Repeat crawls of S3 data stores, choose Crawl all folders.
  7. Choose Next.
  8. For Choose a data store, choose JDBC.
  9. For Connection, choose Trade-Analysis.
  10. For Include path, enter the MySQL database name (DBDEV).
  11. Choose Next.
  12. For Add another data store, choose No.
  13. Choose Next.
  14. For the IAM role to access the data stores, choose the role AWSGlueServiceRole-Studio.
  15. For Frequency, choose Run on demand.
  16. Choose Add database.
  17. For Database name, enter trade_analysis_db.
  18. Choose Create.
  19. Choose Next.
  20. Review all the steps and choose Finish to create your crawler.
  21. Select the Trades Crawlers crawler and choose Run crawler to get the metadata.

Crawl Amazon S3 data stores

Now you configure a crawler to create metadata for the Amazon S3 data stores.

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Choose Add crawler.
  3. For Crawler name, enter Ratings.
  4. Choose Next.
  5. For Crawler source type, choose Data stores.
  6. For Repeat crawls of S3 data stores, choose Crawl all folders.
  7. Choose Next.
  8. For Choose a data store, choose S3.
  9. For Connection, choose Trade-Analysis.
  10. For Include path, enter s3://aws-bigdata-blog/artifacts/analyze_fund_performance_using_glue/Morningstar.csv.
  11. Choose Next.
  12. For Add another data store, choose No.
  13. Choose Next.
  14. For the IAM role to access the data stores, choose the role AWSGlueServiceRole-Studio.
  15. For Frequency, choose Run on demand.
  16. Choose Add database.
  17. For Database name, enter trade_analysis_db.
  18. Review all the steps and choose Finish to create your crawler.
  19. Select the Ratings crawler and choose Run crawler to get the metadata.

Review crawler output

To review the output of your two crawlers, navigate to the Databases page on the AWS Glue console.

You can review the database trade_analysis_db created in previous steps and the contents of the metadata tables.

Create a job using AWS Glue Studio

A job is the AWS Glue component that allows the implementation of business logic to transform data as part of the extract, transform, and load (ETL) process. For more information, see Adding jobs in AWS Glue.

To create an AWS Glue job using AWS Glue Studio, complete the following steps:

  1. On the AWS Glue console, in the navigation pane, choose AWS Glue Studio.
  2. Choose Create and manage jobs.
  3. Choose View jobs.
    AWS Glue Studio supports different sources. For this post, you use two AWS Glue tables as data sources and one S3 bucket as the destination.
  4. In the Create job section, select Visual with a blank canvas.
  5. Choose Create.

    This takes you to the visual editor to create an AWS Glue job.
  6. Change the job name from Untitled Job to Trade-Analysis-Job.

You now have an AWS Glue job ready to filter, join, and aggregate data from two different sources.

Add two data sources

For this post, you use two AWS Glue tables as data sources: Trades and Ratings, which you created earlier.

  1. On the AWS Glue Studio console, on the Source menu, choose MySQL.
  2. On the Node properties tab, for Name, enter Trades.
  3. For Node type, choose MySQL.
  4. On the Data Source properties – MySQL tab, for Database, choose trade_analysis_db.
  5. For Table, choose dbdev_mft_actvitity.
    Before adding the second data source to the analysis job, be sure that the node you just created isn’t selected.
  6. On the Source menu, choose Amazon S3.
  7. On the Node properties tab, for Name, enter Ratings.
  8. For Node type, choose Amazon S3.
  9. On the Data Source properties – S3 tab, for Database, choose trade_analysis_db.
  10. For Table, choose morning_star_csv.
    You now have two AWS Glue tables as the data sources for the AWS Glue job.The Data preview tab helps you sample your data without having to save or run the job. The preview runs each transform in your job so you can test and debug your transformations.
  11. Choose the Ratings node and on the Data preview tab, choose Start data preview session.
  12. Choose the AWSGlueServiceRole-Studio IAM role and choose Confirm to sample the data.

Data previews are available for each source, target, and transform node in the visual editor, so you can verify the results step by step for other nodes.

Join two tables

A transform is the AWS Glue Studio component were the data is modified. You have the option of using different transforms that are part of this service or custom code. To add transforms, complete the following steps:

  1. On the Transform menu, choose Join.
  2. On the Node properties tab, for Name, enter trades and ratings join.
  3. For Node type, choose Join.
  4. For Node parents, choose the Trades and Ratings data sources.
  5. On the Transform tab, for Join type, choose Outer join.
  6. Choose the common column between the tables to establish the connection.
  7. For Join conditions, choose symbol from the Trades table and mor_rating_fund_symbol from the Ratings table.

Add a target

Before adding the target to store the result, be sure that the node you just created isn’t selected. To add the target, complete the following steps:

  1. On the Target menu, choose Amazon S3.
  2. On the Node properties tab, for Name, enter trades ratings merged.
  3. For Node type, choose Amazon S3 for writing outputs.
  4. For Node parents, choose trades and ratings join.
  5. On the Data target properties – S3 tab, for Format, choose Parquet.
  6. For Compression type, choose None.
  7. For S3 target location, enter s3://glue-studio-blog- {Your Account ID as a 12-digit number}/.
  8. For Data catalog update options, select Create a table in the Data Catalog and on subsequent runs, update the schema and add new partitions.
  9. For Database, choose trade-analysis-db.
  10. For Table name, enter tradesratingsmerged.

Configure the job

When the logic behind the job is complete, you must set the parameters for the job run. In this section, you configure the job by selecting components such as the IAM role and the AWS Glue version you use to run the job.

  1. Choose the Job details tab.
  2. For Job bookmark, choose Disable.
  3. For Number of retries, optionally enter 0.
  4. Choose Save.
  5. When the job is saved, choose Run.

Monitor the job

AWS Glue Studio offers a job monitoring dashboard that provides comprehensive information about your jobs. You can get job statistics and see detailed information about the job and the job status when running.

  1. In the AWS Glue Studio navigation pane, choose Monitoring.
  2. Change the date range to 1 hour using the Date range selector to get the recently submitted job.
    The Job runs summary section displays the current state of the job run. The status of the job could be Running, Canceled, Success, or Failed.The Job run success rate section provides the estimated DPU usage for jobs, and gives you a summary of the performance of the job. Job type breakdown and Worker type breakdown contain additional information about the job.
  3. For get more details about the job run, choose View run details.

Review the results using Athena

To view the data in Athena, complete the following steps:

  1. Navigate to the Athena console, where you can see the database and tables created by your crawlers.

    If you haven’t used Athena in this account before, a message appears instructing you to set a query result location.
  2. Choose Settings, Manage, Browse S3, and select any bucket that you created.
  3. Choose Save and return to the editor to continue.
  4. In the Data section, expand Tables to see the tables you created with the AWS Glue crawlers.
  5. Choose the options menu (three dots) next to one of the tables and choose Preview Table.

The following screenshot shows an example of the data.

Create a QuickSight dashboard and visualizations

To set up QuickSight for the first time, sign up for a QuickSight subscription and allow connections to Athena.

To create a dashboard in QuickSight based on the AWS Glue Data Catalog tables you created, complete the following steps:

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.
  3. Create a new QuickSight dataset called Fund-Analysis with Athena as the data source.
  4. In the Choose your table section, choose AwsDataCatlog for Catalog and choose trade_analysis_db for Database.
  5. For Tables, select the tradesratingmerged table to visualize.
  6. Choose Select.
  7. Import the data into SPICE.
    SPICE is an in-memory engine that QuickSight uses to perform advanced calculations and improve performance. Importing the data into SPICE can save time and money. When using SPICE, you can refresh your datasets both fully or incrementally. As of this writing, you can schedule incremental refreshes up to every 15 minutes. For more information, refer to Refreshing SPICE data. For near-real-time analysis, select Directly query your data instead.
  8. Choose Visualize.

    After you create the dataset, you can view it and edit its properties. For this post, leave the properties unchanged.
  9. To analyze the market performance from the Morningstar file, choose the clustered bar combo chart under Visual types.
  10. Drag Fund_Symbol from Fields list to X-axis.
  11. Drag Ratings to Y-axis and Lines.
  12. Choose the default title choose Edit title to change the title to “Market Analysis.”
    The following QuickSight dashboard was created using a custom theme, which is why the colors may appear different than yours.
  13. To display the Morningstar details in tabular form, add a visual to create additional graphs.
  14. Choose the table visual under Visual types.
  15. Drag Fund Symbol and Fund Names to Group by.
  16. Drag Ratings, Historical Earnings, and LT Earnings to Value.

    In QuickSight, up until this point, you analyzed the market performance reported by Morningstar. Let’s analyze the near-real-time daily trade activities.
  17. Add a visual to create additional graphs.
  18. Choose the clustered bar combo chart under Visual types.
  19. Drag Fund_Symbol from Fields list to X-axis and Trade Amount to Y-axis.
  20. Choose the default title choose Edit title to change the title to “Daily Transactions.”
  21. To display the daily trades in tabular form, add a visual to create additional graphs.
  22. Drag Trade Date, Customer Name, Fund Name, Fund Symbol, and Buy/Sell to Group by.
  23. Drag Trade Amount to Value.

The following screenshot shows a complete dashboard. This compares the market observation reported in the street against the daily trades happening in the bank.

In the Market Analysis section of the dashboard, GMFXXD funds were performing well based on the previous night’s feed from Morningstar. However, the Daily Transactions section of the dashboard shows that customers were selling their positions from the funds. Relying only on the previous nightly batch report will mislead the fund managers or operation analyst to act.

Near-real-time analytics using AWS Glue Studio and QuickSight can enable fund managers and analysts to self-serve and impose fees or gates on those IMM funds.

Clean up

To avoid incurring future charges and to clean up unused roles and policies, delete the resources you created: the CloudFormation stack, S3 bucket, and AWS Glue job.

Conclusion

In this post, you learned how to use AWS Glue Studio to analyze data from different sources with no previous coding experience and how to build visualizations and get business insights using QuickSight. You can use AWS Glue Studio and QuickSight to speed up the analytics process and allow different personas to transform data with no development experience.

For more information about AWS Glue Studio, see the AWS Glue Studio User Guide. For information about QuickSight, refer to the Amazon QuickSight User Guide.


About the authors

Rajeshkumar Karuppaswamy is a Customer Solutions Manager at AWS. In this role, Rajeshkumar works with AWS Customers to drive Cloud strategy, provides thought leadership to accelerate businesses achieve speed, agility, and drive innovation. His areas of interests are AI & ML, analytics, and data engineering.

Richa Kaul is a Senior Leader in Customer Solutions serving Financial Services customers. She is based out of New York. She has extensive experience in large scale cloud transformation, employee excellence, and next generation digital solutions. She and her team focus on optimizing value of cloud by building performant, resilient and agile solutions. Richa enjoys multi sports like triathlons, music, and learning about new technologies.

Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He is responsible for building software artifacts to help customers. This summer, he enjoyed goldfish scooping with his children.

Talk to your data: Query your data lake with Amazon QuickSight Q

Post Syndicated from Ying Wang original https://aws.amazon.com/blogs/big-data/talk-to-your-data-query-your-data-lake-with-amazon-quicksight-q/

Amazon QuickSight Q uses machine learning (ML) and natural language technology to empower you to ask business questions about your data and get answers instantly. You can simply enter your questions (for example, “What is the year-over-year sales trend?”) and get the answer in seconds in the form of a QuickSight visual.

Some business questions can’t be answered through existing business intelligence (BI) dashboards. It can take days or weeks for the BI team to accommodate these needs and refine their solution. Because Q doesn’t depend on prebuilt dashboards or reports to answer questions, it removes the need for BI teams to create or update dashboards every time a new business question arises. You can ask questions and receive answers in the form of visuals in seconds directly from within QuickSight or from web applications and portals. Q empowers every business user to self-serve and get insights faster, regardless of their background or skillset.

In this post, we walk you through the steps to configure Q using an Olympic Games public dataset and demonstrate how an end-user can ask simple questions directly from Q in an interactive manner and receive answers in seconds.

You can interactively play with the Olympic dashboard and Q search bar in the following interactive demo.

Solution overview

We use Olympic games public datasets to configure a Q topic and discuss tips and tricks on how to make further configurations on the topic that enable Q to provide prompt answers using ML-powered, natural language query (NLQ) capabilities that empower you to ask questions about data using everyday business language.

The video from Data Con LA provides a high-level demonstration of the capabilities covered in this post.

Additionally, we discuss the following:

  • Best practices for data modeling of a Q topic
  • How to perform data cleansing using AWS Glue DataBrew, SQL, or an Amazon SageMaker Jupyter notebook on datasets to build a Q topic

We use multiple publicly available datasets from Kaggle. The datasets have historical information about athletes, including name, ID, age, weight, country, and medals.

We use the 2020 Olympic datasets and historical data. We also use the datasets Introduction of Women Olympic Sport and Women of Olympic Games to determine the participation of women athletes in Olympics and discover trends. The QuickSight datasets created using these public data files are added to a Q topic, as shown in the following screenshot. We provide details on creating QuickSight datasets later in this post.

Prerequisites

To follow along with the solution presented in this post, you must have access to the following:

Create solution resources

The public datasets in Kaggle can’t be directly utilized to create a Q topic. We have already cleansed the raw data and have provided the cleansed datasets in the GitHub repo. If you are interested in learning more about data cleansing, we discussed three different data cleansing methods at the end of this post.

To create your resources, complete the following steps:

  1. Create an S3 bucket called olympicsdata.
  2. Create a folder for each data file, as shown in the following screenshot.
  3. Upload the data files from the GitHub repo into their respective folders.
  4. Deploy the provided CloudFormation template and provide the necessary information.

The template creates an Athena database and tables, as shown in the following screenshot.

The template also creates the QuickSight data source athena-olympics and datasets.

Create datasets in QuickSight

To build the Q topic, we need to combine the datasets, because each table contains only partial data. Joining these tables helps answer questions across all the features of the 2020 Olympics.

We create the Olympics 2021 dataset by joining the tables Medals_athletes_2021, Athletes_full_2021, Coach_full_2021, and Tech_official_2021.

The following screenshot shows the joins for our complete dataset.

Medals_athletes_2021 is the main table, with the following join conditions:

  • Left outer join athletes_full_2021 on athlete_name, discipline_code, and country_code
  • Left outer join coach_full_2021 on country, discipline, and event
  • Left outer join tech_official_2021 on discipline

Finally, we have the following datasets that we use for our Q topic:

  • Olympics 2021 Details
  • Medals 2021
  • Olympics History (created using the Olympics table)
  • Introduction of Women Olympics Sports
  • Women in the Olympic Movement

Create a Q topic

Topics are collections of one or more datasets that represent a subject area that your business users can ask questions about. In QuickSight, you can create and manage topics on the Topics page. When you create a topic, your business users can ask questions about it in the Q search bar.

When you create topics in Q, you can add multiple datasets to them and then configure all the fields in the datasets to make them natural language-friendly. This enables Q to provide your business users with the correct visualizations and answers to their questions.

The following are data modeling best practices for Q topics:

  • Reduce the number of datasets by consolidating the data. Any given question can only hit one data set, so only include multiple datasets if they are related enough to be part of the same topic, but distinct enough that you can ask a question against them independently.
  • For naming conventions, provide a meaningful name or alias (synonym) of a field to allow the end-user to easily query it.
  • If a field appears in different datasets, make sure that this field has the same name across different datasets.
  • Validate data consistency. For example, the total value of a metric that aggregates from different datasets should be consistent.
  • For fields that don’t request on-the-fly calculations, for example, metrics with distributive functions (sum, max, min, and so on), push down the calculation into a data warehouse.
  • For fields that request on-the-fly calculations, create the calculated field in the QuickSight dataset or Q topic. If other topics or dashboards might reuse the same field, create it in the datasets.

To create a topic, complete the following steps:

  1. On the QuickSight console, choose Topics in the navigation pane.
  2. Choose New topic.
  3. For Topic name, enter a name.
  4. For Description, enter a description.
  5. Choose Save.
  6. On the Add data to topic page that opens, choose Datasets, and then select the datasets that we created in the previous section.
  7. Choose Add data to create the topic.

Enhance the topic

In this section, we discuss various ways that you can enhance the topic.

Add calculated fields to a topic dataset

You can add new fields to a dataset in a topic by creating calculated fields.

For example, we have the column Age in our Olympics dataset. We can create a calculated field to group age into different ranges using the ifelse function. This calculated field can help us ask a question like “How many athletes for each age group?”

  1. Choose Add calculated field.
  2. In the calculation editor, enter the following syntax:
    ifelse(
    Age<=20, '0-20',
    Age>20 and Age <=40, '21-40',
    Age>40 and Age<=60, '41-60',
    '60+'
    )

  3. Name the calculated field Age Groups.
  4. Choose Save.

The calculated field is added to the list of fields in the topic.

Add filters to a topic dataset

Let’s say lot of analysis is expected on the dataset for the summer season. We can add a filter to allow for easy selection of this value. Furthermore, if we want to allow analysis against data for the summer season only, we can choose to always apply this filter or apply it as the default choice, but allow users to ask questions about other seasons as well.

  1. Choose Add filter.
  2. For Name, enter Summer.
  3. Choose the Women in the Olympic Movement dataset.
  4. Choose the Olympics Season field.
  5. Choose Custom filter list for Filter type and set the rule as include.
  6. Enter Summer under Values.
  7. Choose Apply always, unless a question results in an explicit filter from the dataset.
  8. Choose Save.

The filter is added to the list of fields in the topic.

Add named entities to a topic dataset

We can define named entities if we need to show users a combination of fields. For example, when someone asks for player details, it makes sense to show them player name, age, country, sport, and medal. We can make this happen by defining a named entity.

  1. Choose Add named entity.
  2. Choose the Olympics dataset.
  3. Enter Player Profile for Name.
  4. Enter Information of Player for Description.
  5. Choose Add field.
  6. Choose Player Name from the list.
  7. Choose Add field again and add the fields Age, Countries, Sport, and Medal.
    The fields listed are the order they appear in answers. To move a field, choose the six dots next to the name and drag and drop the field to the order that you want.
  8. Choose Save.

The named entity is added to the list of fields in the topic.

Make Q topics natural language-friendly

To help Q interpret your data and better answer your readers’ questions, provide as much information about your datasets and their associated fields as possible.

To make the topic more natural language-friendly, use the following procedures.

Rename fields

You can make your field names more user-friendly in your topics by renaming them and adding descriptions.

Q uses field names to understand the fields and link them to terms in your readers’ questions. When your field names are user-friendly, it’s easier for Q to draw links between the data and a reader’s question. These friendly names are also presented to readers as part of the answer to their question to provide additional context.

Let’s rename the birth date field from the athlete dataset as Athlete Birth Date. Because we have multiple birth date fields in the topics for coach, athlete, and tech roles, renaming the athletes’ birth date field helps Q easily link to the data field when we ask questions regarding athletes’ birth dates.

  1. On the Fields page, choose the down arrow at far right of the Birth Date field to expand it.
  2. Choose the pencil icon next to the field name.
  3. Rename the field to Athlete Birth Date.

Add synonyms to fields in a topic

Even if you update your field names to be user-friendly and provide a description for them, your readers might still use different names to refer to them. For example, a player name field might be referred to as player, players, or sportsman in your reader’s questions.

To help Q make sense of these terms and map them to the correct fields, you can add one or more synonyms to your fields. Doing this improves Q’s accuracy.

  1. On the Fields page, under Synonyms, choose the pencil icon for Player Name.
  2. Enter player and sportsman as synonyms.

Add synonyms to field values

Like we did for field names, we can add synonyms for category values as well.

  1. Choose the Gender field’s row to expand it.
  2. Choose Configure value synonyms, then choose Add.
  3. Choose the pencil icon next to the F value.
  4. Add the synonym Female.
  5. Repeat these steps to add the synonym Male for M.
  6. Choose Done.

Assign field roles

Every field in your dataset is either a dimension or a measure. Knowing whether a field is a dimension or a measure determines what operations Q can and can’t perform on a field.

For example, setting the field Age as a dimension means that Q doesn’t try to aggregate it as it does measures.

  1. On the Fields page, expand the Age field.
  2. For Role, choose Dimension.

Set field aggregations

Setting field aggregations tells Q which function should or shouldn’t be used when those fields are aggregated across multiple rows. You can set a default aggregation for a field, and specify aggregations that aren’t allowed.

A default aggregation is the aggregation that’s applied when there’s no explicit aggregation function mentioned or identified in a reader’s question. For example, let’s ask Q “Show total number of events.” In this case, Q uses the field Total Events, which has a default aggregation of Sum, to answer the question.

  1. On the Fields page, expand the Total Events field.
  2. For Default aggregation, choose Sum.
  3. For Not allowed aggregation, choose Average.

Specify field semantic types

Providing more details on the field context will help Q answer more natural language questions. For example, users might ask “Who won the most medals?” We haven’t set any semantic information for any fields in our dataset yet, so Q doesn’t know what fields to associate with “who.” Let’s see how we can enable Q to tackle this question.

  1. On the Fields page, expand the Player Name field.
  2. For Semantic Type, choose Person.

This enables Q to surface Player Name as an option when answering “who”-based questions.

Exclude unused or unnecessary fields

Fields from all included datasets are displayed by default. However, we have a few fields like Short name of Country, URL Coach Full 2021, and URL Tech Official 2021 that we don’t need in our topic. We can exclude unnecessary fields from the topic to prevent them from showing up in results by choosing the slider next to each field.

Ask questions with Q

After we create and configure our topic, we can now interact with Q by entering questions in the Q search bar.

For example, let’s enter show total medals by country. Q presents an answer to your question as a visual.

You can see how Q interpreted your question in the description at the visual’s upper left. Here you can see the fields, aggregations, topic filters, and datasets used to answer the question. The topic filter na is applied on the Medal attribute, which excludes na values from the aggregation. For more information on topic filters, see Adding filters to a topic dataset.

Q displays the results using the visual type best suited to convey the information. However, Q also gives you the flexibility to view results in other visual types by choosing the Visual icon.

Another example, let’s enter who is the oldest player in basketball. Q presents an answer to your question as a visual.

Sometimes Q might not interpret your question the way you wanted. When this happens, you can provide feedback on the answer or make suggestions for corrections to the answer. For more information about providing answer feedback, see Providing feedback about QuickSight Q topics. For more information about correcting answers, see Correcting wrong answers provided by Amazon QuickSight Q.

Conclusion

In this post, we showed you how to configure Q using an Olympic games public dataset and so end-users can ask simple questions directly from Q in an interactive manner and receive answers in seconds. If you have any feedback or questions, please leave them in the comments section.

Appendix 1: Types of questions supported by Q

Let’s look at samples of each question type that Q can answer using the topic created earlier in this post.

Try the following questions or your own questions and continue enhancing the topic to improve accuracy of responses.

Question Type Example
Dimensional Group Bys show total medals by country
Dimensional Filters (Include) show total medals for united states
Date Group Bys show yearly trend of women participants
Multi Metrics number of women events compared to total events
KPI-Based Period over Periods (PoPs) how many women participants in 2018 over 2016
Relative Date Filters show total medals for united states in the last 5 years
Time Range Filters list of women sports introduced since 2016
Top/Bottom Filter show me the top 3 player with gold medal
Sort Order show top 3 countries with maximum medals
Aggregate Metrics Filter show teams that won more than 50 medals
List Questions list the women sports by year in which they are introduced
OR filters Show player who got gold or silver medal
Percent of Total Percentage of players by country
Where Questions where are the most number of medals
When Questions when women volleyball introduced into olympic games
Who Questions who is the oldest player in basketball
Exclude Questions show countries with highest medals excluding united states

Appendix 2: Data cleansing

In this section, we provide three options for data cleansing: SQL, DataBrew, and Python.

Option 1: SQL

For our first option, we discuss how to create Athena tables on the downloaded Excel or CSV files and then perform the data cleansing using SQL. This option is suitable for those who use Athena tables as a data source for QuickSight datasets and are comfortable using SQL.

The SQL queries to create Athena tables are available in the GitHub repo. In these queries, we perform data cleansing by renaming, changing the data type of some columns, as well as removing the duplicates of rows. Proper naming conventions and accurate data types help Q efficiently link the questions to the data fields and provide accurate answers.

Use the following sample DDL query to create an Athena table for women_introduction_to_olympics:

CREATE EXTERNAL TABLE women_introduction_to_olympics(
year string,
sport string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://<<s3 bucket name>>/womeninolympics/introduction_of_women_olympic_sports'
TBLPROPERTIES (
'has_encrypted_data'='false')

In our data files, there are few columns that are common across more than one dataset that have different column names. For example, gender is available as gender or sex, country is available as country or team or team/noc, and person names have a role prefix in one dataset but not in other datasets. We rename such columns using SQL to maintain consistent column names.

Additionally, we need to change other demographic columns like age, height, and weight to the INT data type, so that they don’t get imported as String.

The following columns from the data files have been transformed using SQL.

Data File Original Column New Column
medals Discipline
Medal_date (timestamp)
Sport
Medal_date (date)
Athletes name
gender
birth_date
birth_place
birth_country
athlete_name
athlete_gender
athlete_birth_date
athlete_birth_place
athlete_birth_country
Coaches name
gender
birth_date
function
coach_name
coach_gender
coach_birth_date
coach_function
Athlete_events (history) Team
NOC
Age (String)
Height (String)
Weight (String)
country
country_code
Age (Integer)
Height (Integer)
Weight (Integer)

Option 2: DataBrew

In this section, we discuss a data cleansing option using DataBrew. DataBrew is a visual data preparation tool that makes it easy to clean and prepare data with no prior coding knowledge. You can directly load the results into an S3 bucket or load the data by uploading an Excel or CSV file.

For our example, we walk you through the steps to implement data cleansing on the medals_athletes_2021 dataset. You can follow the same process to perform any necessary data cleaning on other datasets as well.

Create a new dataset in DataBrew using medals_athletes.csv and then create a DataBrew project and implement the following recipes to cleanse the data in the medals_athletes_2021 dataset.

  1. Delete empty rows in the athlete_name column.
  2. Delete empty rows in the medal_type column.
  3. Delete duplicate rows in the dataset.
  4. Rename discipline to Sport.
  5. Delete the column discipline_code.
  6. Split the column medal_type on a single delimiter.
  7. Delete the column medal_type_2, which was created as a result of step 6.
  8. Rename medal_type_1 to medal_type.
  9. Change the data type of column medal_date from timestamp to date.

After you create the recipe, publish it and create a job to output the results in your desired destination. You can create QuickSight SPICE datasets by importing the cleaned CSV file.

Option 3: Python

In this section, we discuss data cleansing using NumPy and Pandas of Python on the medals_athletes_2021 dataset. You can follow the same process to perform any necessary data cleansing on other datasets as well. The sample Python code is available on GitHub. This option is suitable for someone who is comfortable processing the data using Python.

  1. Delete the column discipline_code:
    olympic.drop(columns='discipline_code')

  2. Rename the column discipline to sport:
    olympic.rename(columns={'discipline': 'sport'})

You can create QuickSight SPICE datasets by importing the cleansed CSV.

Appendix 3: Data cleansing and modeling in the QuickSight data preparation layer

In this section, we discuss one more method of data cleansing that you can perform from the QuickSight data preparation layer, in addition to the methods discussed previously. Using SQL, DataBrew, or Python have advantages because you can prepare and clean the data outside QuickSight so other AWS services can use the cleansed results. Additionally, you can automate the scripts. However, Q authors have to learn other tools and programming languages to take advantage of these options.

Cleansing data in the QuickSight dataset preparation stage allows non-technical Q authors to build the application end to end in QuickSight with a codeless method.

The QuickSight dataset stores any data preparation done on the data, so that the prepared data can be reused in multiple analyses and topics.

We have provided a few examples for data cleansing in the QuickSight data preparation layer.

Change a field name

Let’s change the name data field from Athletes_full_2021 to athlete_name.

  1. In the data preview pane, choose the edit icon on the field that you want to change.
  2. For Name, enter a new name.
  3. Choose Apply.

Change a field data type

You can change the data type of any field from the data source in the QuickSight data preparation layer using the following procedure.

  1. In the data preview pane, choose the edit icon on the field you want to change (for example, birth_date).
  2. Choose Change data type and choose Date.

This converts the string field to a date field.

Appendix 4: Information about the tables

The following table illustrates the scope of each table in the dataset.

Table Name Link Table Data Scope
medals https://www.kaggle.com/piterfm/tokyo-2020-olympics?select=medals.csv Information about medals won by each athlete and the corresponding event and country details
athletes https://www.kaggle.com/piterfm/tokyo-2020-olympics?select=athletes.csv Details about each athlete, such as demographic and country
coaches https://www.kaggle.com/piterfm/tokyo-2020-olympics?select=coaches.csv Details about each coach, such as demographic and country
technical_officials https://www.kaggle.com/piterfm/tokyo-2020-olympics?select=technical_officials.csv Details about each technical official, such as demographic and country
athlete_events https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results Historical information of Olympic games
Introduction_of_Women_Olympics_Sports https://data.world/sports/women-in-the-olympic-games Information on when the women Olympic sports were introduced
womens_participation_in_the_olympic https://data.world/sports/women-in-the-olympic-games Information on participation of women in Olympic sports

About the authors

Ying Wang is a Manager of Software Development Engineer. She has 12 years experience in data analytics and data science. In her data architect life, she helped customer on enterprise data architecture solutions to scale their data analytics in the cloud. Currently, she helps customer to unlock the power of Data with QuickSight from engineering/product by delivering new features.

Ginni Malik is a Data & ML Engineer with AWS Professional Services. She assists customers by architecting enterprise level data lake solutions to scale their data analytics in the cloud. She is a travel enthusiast and likes to run half-marathons.

Niharika Katnapally is a QuickSight Business Intelligence Engineer with AWS Professional Services. She assists customers by developing QuickSight dashboards to help them gain insights into their data and make data driven business decisions.

Maintain visibility over the use of cloud architecture patterns

Post Syndicated from Rostislav Markov original https://aws.amazon.com/blogs/architecture/maintain-visibility-over-the-use-of-cloud-architecture-patterns/

Cloud platform and enterprise architecture teams use architecture patterns to provide guidance for different use cases. Cloud architecture patterns are typically aggregates of multiple Amazon Web Services (AWS) resources, such as Elastic Load Balancing with Amazon Elastic Compute Cloud, or Amazon Relational Database Service with Amazon ElastiCache. In a large organization, cloud platform teams often have limited governance over cloud deployments, and, therefore, lack control or visibility over the actual cloud pattern adoption in their organization.

While having decentralized responsibility for cloud deployments is essential to scale, a lack of visibility or controls leads to inefficiencies, such as proliferation of infrastructure templates, misconfigurations, and insufficient feedback loops to inform cloud platform roadmap.

To address this, we present an integrated approach that allows cloud platform engineers to share and track use of cloud architecture patterns with:

  1. AWS Service Catalog to publish an IT service catalog of codified cloud architecture patterns that are pre-approved for use in the organization.
  2. Amazon QuickSight to track and visualize actual use of service catalog products across the organization.

This solution enables cloud platform teams to maintain visibility into the adoption of cloud architecture patterns in their organization and build a release management process around them.

Publish architectural patterns in your IT service catalog

We use AWS Service Catalog to create portfolios of pre-approved cloud architecture patterns and expose them as self-service to end users. This is accomplished in a shared services AWS account where cloud platform engineers manage the lifecycle of portfolios and publish new products (Figure 1). Cloud platform engineers can publish new versions of products within a portfolio and deprecate older versions, without affecting already-launched resources in end-user AWS accounts. We recommend using organizational sharing to share portfolios with multiple AWS accounts.

Application engineers launch products by referencing the AWS Service Catalog API. Access can be via infrastructure code, like AWS CloudFormation and TerraForm, or an IT service management tool, such as ServiceNow. We recommend using a multi-account setup for application deployments, with an application deployment account hosting the deployment toolchain: in our case, using AWS developer tools.

Although not explicitly depicted, the toolchain can be launched as an AWS Service Catalog product and include pre-populated infrastructure code to bootstrap initial product deployments, as described in the blog post Accelerate deployments on AWS with effective governance.

Launching cloud architecture patterns as AWS Service Catalog products

Figure 1. Launching cloud architecture patterns as AWS Service Catalog products

Track the adoption of cloud architecture patterns

Track the usage of AWS Service Catalog products by analyzing the corresponding AWS CloudTrail logs. The latter can be forwarded to an Amazon EventBridge rule with a filter on the following events: CreateProduct, UpdateProduct, DeleteProduct, ProvisionProduct and TerminateProvisionedProduct.

The logs are generated no matter how you interact with the AWS Service Catalog API, such as through ServiceNow or TerraForm. Once in EventBridge, Amazon Kinesis Data Firehose delivers the events to Amazon Simple Storage Service (Amazon S3) from where QuickSight can access them. Figure 2 depicts the end-to-end flow.

Tracking adoption of AWS Service Catalog products with Amazon QuickSight

Figure 2. Tracking adoption of AWS Service Catalog products with Amazon QuickSight

Depending on your AWS landing zone setup, CloudTrail logs from all relevant AWS accounts and regions need to be forwarded to a central S3 bucket in your shared services account or, otherwise, centralized logging account. Figure 3 provides an overview of this cross-account log aggregation.

Aggregating AWS Service Catalog product logs across AWS accounts

Figure 3. Aggregating AWS Service Catalog product logs across AWS accounts

If your landing zone allows, consider giving permissions to EventBridge in all accounts to write to a central event bus in your shared services AWS account. This avoids having to set up Kinesis Data Firehose delivery streams in all participating AWS accounts and further simplifies the solution (Figure 4).

Aggregating AWS Service Catalog product logs across AWS accounts to a central event bus

Figure 4. Aggregating AWS Service Catalog product logs across AWS accounts to a central event bus

If you are already using an organization trail, you can use Amazon Athena or AWS Lambda to discover the relevant logs in your QuickSight dashboard, without the need to integrate with EventBridge and Kinesis Data Firehose.

Reporting on product adoption can be customized in QuickSight. The S3 bucket storing AWS Service Catalog logs can be defined in QuickSight as datasets, for which you can create an analysis and publish as a dashboard.

In the past, we have reported on the top ten products used in the organization (if relevant, also filtered by product version or time period) and the top accounts in terms of product usage. The following figure offers an example dashboard visualizing product usage by product type and number of times they were provisioned. Note: the counts of provisioned and terminated products differ slightly, as logging was activated after the first products were created and provisioned for demonstration purposes.

Example Amazon QuickSight dashboard tracking AWS Service Catalog product adoption

Figure 5. Example Amazon QuickSight dashboard tracking AWS Service Catalog product adoption

Conclusion

In this blog, we described an integrated approach to track adoption of cloud architecture patterns using AWS Service Catalog and QuickSight. The solution has a number of benefits, including:

  • Building an IT service catalog based on pre-approved architectural patterns
  • Maintaining visibility into the actual use of patterns, including which patterns and versions were deployed in the organizational units’ AWS accounts
  • Compliance with organizational standards, as architectural patterns are codified in the catalog

In our experience, the model may compromise on agility if you enforce a high level of standardization and only allow the use of a few patterns. However, there is the potential for proliferation of products, with many templates differing slightly without a central governance over the catalog. Ideally, cloud platform engineers assume responsibility for the roadmap of service catalog products, with formal intake mechanisms and feedback loops to account for builders’ localization requests.

Amazon migrates financial reporting to Amazon QuickSight

Post Syndicated from Chitradeep Barman original https://aws.amazon.com/blogs/big-data/amazon-migrates-financial-reporting-to-amazon-quicksight/

This is a guest post by from Chitradeep Barman and Yaniv Ackerman  from Amazon Finance Technology (FinTech).

Amazon Finance Technology (FinTech) is responsible for financial reporting on Earth’s largest transaction dataset, as the central organization supporting accounting and tax operations across Amazon. Amazon FinTech’s accounting, tax, and business finance teams close books and file taxes in different regions.

Amazon FinTech had been using a legacy business intelligence (BI) tool for over 10 years, and with its dataset growing at 20% year over year, it was beginning to face operational and performance challenges.

In 2019, Amazon FinTech decided to migrate its data visualization and BI layer to AWS to improve data analysis capabilities, reduce costs, and improve its use of AWS Cloud–native services, which reduces risk and technical complexity. By the end of 2021, Amazon FinTech had migrated to Amazon QuickSight, which organizations use to understand data by asking questions in natural language, exploring through interactive dashboards, or automatically looking for patterns and outliers powered by machine learning (ML).

In this post, we share the challenges and benefits of this migration.

Improving reporting and BI capabilities on AWS

Amazon FinTech’s customers are in accounting, tax, and business finance teams across Amazon Finance and Global Business Services, AWS, and Amazon subsidiaries. It provides these teams with authoritative data to do financial reporting and close Amazon’s books, as well as file taxes in jurisdictions and countries around the world. Amazon FinTech also provides data and tools for analysis and BI.

“Over time, with data growth, we started facing operational and maintenance challenges with the legacy BI tool, resulting in a multifold increase in engineering overhead,” said Chitradeep Barman, a senior technical program manager with Amazon FinTech who drove the technical implementation of the migration to QuickSight.

To improve security, increase scalability, and reduce costs, Amazon FinTech decided to migrate to QuickSight on AWS. This transition aligned with the organization’s goal to rely on native AWS technology and reduce dependency on other third-party tools.

Amazon FinTech was already using Amazon Redshift, which can analyze exabytes of data and run complex analytical queries. It can run and scale analytics on data in seconds without the need to manage the data warehouse infrastructure for its cloud data warehouse. As an AWS-native data visualization and BI tool, QuickSight seamlessly connects with AWS services, including Amazon Redshift. The migration was sizable: after consolidating existing reports, there were about 2,000 financial reports in the legacy tool that were used by over 2,500 users. The reports pulled data from millions of records.

Innovating while migrating

Amazon FinTech migrated complex reports and simultaneously started multiple training sessions. Additional training modules were built to complement existing QuickSight trainings and calibrated to meet the specific needs of Amazon FinTech’s customers.

Amazon FinTech deals with petabytes of data and had built up a repository of 10,000 reports used by 2,500 employees across Amazon. Collaborating with the QuickSight team, they consolidated their reports to reduce redundancy and focus on what their finance customers needed. Amazon FinTech built 450 canned and over 1,800 ad hoc reports in QuickSight, developing a reusable solution with the QuickSight API. As shown in the following figure, on average per month, Amazon FinTech has over 1,300 unique QuickSight users run almost 2,500 unique QuickSight reports, with more than 4,600 total runs.

Amazon FinTech has been able to scale to meet customer requirements using QuickSight.

“AWS services come along with scalability. The whole point of migrating to AWS is that we do not need to think about scaling our infrastructure, and we can focus on the functional part of it,” says Barman.

QuickSight is cloud based, fully managed, and serverless, meaning you don’t have to build your own infrastructure to handle peak usage. It auto scales across tens of thousands of users who work independently and simultaneously.

As of May 2022, more than 2,500 Amazon Finance employees are using QuickSight for financial and operational reporting and to prepare Amazon’s tax statements.

“The advantage of Amazon QuickSight is that it empowers nontechnical users, including accountants and tax and financial analysts. It gives them more capability to run their reporting and build their own analyses,” says Keith Weiss, principal program manager at Amazon FinTech. According to Weiss, “QuickSight has much richer data visualization than competing BI tools.”

QuickSight is constantly innovating for customers, adding new features, and recently released the AI/ML service Amazon QuickSight Q, which lets users ask questions in natural language and receive accurate answers with relevant visualizations to help gain insights from the underlying data. Barman, Weiss, and the rest of the Amazon FinTech team are excited to implement Q in the near future.

By switching to QuickSight, which uses pay-as-you-go pricing, Amazon FinTech saved 40% without sacrificing the security, governance, and compliance requirements their account needed to comply with internal and external auditors. The AWS pricing structure makes QuickSight much more cost-effective than other BI tools on the market.

Overall, Amazon FinTech saw the following benefits:

  • Performance improvements – Latency of consumer-facing reports was reduced by 30%
  • Cost reduction – FinTech reduced licensing, database, and support costs by over 40%, and with the AWS pay-as-you-go model, it’s much more cost-effective to be on QuickSight
  • Controllership – FinTech reports are global, and controlled accessibility to reporting data is a key aspect to ensure only relevant data is visible to specific teams
  • Improved governance – QuickSight APIs to track and promote changes within different environments reduced manual overhead and improved change trackability

Seamless and reliable

At the end of each month, Amazon FinTech teams must close books in 5 days, and since implementing QuickSight for this purpose, Barman says that “reports have run seamlessly, and there have been no critical situations.”

Amazon FinTech’s account on QuickSight is now the source of truth for Amazon’s financial reporting, including tax filings and preparing financial statements. It enables Amazon’s own finance team to close its books and file taxes at the unparalleled scale at which Amazon operates, with all its complexity. Most importantly, despite initial skepticism, according to Weiss, “Our finance users love it.”

Learn more about Amazon QuickSight and get started diving deeper into your data today!


About the authors

Chitradeep Barman is a Sr. Technical Program Manager at Amazon Finance Technology (FinTech). He led the Amazon wide migration of BI reporting from Oracle BI (OBIEE) to AWS QuickSight. Chitradeep started his career as a data engineer and over time grew as a data architect. Before joining Amazon, he lead the design and implementation to launch the BI analytics and reporting platform for Cisco Capital (a fully owned subsidiary of Cisco Systems).

Yaniv Ackerman is a senior software development manager in Fintech org. He has over 20 years of experience building business critical, scalable and high-performance software. Yaniv’s team build data lakes, analytics and automation solutions for financial usage.