All posts by Luis Wang

Amazon QuickSight Announces General Availability of ML Insights

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/amazon-quicksight-announces-general-availability-of-ml-insights/

At re:Invent 2018, we announced the preview of ML Insights, a set of out-of-the-box machine learning and natural language features that provide Amazon QuickSight users with business insights beyond visualization. Today, we are announcing the general availability of ML Insights.

As the volume of data that customers generate continues to grow every day, it’s becoming more challenging to harness that data for business insights. This is where machine learning comes into play. Amazon is a pioneer in using machine learning to automate and scale various aspects of business analytics.

With new ML Insights features, Amazon QuickSight can help you discover hidden data trends, identify key business drivers, forecast future results, and summarize your data in easy-to-read, natural language narratives, saving hours of manual analysis and investigation. You can build comprehensive BI solutions that integrate out-of-the-box machine learning with the analytical richness of Amazon QuickSight and distribute interactive dashboards to everyone in your organization. ML Insights makes machine learning easy, allowing anyone regardless of their technical and ML skillset to easily get insights from their data in minutes rather than weeks. ML Insights features include:

  • ML-powered anomaly detection to uncover hidden insights by continuously analyzing billions of data points.
  • ML-powered forecasting to predict growth and business trends with point-and-click simplicity.
  • Auto-narratives to tell customers the story of their dashboard using plain-language narratives.

Check out this video to get a quick overview of ML Insights:

To get you started with ML Insights, this blog post will walk you through new ML-powered capabilities.

Customer use cases

During the past three months of ML Insights preview availably, customers from a broad range of industries, including telecommunication, entertainment, marketing, retail, energy, financial services, and healthcare, have used ML Insights to harness their growing volume of data on AWS and on-premises for business insights. Here are some of the cool things customers are doing with ML Insights:

Expedia Group is the world’s travel platform, and its purpose is to bring the world within reach.

“At Expedia Group two of our key strategic imperatives are to be customer centric and locally relevant on a global basis. This is why tools such as Amazon QuickSight are so helpful in making it easier to measure, report, and act on our business metrics to help our customers find the best matches for their travel searches. Amazon QuickSight’s out-of-the-box machine learning insights help us to continuously monitor our business for anomalies, alert stakeholders when outliers occur, and help our business project future trends, which in turn allows teams to focus on other priorities instead of building out these capabilities from scratch.”

Amit Marwah, Director of Technology, Flights Data & Analytics, Expedia Group

Ricoh Company, Ltd., is a global corporation that provides imaging equipment for offices, production print solutions, document management systems, IT services, and more, for approximately 200 countries and regions throughout the world.

“Machine learning is becoming more important than ever to meet our growing data volume and BI needs. Amazon QuickSight’s ML Insights functionality makes powerful machine learning easy to use in just a few clicks. It allows us to continuously monitor for unexpected usage behaviors in our fleet of smart devices worldwide, forecast usage trends and deliver comprehensive dashboards that incorporate these machine generated insights as auto-narratives to our line of business users. With ML Insights, we can quickly pinpoint and take actions on anomalies down to the specific devices and features, to improve the experience and add value to our customers.”

Naoki Umehara, Group Leader, Ricoh Company, Ltd.

Tata Consultancy Services Limited is an Indian multinational information technology (IT) service and consulting company headquartered in Mumbai, Maharashtra, with international presence in 46 countries.

“Amazon QuickSight allows us to quickly and easily integrate our Amazon Connect contact center metrics with our client ticketing tools in order to deliver interactive and automatic dashboards that our customers love. We revolutionized our staffing, training, and outage reporting with Amazon QuickSight ML Insights, predicting where the call flow is moving and react accordingly in order to prevent call spikes and provide a better service to our customers.”

Marco David Martinez, Cloud Manager, Tata Consultancy Services

Siemens is a global powerhouse focusing on electrification, automation, and digitalization. The company is also a leading supplier of power generation and transmission systems.

“Amazon QuickSight’s out-of-the-box ML Insights and usage-based pricing make it easy and cost effective to deliver robust machine-learning-based anomaly detection for our customers to analyze performance of their manufacturing process, detect faults in the production line, monitor downtime duration across hundreds of machineries, and understand the root cause of the failures — without heavy investment machine learning and custom development. This allows line supervisors and production managers to receive automated alerts on unexpected events and take actions to optimize the manufacturing process and improve performance.”

Massimilliano Ponticelli, Product Manager, Siemens

Daiso Industries Co., Ltd. is a global retailer and franchise of 100-yen shops founded in Japan.

“With Amazon QuickSight, we were able to build out our BI environment that handles the data of 5,000 stores x 70,000 products in 2 months. Precise sales forecast and inventory optimization are the most important challenges in our business. Amazon QuickSight’s ML Insights allow us to easily and quickly identify unexpected trend changes across our products and improve sales forecast and inventory optimization.”

Kenjiro Marumoto, Section Chief, Daiso Industries Co., Ltd.

Getting started

ML Insights is only available on the Enterprise Edition of Amazon QuickSight. If you are using the Standard Edition, you can easily upgrade with 1-click on the Manage QuickSight page.

To get started with ML Insights, you’ll need to connect a data source to Amazon QuickSight. Data sets can be accessed by direct query to the SQL-compatible database source or by using SPICE.

For this walkthrough, your data must have the following properties:

  • At least one date field.
  • At least one metric, such as sales, orders, shipped units, or sign ups.
  • At least one category dimension, such as product, channel, segment, or industry.
  • More than 40 historical data points per metric.

For optimal results, make sure that your data set has enough historical data points. The built-in ML algorithm requires at least 40 historical data points to learn and train the model, and it will use up to the most recent 1,000 data points. For example, if you’re analyzing daily sales by geographic region, make sure that you have at least 40 days of data. Three months to twelve months of data is preferred, depending on the seasonality of your business

You can use your own data set, or you can download the following sample data set. We’ll use this dataset for the walkthrough:

https://s3.amazonaws.com/quicksight-ml-insights/ML-Insights-Sample-Dataset-V1.csv

Once you have created a data set in Amazon QuickSight, create a new analysis from the data set. For more information about creating data sets and analyses in Amazon QuickSight, see the Amazon QuickSight User Guide.

Suggested insights

ML Insights automatically interprets your data and provides contextual insights called suggested insights. Different visuals may result in different types of insights. For example, if you have a time-series visual, you may get insights such as period-over-period changes, anomalies, and forecasts.

Let’s walk through an example.

1. Create a line chart with a metric and a date, such as revenue over time, aggregated daily.

2. Choose Insights in the top left-hand corner of the visual.

You should then see a list of suggested insights on the left pane. Suggested insights provide you with a quick summary of the data in plain language. As you add visuals to your analyses, you will see additional suggested insights on the left pane, grouped by the visual name.

You can choose a suggested insight such as day over day change to highlight the data point or segment on the visual. Choose it again to deselect it.

ML-powered anomaly detection

With ML Insights, you can run ML-powered anomaly detection on up to a million metrics simultaneously to discover hidden trends and outliers that are often buried in aggregates. To learn more about pricing for anomaly detection, go to Amazon QuickSight pricing and choose ML Insights.

Let’s get started with anomaly detection.

1. Choose Add on the application bar, and then choose Add anomaly to sheet. This creates an insights visual for anomaly detection.

2. Expand the field wells on the top of the page and add at least one category field.

The categories represent the dimensional values by which Amazon QuickSight will split the metric. For example, let’s say you are analyzing anomalies in revenue across all product categories and product SKUs. Assuming there are 10 product categories, each with 10 product SKUs, Amazon QuickSight will split the metric by the 100 unique combinations and run anomaly detection on each of the split metrics.

3. Choose Get Started on the insights visual to configure the anomaly detection job.

4. On the anomaly detection configuration pane, configure the following options:

  • Analyze all combinations of these categories – If you select three categories, Amazon QuickSight will run anomaly detection on the following combinations, hierarchically: A, AB, ABC. If you select this option, Amazon QuickSight will analyze all combinations, including: A, AB, ABC, BC, AC. If your data is not hierarchical, you should select this option.
  • Number of anomalies to show – This setting allows you to control the number of top anomalies you want to display on the insights cards as narratives.
  • Schedule – Set the schedule to run anomaly detection on your data hourly, daily, weekly or monthly, depending on your data. Choose the start time and the time zone of the start time.
  • Contribution analysis – You can select up to four additional dimensions for Amazon QuickSight to analyze for top contributors when an anomaly is detected. For example, Amazon QuickSight can show you the top customers that contributed to a spike in sales in the USA for Home Improvement products. If you have additional dimensions in your data (dimensions not used in the anomaly detection), you can add them here for contribution analysis. For this example, choose the Geo for contribution analysis.

5. Choose OK. Amazon QuickSight will not implement the schedule until you publish the analysis as a dashboard. Within an analysis, you will have the option to run anomaly detection manually without the schedule.

6. After the configuration is set, choose Run Now to run detection manually. You will see a “Analyzing for anomalies… This may take a while…” message. Depending on the size of your data set, analysis may take anywhere from a few minutes to an hour.

Once anomaly detection is complete, you will see the top anomalies for the latest period in your data listed in the insights visual. Amazon QuickSight also computes and displays the expected value so you can better understand the significance of the anomaly.

7. To see all anomalies for this data, choose the selector in the upper right of the visual and choose Explore Anomalies.

On the detailed anomalies page, you can see all of the anomalies detected for the latest period. The title of the visual represents the metric that is applied to the unique combination of the categorical fields. The highlighted data point on the chart—on the far right of the chart—represents the most recent anomaly detected for that time series.

On the left pane, you will see the top contributors to the anomaly based on the dimensions you have predefined. When you hover over the top contributors, Amazon QuickSight displays an explanation of the significance of the contribution.

8. To see anomalies by date, choose Show Anomalies by Date from the top of the visual to expose a date picker. The cart will display the number of anomalies detected for each unit of your anomaly detection configuration. You can choose a particular date to see the anomalies for that date. For example, if you choose Nov. 1st, 2018, from the graph, then the bar chart highlights the anomalies for that date.

Important: Amazon QuickSight uses the first 40 data points in a data set for training; these data points will not be scored by the anomaly detection algorithm. You may not see any anomalies on the first 40 data points.

9. Use the filter controls at the top of the pane to change the anomaly threshold to show anomalies with high, medium or low significance or to show only anomalies that are higher than expected or lower than expected. You can also filter by the categorical values that are present in your data set to look at anomalies only for those categories.

10. To go back to your analysis, choose Back to analysis at the top of the page.

ML-powered forecasting

Using the built-in ML algorithm, you can now forecast business metrics with point-and-click simplicity without having to write code or build a complex spreadsheet.

1. On your time series chart, choose the selector in the upper right corner of the visual, and then choose Add forecast. Amazon QuickSight will analyze the historical data using ML and present a forecast for the next 14 periods.

2. On the Forecast properties pane at the left, you can customize forecast settings. For example, you can change the number of periods to forecast into the future or add “forecast” periods into the past to compare historical actuals against ML-based expectations.

You can adjust the width of the prediction band by changing the prediction internal and manually setting the seasonality (number of periods). Choose Apply to save your changes.

3. Select a forecasted data point on the chart and choose What-if analysis. With What-if analysis, you can set target value for a particular date or date range, and Amazon QuickSight will adjust the forecast gracefully to meet the target.

4. Choose Apply to see the new forecast adjusted for the target along with the original forecast. You can hover over the data points to see details.

With ML-powered forecasting, Amazon QuickSight allows you to forecast complex, real-world scenarios such as data with multiple seasonality. Outliers will be excluded automatically, and missing values will be imputed.

To export the forecasting data in CSV format, choose the selector in the upper right corner of the visual, and then choose Export to CSV.

Auto-narratives

Auto narratives allow you to create natural language summaries of your visuals. You can embed these summaries into your dashboard to highlight key insights that are important for your readers, allowing them to access the data without having to sift through the entire dashboard. When you define a template, narratives update automatically as the data in your data set refreshes, just like a visual. The following steps show you how to get started with auto narratives.

1. In your time series chart, choose Insights again to show the suggested insights.

2. An easy way to add a narrative insight to your analysis is to choose the plus sign (+) next to a suggested insight. For the purpose of this walkthrough, choose the Day Over Day Change insight.

You’ll see an insight visual on your analysis with the predefined template. Notice that the field wells have the date, metric, and category filled in. These settings are populated from the visual that you used to create the insight visual. You can customize the fields as needed.

3. To edit the narrative, choose the insight visual menu and select Customize Narrative. You’ll see the Configure narrative pane where you can edit your insights template.. You can format the content with different sizes and colors using the formatting toolbar. You can also insert expressions and conditional statements like IF and FOR statements.

In the left pane, you can add computations to your narrative. Computations are predefined calculations such as period-over-period, period-to-date, growth-rate, max, min, and top movers, that you can reference in your template to describe your data. Currently, Amazon QuickSight supports 13 different types of computations. In this example, PeriodOverPeriod is added by default since we selected the Day Over Day Change insight from the suggested insights pane.

4. To add a new computation, choose Add computation in the bottom left corner of the pane. You’ll be prompted to select from a list of computations. For the purpose of this walkthrough, select the Growth rate computation type, and select Next.

5. You can configure certain aspects of the computation. In the case of growth rate, you can change the number of periods over which that you want to compute growth. After you make your selections, choose OK.

6. Now expand Computations on the left pane. You should see both PeriodOverPeriod and GrowthRate options.

Please note that computation names must be unique. When you create a computation, assign a unique name. You can reference multiple computations of the same type in your template. For example, if you have two metrics, such as $sales and units sold, you can create a GrowthRate computation for each of the metrics, with a different name for each computation. The specific computations can then be referenced by name in the template.

Also be aware that anomaly computation is not compatible with all other computation types. For example, if you have a PeriodOverPeriod or GrowthRate computation, you will not be able to add an Anomaly computation to the same insight visual.

7. To add growth rate to your narrative, enter the phrase Compounded Growth Rate for the last on the narrative template. From the Computations pane, choose GrowthRate, and then choose timePeriods to insert the expression GrowthRate.timePeriods into your narrative. This expression references the number of periods set in the configuration.

8. Complete the sentence by entering days is. Then add another expression from the Computations pane by choosing GrowthRate, then compounded GrowthRate, and then formattedValue. The selection formattedValue returns a phrase formatted according to the format applied to the metric on the field. To see a raw value in integer or decimal format, choose value instead of formattedValue.

Now, let’s try using a conditional statement.

1. To insert an IF statement, place the cursor at the end of your narrative template. From the Insert Code menu, choose Inline IF.

2. You’ll be prompted to enter some code. On the left pane choose GrowthRate, then choose compoundedGrowthRate, and then choose value. To insert the value, enter > 3, and choose Save.

3. For the conditional content, enter Great! Select the text and use the format menu to format menu to change the color to green.

4. Repeat the previous steps, entering <3 for the growth rate value. For conditional content, enter Bad! And format the text as red.

5. Choose Apply. You should see the results similar to the following.

The template provides you with a sophisticated tool to customize your narrative. Within the template, you can also reference parameters in your analysis or dashboard and leverage a set of built-in functions to perform more calculations.

The best way to get started with auto narratives and to learn the syntax is to use the existing templates built from suggested insights.  But you can also create insight visuals from scratch by choosing Add and then choosing Add Insight.

Try it yourself! Try creating a narrative that enumerates the top selling products for the last three months.

Conclusion

As you can see from the walkthrough, ML Insights helps you perform large scale anomaly detection and create business forecast in a few simple clicks. You can build rich and user-friendly auto narratives within your dashboards in minutes, without any custom development or ML skillset necessary.

 


About the Author

Luis Wang is a principal product manager for Amazon QuickSight. He’s been with AWS for over 6 years, working on various services including Amazon EC2 and then launching Amazon QuickSight. Luis is now focused on the application of machine learning and AI to business intelligence and analytics at QuickSight. He enjoys running, watching sitcoms and spending time with his family.

Amazon QuickSight announces ML Insights in preview

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/amazon-quicksight-announces-ml-insights-in-preview/

Amazon QuickSight is a fast, cloud-powered BI service that makes it easy for everyone in an organization to get business insights from their data through rich, interactive dashboards. With pay-per-session pricing and embedded dashboard, we made BI even more cost-effective and accessible to everyone.

However, as the volume of data that customers generate continues to grow every day, it’s becoming more challenging to harness their data for business insights. This is where machine learning comes into play. Amazon is a pioneer in using machine learning to automate and scale various aspects of business analytics in the supply chain, marketing, retail, and finance.

With AWS, you can use these capabilities through our extensive ML and AI services. Today, we are announcing three new features that integrate proven Amazon technologies into Amazon QuickSight to provide customers with ML-powered insights beyond visualizations:

  1. ML-powered anomaly detection to help customers uncover hidden insights by continuously analyzing across billions of data points.
  2. ML-powered forecasting and what-if analysis to predict key business metrics with point-and-click simplicity.
  3. Auto-narratives to help customers tell the story of their dashboard in a plain-language narrative.

ML-powered anomaly detection

Today, there’s no shortage of tools to visualize and report on business performance and metrics, but the real insights are often buried deep in aggregates. These insights generally don’t manifest in the existing reports and dashboard but are discovered only several days or weeks later. To add to the problemgrowing volumes of data in the cloud make manual analysis such as slice and dice and pivoting too time-consuming and not scalable. As such, customers often resolve to look only at a small subset of their data rather than the whole picture.

Amazon QuickSight uses proven Amazon technology to continuously run ML-powered anomaly detection on millions of metrics and billions of data points. This anomaly detection enables you to get deep insights that are often buried in the aggregates, not visible in plain sight, and not scalable with manual analysis. When anomalies are detected, Amazon QuickSight automatically sends you timely email alerts detailing the top contributors to each anomaly. With ML-powered anomaly detection, there’s no need for manual analysis, custom development, or ML domain expertise.

For example, say that you operate a business that sells products across various geographical regions. You can now use the ML-powered anomaly detection in Amazon QuickSight to continuously monitor your sales and orders at scale across all products, geographies, and customers. Amazon QuickSight’s ML algorithm continuously learns from your historical data patterns at the most granular level. It can alert you when it detects higher- or lower-than-expected sales for a particular product, in a specific city, or even for an individual customer.

With pay-per-session, we introduced industry-first usage-based pricing model for BI. Continuing down that path, we plan for ML-powered anomaly detection to have a usage-based pricing model based on the number of metrics processed in a month. To learn more about planned pricing and examples, see here.

Note: Customers will not be charged during preview. Pricing will apply when the feature becomes generally available.

ML-powered forecasting

We see customers wanting to perform common forecasting tasks that are too cumbersome and often out of reach for most business users. For example, suppose that you are a business manager and you want to forecast sales to see if you are going to meet your goal by the end of the year. Or suppose that you expect a large deal to come through in two weeks that is going to affect your overall forecast. Today, most business forecasting involves tedious, error-prone, and rigid processes using Excel that don’t scale. This is an area where Amazon QuickSight is using ML to simplify.

With ML-powered forecasting and what-if analyses in Amazon QuickSight, nontechnical users can now easily forecast their key business metrics with just a few clicks. No ML-expertise or Excel data modeling is required. The built-in ML algorithm in Amazon QuickSight is designed to handle complex real-world scenarios. Amazon QuickSight uses ML to provide more reliable forecasts than traditional means.

For example, you can forecast your business revenue with multiple levels of seasonality (for example, sales with both weekly and quarterly trends). Amazon QuickSight automatically excludes anomalies in the data (for example, a spike in sales due to price drop or promotion) from influencing the forecast. Additionally, you don’t have to clean and reprep the data with missing values because Amazon QuickSight automatically handles it under the hood. Finally, with ML-powered forecasting, you can perform interactive what-if analyses to determine the growth trajectory needed to meet business goals.

Auto-narratives

We’ve all been through this before. You open up a dashboard and spend hours staring at the data, sifting and drilling down the charts and deciphering rows and columns in the tables. You try to understand what the data is trying to say and engage in a lengthy discussion with your peers on your interpretations. As companies continue to expand their BI and reporting across all aspects of their business, this problem only gets worse.

Auto-narratives is a new feature that we’re announcing that provides key insights in everyday language, embedded contextually in your dashboard, saving hours on manual analysis. With auto-narratives, Amazon QuickSight automatically interprets the charts and tables in your dashboard and provides a number of Suggested Insights in natural language. Depending on the shape and form of your data, you might get suggestions such as what the day-over-day changes look like, what was the highest sales date, what the growth rate is at and what the forecast looks like for the next seven days. As the author of the dashboard, you have complete flexibility to customize the computations and business language for your unique needs. You can use auto-narratives to effectively tell the story of your data in plain language.

Sign up for the preview today!

ML Insights are available for preview starting today. You don’t need any ML or domain expertise to use these features. Simply sign up for the preview here.


About the Author

Luis Wang is a principal product manager for Amazon QuickSight. He’s been with AWS for over 6 years, working on various services including Amazon EC2 and then launching Amazon QuickSight. Luis is now focused on the application of machine learning and AI to business intelligence and analytics at QuickSight. He enjoys running, watching sitcoms and spending time with his family.

Amazon QuickSight Now Supports Search, Filter Groups, and Amazon S3 Analytics Connector

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/amazon-quicksight-now-supports-search-filter-groups-and-amazon-s3-analytics-connector/

Today, I’m excited to share information about some new features in Amazon QuickSight. First, you can now search for datasets, analyses, and dashboards in Amazon QuickSight using the unified search box, making it faster and easier to find and access your data. Next, you can now create filter groups with multiple filter conditions that are evaluated together using the OR operation. Finally, you can now use the built-in Amazon S3 analytics connector to visualize your S3 storage access patterns across multiple S3 buckets and configurations within a single Amazon QuickSight dashboard to optimize for cost.

Search

You can now easily and quickly find and access your datasets, analyses, and dashboards using the unified search box in Amazon QuickSight. Type in what you’re looking for and you get a list of all matches in a unified view. From there, you can take actions such as creating an analysis from a dataset, modifying a dataset, or accessing an analysis or dashboard.

Filter groups

Filters are one of the most important features in Amazon QuickSight. Before this release, you could create multiple filters that were evaluated using the AND operation. With this release, you can now create multiple filters that are evaluated using the OR operation. This provides you with the flexibility to apply more complex filters to your data and visualizations. For example, you can create a chart that shows customers who have spent less than $100 OR made three or more purchases.

Amazon S3 analytics connector

In July, AWS introduced the ability to analyze and visualize your Amazon S3 storage access patterns using Amazon QuickSight in one click from the S3 console. Today, AWS released a dedicated S3 analytics connector in Amazon QuickSight. This connector allows you to import S3 analytics data for different buckets and configurations into a single Amazon QuickSight dataset. With this dataset, you can then create analyses and dashboards that tracks all of your S3 usage patterns in a single view.

Learn more

To learn more about these capabilities and start using them in your dashboards, see the Amazon QuickSight User Guide.

Stay engaged

If you have questions or suggestions, you can post them on the Amazon QuickSight discussion forum.

Not an Amazon QuickSight user?

To get started for FREE, see quicksight.aws.

 

Amazon QuickSight Now Supports Amazon Athena in EU (Ireland), Count Distinct, and Week Aggregation

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/amazon-quicksight-now-supports-amazon-athena-in-eu-ireland-count-distinct-and-week-aggregation/

Today, I’m excited to share a couple of new features in Amazon QuickSight. First, with this release, we expanded connectivity options by adding Amazon Athena support in the EU (Ireland) Region. Additionally, you can now use Count Distinct on your dimensions and metrics in the visualizations and aggregate date fields by week for SPICE data sets.

Athena in Ireland

Athena is one of the most popular data sources used by QuickSight customers. It allows you to deploy a serverless BI and analytics architecture for your operational and business data. With this release, the Athena connector is now available in the EU (Ireland) Region. You can connect QuickSight to your Athena databases and tables in the region and start visualizing your data in a matter of seconds.

Count Distinct

You can now perform aggregations using Count Distinct in the visualizations, one of the top requests from users. To use Count Distinct, simply select Count Distinct as the aggregation on the visual axis or in the field well. Count Distinct is supported for both direct queries and SPICE data sets. You can apply it to strings and measures. It is available for all supported visualization types.

Date aggregation by week

Time series line charts are one of the most common ways for customers to report on business trends. In addition to Year, Month, Day and Hour, you can now aggregate date fields by WEEK and visualize your data at a weekly granularity.

Learn more

To learn more about these capabilities and start using them in your dashboards, see the QuickSight User Guide.

Stay engaged

If you have questions or suggestions, you can post them on the QuickSight Discussion Forum.

Not a QuickSight user?

To get started for FREE, see quicksight.aws.

 

Visualize Amazon S3 Analytics Data with Amazon QuickSight

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/visualize-amazon-s3-analytics-data-with-amazon-quicksight/

When Amazon S3 analytics was released in November 2016, it gave you the ability to analyze storage access patterns and transition the right data to the right storage class. You could also manually export the data to an S3 bucket to analyze, using the business intelligence tool of your choice, and gather deeper insights on usage and growth patterns. This helped you reduce storage costs while optimizing performance based on usage patterns.

With today’s update, you can quickly and easily gain those deeper insights and benefits by analyzing and visualizing S3 analytics data in Amazon QuickSight. It takes just a single click from the S3 console, without the need for manual exports or additional data preparation.

If you already have S3 analytics storage class analysis enabled for your buckets, choose Explore in QuickSight on the top right.

Users new to Amazon QuickSight can follow the steps to get set up. Existing QuickSight users are automatically deep linked to an analysis of their S3 analytics data.

From there, you can access the pre-built visualizations to understand the storage access pattern of your bucket. For example, you can visualize the amount of data retrieved vs. the amount of storage consumed for objects of different age groups to identify infrequently accessed data. You can also create new visualizations and perform ad hoc analysis to break down the access patterns by the filters that you have defined in S3 analytics.

 

Finally, you can set up a daily scheduled refresh of the storage class analysis data set in Amazon QuickSight to keep it up to date. Publish and share the analysis as a dashboard to other users in your organization to monitor your S3 storage access pattern.

This feature is now available in all QuickSight regions: US East (N. Virginia and Ohio), US West (Oregon), and EU (Ireland).

Learn more

To learn more about these capabilities and start using them in your dashboards, check out the Amazon QuickSight User Guide.

Stay engaged

If you have questions and suggestions, post them on the Amazon QuickSight Discussion Forum.

Not a QuickSight user?

Go to the Amazon QuickSight website to get started for FREE.

 

Amazon QuickSight Adds Support for Amazon Redshift Spectrum

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/amazon-quicksight-adds-support-for-amazon-redshift-spectrum/

In April, we announced Amazon Redshift Spectrum in the AWS Blog. Redshift Spectrum is a new feature of Amazon Redshift that allows you to run complex SQL queries against exabytes of data in Amazon without having to load and transform any data.

We’re happy to announce that Amazon QuickSight now supports Redshift Spectrum. Starting today, QuickSight customers can leverage Redshift Spectrum to visualize and analyze vast amounts of unstructured data in their Amazon S3 data lake. With QuickSight and Redshift Spectrum, customers can now visualize combined data sets that include frequently accessed data stored in Amazon Redshift and bulk data sets stored cost effectively in S3 using the familiar SQL syntax of Amazon Redshift.

With Redshift Spectrum, you can start querying your data in Amazon S3 immediately, with no loading or transformation required. You just need to register your Amazon Athena data catalog or Hive Metastore as an external schema. You can then use QuickSight to select the external schema and the Redshift Spectrum tables—just like any other Amazon Redshift tables in your cluster―and start visualizing your S3 data in seconds. You don’t have to worry about scaling your cluster. Redshift Spectrum lets you separate storage and compute, allowing you to scale each independently. You only pay for the queries that you run.

Redshift Spectrum support is now available in all QuickSight regions – US East (N. Virginia and Ohio), US West (Oregon), and EU (Ireland).

To learn more about these capabilities and start using them in your dashboards, check out the QuickSight User Guide.

If you have questions and suggestions, post them on the QuickSight Discussion Forum.

Not a QuickSight user? Get started for FREE on the QuickSight page.

 

Visualize Big Data with Amazon QuickSight, Presto, and Apache Spark on Amazon EMR

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/visualize-big-data-with-amazon-quicksight-presto-and-apache-spark-on-amazon-emr/

Last December, we introduced the Amazon Athena connector in Amazon QuickSight, in the Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight post.

The connector allows you to visualize your big data easily in Amazon S3 using Athena’s interactive query engine in a serverless fashion. This turned out to be a very popular combination, as customers benefit from the speed, agility, and cost benefit that serverless business intelligence (BI) and analytics architecture brings.

Today, we’re excited to announce two new native connectors in QuickSight for big data analytics: Presto and Spark. With the Presto and SparkSQL connector in QuickSight, you can easily create interactive visualizations over large datasets using Amazon EMR.

EMR provides a simple and cost effective way to run highly distributed processing frameworks such as Presto and Spark when compared to on-premises deployments. EMR provides you with the flexibility to define specific compute, memory, storage, and application parameters and optimize your analytic requirements.

In this post, I walk you through connecting QuickSight to an EMR cluster running Presto. If you’d like a walkthrough with Spark, let us know in the comments section!

Presto overview

Presto is an open source, distributed SQL query engine for running interactive analytic queries against data sources ranging from gigabytes to petabytes. It supports the ANSI SQL standard, including complex queries, aggregations, joins, and window functions. Presto can run on multiple data sources, including Amazon S3.

Presto’s execution framework is fundamentally different from that of Hive/MapReduce. Presto has a custom query and execution engine where the stages of execution are pipelined, similar to a directed acyclic graph (DAG), and all processing occurs in memory to reduce disk I/O. This pipelined execution model can run multiple stages in parallel and streams data from one stage to another as the data becomes available. This reduces end-to-end latency and makes Presto a great tool for ad hoc data exploration over large data sets.

Walkthrough

Use the following steps to connect QuickSight to an EMR cluster running Presto:

  1. Create an EMR cluster with the latest 5.5.0 release.
  2. Configure LDAP for user authentication in QuickSight.
  3. Configure SSL using a QuickSight supported certificate authority (CA).
  4. Create tables for Presto in the Hive metastore.
  5. Whitelist the QuickSight IP address range in your EMR master security group rules.
  6. Connect QuickSight to Presto and create some visualizations.

Prerequisites

You need run Presto version 0.167, at a minimum, which is the first release that supports LDAP authentication. LDAP authentication is a requirement for the Presto and Spark connectors and QuickSight refuses to connect if LDAP is not configured on your cluster.

Create an EMR cluster with release version 5.5.0

In the EMR console, use the Quick Create option to create a cluster.  For this post, use most of the default settings with a few exceptions. To install both Presto and Spark on your cluster (and customize other settings), create your cluster from the Advanced Options wizard instead.

Make sure that EMR release 5.5.0 is selected and under Applications, choose Presto. If you have an EC2 key pair, you can use it. Otherwise, create a key pair (.PEM file) and then return to this page to create the cluster. 

Make sure that you configure your cluster’s security group inbound rules to allow SSH from your machine’s IP address range. 

Configure LDAP for user authentication in QuickSight

After your cluster is in a running state, connect using SSH to your cluster to configure LDAP authentication.

To SSH into your EMR cluster, use the following commands in the terminal:

chmod 600 ~/YOUR_PEM_FILE.pem
ssh -i ~/YOUR_PEM_FILE.pem [email protected]_MASTER_PUBLIC_DNS_FROM_EMR_CLUSTER

After you log in, install OpenLDAP, configure it, and create users in the directory. For more about configuring LDAP, see Editing /etc/openldap/slapd.conf in the OpenLDAP documentation.

# Install LDAP Server
 sudo yum install openldap openldap-servers openldap-clients
# Create the config files
sudo cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG
sudo cp /usr/share/openldap-servers/slapd.conf.obsolete /etc/openldap/slapd.conf
# Bounce LDAP
sudo service slapd restart

After LDAP is installed and restarted, you issue a couple of commands to change the LDAP password. First, generate a hash for the LDAP root password and save the output hash that looks like this:

{SSHA}DmD616c3yZyKndsccebZK/vmWiaQde83

Issue the following command and set a root password for LDAP when prompted:

slappasswd

Now, prepare the commands to set the password for the LDAP root. Make sure to replace the hash below with the one that you generated in the previous step:

cat > /tmp/config.ldif <<EOF
dn: olcDatabase={0}config,cn=config
changetype: modify
add: olcRootPW
olcRootPW: {SSHA}DmD616c3yZyKndsccebZK/vmWiaQde83

dn: olcDatabase={2}bdb,cn=config
changetype: modify
add: olcRootPW
olcRootPW: {SSHA}DmD616c3yZyKndsccebZK/vmWiaQde83
-
replace: olcRootDN
olcRootDN: cn=dev,dc=example,dc=com
-
replace: olcSuffix
olcSuffix: dc=example,dc=com
EOF

Run the following command to execute the above commands against LDAP:

sudo ldapadd -Y EXTERNAL -H ldapi:/// -f /tmp/ config.ldif

Next, create a user account with password in the LDAP directory with the following commands. When prompted for a password, use the LDAP root password that you created in the previous step.

cat > /tmp/accounts.ldif <<EOF
dn: dc=example,dc=com
objectclass: domain
objectclass: top
dc: example

dn: ou= dev,dc=example,dc=com
objectclass: organizationalUnit
ou: dev
description: Container for developer entries

dn: uid=<REPLACE_WITH_YOUR_USER_NAME>,ou=dev,dc=example,dc=com
uid: <REPLACE_WITH_YOUR_USER_NAME>
objectClass: inetOrgPerson
userPassword: <REPLACE_WITH_STRONG_PASSWORD>
sn: <REPLACE_WITH_SURNAME>
cn: dev
EOF

ldapadd -D "cn=dev,dc=example,dc=com" -W -f /tmp/accounts.ldif

You now have OpenLDAP configured on your EMR cluster running Presto and a user that you later use to authenticate against when connecting to Presto.

Configure SSL using a QuickSight supported certificate authority

To ensure that any communication between QuickSight and Presto is secured, QuickSight requires that the connection to be established with SSL enabled. You need to obtain a certificate from a certificate authority (CA) that QuickSight trusts. You can find the full list of public CAs accepted by QuickSight in the Network and Database Configuration Requirements topic.

To set up SSL on LDAP and Presto, obtain the following three SSL certificate files from your CA and store them in the /home/hadoop/ directory.

  • Certificate key file
  • Certificate file
  • CA certificate

Configure the keys in LDAP with the following commands:

cat > /tmp/ca.ldif <<EOF
dn: cn=config
replace: olcTLSCertificateKeyFile
olcTLSCertificateKeyFile: /home/hadoop/certificateKey.pem

replace: olcTLSCertificateFile
olcTLSCertificateFile: /home/hadoop/certificate.pem

replace: olcTLSCACertificateFile
olcTLSCACertificateFile: /home/hadoop/cacertificate.pem
EOF

sudo ldapmodify -Y EXTERNAL -H ldapi:/// -f /tmp/ca.ldif

Now, enable SSL in LDAP by editing the /etc/sysconfi/ldap file and set SLAPD_LDAPS=yes:

sudo vi /etc/sysconfig/ldap

SLAPD_LDAPS=yes

sudo service slapd restart

Use the following commands to generate keystore. You will be prompted to provide a password for the keystore.

openssl pkcs12 -inkey certificatekey.pem -in certificate.pem -export -out server-key.p12

keytool -importkeystore -srckeystore server-key.p12 -srcstoretype PKCS12 -destkeystore server.keystore

Edit the configuration files for Presto in EMR.

SERVERNAME=<PUBLIC_DNS_NAME_OF_EMR_CLUSTER>
cd /etc/presto/conf

# Enable LDAPS auth for Presto
echo http-server.authentication.type=LDAP | sudo tee -a config.properties
echo authentication.ldap.url=ldaps://${SERVERNAME}:636 | sudo tee -a config.properties
echo authentication.ldap.user-bind-pattern=uid=\${USER},OU=dev,DC=example,DC=com | sudo tee -a config.properties

# Enable SSL for the Presto server
echo http-server.https.enabled=true | sudo tee -a config.properties
echo http-server.https.port=<PORT_NUMBER> | sudo tee -a config.properties
echo http-server.https.keystore.path=/home/hadoop/server.keystore | sudo tee -a config.properties
echo http-server.https.keystore.key=<KEYSTORE_PASSWORD> | sudo tee -a config.properties

# Bounce Presto to pick up the new config
sudo pkill presto
# wait until presto is up
while [[ 1 ]]; do pgrep presto; if [ $? -eq 0 ]; then break; else echo -n .; sleep 1; fi; done

Create tables for Presto in the Hive metastore

Now that you have a running EMR cluster with Presto and LDAP set up, you can load some sample data into the cluster for analysis. Use the same CloudFront log sample data set that is available for Athena. The following SQL query creates a table in EMR and loads the sample data set into it:

# Run hive
$hive

#Create table and load data
CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (
  Date Date,
  Time STRING,
  Location STRING,
  Bytes INT,
  RequestIP STRING,
  Method STRING,
  Host STRING,
  Uri STRING,
  Status INT,
  Referrer STRING,
  OS String,
  Browser String,
  BrowserVersion String
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\()+[\()([^\;]+).*\%20([^\/]+)[\/](.*)$"
) LOCATION 's3://athena-examples/cloudfront/plaintext/'; 

# quit hive
quit

Try to query the data using the Presto CLI with the following commands:

#Run the Presto CLI
presto-cli --server https://<PUBLIC_DNS_NAME_OF_EMR_CLUSTER>:<PORT_NUMER>     --user <USERNAME> --password --catalog hive

#Issue query to Presto
SELECT * FROM cloudfront_logs limit 10;

You should see an output from Presto like the following:

Whitelist the QuickSight IP address range in your EMR master security group rules

Now you’re ready to connect QuickSight to Presto. For QuickSight to connect to Presto, you need to make sure that Presto is reachable by QuickSight’s public endpoints by adding QuickSight’s IP address ranges to your EMR master node security group.

Connect QuickSight to Presto and create some visualizations

If you have not already signed up for QuickSight, you can do so at https://quicksight.aws. QuickSight offers a 1 user and 1 GB perpetual free tier.

After you’re signed up for QuickSight, navigate to the New Analysis page and the New Data Set page. You see the new Presto and Spark connector as in the following screenshot.

Open the Presto connector, provide the connection details in the modal window, and choose Create data source.

Select the default schema and choose the cloudfront_logs table that you just created.

In QuickSight, you can choose between importing the data in SPICE for analysis or directly querying your data in Presto. SPICE is an in-memory optimized columnar engine in QuickSight that enable fast, interactive visualization as you explore your data. For this post, choose to import the data into SPICE and choose Visualize.

In the analysis view, you can see the notification that shows import is complete with 4996 rows imported. On the left, you see the list of fields available in the data set and below, the various types of visualizations from which you can choose.

QuickSight makes it easy for you to create visualizations and analyze data with AutoGraph, a feature that automatically selects the best visualization for you based on selected fields.

To create a visualization, select the fields on the left panel. In this case, look at the number of connections to CloudFront ordered by the various OS types, by selecting the OS field. Additionally, you can select the bytes fields to look at total bytes transferred by OS instead of count.

Summary

You just finished creating an EMR cluster, setting up Presto and LDAP with SSL, and using QuickSight to visualize your data. I hope this post was helpful. Feel free to reach out if you have any questions or suggestions.

Learn more

To learn more about these capabilities and start using them in your dashboards, check out the QuickSight User Guide.

Stay engaged

If you have questions and suggestions, you can post them on the QuickSight forum. 

Not a QuickSight user

Go to the QuickSight website to get started for FREE.