Tag Archives: Amazon QuickSight

ICYMI: Serverless Q4 2019

Post Syndicated from Rob Sutter original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2019/

Welcome to the eighth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

The three months comprising the fourth quarter of 2019

AWS re:Invent

AWS re:Invent 2019

re:Invent 2019 dominated the fourth quarter at AWS. The serverless team presented a number of talks, workshops, and builder sessions to help customers increase their skills and deliver value more rapidly to their own customers.

Serverless talks from re:Invent 2019

Chris Munns presenting 'Building microservices with AWS Lambda' at re:Invent 2019

We presented dozens of sessions showing how customers can improve their architecture and agility with serverless. Here are some of the most popular.

Videos

Decks

You can also find decks for many of the serverless presentations and other re:Invent presentations on our AWS Events Content.

AWS Lambda

For developers needing greater control over performance of their serverless applications at any scale, AWS Lambda announced Provisioned Concurrency at re:Invent. This feature enables Lambda functions to execute with consistent start-up latency making them ideal for building latency sensitive applications.

As shown in the below graph, provisioned concurrency reduces tail latency, directly impacting response times and providing a more responsive end user experience.

Graph showing performance enhancements with AWS Lambda Provisioned Concurrency

Lambda rolled out enhanced VPC networking to 14 additional Regions around the world. This change brings dramatic improvements to startup performance for Lambda functions running in VPCs due to more efficient usage of elastic network interfaces.

Illustration of AWS Lambda VPC to VPC NAT

New VPC to VPC NAT for Lambda functions

Lambda now supports three additional runtimes: Node.js 12, Java 11, and Python 3.8. Each of these new runtimes has new version-specific features and benefits, which are covered in the linked release posts. Like the Node.js 10 runtime, these new runtimes are all based on an Amazon Linux 2 execution environment.

Lambda released a number of controls for both stream and async-based invocations:

  • You can now configure error handling for Lambda functions consuming events from Amazon Kinesis Data Streams or Amazon DynamoDB Streams. It’s now possible to limit the retry count, limit the age of records being retried, configure a failure destination, or split a batch to isolate a problem record. These capabilities help you deal with potential “poison pill” records that would previously cause streams to pause in processing.
  • For asynchronous Lambda invocations, you can now set the maximum event age and retry attempts on the event. If either configured condition is met, the event can be routed to a dead letter queue (DLQ), Lambda destination, or it can be discarded.

AWS Lambda Destinations is a new feature that allows developers to designate an asynchronous target for Lambda function invocation results. You can set separate destinations for success and failure. This unlocks new patterns for distributed event-based applications and can replace custom code previously used to manage routing results.

Illustration depicting AWS Lambda Destinations with success and failure configurations

Lambda Destinations

Lambda also now supports setting a Parallelization Factor, which allows you to set multiple Lambda invocations per shard for Kinesis Data Streams and DynamoDB Streams. This enables faster processing without the need to increase your shard count, while still guaranteeing the order of records processed.

Illustration of multiple AWS Lambda invocations per Kinesis Data Streams shard

Lambda Parallelization Factor diagram

Lambda introduced Amazon SQS FIFO queues as an event source. “First in, first out” (FIFO) queues guarantee the order of record processing, unlike standard queues. FIFO queues support messaging batching via a MessageGroupID attribute that supports parallel Lambda consumers of a single FIFO queue, enabling high throughput of record processing by Lambda.

Lambda now supports Environment Variables in the AWS China (Beijing) Region and the AWS China (Ningxia) Region.

You can now view percentile statistics for the duration metric of your Lambda functions. Percentile statistics show the relative standing of a value in a dataset, and are useful when applied to metrics that exhibit large variances. They can help you understand the distribution of a metric, discover outliers, and find hard-to-spot situations that affect customer experience for a subset of your users.

Amazon API Gateway

Screen capture of creating an Amazon API Gateway HTTP API in the AWS Management Console

Amazon API Gateway announced the preview of HTTP APIs. In addition to significant performance improvements, most customers see an average cost savings of 70% when compared with API Gateway REST APIs. With HTTP APIs, you can create an API in four simple steps. Once the API is created, additional configuration for CORS and JWT authorizers can be added.

AWS SAM CLI

Screen capture of the new 'sam deploy' process in a terminal window

The AWS SAM CLI team simplified the bucket management and deployment process in the SAM CLI. You no longer need to manage a bucket for deployment artifacts – SAM CLI handles this for you. The deployment process has also been streamlined from multiple flagged commands to a single command, sam deploy.

AWS Step Functions

One powerful feature of AWS Step Functions is its ability to integrate directly with AWS services without you needing to write complicated application code. In Q4, Step Functions expanded its integration with Amazon SageMaker to simplify machine learning workflows. Step Functions also added a new integration with Amazon EMR, making EMR big data processing workflows faster to build and easier to monitor.

Screen capture of an AWS Step Functions step with Amazon EMR

Step Functions step with EMR

Step Functions now provides the ability to track state transition usage by integrating with AWS Budgets, allowing you to monitor trends and react to usage on your AWS account.

You can now view CloudWatch Metrics for Step Functions at a one-minute frequency. This makes it easier to set up detailed monitoring for your workflows. You can use one-minute metrics to set up CloudWatch Alarms based on your Step Functions API usage, Lambda functions, service integrations, and execution details.

Step Functions now supports higher throughput workflows, making it easier to coordinate applications with high event rates. This increases the limits to 1,500 state transitions per second and a default start rate of 300 state machine executions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). Click the above link to learn more about the limit increases in other Regions.

Screen capture of choosing Express Workflows in the AWS Management Console

Step Functions released AWS Step Functions Express Workflows. With the ability to support event rates greater than 100,000 per second, this feature is designed for high-performance workloads at a reduced cost.

Amazon EventBridge

Illustration of the Amazon EventBridge schema registry and discovery service

Amazon EventBridge announced the preview of the Amazon EventBridge schema registry and discovery service. This service allows developers to automate discovery and cataloging event schemas for use in their applications. Additionally, once a schema is stored in the registry, you can generate and download a code binding that represents the schema as an object in your code.

Amazon SNS

Amazon SNS now supports the use of dead letter queues (DLQ) to help capture unhandled events. By enabling a DLQ, you can catch events that are not processed and re-submit them or analyze to locate processing issues.

Amazon CloudWatch

Amazon CloudWatch announced Amazon CloudWatch ServiceLens to provide a “single pane of glass” to observe health, performance, and availability of your application.

Screenshot of Amazon CloudWatch ServiceLens in the AWS Management Console

CloudWatch ServiceLens

CloudWatch also announced a preview of a capability called Synthetics. CloudWatch Synthetics allows you to test your application endpoints and URLs using configurable scripts that mimic what a real customer would do. This enables the outside-in view of your customers’ experiences, and your service’s availability from their point of view.

CloudWatch introduced Embedded Metric Format, which helps you ingest complex high-cardinality application data as logs and easily generate actionable metrics. You can publish these metrics from your Lambda function by using the PutLogEvents API or using an open source library for Node.js or Python applications.

Finally, CloudWatch announced a preview of Contributor Insights, a capability to identify who or what is impacting your system or application performance by identifying outliers or patterns in log data.

AWS X-Ray

AWS X-Ray announced trace maps, which enable you to map the end-to-end path of a single request. Identifiers show issues and how they affect other services in the request’s path. These can help you to identify and isolate service points that are causing degradation or failures.

X-Ray also announced support for Amazon CloudWatch Synthetics, currently in preview. CloudWatch Synthetics on X-Ray support tracing canary scripts throughout the application, providing metrics on performance or application issues.

Screen capture of AWS X-Ray Service map in the AWS Management Console

X-Ray Service map with CloudWatch Synthetics

Amazon DynamoDB

Amazon DynamoDB announced support for customer-managed customer master keys (CMKs) to encrypt data in DynamoDB. This allows customers to bring your own key (BYOK) giving you full control over how you encrypt and manage the security of your DynamoDB data.

It is now possible to add global replicas to existing DynamoDB tables to provide enhanced availability across the globe.

Another new DynamoDB capability to identify frequently accessed keys and database traffic trends is currently in preview. With this, you can now more easily identify “hot keys” and understand usage of your DynamoDB tables.

Screen capture of Amazon CloudWatch Contributor Insights for DynamoDB in the AWS Management Console

CloudWatch Contributor Insights for DynamoDB

DynamoDB also released adaptive capacity. Adaptive capacity helps you handle imbalanced workloads by automatically isolating frequently accessed items and shifting data across partitions to rebalance them. This helps reduce cost by enabling you to provision throughput for a more balanced workload instead of over provisioning for uneven data access patterns.

Amazon RDS

Amazon Relational Database Services (RDS) announced a preview of Amazon RDS Proxy to help developers manage RDS connection strings for serverless applications.

Illustration of Amazon RDS Proxy

The RDS Proxy maintains a pool of established connections to your RDS database instances. This pool enables you to support a large number of application connections so your application can scale without compromising performance. It also increases security by enabling IAM authentication for database access and enabling you to centrally manage database credentials using AWS Secrets Manager.

AWS Serverless Application Repository

The AWS Serverless Application Repository (SAR) now offers Verified Author badges. These badges enable consumers to quickly and reliably know who you are. The badge appears next to your name in the SAR and links to your GitHub profile.

Screen capture of SAR Verifiedl developer badge in the AWS Management Console

SAR Verified developer badges

AWS Developer Tools

AWS CodeCommit launched the ability for you to enforce rule workflows for pull requests, making it easier to ensure that code has pass through specific rule requirements. You can now create an approval rule specifically for a pull request, or create approval rule templates to be applied to all future pull requests in a repository.

AWS CodeBuild added beta support for test reporting. With test reporting, you can now view the detailed results, trends, and history for tests executed on CodeBuild for any framework that supports the JUnit XML or Cucumber JSON test format.

Screen capture of AWS CodeBuild

CodeBuild test trends in the AWS Management Console

Amazon CodeGuru

AWS announced a preview of Amazon CodeGuru at re:Invent 2019. CodeGuru is a machine learning based service that makes code reviews more effective and aids developers in writing code that is more secure, performant, and consistent.

AWS Amplify and AWS AppSync

AWS Amplify added iOS and Android as supported platforms. Now developers can build iOS and Android applications using the Amplify Framework with the same category-based programming model that they use for JavaScript apps.

Screen capture of 'amplify init' for an iOS application in a terminal window

The Amplify team has also improved offline data access and synchronization by announcing Amplify DataStore. Developers can now create applications that allow users to continue to access and modify data, without an internet connection. Upon connection, the data synchronizes transparently with the cloud.

For a summary of Amplify and AppSync announcements before re:Invent, read: “A round up of the recent pre-re:Invent 2019 AWS Amplify Launches”.

Illustration of AWS AppSync integrations with other AWS services

Q4 serverless content

Blog posts

October

November

December

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page.

Here are the ones from Q4:

Twitch

October

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

AWS Serverless Heroes

We are excited to welcome some new AWS Serverless Heroes to help grow the serverless community. We look forward to some amazing content to help you with your serverless journey.

AWS Serverless Application Repository (SAR) Apps

In this edition of ICYMI, we are introducing a section devoted to SAR apps written by the AWS Serverless Developer Advocacy team. You can run these applications and review their source code to learn more about serverless and to see examples of suggested practices.

Still looking for more?

The Serverless landing page has much more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. We’re also kicking off a fresh series of Tech Talks in 2020 with new content providing greater detail on everything new coming out of AWS for serverless application developers.

Throughout 2020, the AWS Serverless Developer Advocates are crossing the globe to tell you more about serverless, and to hear more about what you need. Follow this blog to keep up on new launches and announcements, best practices, and examples of serverless applications in action.

You can also follow all of us on Twitter to see latest news, follow conversations, and interact with the team.

Chris Munns: @chrismunns
Eric Johnson: @edjgeek
James Beswick: @jbesw
Moheeb Zara: @virgilvox
Ben Smith: @benjamin_l_s
Rob Sutter: @rts_rob
Julian Wood: @julian_wood

Happy coding!

Amazon QuickSight: 2019 in review

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/amazon-quicksight-2019-in-review/

2019 has been an exciting year for Amazon QuickSight. We onboarded thousands of customers, expanded our global presence to 10 AWS Regions, and launched over 60 features—more than a feature a week! We are inspired by you—our customers, and all that you do with Amazon QuickSight. We are thankful for the time you spend with us across in-person meetings, conference calls, emails, discussion forums, and AWS summits. As we close this year, here’s a quick summary of the highlights.

re:Invent 2019

The Amazon QuickSight team was at re:Invent alongside customers such as Best Western, Capital One, and Club OS, who spoke about their experiences implementing and using Amazon QuickSight for their analytics needs. We also ran two hands-on workshops using the newly launched APIs.

ANT324: Deploy business analytics at enterprise scale with Amazon QuickSight

This session discussed how enterprises are rolling out Amazon QuickSight Enterprise Edition for all their users and using features such as Active Directory or Federated SSO (SAML/OpenID Connect) authentication, private connectivity to data in AWS, email reports, and embedded dashboards.

Best Western rolled out Amazon QuickSight for business reporting across tens of thousands of users, using it in conjunction with Amazon Redshift and AWS Glue. While their previous legacy reporting system took 18 months and over 7,000 staff hours to upgrade server software versions, they now benefit from Amazon QuickSight’s modern reporting infrastructure and constant feature updates, and do not need to manage any servers to report on this data.

Capital One has made Amazon QuickSight available internally to their BI community of over 25,000 users. Key benefits that the Capital One team sees from QuickSight include the ability to roll out new BI use cases without server setup or capacity planning, integrated machine learning (ML) capabilities, and the absence of a traditional software update cycle. All this is with PrivateLink connectivity to data in AWS (Snowflake, Presto, Amazon Redshift, and Amazon RDS), and pay-per-session pricing for consumption with a max charge of $5 per month, per reader.

ANT217: Embedding analytics into applications with Amazon QuickSight

This session dove into the latest features that allow you to integrate Amazon QuickSight dashboards into your software development lifecycle, including new APIs, theming capabilities, and dashboard embedding. The full session recording is available on YouTube.

As an Independent Software Vendor (ISV), Club OS uses Amazon QuickSight’s embedded capabilities to add analytics to their cloud-based Customer Relationship Management for fitness and wellness businesses. Amazon QuickSight’s serverless architecture lets Club OS focus on meeting actual customer needs without being burdened by operations and infrastructure management. This architecture also provides fast, consistent performance for end-users; the analytics dashboards are rolled out to over 40,000 users across 3,500 health and fitness locations. For more information, see the section of the talk on YouTube by Nicholas Hahn, VP of Product Club OS.

As an enterprise with tens of thousands of internal users, Capital One uses Amazon QuickSight’s embedded capabilities to add analytics to internal applications and portals. With no servers to set up, teams can set up embedded dashboards in their portals in as quickly as a week, whether they want to serve this to just a few users or thousands. Traditional BI approaches to such portals would require server setup and maintenance, and expensive licensing contracts. For more information, see the section of the talk on YouTube by Latha Govada, Senior Manager Analytics at Capital One.

ANT302 – Enhancing your applications with Amazon QuickSight dashboards

This hands-on workshop session guided users through setting up an Amazon QuickSight Enterprise Edition account, connecting to data, creating dashboards, and using APIs to add users, move assets across development and staging versions, and embedding dashboards in a web portal. For more information, see Dashboard Embedding & Operationalizing with Quicksight APIs. Try it out for yourself!

Embedding, theming, and APIs

We followed up on our launch of dashboard embedding for federated AWS identities at re:Invent 2018 (for more information about how the NFL uses embedded Amazon QuickSight dashboards for hundreds of stakeholders of Next Gen Stats, see the session video on YouTube). In 2019, we launched the following features:

Machine learning integrations

We added the following native ML integrations in Amazon QuickSight to help you make the most of your data in AWS and QuickSight:

Visualizations and interactivity

We added the following support for new chart types, conditional formatting, QuickSight Actions, sheets in dashboards, and more:

Calculations and aggregations

We added the following new aggregation options, new functions, and more:

Data connectivity and security

We increased SPICE limits, provided finer control over data sources in QuickSight, enabled data source sharing, and more:

  • SPICE datasets now accommodate up to 100 million rows of data in Enterprise Edition and 25 million rows for Standard Edition. For more information, see Data Source Limits.
  • Fine-grained access control over S3 and Athena allows you to scope down access to these data sources to specific users or groups in Amazon QuickSight using IAM.
  • Cross data source joins allow business analysts to perform joins across supported data sources without relying on data engineering teams to set up complex ETL processes.
  • Shared data sources centralize credential management of data sources. This also allows shared ownership of data connections, allowing collaboration over SQL scripts that define custom SQL datasets.
  • Connecting to Presto data sources in AWS without any public routing of data similar to Amazon Redshift, Snowflake, RDS, and others.
  • Amazon QuickSight data sources connect to Amazon Athena, which you can tag against specific workgroups. This allows cost allocation of Athena queries by workgroup.

Mobile

We launched the new QuickSight Mobile app for iOS and Android, which allows you to access your dashboards and explore your data with drill-downs and filters. For more information, see New QuickSignt Mobile App on YouTube.

Geographic availability

We added the following new Regions and language support in Amazon QuickSight:

  • Amazon QuickSight is now available in 10 AWS regions: US East (N. Virginia and Ohio), US West (Oregon), EU (Frankfurt, Ireland, and London), and Asia Pacific (Seoul, Singapore, Sydney, and Tokyo).
  • The Amazon QuickSight interface is now available in 10 languages: English, German, Spanish, French, Portuguese, Italian, Japanese Korean, Simplified Chinese, and Traditional Chinese.

With all of our updates you can build dashboards like the ones below and integrate them into your applications and portals.

Looking ahead

To see the full list of 2019 launches, see What’s New in Amazon QuickSight or subscribe to the Amazon QuickSight YouTube channel for the latest training and feature walkthroughs. We have a packed roadmap for 2020, and continue to focus on enabling you with insights from all your data, sharing these with all your users, while not having to worry about operations and servers. Thank you for your support.

We wish you all the very best in the new year (and decade)!


About the authors

Jose Kunnackal John is a principal product manager for Amazon QuickSight.

 

 

 

 

Sahitya Pandiri is a senior technical program manager with Amazon Web Services. Sahitya has been in product/program management for 6 years now, and has built multiple products in the retail, healthcare, and analytics space.

 

 

 

Highlight Critical Insights with Conditional Formatting in Amazon QuickSight

Post Syndicated from Susan Fang original https://aws.amazon.com/blogs/big-data/highlight-critical-insights-with-conditional-formatting-in-amazon-quicksight/

Amazon QuickSight now makes it easier for you to spot the highlights or low-lights in data through conditional formatting. With conditional formatting, you can specify customized text or background colors based on field values in the dataset, using solid or gradient colors. You can also display data values with the supported icons. Using color coding and icons, you can visually explore and analyze data, quickly detect issues, and identify trends, as seen in the following screenshot. These dynamic visual cues also help your dashboard viewers cut through the densely populated data values and get to the attention-worthy insights much faster.

With this release, you can apply conditional formatting to tables, pivot tables and KPI charts. This blog post walks you through how to leverage conditional formatting in each supported chart type in Amazon QuickSight.

Tables

You can apply conditional formatting to tables through the following methods.

Conditionally formatting columns based on data fields

When applying conditional formatting to tables, there are two ways to access the configuration pane. One option is to select Table in an analysis, and choose the caret in the upper right corner of the Table visual. Select Conditional formatting from the dropdown menu, as shown in the following screenshots, and the Conditional formatting configuration pane pops up to the left of the analysis canvas. Choose the + sign to select the target column to which you’d like to apply the formatting; it can be any column present in the table.

For example, select Sales from the dropdown menu to add a condition for the Sales target column.

Alternatively, from the selected table, you can expand the field well on the top of the analysis canvas and choose the Group by or Value target column directly from the columns used in the table. For example, select Sales from the Value well and select Conditional formatting from the dropdown, as seen in the following screenshots. The conditional formatting confirmation pane pops up to the left of the analysis canvas, with a condition already added for the Sales target column.

Next, choose the formatting style from the three available options.

  • Add background color
  • Add text color
  • Add icon

For each target column, you can add any formatting style or any combination of the styles.

Background and text color

Choose Add background color to expand the configuration pane for New background color. In the Sales target column are two Fill type options, Solid or Gradient, with Solid as the default.

You are prompted to decide the base column of the conditions, which can be any field in the dataset, regardless of whether it is present in the Table or not.

Next, define the Aggregation of the base column used in the conditions, which can be any aggregation currently supported in QuickSight, such as sum, average, min, max, or standard deviation. Then you can create the conditions that drive the color coding of the cell background.

This example highlights cells based on the following rules:

  • Cells with sum of sales greater than $100, green
  • Cells with sum of between $10 and $100 (inclusive), yellow
  • Cells with sum of sales less than $10, red

Since the conditions are discrete, apply solid color based on Sales in the Sum aggregation.

In Conditional #1:

  1. Choose the Greater than comparison method.
  2. Enter 100 as the threshold Value to be 100
  3. Use the color picker to pick green.
  4. Choose Add condition and configure the other two conditions accordingly.
  5. Choose Apply, and the Sales column is conditionally formatted, showing green, yellow and red colors when conditions are met.

If you’d like to re-order the conditions, choose and move the chosen condition up or down. You can also delete the condition. To reset all of the conditions, choose Clear to clear the applied formatting from the visual and remove the conditions from the configuration pane.

The following screenshots demonstrate the configuration step by step:

If you are happy with the highlighting rules, choose Close to return to the summary page.

You see a brief summary of the conditions applied to background color of the Sales column. You can Edit or Delete the applied conditions, Add other formatting styles such as text color or icon to the same Sales column, Delete the target column altogether, or Add another target column up top.

Icons

This example adds icons to the same Sales column based on data in the Profits column so that a thumbs-up icon shows when profits are greater than 0.

  1. Start with Add icon. In the New icon configuration page, choose the base column and Aggregation
  2. Under Icon set, QuickSight offers a list of canned color icon sets for quick application.
  3. For icons beyond this list, collapse the dropdown and check the Custom conditions Choose the pre-populated + icon for more icon selections and the option to Use custom Unicode icon. This example uses the thumbs-up and sets the color as orange.
  4. Choose Apply and to see icons show up in the Sales column accordingly.

The following screenshots show the configuration step by step:

Please note that you have the option to Show icon only. Once checked, the text values of the target column are hidden, showing icons only.

Conditionally formatting entire rows based on calculated fields

Next, here is a slightly more complex example that involves calculated fields. This example highlights rows in which Ship charges exceeds 20% of the total sales amount. This condition needs to be based on a flag that differentiates orders with high shipping charges from the rest.

Create a calculated field called high_shipping_flag using this function:

ifelse(sum({Ship charges}) > 0.2 * sum(Sales),1,0)

As a result, orders with shipping charges over 20% of sales amount take a value of 1, otherwise, they take a value of 0.

With the base field ready, proceed with conditional formatting configuration. Since the target property for formatting is the entire row instead of a specific column, access the configuration pane from the visual dropdown menu instead of a particular column from the field well.

Scroll down to the bottom of the dropdown list and select the option to conditionally format the [Entire row] in addition to individual columns.

Note the options to Add background color or Add text color, without the Add icon option, because it is not applicable to entire rows. Also note the Apply on top checkbox, which allows the conditional formatting for the entire row to paint over conditional formatting applied to individual columns. For this example, color the entire row teal if the shipping charge flag is 1.

The following screenshots demonstrate the configuration step by step:

Pivot tables

Conditional formatting is enabled for pivot tables as well, and the configuration workflow is very similar to that of tables, with the following differences:

  • Conditional formatting can be applied to measures only, or fields in the Values well, but not the other data types.
  • Conditional formatting in solid colors can be applied to various levels of your choice: values, subtotals, and totals.
  • Conditional formatting can only be applied to individual columns, not entire rows.

Applying gradient colors

Next, explore the conditional formatting functionalities unique to pivot tables. As mentioned above, only the two measures, or numeric fields, in the Values well can be the target fields for conditional formatting.

The following screenshot shows a conditionally formatted pivot table:

As with tables, you can access Conditional formatting configuration pane from the dropdown of either the visual or the field itself in the Values well. Note that only the two measures show up in the dropdown list.

Applying gradient colors to Percent Tested is fairly straightforward. When you select Gradient, you are prompted to define a base field, which can be any field from the dataset as well as Aggregation. Amazon QuickSight offers three pre-set gradient colors for easy application: blue, red-green, and red-yellow-green.

Choose the diverging color option. You have the option to further customize the three colors as well as the corresponding min, mid, and max values by overriding the default, as shown in the following screenshots.

Applying solid colors to column values, subtotals and totals

For Average Score, apply solid color rules to the text.

Add Average Score as the target column for conditional formatting and select Add text color. As mentioned above, only solid colors can be applied to subtotals and totals.

Note the extra text Values next to Condition, which is different from those in Table, right above the comparison method dropdown box. By default, Amazon QuickSight applies the solid color conditions to the most fine-grained values of the target field with the dimension fields at all levels expanded.

This example examines how the average scores at the Borough level compare with each other, which requires applying conditions to both Values and Subtotals. To do so, click on the icon to expose the options. This is a multi-select dropdown list, in which you can check one option or multiple options. After checking the desired boxes, the Values and Subtotals show up next to Condition. You can Add condition after the first you have completed.

Note that the next condition added inherits the level of granularity from the previous condition. In this example, Condition #2 automatically applies to Values and Subtotals, just like in the previous one. You can override from the check boxes of the multi-select dropdown.

The following screenshot shows the configuration and the effect.

For illustration purposes, add icons as well and enable Show subtotals for rows from the visual dropdown menu. With some rows collapsed at the Borough level, the solid color rules are applied to the average score for the collapsed boroughs, along with the school-level average scores on the expanded rows. The solid color rules also show up in the subtotals enabled for rows.

The following screenshots show conditional formatting applied to subtotals:

KPI Chart

This release also makes conditional formatting available for KPI charts so that you can easily tell whether the KPI is over or under the business-critical thresholds, using text color and icons as quick indicators. In cases in which progress bars are shown, you also have the option to conditionally format the foreground color of the progress bar. The configuration workflow is similar to that of a table or pivot table. The difference is that only the primary value field can be conditionally formatted, not the Target nor the Trend group. This example uses a simple KPI chart showing Primary value only and adds an icon if the Average SAT Score is above 1200.

The following screenshots show the configuration and the effect:

Summary

This post illustrated how you can leverage conditional formatting to create richer data visualizations and thus more appealing dashboards in Amazon QuickSight. As of this writing, conditional formatting is enabled for tables, pivot tables, and KPI charts. You can expect to see enhanced capability to dynamically customize data colors based on numeric fields in other chart types as well. Stay tuned.

If you have any questions or feedback, please leave a comment.

 


About the Authors

Susan Fang is a senior product manager for QuickSight with AWS.

 

 

 

 

Evolve your analytics with Amazon QuickSight’s new APIs and theming capabilities

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/evolve-your-analytics-with-amazon-quicksights-new-apis-and-theming-capabilities/

The Amazon QuickSight team is excited to announce the availability of Themes and more APIs! Themes for dashboards let you align the look and feel of Amazon QuickSight dashboards with your application’s branding or corporate themes. The new APIs added allow you to manage your Amazon QuickSight deployments programmatically, with support for dashboards, datasets, data sources, SPICE ingestion, and fine-grained access control over AWS resources. Together, they allow you to creatively tailor Amazon QuickSight to your audiences, whether you are using Amazon QuickSight to provide your users with an embedded analytics experience or for your corporate Business Intelligence (BI) needs. This post provides an overview of these new capabilities and details on getting started.

Theming your dashboards

Amazon QuickSight themes allow you to control the look and feel of your dashboards. Specifically, you will be able to affect the following items through themes, and then share with other authors in the account.

  • Margins
  • Borders around visuals
  • Gutters (spacing between visuals)
  • Data colors (used within the visuals)
  • Background colors (for visuals within the dashboard and the dashboard itself)
  • Foreground colors (for text in the visuals and on the dashboard)
  • Highlight colors on visuals

The following screenshot of the dashboard highlights aspects of the dashboards you can modify via the theming options currently available:

With the available options, you can also make your QuickSight dashboards denser (e.g. no margins or gutters), as shown in the view below that uses QuickSight’s new “Seaside” theme.

You also have a dark dashboard option with QuickSight’s new “Midnight” theme as shown below.

Themes are accessible via the left hand panel in the dashboard authoring interface.

You can create your own theme by starting with one of the built-in themes and customizing it. The interactive theme editing experience makes this easy – you can edit as little or as much as you want, and get to the perfect theme for your dashboards.

Using APIs for dashboards, datasets, data sources, SPICE ingestion, and permissions

The new APIs cover dashboards, datasets, data sources, SPICE ingestion, and fine-grained access control over S3/Athena. We’ve introduced templates in QuickSight with this release: templates allow you to store the visual configuration and data schema associated with a dashboard and then easily replicate the dashboard for different sets of users or different data, within or across accounts.

The new template APIs allow you to create templates from QuickSight analyses. You can then use the dashboard APIs to create a dashboard from the template in the same or different account, version control dashboards, and connect them to different datasets as needed. The dataset APIs allow you to create custom datasets, with filters applied at the dataset level (for example, data for a specific customer only). Dataset APIs can be used to create both SPICE and direct query datasets. SPICE APIs let you refresh datasets programmatically and view SPICE ingestion status and history. Data source APIs programmatically define data source connections. Finally, fine-grained access permissions APIs let you secure access to data in Amazon S3, Amazon Athena, and other AWS resources.

As a developer using Amazon QuickSight’s embedded capabilities, or author using Amazon QuickSight to publish dashboards on different subject areas, the new APIs allow you to automate your workflows and programmatically perform repetitive activities such as rolling out identical versions of dashboards to different teams or customers. QuickSight assets can now be migrated from dev to QA to production via API, overcoming the need to manually access each environment. With versioning, you also have your previous dashboards backed up, providing you flexibility to roll back dashboard deployments if the need arises.

As an Amazon QuickSight administrator, you can use the new APIs to audit and manage access to dashboards, data sources, and datasets across your account. You can also pre-populate dashboards and datasets into individual author accounts for easier and faster onboarding. You no longer need to distribute individual credentials to data sources: you can provision centrally-managed data sources shared with all your users. You can also manage individual author access to S3, Athena, or other AWS resources on-the-fly using IAM policies through the fine-grained access control APIs. The APIs open up a lot of possibilities with Amazon QuickSight – we’ll cover a few scenarios in this blog post and then follow up with additional posts that cover more.

The Amazon QuickSight APIs can be invoked via the AWS Software Development Kit (SDK) or the AWS Command Line Interface (CLI). For our discussion, we will use the AWS CLI. To capture all the details needed for each API, this post uses the generate-cli-skeleton option available in the AWS CLI and walks you through the example of creating a Redshift data source, creating data sets, and dashboards associated with it. This option, when invoked, provides a skeleton file with all the required API details, which you can edit to the correct inputs and use for your API calls. The code below invokes the generate-cli-skeleton option and writes the details to the create-data-source-cli-input.json file; we would then modify this file and use it as the input to the create-data-source API when invoking it.

aws quicksight create-data-source --generate-cli-skeleton > create-data-source-cli-input.json

Let’s say that you want to help Alice, a new hire in your company, create an Amazon QuickSight dashboard from some customer revenue data in Redshift. To set things up for Alice, you want to use the new APIs to connect to Redshift and create a QuickSight data set within her QuickSight user account.

Creating a data source

To connect to your Redshift database, you’ll need to create a data source in QuickSight. First, you describe it with the following configuration, which covers the data source connection details, credentials, and permissions around who should be able to see the data sources.

cat create-data-source-cli-input.json

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DataSourceId": "SampleRedshiftDatasouce",
    "Name": "Sample Redshift Datasouce",
    "Type": "REDSHIFT",
    "DataSourceParameters": {
        "RedshiftParameters": {
            "Host": "redshift-cluster-1.xxxxxxxxxx.us-east-1.redshift.amazonaws.com",
            "Port": 5439,
            "Database": "dev",
            "ClusterId": "redshift-cluster-1"
        }
    },
    "Credentials": {
        "CredentialPair": {
            "Username": "xxxxxxxxx",
            "Password": "xxxxxxxxxxx"
        }
    },
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/alice",
            "Actions": [
                "quicksight:UpdateDataSourcePermissions",
                "quicksight:DescribeDataSource",
                "quicksight:DescribeDataSourcePermissions",
                "quicksight:PassDataSource",
                "quicksight:UpdateDataSource",
                "quicksight:DeleteDataSource"
                
            ]
        }
    ],
    "VpcConnectionProperties": {
        "VpcConnectionArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:vpcConnection/VPC to Redshift"
    },
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-RedShiftDemoDataSource"
        }
    ]
}

You can then invoke create-data-source API with this input to start the creation of your data source as indicated below.

aws quicksight create-data-source --cli-input-json file://./create-data-source-cli-input.json

{
    "Status": 202,
    "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
    "DataSourceId": "SampleRedshiftDatasouce",
    "CreationStatus": "CREATION_IN_PROGRESS",
    "RequestId": "ac9fb8fe-71d8-4005-a7c9-d66d814e224e"
}

You can validate that the data source was successfully created by calling the describe-data-source API as shown below and checking the response’s Status:

aws quicksight describe-data-source --aws-account-id $AAI --data-source-id SampleRedshiftDatasouce

{
    "Status": 200,
    "DataSource": {
        "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
        "DataSourceId": "SampleRedshiftDatasouce",
        "Name": "Sample Redshift Datasouce",
        "Type": "REDSHIFT",
        "Status": "CREATION_SUCCESSFUL",
        "CreatedTime": 1574053515.41,
        "LastUpdatedTime": 1574053516.368,
        "DataSourceParameters": {
            "RedshiftParameters": {
                "Host": "redshift-cluster-1.c6qmmwnqxxxx.us-east-1.redshift.amazonaws.com",
                "Port": 5439,
                "Database": "dev",
                "ClusterId": "redshift-cluster-1"
            }
        },
        "VpcConnectionProperties": {
            "VpcConnectionArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:vpcConnection/VPC to Redshift"
        },
        "SslProperties": {
            "DisableSsl": false
        }
    },
    "RequestId": "57d2d6d6-ebbf-4e8c-82ab-c38935cae8aa"
}

Because you granted permissions to Alice as part of the call to create-data-source, this data source will now be visible to her when she logs in to QuickSight. If you didn’t grant user permissions, this data source is only visible to programmatic users with appropriate permissions.

Creating a dataset

Before Alice can build her dashboard, you’ll need to create a dataset, which identifies the specific data in a data source you want to use. To create a dataset from this data source, you can call the create-data-set API. The create-data-set API allows you to join two tables to create a desired view of data for Alice. Below, PhysicalTableMap points to the tables in your Redshift data source you want to join, and LogicalTableMap defines the join between them:

A sample dataset definition with joins included is shown below.

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DataSetId": "DemoDataSet1",
    "Name": "Demo Data Set 1",
    "PhysicalTableMap": {
        "DemoDataSet1-LineOrder": {
            "RelationalTable": {
                "DataSourceArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
                "Schema": "public",
                "Name": "lineorder",
                "InputColumns": [
                        {
                            "Name": "lo_custkey",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_revenue",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_quantity",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_orderkey",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_shipmode",
                            "Type": "STRING"
                        },
                        {
                            "Name": "lo_orderdate",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_ordertotalprice",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_tax",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_discount",
                            "Type": "INTEGER"
                        }
                ]
	    }
            },
        "DemoDataSet1-Customer": {
            "RelationalTable": {
                "DataSourceArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
                "Schema": "public",
                "Name": "customer",
                "InputColumns": [
                        {
                            "Name": "c_custkey",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "c_phone",
                            "Type": "STRING"
                        },
                        {
                            "Name": "c_address",
                            "Type": "STRING"
                        },
                        {
                            "Name": "c_name",
                            "Type": "STRING"
                        },
                        {
                            "Name": "c_city",
                            "Type": "STRING"
                        }
                ]
            }
        }
    },
    "LogicalTableMap": {
        "DemoDataSet1-LineOrder-Logical": {
            "Alias": "LineOrder",
            "Source": {
                "PhysicalTableId": "DemoDataSet1-LineOrder"
            }
        },
        "DemoDataSet1-Customer-Logical": { 
            "Alias": "Customer",
            "Source": {
                "PhysicalTableId": "DemoDataSet1-Customer"
            }
        },
        "DemoDataSet1-Intermediate": { 
            "Alias": "Intermediate",
            "DataTransforms": [
                {   
                    "ProjectOperation": {
                        "ProjectedColumns": [
                                "lo_revenue",
                                "c_name",
                                "c_city"
                        ]
                    }
                }
            ],
            "Source": {
                "JoinInstruction": {
                    "LeftOperand": "DemoDataSet1-LineOrder-Logical",
                    "RightOperand": "DemoDataSet1-Customer-Logical",
                    "Type": "INNER",
                    "OnClause": "{lo_custkey} = {c_custkey}"
                }
            }
        }

    },
    "ImportMode": "DIRECT_QUERY",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/alice",
            "Actions": [
                "quicksight:UpdateDataSetPermissions",
                "quicksight:DescribeDataSet",
                "quicksight:DescribeDataSetPermissions",
                "quicksight:PassDataSet",
                "quicksight:DescribeIngestion",
                "quicksight:ListIngestions",
                "quicksight:UpdateDataSet",
                "quicksight:DeleteDataSet",
                "quicksight:CreateIngestion",
                "quicksight:CancelIngestion"
            ]
        }
    ],
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-DemoDataSet1"
        }
    ]
}

If you wanted to create a similar dataset with a SQL statement instead of specifying all the columns, call the create-data-set API specifying your SQL statement in PhysicalTableMap as shown below:

aws quicksight create-data-set --generate-cli-skeleton >create-data-set-cli-input-sql.json

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DataSetId": "DemoDataSet2",
    "Name": "Demo Data Set 2",
    "PhysicalTableMap": {
        "DemoDataSet2-CustomerSales": {
            "CustomSql": {
                "DataSourceArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
                "Name": "Customer Sales",
                "SqlQuery":"SELECT lineorder.lo_revenue, customer.c_name, customer.c_city FROM public.lineorder INNER JOIN public.customer ON lineorder.lo_custkey = customer.c_custkey",
                    "Columns": [
                        {
                            "Name": "lo_revenue",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "c_name",
                            "Type": "STRING"
                        },
                        {
                            "Name": "c_city",
                            "Type": "STRING"
                        }
                    ]
 
	    }
        }
    },
    "LogicalTableMap": {
        "DemoDataSet2-CustomerSales-Logical": {
            "Alias": "Customer Sales",
            "Source": {
                "PhysicalTableId": "DemoDataSet2-CustomerSales"
            }
        }

    },
    "ImportMode": "DIRECT_QUERY",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/alice",
            "Actions": [
                "quicksight:UpdateDataSetPermissions",
                "quicksight:DescribeDataSet",
                "quicksight:DescribeDataSetPermissions",
                "quicksight:PassDataSet",
                "quicksight:DescribeIngestion",
                "quicksight:ListIngestions",
                "quicksight:UpdateDataSet",
                "quicksight:DeleteDataSet",
                "quicksight:CreateIngestion",
                "quicksight:CancelIngestion"
            ]
        }
    ],
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-DemoDataSet2"
        }
    ]
}

Because you granted permissions to Alice, these data sets will be visible to her under Your Data Sets when she logs into QuickSight.

This dataset can be used like any other dataset in Amazon QuickSight. Alice can log in to the QuickSight UI and create an analysis that utilizes this dataset, as shown below:

Creating a template

Once Alice creates her dashboard and it gets popular with her team, the next ask might be to provide the same view, but to 100 other teams within the company, and using different datasets for each team. Without APIs, this ask would be a challenge. However, with the APIs, you can now define a template from Alice’s analysis and then create dashboards as needed. A template is created by pointing to a source analysis in QuickSight, which creates references for all datasets used, so that new datasets can referenced when using the template.

The following configuration describes a template created from Alice’s analysis:

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "TemplateId": "DemoDashboardTemplate",
    "Name": "Demo Dashboard Template",
    "SourceEntity": {
        "SourceAnalysis": {
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:analysis/7975f7aa-261c-4e7c-b430-305d71e07a8e",
            "DataSetReferences": [
                {
                    "DataSetPlaceholder": "DS1",
                    "DataSetArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet1"
                }
            ]
        }
    },
    "VersionDescription": "1"
}

Templates are not visible within the QuickSight UI and are a developer/admin managed asset that is only accessible via the APIs at this point. You can save the template configuration by calling the create-template API to start creation of your template:

aws quicksight create-template --cli-input-json file://./create-template-cli-input.json

{
    "Status": 202,
    "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate",
    "VersionArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate/version/1",
    "TemplateId": "DemoDashboardTemplate",
    "CreationStatus": "CREATION_IN_PROGRESS",
    "RequestId": "aa189aee-8ab5-4358-9bba-8f4e32450aef"
}

To confirm the template was created, call the describe-template API as shown below:

aws quicksight describe-template --aws-account-id $AAI --template-id DemoDashboardTemplate

{
    "Status": 200,
    "Template": {
        "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate",
        "Name": "Demo Dashboard Template",
        "Version": {
            "CreatedTime": 1574137165.136,
            "VersionNumber": 1,
            "Status": "CREATION_SUCCESSFUL",
            "DataSetConfigurations": [
                {
                    "Placeholder": "DS1",
                    "DataSetSchema": {
                        "ColumnSchemaList": [
                            {
                                "Name": "c_city",
                                "DataType": "STRING"
                            },
                            {
                                "Name": "lo_revenue",
                                "DataType": "INTEGER"
                            }
                        ]
                    },
                    "ColumnGroupSchemaList": []
                }
            ],
            "Description": "1",
            "SourceEntityArn": "arn:aws:quicksight:us-east-1: xxxxxxxxxxxx:analysis/7975f7aa-261c-4e7c-b430-305d71e07a8e"
        },
        "TemplateId": "DemoDashboardTemplate",
        "LastUpdatedTime": 1574137165.104,
        "CreatedTime": 1574137165.104
    }
}

Creating dashboards

You can then use this template to create a dashboard that is connected to the second, similar, dataset you created earlier. When calling the create-dashboard API, you define a SourceEntity, (in this case, the template we just created), permissions, and any dashboard publish options, as shown in the configuration used for create-dashboard-cli-input.json

create-dashboard-cli-input.json

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DashboardId": "DemoDashboard1",
    "Name": "Demo Dashboard 1",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/alice",
            "Actions": [
                "quicksight:DescribeDashboard",
                "quicksight:ListDashboardVersions",
                "quicksight:UpdateDashboardPermissions",
                "quicksight:QueryDashboard",
                "quicksight:UpdateDashboard",
                "quicksight:DeleteDashboard",
                "quicksight:DescribeDashboardPermissions",
                "quicksight:UpdateDashboardPublishedVersion"
            ]
        }
    ],
    "SourceEntity": {
        "SourceTemplate": {
            "DataSetReferences": [
                {
                    "DataSetPlaceholder": "DS1",
                    "DataSetArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet2"
                }
            ],
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate"
        }
    },
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-DemoDashboard"
        }
    ],
    "VersionDescription": "1",
    "DashboardPublishOptions": {
        "AdHocFilteringOption": {
            "AvailabilityStatus": "ENABLED"
        },
        "ExportToCSVOption": {
            "AvailabilityStatus": "ENABLED"
        },
        "SheetControlsOption": {
            "VisibilityState": "EXPANDED"
        }
    }
}

You can confirm you created the dashboard correctly with the describe-dashboard API. See the following code:

aws quicksight describe-dashboard --aws-account-id $AAI --dashboard-id DemoDashboard1

{
    "Status": 200,
    "Dashboard": {
        "DashboardId": "DemoDashboard1",
        "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dashboard/DemoDashboard1",
        "Name": "Demo Dashboard 1",
        "Version": {
            "CreatedTime": 1574138252.449,
            "Errors": [],
            "VersionNumber": 1,
            "Status": "CREATION_SUCCESSFUL",
            "SourceEntityArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate/version/1",
            "Description": "1"
        },
        "CreatedTime": 1574138252.395,
        "LastPublishedTime": 1574138252.395,
        "LastUpdatedTime": 1574138252.449
    },
    "RequestId": "62945539-bb63-4faf-b897-8e84ea5644ae"
}

Because you granted permissions to Alice during creation, this dashboard will be visible to her under All dashboards when she logs into QuickSight as shown below:

Configuring SPICE datasets        

Alice’s datasets are direct query datasets and may cause higher load on your Redshift cluster during peak times, which might also cause Alice and her customers to wait longer for their dashboards to load. With SPICE, Amazon QuickSight’s in-memory data store, you can offload user traffic from your database, while also achieving high concurrency limits and fast, responsive performance for your end-users. The following configuration redefines our second dataset and sets ImportMode to SPICE as highlighted below:

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DataSetId": "DemoDataSet3",
    "Name": "Demo Data Set 3",
    "PhysicalTableMap": {
        "DemoDataSet2-CustomerSales": {
            "CustomSql": {
                "DataSourceArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
                "Name": "Customer Sales",
                "SqlQuery":"SELECT lineorder.lo_revenue, customer.c_name, customer.c_city FROM public.lineorder INNER JOIN public.customer ON lineorder.lo_custkey = customer.c_custkey",
                    "Columns": [
                        {
                            "Name": "lo_revenue",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "c_name",
                            "Type": "STRING"
                        },
                        {
                            "Name": "c_city",
                            "Type": "STRING"
                        }
                    ]
 
	    }
        }
    },
    "LogicalTableMap": {
        "DemoDataSet2-CustomerSales-Logical": {
            "Alias": "Customer Sales",
            "Source": {
                "PhysicalTableId": "DemoDataSet2-CustomerSales"
            }
        }

    },
    "ImportMode": "SPICE",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/alice",
            "Actions": [
                "quicksight:UpdateDataSetPermissions",
                "quicksight:DescribeDataSet",
                "quicksight:DescribeDataSetPermissions",
                "quicksight:PassDataSet",
                "quicksight:DescribeIngestion",
                "quicksight:ListIngestions",
                "quicksight:UpdateDataSet",
                "quicksight:DeleteDataSet",
                "quicksight:CreateIngestion",
                "quicksight:CancelIngestion"
            ]
        }
    ],
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-DemoDataSet2"
        }
    ]
}

You can call the create-data-set API with that configuration to create the dataset in SPICE mode:

aws quicksight create-data-set --aws-account-id $AAI --cli-input-json file://./create-data-set-cli-input-sql-spice.json

{
    "Status": 201,
    "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3",
    "DataSetId": "DemoDataSet3",
    "IngestionArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3/ingestion/589eaf9d-0e21-4572-9ebe-f61b129886d9",
    "IngestionId": "589eaf9d-0e21-4572-9ebe-f61b129886d9",
    "RequestId": "7a442521-ab8f-448f-8668-1f0d6823f987"
}

When creating datasets in SPICE mode, you’ll consume SPICE capacity. The describe-data-set API provides details on the SPICE capacity consumed. See the relevant bits from the describe-data-set response highlighted in the code snippet below:

aws quicksight describe-data-set --aws-account-id $AAI --data-set-id DemoDataSet3|tail -5

        "ImportMode": "SPICE",
        "ConsumedSpiceCapacityInBytes": 107027200
    },
    "RequestId": "b0175568-94ed-40e4-85cc-b382979ca377"
}

For SPICE datasets, the APIs also provide the option of setting up ingestions or listing past ingestions. Listing past ingestions allows you to identify ingestion status and when the data in the dataset was last updated. To list past ingestions, call the list-ingestions API for your dataset:

aws quicksight list-ingestions --aws-account-id $AAI --data-set-id DemoDataSet3

{
    "Status": 200,
    "Ingestions": [
        {
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3/ingestion/589eaf9d-0e21-4572-9ebe-f61b129886d9",
            "IngestionId": "589eaf9d-0e21-4572-9ebe-f61b129886d9",
            "IngestionStatus": "COMPLETED",
            "ErrorInfo": {},
            "RowInfo": {
                "RowsIngested": 1126608,
                "RowsDropped": 0
            },
            "CreatedTime": 1574138852.564,
            "IngestionTimeInSeconds": 24,
            "IngestionSizeInBytes": 107027200,
            "RequestSource": "MANUAL",
            "RequestType": "INITIAL_INGESTION"
        }
    ],
    "RequestId": "ee1d9a6f-a290-418a-a526-8906f4689776"
}

Set up ingestions to update the data in your dataset from your data source. As you may have new data in your Redshift cluster, you can use the create-ingestion API for your dataset to trigger a SPICE refresh:

aws quicksight create-ingestion --aws-account-id $AAI --data-set-id DemoDataSet3 --ingestion-id DemoDataSet3-Ingestion2

{
    "Status": 201,
    "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3/ingestion/DemoDataSet3-Ingestion2",
    "IngestionId": "DemoDataSet3-Ingestion2",
    "IngestionStatus": "INITIALIZED",
    "RequestId": "fc1f7eea-1327-41d6-9af7-c12f097ed343"
}

Listing ingestion history again shows the new ingestion in the state RUNNING. See the following code:

aws quicksight list-ingestions --aws-account-id $AAI --data-set-id DemoDataSet3

{
    "Status": 200,
    "Ingestions": [
        {
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3/ingestion/DemoDataSet3-Ingestion2",
            "IngestionId": "DemoDataSet3-Ingestion2",
            "IngestionStatus": "RUNNING",
            "RowInfo": {
                "RowsIngested": 0
            },
            "CreatedTime": 1574139332.876,
            "RequestSource": "MANUAL",
            "RequestType": "FULL_REFRESH"
        },
        {
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3/ingestion/589eaf9d-0e21-4572-9ebe-f61b129886d9",
            "IngestionId": "589eaf9d-0e21-4572-9ebe-f61b129886d9",
            "IngestionStatus": "COMPLETED",
            "ErrorInfo": {},
            "RowInfo": {
                "RowsIngested": 1126608,
                "RowsDropped": 0
            },
            "CreatedTime": 1574138852.564,
            "IngestionTimeInSeconds": 24,
            "IngestionSizeInBytes": 107027200,
            "RequestSource": "MANUAL",
            "RequestType": "INITIAL_INGESTION"
        }
    ],
    "RequestId": "322c165e-fb90-45bf-8fa5-246d3034a08e"
}

Versioning and aliasing

As you may have noticed, the template and dashboard APIs support versioning. When you create a template or dashboard, version 1 of that entity is created, and each update of a template or dashboard results in a new version of that entity with an incremented version number. Versioning can be useful when you want to republish an old version of a dashboard or ensure you’re building off a known version of the template. For example, if you published your dashboard multiple times, but want to republish the first version, you can call the update-dashboard-published-version API as follows:

aws quicksight update-dashboard-published-version --aws-account-id $AAI --dashboard-id DemoDashboard1 --dashboard-version-number 1

{
   "DashboardArn": "arn:aws:quicksight:us-east-
1:xxxxxxxxxxxx:dashboard/DemoDashboard1",
   "DashboardId": "DemoDashboard1",
   "RequestId": "c97cdbff-d954-4f5b-a887-28659ad9cc9a"
}

Versioning becomes even more powerful with aliases. A template or dashboard alias is a pointer to a specific template or dashboard version. Aliases are helpful when you want to point to a logical version of a resource rather than an explicit version number that needs to be manually updated. This feature can be helpful for development workflows where you want to make changes to a template and only want production code to use an approved template version. For example, you can create a ‘PROD’ template alias for your approved production template and point it to version 1, as shown below:

aws quicksight create-template-alias --aws-account-id $AAI --template-id DemoDashboardTemplate --template-version-number 1 --alias-name PROD

{
   "RequestId": "c112a48c-14da-439e-b4d8-538b60c1f188",
   "TemplateAlias": { 
      "AliasName": "PROD",
      "Arn": "arn:aws:quicksight:us-east-
1:xxxxxxxxxxxx:template/DemoDashboardTemplate/alias/PROD",
      "TemplateVersionNumber": 1
   }
}

When you create a dashboard from this template, you can specify the SourceEntity as the ‘PROD’ template alias to always publish the dashboard from the approved production template version. Note how the template Arn contains the template alias name in the call to the create-dashboard API below:

aws quicksight create-dashboard --generate-cli-skeleton > create-dashboard-cli-input.json

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DashboardId": "DemoDashboard2",
    "Name": "Demo Dashboard 2",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/user/Alice",
            "Actions": [
                "quicksight:DescribeDashboard",
                "quicksight:ListDashboardVersions",
                "quicksight:UpdateDashboardPermissions",
                "quicksight:QueryDashboard",
                "quicksight:UpdateDashboard",
                "quicksight:DeleteDashboard",
                "quicksight:DescribeDashboardPermissions",
                "quicksight:UpdateDashboardPublishedVersion"
            ]
        }
    ],
    "SourceEntity": {
        "SourceTemplate": {
            "DataSetReferences": [
                {
                    "DataSetPlaceholder": "DS1",
                    "DataSetArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet2"
                }
            ],
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate/alias/PROD"
        }
    },
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-DemoDashboard"
        }
    ],
    "VersionDescription": "1",
    "DashboardPublishOptions": {
        "AdHocFilteringOption": {
            "AvailabilityStatus": "ENABLED"
        },
        "ExportToCSVOption": {
            "AvailabilityStatus": "ENABLED"
        },
        "SheetControlsOption": {
            "VisibilityState": "EXPANDED"
        }
    }
}

If you were to create new versions of the template, you could then update the ‘PROD’ template alias to point to the new template version, without touching your dashboard creation code from above. For example, if you wanted to set template version 3 to be your approved production version, you can call the update-template-alias API as shown below:

aws quicksight update-template-alias --aws-account-id $AAI --template-id DemoDashboardTemplate --template-version-number 3 --alias-name PROD

{
   "RequestId": "5ac9ff9e-28c1-4b61-aeab-5949986f5b2b",
   "TemplateAlias": { 
      "AliasName": "PROD",
      "Arn": "arn:aws:quicksight:us-east-
1:xxxxxxxxxxxx:template/DemoDashboardTemplate/alias/PROD",
      "TemplateVersionNumber": 3
   }
}

Keep an eye out for a future blog post that will detail more examples of using versioning and aliases for large scale deployments!

Amazon QuickSight’s data source, dataset, dashboard, and SPICE APIs are available in both Standard and Enterprise editions, with edition-specific support for functionality. Fine-grained access control APIs and template support for dashboard APIs are available in Amazon QuickSight Enterprise edition. APIs for Amazon QuickSight are part of the AWS SDK and are available in a variety of programming languages. For more information, see API Reference and Tools to Build on AWS.

Conclusion

With QuickSight’s new themes, APIs, and additional customization options underway, there are many creative ways to author the next chapter in your analytics story. To be a part of future API previews and connect with us, sign up for the embedding early access program.

 


About the Author

Jose Kunnackal John is a principal product manager for Amazon QuickSight.

 

 

 

Enhancing dashboard interactivity with Amazon QuickSight Actions

Post Syndicated from Sahitya Pandiri original https://aws.amazon.com/blogs/big-data/enhancing-dashboard-interactivity-with-amazon-quicksight-actions/

Amazon QuickSight now offers enhanced dashboard interactivity capabilities through QuickSight Actions. QuickSight Actions provide advanced filtering capabilities through single point-and-click actions on dashboards. With Actions, you can link visuals within a dashboard so that selecting a dimensional point on one visual provides you with granular insights on the selected point on other visuals within your dashboard. Therefore, you can start with summaries and dive deep into details of your business metrics, all within the same dashboard sheet. You can define what visuals within your dashboard are interactive and how these interact with each other. As of this writing, QuickSight Actions lets you define two primary actions of interactivity: filter actions and URL actions. URL actions within Amazon QuickSight are not new, but the point of entry to create URL actions is now consolidated with Actions.

You can apply QuickSight Actions to any supported chart that holds at least one dimension. This post provides examples of getting starting with Actions, configuring different Actions on a dashboard, and enabling different forms of interactivity for each action configured.

This post uses the following data sets:

B2B Sales
This data set holds order details for a fictitious company ABCDO for 2016 and 2017. The dashboard we will build will report on sales metrics by industry, segment, region as primary dimensions, and also provides granular details for each order purchased.

Product Availability
This data set holds available quantity for every product by ID.

Prerequisites

Before implementing Actions on your Amazon QuickSight dashboards, review how to create and publish dashboards.

Getting started with QuickSight Actions

This screenshot is a dashboard built from the above two data sets. It shows sales by category, industry and region on line 1; segment sales by quarter, industry sales by segment on line 2; total profit, sales, discount and quanity sold on line 3; order details pivot on line 4 and shipping details pivot on line 5.

Before getting started, note the following terminology:

  • Source visual – The visual on which an action is created. Choosing a point on the source visual triggers the action, and the dimensional values chosen are passed as filters to target visuals.
  • Target visual – The visual that is filtered by the dimensional values chosen on the source visual.
  • Activation – You can trigger an action either by  selecting it directly (left-click), or selecting from the menu options (right-click).
  • Action type – You can configure two types of actions on a visual: filter actions and URL actions. Filter actions pass select or all dimensions as filters across select or all visuals across the dashboard. URL actions allow you to navigate from the dashboard to an external site or different sheet within the same dashboard with the selected dimension passed along.

Setting up a click/select action

To set up a click/select action, complete the following steps:

  1. Select the source visual on the analysis and choose Actions.The following screenshot shows you this step.
  2. Within Actions, you can either create and configure your actions or use the 1-click action setup. The 1-click setup creates a default action on the visual. As a result, when a point on the visual is selected/clicked, all dimensions on a selected point are passed as filters across all other visuals on the dashboard. Under Quick create, choose Filter same-sheet visuals.

This creates a default action, which you can modify at any time.

After creating the 1-click action, you can select any point on the source visual. The following screenshot shows that the selected point is highlighted and all the underlying dimensions are used as filters on all other visuals on the analysis or dashboard.

In this screenshot, selecting Copper and Diamonds for the year 2017 passes these two as filters to all other visuals.

To verify what filter actions are applied on a particular visual, choose the filter icon on the top right. This shows all the filters applied through filter actions and also regular filters.

Setting up custom filter actions

To perform further analysis on the sales metrics for any segment, configure an action on the Industry sales by Segment visual. To configure a filter action to pass segment as the filter onto the KPI charts, complete the following steps:

  1. Choose the source visual and choose Actions.
  2. Choose Define a custom action.
  3. For Action name, enter a name for your action. You could also add dynamic placeholder values to the name to be more descriptive to your readers by choosing the plus sign next to the Action name.You have four configuration settings. While this post uses values specific for this use case, you can choose other values according to your specific needs.
  4. For Activation, select Menu option.
  5. For Action type, choose Filter action.
  6. For Filter scope, select Selected fields, segment.
  7. For Target visuals, select Select visuals and check needed target visuals Total quantity, Total discount offered, and Total profit by year.
  8. Choose Save.

To view menu options and choose a filter action, right-click any segment on the visual. This post selects Strategic. The following screenshot shows that all the Strategic points are highlighted and the KPIs are updated.

Adding filter actions on a pivot tables

Filter actions on pivot tables provide exceptional flexibility to pass anything from the innermost to the outermost dimensions. Additionally, when a filter action is triggered, the source pivot table is highlighted to show values that are passed as filters to the target visuals.

This post adds click/select filter actions on the Order details pivot table to analyze shipping details for each order. This post also adds two filter actions to the menu options: one to filter by the product name and the other by the customer name.

To create click/select filter actions on order_id, complete the following steps:

  1. Choose the Order details pivot table chart.
  2. Choose Actions.
  3. Choose Define custom action.
  4. For Action name, enter your action name.
  5. For Activation, select Select.
  6. For Filter scope, select Selected fields and check order_id.
  7. For Target visuals, select Select visuals, and check Shipping details.
  8. Choose Save.

The following screenshot shows that when choosing any point on the pivot table, the order pertaining to the point is highlighted and the Shipping details table shows shipping details for the selected order.

You can only create one Selection action on a source visual and any number of menu option actions.

You can now create additional actions in the menu. To create a new actions, go to Actions, choose the plus sign and create these two actions.

After creating the two actions, they are visible in the Actions menu. See the following screenshot.

After setting up these action filters, you can right-click on the visual and trigger these filters from the menu.

The screenshots below shows you the menu on right-clicking product name.

The screenshots below shows you the menu on right-clicking customer name.

Removing applied filter actions

There are three ways to remove a triggered filter action:

  • On source visuals with filter actions triggered via select or menu options, clear the filter action by choosing the selected point again or a different point.
  • On source visuals with empty spaces within the visual (such as bar chart family or pie chart), clicking within the empty space also clears the filter action selection. This deselection is not applicable to heat maps and tree maps. Deselect points on these charts by choosing an already selected point.
  • On target visuals with a highlighted filter icon, choosing the X on the filter actions displayed removes the filter for this visual and all other visuals the filter is applied to.

Using action filters when the source and target visuals use different data sets

If you use multiple data sets within an analysis, Amazon QuickSight automatically maps fields across these data sets by the field name. For example, product_id is a field in the B2B Sales data set and Product Availability data set. When setting a filter action on product_id from a source visual using the B2B Sales data set, the target visual (the adjacent KPI) showing product quantity availability shows results for the selected product_id from the source visual. This implicit mapping of fields is case sensitive.

You can add the new data set Product Availability to this analysis. For more information, see Adding a Data Set to an Analysis.

Add a KPI chart to show count of product_id as in the screenshot below.

To show total quantity available of select products, create an action filter on the menu options, as seen in the screenshot below:

To drill down into product quantity availability, choose (right-click) product_id on the order details pivot table. The following screenshot shows the product availability details for the selected product_id.

Creating custom URL actions

Custom URL actions aren’t new to Amazon QuickSight, but the point of entry to create URL actions is now consolidated with Actions. To create a custom URL action on the Order details pivot table to see Google search results on a customer, complete the following steps:

  1. Choose the Order details pivot table.
  2. Choose Actions.
  3. Choose Add a new action.
  4. For Action name, enter an action name.
  5. For Action type, choose URL action.
  6. Enter the URL https://www.google.com/search?q=<<customer_name>>.Customer_name is the value of the dimension field to select.
  7. Choose New browser tab.
  8. Choose Save.

When you right-click on the pivot table, menu options appear with your two filter actions created previously and the new custom URL action. See the following screenshot.

Conclusion

With Amazon QuickSight, you can create filter actions on any chart type with at least one dimensional value. You can also create URL actions on all chart types, including KPIs and gauge charts, which do not hold dimensional values. As of this writing, QuickSight Actions don’t support selecting multiple data points within the same source visual, cascading filter actions in which you choose one filter action on one visual and a subsequent filter on another visual, or adding customer mappings between different data sets used in the dashboard. You can expect to see these features in future releases.

QuickSight Actions is available on both Enterprise and Standard editions in all QuickSight-supported regions.

 


About the Author

Sahitya Pandiri is a technical program manager with Amazon Web Services. Sahitya has been in product/program management for 6 years now, and has built multiple products in the retail, healthcare, and analytics space.

 

 

 

Amazon QuickSight announces the all-new QuickSight Mobile app

Post Syndicated from Tina Kelleher original https://aws.amazon.com/blogs/big-data/announcing-the-new-mobile-app-for-amazon-quicksight/

AWS is happy to announce the release of QuickSight Mobile for iOS and Android devices. This release is both a major update to the existing iOS app and the launch of a new Android application. The app enables you to securely get insights from your data from anywhere; favorite, browse, and interact with your dashboards; explore your data with drilldowns and filters; stay ahead of the curve via forecasting; get email alerts when unexpected changes happen in your data; and share those insights with colleagues.

To download the QuickSight Mobile app, visit the iOS App Store and Google Play.

The new QuickSight Mobile optimizes the dashboard consumption experience with newly added features and enhanced interactivity. In this blog, we will walk you through the new Mobile experience in detail.

Features and capabilities

Home page

After authenticating with the app, you land on the QuickSight home screen. From here, you have quick access to your favorite dashboards (synched from the browser application), and most recently viewed dashboards. The recently viewed list can display up to the last 100 dashboards you previously opened.

The following screenshots show your favorited and recently viewed dashboards.

Dashboards

You can browse through all your dashboards via the Dashboards tab. You can sort the dashboards by name or updated (published) date and change to a list or a grid view. You can also search for dashboards by name. Lastly, you can easily add any dashboard to your favorites by choosing the star icon next to the dashboard’s name.

The following screenshots show dashboards in grid view, display preferences and search tool.

Dashboard view

The dashboard view is optimized for your mobile device by showing the visuals in vertical stack view, with each visual expanded to full width. A multi-sheet dashboard shows the first sheet by default. To navigate across sheets, choose the sheet title and select the sheet name you want to view.

The following screenshots show an opened dashboard with visuals in vertical stack view and how you navigate across different sheets within the dashboard.

To apply filters to visuals, choose the funnel icon to view all available filter controls and options for your dashboard sheet. After you’ve made your selection, choose Apply, and choose Done. You can see the number of filters applied via the small blue tag next to the funnel icon. You can also expand the filter control view and reset a given filter to the default.

The following screenshots show dashboard filters, expanded filter pane and how you select filter values.

Some visuals have a scroll bar, which allows you to customize the range to zoom in on a visual. A long press on a data point on most charts, such as a line or a bar, brings up the context menu. You can focus on this data point, drill up or down if the visual has a drill hierarchy, and exclude the chosen data point.

The following screenshots show interactions with zoom bars and context menu.

To go back to the full view, choose the … icon in the upper right and choose Undo. To exit the dashboard view, choose the back arrow.

If it is an anomaly detection widget, you can click on “Explore anomalies” and get redirected to the detailed anomalies page. Here, you can see all of the anomalies detected for the latest period. To perform contribution analysis to uncover the hidden drivers, long press the anomaly data point and you will see the option show up in the bottom sheet. To find out more about anomaly detection and contribution analysis, see Amazon QuickSight Announces General Availability of ML Insights.

You can expect contribution analysis to extend support to all visuals beyond anomaly detection on the app in the near future.

The following screenshots show the anomaly details and contribution analysis flow.

Visual view

From the dashboard view, choose the expand icon in the upper right corner to expand the desired visual to full-screen. The app supports both portrait and landscape modes in the expanded visual, and you can use the same interactions such as zoom, drill, focus, and exclude.

The following screenshots show expanded visuals in both portrait and landscape modes.

Settings

The Settings tab allows you to change language, Region, and biometric authentication methods. You can also provide feedback by choosing Provide feedback, which initiates an email to our mobile team.

The following screenshot shows you the options on settings page.

Setup and authentication

The QuickSight Mobile app supports authentication by any provisioned user. You can authenticate by using either your Amazon QuickSight username and password (with MFA, if this option is enabled), or through your own identity provider (IdP). After authentication, you can re-authenticate using biometric authentication such as face ID or touch ID, if supported by your mobile device, or an application-specific PIN.

The following screenshots show the generic sign-in flow.

IdP-initiated login

For IdP-initiated logins, set up your account with identity federation. For more information, see Federate Amazon QuickSight access with Okta. You can federate using standard mobile federation processes that your IdP provides, either through an app or through the web. This post uses the Okta mobile app. When you are federated, you receive the prompt to either continue your session using the mobile web experience or in the QuickSight Mobile app.

Account administrators can set the duration of mobile sessions to be anywhere from 1–30 days through the administrative console. The default session length for new accounts is 30 days.

The following screenshots show IdP-initiated login flow.

Microsoft Active Directory

If your account uses Microsoft Active Directory, you can log in to the app using your current AD credentials. If your account requires MFA, you also need to provide your MFA token to authenticate. For more information, see Can I use AWS Directory Service for Microsoft Active Directory to authenticate users in Amazon QuickSight?

Email Invitations

If you receive an email invitation to Amazon QuickSight, follow the instructions in the email to set your username and password through the website (mobile or desktop). Use these credentials to authenticate with the mobile applications.

IAM Users

If your account is an IAM user, you need to provision yourself as an Amazon QuickSight user before authenticating in the mobile application. For more information on provisioning users, Provisioning Users for Amazon QuickSight.

Summary

This post provided an overview of the key features and functionalities as well as sign-in flows for the new QuickSight Mobile app. The app is available as a free download in Google Play and the iOS App Store. You can expect the app to continue to evolve its features and capabilities to give you the best experience for getting insights from your data while on the go!

If you have any questions or feedback, please leave a comment.

 


About the Authors

Susan Fang is a senior product manager for QuickSight with AWS.

 

 

 

 

Brian Stein is a software development manager at AWS.

 

 

 

 

Joining across data sources on Amazon QuickSight

Post Syndicated from Rakshith Dayananda original https://aws.amazon.com/blogs/big-data/joining-across-data-sources-on-amazon-quicksight/

Amazon QuickSight announced the launch of Cross Data Source Join, which allows you to connect to multiple data sources and join data across these sources in Amazon QuickSight directly to create data sets used to build dashboards. For example, you can join transactional data in Amazon Redshift that contains customer IDs with Salesforce tables that contain customer profile data to create an interactive dashboard with order and customer details. You can slice and dice transactional data by various customer dimensional data such as segment, geographic, or demographic without first pulling the data to a single source outside Amazon QuickSight.

With cross data source joins, you can join across all data sources supported by Amazon QuickSight, including file-to-file, file-to-database, and database-to-database joins using the built-in drag-and-drop UI, without heavily relying on the BI and data engineering teams to set up complex and time-consuming ETLs. Whether it is local CSV files, Amazon RDS databases, or JSON objects on an S3 bucket, you can now join any of these data sources together to create data sets.

Finally, you can set up a scheduled refresh up to the hour and confirm that the joined data set is always up to date with the latest information.

Getting started with Cross Data Source Join

The screenshot below shows all data sources you can connect to on QuickSight.

Amazon QuickSight allows you to connect to different data sources. It is common for businesses to have data spread across multiple data sources, depending on your data requirements. For example, you might have your web server logs stored in Amazon S3, customer details on Amazon Redshift tables, and order details on RDS. You may need to build reports from data combined from two or more of these different data sources.

You can accomplish this to some extent by building data pipelines to consolidate from multiple data sources into one single data source. However, creating these data pipelines results in data duplications across various AWS services, and adds additional cost in terms of effort and time to move data to a single data source. You would then build Amazon QuickSight data sets from this single data source. With cross data source join available directly on Amazon QuickSight, you can eliminate this problem.

There is no size restriction on your largest source, as long as the post-join table can fit into your SPICE capacity per data set. The rest of the tables together need to be within 1 GB in size. For example, if you have 20 numeric columns in your smaller table, you can fit about 5 million rows until you exceed the 1 GB memory limit.

This post demonstrates how to create data sets from two CSV files and how to join an RDS table with an S3 file. The post uses an example table with orders-related data in one data source, and returns-related data in another data source. The final goal is to create one single data set that contains both orders and returns data.

Prerequisites

Before getting started, download these CSV files in to your local machines:

Also, learn how to create, edit, delete QuickSight data sources from Working with Data Sources in Amazon QuickSight.

Joining multiple CSV files

To join two CSV files, complete the following steps:

  1. Use the orders CSV file downloaded from the S3 bucket above and upload to QuickSight.
  2. After selecting the ‘orders’ sheet, go to edit/preview data page where your data set details appears.
  1. From the top menu, choose Add data.A pop-up window appears with the option to either switch data sources or upload a new CSV to join with the current data set. The window also shown has existing source.
  1. Choose Upload a file and upload the ‘returns’ CSV file.After uploading the file, you can see sheet names for both the CSVs. The following screenshot shows orders and returns.
  1. Choose the two circles between the files.This step allows you to edit the join configuration between the two CSVs. In the Join configuration section, you can select the join type (inner, left, right, or full) and also select the column on which to apply the join. This post uses an inner join.
  1. For Join type, choose Inner.This post is joining the order ID classes of the two files.
  1. For Join classes, select Order ID in both drop-downs.
  1. Choose Apply.

You now have a data set that contains both orders and returns data from two different CSVs. You can save your data set and continue to create your analysis.

Joining an RDS table with S3

In this example, you have orders data in RDS and returns data as a JSON file in an S3 bucket. To join this data, complete the following steps:

  1. Create a data set on QuickSight from the RDS table. Review Create a Database Data Set and an Analysis to learn to connect to RDS to create data sets.
  2. Next, go to Manage data set, select the RDS data set. This post uses the orders data set.
  1. Choose Edit data set.A page with your data set details appears.
  1. From the top menu, choose Add data.A pop-up window appears with the option to either switch data sources or upload a new CSV to join with the current data set. The window also shown has existing source.
  2. Choose Switch data source.A list appears of the different data sets and their sources.
  1. Choose a data set. This post chooses Returns.You can now see both data sets linked together.
  2. Choose the two spheres between the data sets.
  3. Under Join configuration, choose your desired Join type. This post chooses Inner.
  1. For Join clauses, from the drop-downs of each data set, select the column on which to apply the join.This post chooses order_id for orders and Order ID for Returns.
  1. Choose Apply.

Your new data set contains both orders and returns data from two different CSVs. You can save your data set and continue to create your analysis.

Conclusion

This post showed you how to join data from two files, join tables from RDS and S3. You can join data from any two data sources (except IoT) on Amazon QuickSight with this method. Cross data source join capability is now available in both Enterprise and Standard editions in all Amazon QuickSight Regions.

 


About the Author

Rakshith Dayananda is an IT app dev engineer at AWS.

 

 

 

Analyze Google Analytics data using Upsolver, Amazon Athena, and Amazon QuickSight

Post Syndicated from Roy Hasson original https://aws.amazon.com/blogs/big-data/analyze-google-analytics-data-using-upsolver-amazon-athena-and-amazon-quicksight/

In this post, we present a solution for analyzing Google Analytics data using Amazon Athena. We’re including a reference architecture built on moving hit-level data from Google Analytics to Amazon S3, performing joins and enrichments, and visualizing the data using Amazon Athena and Amazon QuickSight. Upsolver is used for data lake automation and orchestration, enabling customers to get started quickly.

Google Analytics is a popular solution for organizations who want to understand the performance of their web properties and applications. Google Analytics data is collected and aggregated to help users extract insights quickly.  This works great for simple analytics. It’s less than ideal, however, when you need to enrich Google Analytics data with other datasets to produce a comprehensive view of the customer journey.

Why analyze Google Analytics data on AWS?

Google Analytics has become the de-facto standard web analytics tool. It is offered for free at lower data volumes and provides tracking, analytics, and reporting.  It enables non-technical users to understand website performance by answering questions such as: where are users coming from? Which pages have the highest conversion rates? Where are users experiencing friction and abandoning their shopping cart?

While these questions are answered within the Google Analytics UI, there are however some limitation, such as:

  • Data sampling: Google Analytics standard edition displays sampled data when running ad hoc queries on time periods that contain more than 500,000 sessions. Large websites can easily exceed this number on a weekly or even daily basis. This can create reliability issues between different reports, as each query can be fed by a different sample of the data.
  • Difficulty integrating with existing AWS stack: Many customers have built or are in the process of building their data and analytics platform on AWS. Customers want to use the AWS analytics and machine learning capabilities with their Google Analytics data to enable new and innovative use cases.
  • Joining with external data sources: Seeing the full picture of a business’ online activity might require combining web traffic data with other sources. Google Analytics does not offer a simple way to either move raw data in or out of the system. Custom dimensions in Google Analytics can be used, but they are limited to 20 for the standard edition and are difficult to use.
  • Multi-dimensional analysis: Google Analytics custom reports and APIs are limited to seven dimensions per query. This limits the depth of analysis and requires various workarounds for more granular slicing and dicing.
  • Lack of alternatives: Google Analytics 360, which allows users to export raw data to Google BigQuery, carries a hefty annual fee. This can be prohibitive for organizations. And even with this upgrade, the native integration is only with BigQuery, which means users still can’t use their existing AWS stack.

Building or buying a new web analytics solution (including cookie-based tracking) is also cost-prohibitive, and can interrupt existing workflows that rely on Google Analytics data.

Customers are looking for a solution to enable their analysts and business users to incorporate Google Analytics data into their existing workflows using familiar AWS tools.

Moving Google Analytics data to AWS: Defining the requirements

To provide an analytics solution with the same or better level of reporting as Google Analytics, we designed our solution around the following tenets:

  1. Analytics with a low technical barrier to entry: Google Analytics is built for business users, and our solution is designed to provide a similar experience. This means that beyond ingesting the data, we want to automate the data engineering work that goes into making the data ready for analysis.  This includes data retention, partitioning, and compression. All of this work must be done under the hood and remain invisible to the user querying the data.
  2. Hit-level data: Google Analytics tracks clickstream activity based on Hits – the lowest level of interaction between a user and a webpage. These hits are then grouped into Sessions – hits within a given time period, and Users – groups of sessions (more details here). The standard Google Analytics API is limited to session and user-based queries, and does not offer any simple way of extracting hit-level data. Our solution, however, does provide access to this granular data.
  3. Unsampled data: By extracting the data from Google Analytics and storing it on Amazon S3, we are able to bypass the 500K sessions limitation. We also have access to unsampled data for any query at any scale.
  4. Data privacy: If sensitive data is stored in Google Analytics, relying on third-party ETL tools can create risks around data privacy, especially in the era of GDPR. Therefore, our solution encrypts data in transit and relies exclusively on processing within the customer’s VPC.

Solution overview

The solution is built on extracting hit-level data and storing it in a data lake architecture on Amazon S3. We then use Amazon Athena and Amazon QuickSight for analytics and reporting. Upsolver, an AWS premier solution provider, is used to automate ingestion, ETL and data management on S3. Upsolver also orchestrate the entire solution with a simple-to-use graphical user interface.  The following diagram shows the high level architecture of our solutions.

Reference architecture showing the flow of data across Google Anaytics, Amazon Athena and Amazon QuickSight

Using Upsolver’s GA connector we extract unsampled, hit-level data from Google Analytics. This data is then automatically ingested according to accepted data lake best practices and stored in an optimized form on Amazon S3. The following best practices are applied to the data:

  • Store data in Apache Parquet columnar file format to improve read performance and reduce the amount of data scanned per query.
  • Partition data by event (hit) time rather than by API query time.
  • Perform periodic compaction by which small files are merged into larger ones improving performance and optimizing compression.

Once data is stored on S3, we use Upsolver’s GUI to create structured fact tables from the Google Analytics data. Users can query them using Amazon Athena and Amazon Redshift. Upsolver provides simple to use templates to help users quickly create tables from their Google Analytics data.  Finally, we use Amazon QuickSight to create interactive dashboards to visualize the data.

The result is a complete view of our Google Analytics data. This view provides the level of self-service analytics that users have grown accustomed to, at any scale, and without the limitations outlined earlier.

Building the solution: Step by step guide

In this section, we walk through the steps to set up the environment, configure Upsolver’s Google Analytics plugin, extract the data, and begin exploring.

Step 1: Installation and permissions

  1. Sign up for Upsolver (can also be done via the AWS Marketplace).
  2. Allow Upsolver access to read data from Google Analytics and add new custom dimensions. Custom dimensions enable Upsolver to read non-sampled hit-level data directly from Google Analytics instead of creating parallel tracking mechanisms that aren’t as trust-worthy.
  3. To populate the custom dimensions that were added to Google Analytics, allow Upsolver to run a small JavaScript code on your website. If you’re using GA360, this is not required.

Step 2: Review and clean the raw data

For supported data sources, Upsolver automatically discovers the schema and collects key statistics for every field in the table. Doing so gives users a glimpse into their data.

In the following screenshot, you can see schema-on-read information on the left side, stats per field and value distribution on the right side.

Screen shot of the Upsolver UI showing schema-on-read information on the left side, stats per field and value distribution on the right side

Step 3: Publishing to Amazon Athena

Upsolver comes with four templates for creating tables in your AWS based data lake according to the Google Analytics entity being analyzed:

  • Pageviews – used to analyze user flow and behavior on specific sections of the web property using metrics such as time on page and exit rate.
  • Events – user-defined interactions such as scroll depth and link clicks.
  • Sessions – monitor a specific journey in the web property (all pageviews and events).
  • Users – understand a user’s interaction with the web property or app over time.

All tables are partitioned by event time, which helps improve query performance.

Upsolver users can choose to run the templates as-is, modify them first or create new tables unique to their needs.

The following screenshot shows the schema produced by the Pageviews template:

Screen shot of the Upsolver UI showing the schema produced by the Pageviews template:

The following screenshot shows the Pageviews and Events tables as well as the Amazon Athena views for Sessions and Users generated by the Upsolver templates.

Screenshot showing the Pageviews and Events tables as well as the Athena views for Sessions and Users generated from the Upsolver templates.

The following are a couple example queries you may want to run to extract specific insights:

-- Popular page titles 
SELECT page_title, 
       Count(*) AS num_hits 
FROM   ga_hits_pageviews 
GROUP  BY page_title 
ORDER  BY 2 DESC 
-- User aggregations from hit data 
SELECT user_id, 
       Count(*)                   AS num_hits, 
       Count(DISTINCT session_id) AS num_of_sessions, 
       Sum(session_duration)      AS total_sessions_time 
FROM   ga_hits_pageviews 
GROUP  BY user_id 

Step 4: Visualization in Amazon QuickSight

Now that the data has been ingested, cleansed, and written to S3 in a structured manner, we are ready visualize it with Amazon QuickSight. Start by creating a dashboard to mimic the one provided by Google Analytics.  But we don’t need to stop there.  We can use QuickSight ML Insights to extract deeper insights from our data.  We can also embed Amazon QuickSight visualizations into existing web portals and applications making insights available to everyone.

Screenshot of QuickSight visual ization showing several sections, one with a graph, several others with various statistics

Screen shot of QuickSight showing a global map with usage concentrations marked by bubbles, alongside a pie graph.

Sreenshot of QuickSight showing a bar graph, alongside a table with various data values.

Conclusion

With minimal setup, we were able to extract raw hit-level Google Analytics data, prepare, and stored it in a data lake on Amazon S3.  Using Upsolver, combined with Amazon Athena and Amazon QuickSight, we built a feature-complete solution for analyzing web traffic collected by Google Analytics on AWS.

Key technical benefits:

  • Schema on-read means data consumers don’t need to model the data into a table structure, and can instantly understand what their top dimensions are. For example, 85% of my users navigate my website using Google Chrome browser.
  • Graphical user interface that enables self-service consumption of Google Analytics data.
  • Fast implementation using pre-defined templates that map raw data from Google Analytics to tables in the data lake.
  • Ability to replay historical Google Analytics data stored on Amazon S3.
  • Ability to partition the data on Amazon S3 by hit time reducing complexity of handling late arriving events.
  • Optimize data on Amazon S3 automatically for improved query performance.
  • Automatically manage tables and partitions in AWS Glue Data Catalog.
  • Fully integrated with a suite of AWS native services – Amazon S3, Amazon Athena, Amazon Redshift and Amazon QuickSight.

Now that we have feature parity, we can begin to explore integrating other data sources such as CRM, sales, and customer profile to build a true 360-degree view of the customer.  Furthermore, you can now begin using AWS Machine Learning services to optimize traffic to your websites, forecast demand and personalize the user experience.

We’d love to hear what you think. Please feel free to leave a comment with any feedback or questions you may have.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

 


About the Authors


Roy Hasson is the global business development lead of analytics and data lakes at AWS.
He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan, cheering his team on and hanging out with his family.

 

 

 

Eran Levy is the director of marketing at Upsolver.

 

 

 

 

Visualizing Sensor Data in Amazon QuickSight

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/visualizing-sensor-data-in-amazon-quicksight/

This post is courtesy of Moheeb Zara, Developer Advocate, AWS Serverless

The Internet of Things (IoT) is a term used wherever physical devices are networked in some meaningful connected way. Often, this takes the form of sensor data collection and analysis. As the number of devices and size of data scales, it can become costly and difficult to keep up with demand.

Using AWS Serverless Application Model (AWS SAM), you can reduce the cost and time to market of an IoT solution. This guide demonstrates how to collect and visualize data from a low-cost, Wi-Fi connected IoT device using a variety of AWS services. Much of this can be accomplished within the AWS Free Usage Tier, which is necessary for the following instructions.

Services used

The following services are used in this example:

What’s covered in this post?

This post covers:

  • Connecting an Arduino MKR 1010 Wi-Fi device to AWS IoT Core.
  • Forwarding messages from an AWS IoT Core topic stream to a Lambda function.
  • Using a Kinesis Data Firehose delivery stream to store data in S3.
  • Analyzing and visualizing data stored in S3 using Amazon QuickSight.

Connect the device to AWS IoT Core using MQTT

The Arduino MKR 1010 is a low-cost, Wi-Fi enabled, IoT device, shown in the following image.

An Arduino MKR 1010 Wi-Fi microcontroller

Its analog and digital input and output pins can be used to read sensors or to write to actuators. Arduino provides a detailed guide on how to securely connect this device to AWS IoT Core. The following steps build upon it to push arbitrary sensor data to a topic stream and ultimately visualize that data using Amazon QuickSight.

  1. Start by following this comprehensive guide to using an Arduino MKR 1010 with AWS IoT Core. Upon completion, your device is connected to AWS IoT Core using MQTT (Message Queuing Telemetry Transport), a protocol for publishing and subscribing to messages using topics.
  2. In the Arduino IDE, choose File, Sketch, Include Library, and Manage Libraries.
  3. In the window that opens, search for ArduinoJson and select the library by Benoit Blanchon. Choose install.

4. Add #include <ArduinoJson.h> to the top of your sketch from the Arduino guide.

5. Modify the publishMessage() function with this code. It publishes a JSON message with two keys: time (ms) and the current value read from the first analog pin.

void publishMessage() {  
  Serial.println("Publishing message");

  // send message, the Print interface can be used to set the message contents
  mqttClient.beginMessage("arduino/outgoing");
  
  // create json message to send
  StaticJsonDocument<200> doc;
  doc["time"] = millis();
  doc["sensor_a0"] = analogRead(0);
  serializeJson(doc, mqttClient); // print to client
  
  mqttClient.endMessage();
}

6. Save and upload the sketch to your board.

Create a Kinesis Firehose delivery stream

Amazon Kinesis Data Firehose is a service that reliably loads streaming data into data stores, data lakes, and analytics tools. Amazon QuickSight requires a data store to create visualizations of the sensor data. This simple Kinesis Data Firehose delivery stream continuously uploads data to an S3 storage bucket. The next sections cover how to add records to this stream using a Lambda function.

  1. In the Kinesis Data Firehose console, create a new delivery stream, called SensorDataStream.
  2. Leave the default source as a Direct PUT or other sources and choose Next.
  3. On the next screen, leave all the default values and choose Next.
  4. Select Amazon S3 as the destination and create a new bucket with a unique name. This is where records are continuously uploaded so that they can be used by Amazon QuickSight.
  5. On the next screen, choose Create New IAM Role, Allow. This gives the Firehose delivery stream permission to upload to S3.
  6. Review and then choose Create Delivery Stream.

It can take some time to fully create the stream. In the meantime, continue on to the next section.

Invoking Lambda using AWS IoT Core rules

Using AWS IoT Core rules, you can forward messages from devices to a Lambda function, which can perform actions such as uploading to an Amazon DynamoDB table or an S3 bucket, or running data against various Amazon Machine Learning services. In this case, the function transforms and adds a message to the Kinesis Data Firehose delivery stream, which then adds that data to S3.

AWS IoT Core rules use the MQTT topic stream to trigger interactions with other AWS services. An AWS IoT Core rule is created by using an SQL statement, a topic filter, and a rule action. The Arduino example publishes messages every five seconds on the topic arduino/outgoing. The following instructions show how to consume those messages with a Lambda function.

Create a Lambda function

Before creating an AWS IoT Core rule, you need a Lambda function to consume forwarded messages.

  1. In the AWS Lambda console, choose Create function.
  2. Name the function ArduinoConsumeMessage.
  3. For Runtime, choose Author From Scratch, Node.js10.x. For Execution role, choose Create a new role with basic Lambda permissions. Choose Create.
  4. On the Execution role card, choose View the ArduinoConsumeMessage-role-xxxx on the IAM console.
  5. Choose Attach Policies. Then, search for and select AmazonKinesisFirehoseFullAccess.
  6. Choose Attach Policy. This applies the necessary permissions to add records to the Firehose delivery stream.
  7. In the Lambda console, in the Designer card, select the function name.
  8. Paste the following in the code editor, replacing SensorDataStream with the name of your own Firehose delivery stream. Choose Save.
const AWS = require('aws-sdk')

const firehose = new AWS.Firehose()
const StreamName = "SensorDataStream"

exports.handler = async (event) => {
    
    console.log('Received IoT event:', JSON.stringify(event, null, 2))
    
    let payload = {
        time: new Date(event.time),
        sensor_value: event.sensor_a0
    }
    
    let params = {
            DeliveryStreamName: StreamName,
            Record: { 
                Data: JSON.stringify(payload)
            }
        }
        
    return await firehose.putRecord(params).promise()

}

Create an AWS IoT Core rule

To create an AWS IoT Core rule, follow these steps.

  1. In the AWS IoT console, choose Act.
  2. Choose Create.
  3. For Rule query statement, copy and paste SELECT * FROM 'arduino/outgoing’. This subscribes to the outgoing message topic used in the Arduino example.
  4. Choose Add action, Send a message to a Lambda function, Configure action.
  5. Select the function created in the last set of instructions.
  6. Choose Create rule.

At this stage, any message published to the arduino/outgoing topic forwards to the ArduinoConsumeMessage Lambda function, which transforms and puts the payload on the Kinesis Data Firehose stream and also logs the message to Amazon CloudWatch. If you’ve connected an Arduino device to AWS IoT Core, it publishes to that topic every five seconds.

The following steps show how to test functionality using the AWS IoT console.

  1. In the AWS IoT console, choose Test.
  2. For Publish, enter the topic arduino/outgoing .
  3. Enter the following test payload:
    {
      “time”: 1567023375013,  
      “sensor_a0”: 456
    }
  4. Choose Publish to topic.
  5. Navigate back to your Lambda function.
  6. Choose Monitoring, View logs in CloudWatch.
  7. Select a log item to view the message contents, as shown in the following screenshot.

Visualizing data with Amazon QuickSight

To visualize data with Amazon QuickSight, follow these steps.

  1. In the Amazon QuickSight console, sign up.
  2. Choose Manage Data, New Data Set. Select S3 as the data source.
  3. A manifest file is necessary for Amazon QuickSight to be able to fetch data from your S3 bucket. Copy the following into a file named manifest.json. Replace YOUR-BUCKET-NAME with the name of the bucket created for the Firehose delivery stream.
    {
       "fileLocations":[
          {
             "URIPrefixes":[
                "s3://YOUR-BUCKET-NAME/"
             ]
          }
       ],
       "globalUploadSettings":{
          "format":"JSON"
       }
    }
  4. Upload the manifest.json file.
  5. Choose Connect, then Visualize. You may have to give Amazon QuickSight explicit permissions to your S3 bucket.
  6. Finally, design the Amazon QuickSight visualizations in the drag and drop editor. Drag the two available fields into the center card to generate a Sum of Sensor_value by Time visual.

Conclusion

This post demonstrated visualizing data from a securely connected remote IoT device. This was achieved by connecting an Arduino to AWS IoT Core using MQTT, forwarding messages from the topic stream to Lambda using IoT Core rules, putting records on an Amazon Kinesis Data Firehose delivery stream, and using Amazon QuickSight to visualize the data stored within an S3 bucket.

With these building blocks, it is possible to implement highly scalable and customizable IoT data collection, analysis, and visualization. With the use of other AWS services, you can build a full end-to-end platform for an IoT product that can reliably handle volume. To further explore how hardware and AWS Serverless can work together, visit the Amazon Web Services page on Hackster.

Federate Amazon QuickSight access with Okta

Post Syndicated from Loc Trinh original https://aws.amazon.com/blogs/big-data/federate-amazon-quicksight-access-with-okta/

Amazon QuickSight is a fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in your organization. As a fully managed service, Amazon QuickSight lets you easily create and publish interactive dashboards that can then be accessed from any device and embedded into your applications, portals, and websites.

Amazon QuickSight supports identity federation through Security Assertion Markup Language 2.0 (SAML 2.0) in both Standard and Enterprise editions. With federation, you can manage users using your enterprise identity provider (IdP) and pass them to Amazon QuickSight at log-in. Such IdPs include Microsoft Active Directory Federation Services, Ping One Federation Server, and Okta.

This post provides step-by-step guidance for how to use Okta to federate access to Amazon QuickSight.

Create an Okta application

Sign in to your Okta admin dashboard. You can create a free Okta Developer Edition account.

  1. From the Okta admin dashboard ribbon, choose Applications.
  2. If you are viewing the Developer Console, switch to Classic UI, as shown in the following screenshot.
  3. Choose Add Application.
  4. Search for Amazon Web Services and choose Add.
  5. Rename Application label to Amazon QuickSight and choose Next.
  6. For Sign-On Options, choose SAML 2.0.
  7. For Default Relay State, type https://quicksight.aws.amazon.com.
  8. Right-click on Identity Provider metadata and choose Save Link As…
  9. Save the XML file to disk and choose Done. You need to use this file in the next steps.

Create a SAML provider in AWS

Open a new window and sign in to the AWS Management Console.

  1. Open the IAM console.
  2. In the navigation pane, choose Identity Providers, Create Provider.
  3. For Provider Type, choose SAML and provide a Provider Name (for example, Okta).
  4. For Metadata Document, upload the XML file from the previous steps.
  5. Choose Next Step, Create.
  6. Locate the IdP that you just created and make note of the Provider ARN

Create a role for federated users

This section describes the steps for creating an IAM SAML 2.0 federation role. While Okta is used for a single sign-on, there are two ways to provision users in Amazon QuickSight:

  • Grant the federation role permission to create new Amazon QuickSight users when a user visits for the first time.
  • Pre-provision Amazon QuickSight users using the API and add users to the appropriate groups. This is preferred for adding users to groups within Amazon QuickSight, because you can provision the user and add them to the groups at the same time.

The following steps demonstrate how to create a federation role with permission to create new Amazon QuickSight users. If you would rather pre-provision Amazon QuickSight users, instructions for using the API are at the end of this post.

  1. Open the IAM console.
  2. In the navigation pane, choose Roles, Create Role, Select type of trusted entity as SAML 2.0 federation.
  3. For SAML provider, select the IdP that you created in the previous steps (Okta).
  4. Select Allow programmatic and AWS Management Console access.
  5. Choose Next: Permissions, Create policy.
  6. In the Create policy window, navigate to the JSON tab and type the following:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "sts:AssumeRoleWithSAML",
                "Resource": "<YOUR SAML IDENTITY PROVIDER ARN>",
                "Condition": {
                    "StringEquals": {
                        "saml:aud": "https://signin.aws.amazon.com/saml"
                    }
                }
            },
            {
                "Action": [
                    "quicksight:CreateReader"
                ],
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:quicksight::<YOUR ACCOUNT ID>:user/${aws:userid}"
                ]
            }
        ]
    }

The IAM policy above grants the federation role permission to self-provision an Amazon QuickSight reader with the quicksight:CreateReader action. Best practice is to grant users in your organization reader access, and then upgrade users from within the application. Instructions for upgrading users are at the end of this post.

If you would rather pre-provision Amazon QuickSight users using the API, do not include any actions in the permission policy.

  1. Choose Review Policy.
  2. For Name, enter a value (for example, QuicksightOktaFederatedPolicy) and choose Create policy.
  3. On the Create role page, choose Refresh and select your new policy.
  4. Choose Next: Tags and Next: Review.
  5. Provide a Role name (for example, QuicksightOktaFederatedRole) and Role description.
  6. Choose Create role.

Create an AWS access key for Okta

To create an access key for Okta, follow these steps.

  1. Open the IAM console.
  2. In the navigation pane, choose Users, Add user.
  3. For User name, enter a value (for example, OktaSSOUser).
  4. For Access type, choose Programmatic access.
  5. Choose Next: Permissions, Attach existing policies directly, and Create policy.
  6. On the Create policy page, navigate to the JSON tab and type the following:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "iam:ListRoles",
                    "iam:ListAccountAliases"
                ],
                "Resource": "*"
            }
        ]
    }

  7. Choose Review Policy.
  8. For a Name, enter a value (for example, OktaListRolesPolicy) and choose Create policy.
  9. On the Create user page, choose Refresh and select your new policy.
  10. Choose Next: Tags, Next: Review, and Create user.
  11. To save your access key and secret key, choose Download .csv.

Configure the Okta application

Return to the window with your Okta admin dashboard.

  1. For Identity Provider ARN (Required only for SAML SSO), provide the ARN (for example, arn:aws:iam::<YOUR ACCOUNT ID>:saml-provider/Okta) of the IdP that you created in previous steps.
  2. Choose Done.
  3. From the Applications dashboard, choose Provisioning.
  4. Choose Configure API Integration.
  5. Select Enable API Integration.
  6. For Access Key and Secret Key, provide the access key and secret key that you downloaded in previous steps.
  7. Choose Test API CredentialsSave.
  8. From the SETTINGS pane, navigate to To App.
  9. Choose Edit.
  10. Enable Create Usersand choose Save.
  11. Choose Assignments, Assign and then select the users or groups to which to grant federated access.
  12. Select the Roles and SAML User Roles to grant to the users, as shown in the following screenshot.
  13. Choose Save and Go Back, Done.

Launch Amazon QuickSight

Log in to your Okta Applications dashboard with a user (if you are using an admin account, switch to user mode) that has been granted federated access. You should see a new application with your label (for example, Amazon QuickSight). Choose on the application icon to launch Amazon QuickSight.

You can now manage your users and groups using Okta as your IdP and federate access to Amazon QuickSight.

Pre-provisioning Amazon QuickSight users

The outlined steps demonstrate how to grant users permission to self-provision Amazon QuickSight users when they visit Amazon QuickSight for the first time. If you would rather pre-provision Amazon QuickSight users, you can use the API to create users and groups and then add users to those groups.

  1. To create an Amazon QuickSight user, run the following AWS CLI Link the Amazon QuickSight user to your federated Okta username by providing the user-name parameter with the format <role name>\<email> (for example, QuicksightOktaFederatedRole\[email protected])
    aws quicksight register-user \
        --aws-account-id=<YOUR ACCOUNT ID> \
        --namespace=default \
        --email=<[email protected]> \
        --user-name=<ROLE NAME>\<[email protected]> \
        --identity-type=QUICKSIGHT \
        --user-role=READER

  2. Optionally, create an Amazon QuickSight group.
    aws quicksight create-group \
        --aws-account-id=<YOUR ACCOUNT ID> \
        --namespace=default \
        --group-name="<YOUR GROUP NAME>" \
        --description="<YOUR GROUP DESCRIPTION>"

  3. Add users to groups.
    aws quicksight create-group-membership \
        --aws-account-id=<YOUR ACCOUNT ID> \
        --namespace=default \
        --group-name="<YOUR GROUP NAME>" \
        --member-name="<YOUR MEMBER USER NAME>"

By using the Amazon QuickSight API, you can manage users, groups, and group membership. After they’re created, groups automatically become available for use when modifying permissions to data sets, analyses, or dashboards by typing in the group name instead of a specific user. For other supported group and user management functions, see List of Actions by Function.

Managing users

You can upgrade users between reader and author or admin in the Manage users tab of the Manage QuickSight screen.

  1. In the Amazon QuickSight console, choose your user name in the upper-right corner and choose Manage QuickSight.
  2. In the navigation pane, choose Manage users.
  3. Locate the user to upgrade, and select the role to grant from the Role

Deep-linking dashboards

AmazonQuickSight dashboards can be shared using the Okta application’s single sign-on URL so that users can be federated directly to specific dashboards.

To deep link to a specific Amazon QuickSight dashboard with single sign-on, first locate the Okta application’s single sign-on URL. This can be found by opening the metadata XML file that you downloaded in the Create an Okta application steps above. The URL is the value of the Location attribute in the md:SingleSignOnService element and ends with /sso/saml.

After you have the Okta application’s single sign-on URL, append ?RelayState= to the end of the URL followed by the URL to your Amazon QuickSight dashboard. For example, your deep link URL might look as follows:

https://my-test-org.okta.com/app/amazon_aws/abcdefg12345XYZ678/sso/saml?RelayState=https://us-east-1.quicksight.aws.amazon.com/sn/dashboards/11111111-abcd-1234-efghi-111111111111

By deep-linking dashboards, you can provide users with a way to use single sign-on and directly access specific dashboards.

Summary

This post provided a step-by-step guide for configuring Okta as your IdP, and using IAM roles to enable single sign-on to Amazon QuickSight. It also showed how users and groups can be managed using the Amazon QuickSight API.

Although this post demonstrated the integration of IAM and Okta, you can replicate this solution using your choice of SAML 2.0 IdPs. For other supported federation options, see Enabling Single Sign-On Access to Amazon QuickSight Using SAML 2.0.

If you have any questions or feedback, please leave a comment.

 


About the Authors

Loc Trinh is a solutions architect at Amazon Web Services.

 

 

 

 

Naresh Gautam is a senior solutions architect at Amazon Web Services.

 

 

 

Create advanced insights using Level Aware Aggregations in Amazon QuickSight

Post Syndicated from Arun Baskar original https://aws.amazon.com/blogs/big-data/create-advanced-insights-using-level-aware-aggregations-in-amazon-quicksight/

Amazon QuickSight recently launched Level Aware Aggregations (LAA), which enables you to perform calculations on your data to derive advanced and meaningful insights. In this blog post, we go through examples of applying these calculations to a sample sales dataset so that you can start using these for your own needs.

What are Level Aware Aggregations?

Level aware aggregations are aggregation calculations that can be computed at a desired level in the overall query evaluation order of QuickSight. Please check this link for details on QuickSight’s Order of Evaluation. Up until now, the only types of aggregations possible in QuickSight were Display-level and Table calculation aggregation types.

  • Display-level aggregations are aggregations that are defined by the dimensions and metrics present in the field wells of a QuickSight visual.
  • Table calculations are computed by windowing/rolling-up over the display-level aggregated values of the visual. Hence, by definition, these are calculated after the Display-level aggregations are computed.

With Level Aware Aggregations, QuickSight now allows you to aggregate values before the Display-level aggregation. For more information, please visit the Level Aware Aggregations documentation.

Customer use cases

Distribution of customers by lifetime orders

Customer question: How many customers have made one order, two orders, three orders, and so forth?

In this case, we first want to aggregate the total number of orders made by each customer, then use the output of that as a visual dimension. This isn’t feasible to compute without LAA.

Solution using LAA

1.) Compute the number of orders per customer.

Calculated field name : NumberOrdersPerCustomer

Calculated field expression : countOver({order_id}, [{Customer Id}], PRE_AGG)

This computes the number of orders per customer, before the display-level aggregation of the visual.

2.) Create the visual.

Create the visual with the above field NumberOrdersPerCustomer in the “X-Axis well of the Field Wells. Add “Count Distinct” of “Customer Id” in the “Value” section of the Field Wells to create a histogram on number of orders made by customers.

As we can see, there are around 5000 unique customers with one order, around 3500 customers with two orders, and so on.

Filter out customers based on lifetime spends

Customer Question: How do I filter out customers with life-time spend less than $100,000? My visual’s dimension (group by) and metric definitions are independent of total spend per customer.

If the group dimensions of the aforementioned aggregation(spend) is exactly the same as the group dimensions in the field well, the customer can achieve this using aggregated filters feature. But that’s not always the case. As mentioned in the customer question, the visual’s definition can be different from the filter’s aggregation.

Solution using LAA

1.) Compute sum of sales per customer.

Calculated field name :  salesPerCustomer

Calculated field expression : sumOver(sales,[{customer_id}],PRE_AGG)

PRE_AGG indicates that the computation must occur before display-level aggregation.

2.) Create the visuals.

The visual on the left shows sum of sales per segment and the visual on the right shows the total number of customers. Note that there are no filters applied at this point.

3.) Create the filter on salesPerCustomer. 

Create a filter on top of the above field salesPerCustomer to select items greater than $100,000.

4.) Apply the filter.

The above image shows applying the filter on “salesPerCustomer” greater than $100,000.

With the filter applied, we have excluded the customers whose total spend is less than $100,000, regardless of what we choose to display in the visuals.

Fixed percent of total sales even with filters applied

Customer Question: How much is the contribution of each industry to the entire company’s profit (percent of total)? I don’t want the total to recompute when filters are applied.

The existing table calculation function percentOfTotal isn’t able to solve this problem, since filters on categories are applied before computing the total. Using percentOfTotal would recalculate the total every time filters are applied. We need a solution that doesn’t consider the filtering when computing the total.

Solution using LAA

1.) Compute total sales before filters through a calculated field.

Calculated field name : totalSalesBeforeFilters

Calculated field expression : sumOver(sales,[],PRE_FILTER)

PRE_FILTER indicates that this computation must be done prior to applying filters.

The partition dimension list (second argument) is empty since we want to compute the overall total.

2.) Compute the fixed percent of total sales.

Calculated field name : fixedPercentOfTotal

Calculated field expression : sum(sales) / min(totalSalesBeforeFilters)

Note: totalSalesBeforeFilters is the same for every row of the unaggregated data. Since we want to use it post-aggregation, we are using the aggregation min on top of it. If all values are the same, max or avg aggregations can be used as well as it serves the same purpose.

3.) Create the visual.

Add “industry” field to “Rows” well. Add “sales (SUM)” and “fixedPercentOfTotal“ to the ”values“ section. Now, the percent of total metric would remain fixed even if we filter out the data based on any underlying dimension or measure.

The visual shows sales per industry along with percent of total, computed using the table calculation percentOfTotal and using Level Aware Aggregation as described above. Both the percent of total values are currently the same since there aren’t any filters applied.

The visual shows the same metrics but with industries filtered only to 5 of them. As we can see “Percent of total sales” got re-adjusted to represent only the filtered data, whereas “Fixed Percent of total sales” remains the same even after filtering. Both the metrics are valuable customer use cases now feasible through QuickSight.

Compare sales in a category to industry average

Customer question: How do I compare sales in a category to the industry average? I want the industry average to include all categories even after filtering.

Since we want the industry average to stay fixed even with filtering, we need PRE_FILTER aggregation to achieve this.

Solution using LAA

1.) Compute the industry average.

Calculated field name : IndustryAverage

Calculated field expression : avgOver(sumOver(sales,[{category}],PRE_FILTER),[],PRE_FILTER)

We first compute the sum of sales per category and then average it across all categories. It’s important to note here that we first computed a finer level aggregation and fed that into a coarser level aggregation.

2.) Compute the difference from IndustryAverage.

Calculated field name : FixedDifferenceFromIndustryAverage

Calculated field expression : sum(sales) – min(IndustryAverage)

As mentioned in one of the examples above, we use min aggregation to retain the data while going.

3.) Create the visual.

Create the visual by adding “Category” in “X axis” field well and SUM(Sales), IndustryAverage and FixedDifferenceFromIndustryAverage as the values in a bar chart.

Visual shows total sales per category, the average across all industries and each category’s difference from average.

This visual shows the same metrics, but with categories filtered to include only 6 of them. As we can see, the industry average remained the same before and after filtering, keeping the difference the same whether you choose to show all categories, some of them, or just one.

Categorize customers based on lifetime spend

Customer question: How do I classify customers based on cumulative sales contribution? I then want to use that classification as my visual’s grouping.

The objective here is create custom sized bins to classify the customer. Even though we could do this classification post display-level aggregation, we wouldn’t be able to use it as a dimension/group by in the visual.

Solution using LAA

1.) Compute sales per customer before display-level aggregation.

Calculated field name : salesPerCustomer

Calculated field expression : sumOver({sales amount},[{customer id}],PRE_AGG)

2.) Categorize Customers.

Calculated field name : Customer Category

Calculated field expression : ifelse(salesPerCustomer < 1000, “VERY_LOW”, salesPerCustomer < 10000, “LOW”, salesPerCustomer < 100000, “MEDIUM”, “HIGH”)

3.) Create the visual.

Create the visual by adding “Customer Category” to the “Y-axis” field well, “Count Distinct” of “customer id” to the value field well.

Above image shows the number of unique customers per Custom Category.

Filtering can be done on top of these categories as well to build other relevant visuals, since the categories are tagged before aggregation.

Above image shows the number of unique customers per custom category split by gender.

Availability

Level aware aggregations are available in both Standard and Enterprise editions, in all supported AWS Regions. For more information, see the Amazon QuickSight documentation.

 


About the Author

Arun Baskar is a software development engineer for QuickSight at Amazon Web Services.

 

 

 

 

Creating custom Pinpoint dashboards using Amazon QuickSight, part 2

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/creating-custom-pinpoint-dashboards-using-amazon-quicksight-part-2/

Note: This post was written by Manan Nayar and Aprajita Arora, Software Development Engineers on the AWS Digital User Engagement team.


In our previous post, we discussed the process of visualizing specific, pre-aggregated Amazon Pinpoint metrics—such as delivery rate or open rate—using the Amazon Pinpoint Metrics APIs. In that example, we showed how to create a Lambda function that retrieves your metrics, and then make those metrics available for creating visualizations in Amazon QuickSight.

This post discusses shows a different approach to exporting data from Amazon Pinpoint and using it to build visualizations. Rather than retrieve specific metrics, you can use the event stream feature in Amazon Pinpoint to export raw event data. You can use this data in Amazon QuickSight to create in-depth analyses of your data, as opposed to visualizing pre-calculated metrics. As an added benefit, when you use this solution, you don’t have to modify any code, and the underlying data is updated every few minutes.

Step 1: Configure the event stream in Amazon Pinpoint

The Amazon Pinpoint event stream includes information about campaign events (such as campaign.send) and application events (such as session.start). It also includes response information related to all of the emails and SMS messages that you send from your Amazon Pinpoint project, regardless of whether they were sent from campaigns or on a transactional basis. When you enable event streams, Amazon Pinpoint automatically sends this data to your S3 bucket (via Amazon Kinesis Data Firehose) every few minutes.

To set up the event stream

  1. Sign in to the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint. In the list of projects, choose the project that you want to enable event streaming for.
  2. Under Settings, choose Event stream.
  3. Choose Edit, and then configure the event stream to use Amazon Kinesis Data Firehose. If you don’t already have a Kinesis Data Firehose stream, follow the link to create one in the Kinesis console. Configure the stream to send data to an S3 bucket. For more information about creating streams, see Creating an Amazon Kinesis Data Firehose Delivery Stream.
  4. Under IAM role, choose Automatically create a role. Choose Save.

Step 2: Add a data set in Amazon QuickSight

Now that you’ve started streaming your Amazon Pinpoint data to S3, you can set Amazon QuickSight to look for data in the S3 bucket. You connect QuickSight to sources of data by creating data sets.

To create a data set

    1. In a text editor, create a new file. Paste the following code:
      {
          "fileLocations": [
              {
                  "URIPrefixes": [ 
                      "s3://<bucketName>/"          
                  ]
              }
          ],
          "globalUploadSettings": {
              "format": "JSON"
          }
      }

      In the preceding code, replace <bucketName> with the name of the S3 bucket that you’re using to store the event stream data. Save the file as manifest.json.

    2. Sign in to the QuickSight console at https://quicksight.aws.amazon.com.
    3. Create a new S3 data set. When prompted, choose the manifest file that you created in step 1. For more information about creating S3 data sets, see Creating a Data Set Using Amazon S3 Files in the Amazon QuickSight User Guide.
    4. Create a new analysis. From here, you can start creating visualizations of your data. To learn more, see Creating an Analysis in the Amazon QuickSight User Guide.

Step 3: Set the refresh rate for the data set

You can configure your data sets in Amazon QuickSight to automatically refresh on a certain schedule. In this section, you configure the data set to refresh every day, one minute before midnight.

To set the refresh schedule

  1. Go to the QuickSight start page at https://quicksight.aws.amazon.com/sn/start. Choose Manage data.
  2. Choose the data set that you created in the previous section.
  3. Choose Schedule refresh. Follow the prompts to set up a daily refresh schedule.

Step 4: Create your visualizations

From this point, you can start creating visualizations of your data. To learn more about creating visualizations, see Creating an Analysis in the Amazon QuickSight User Guide.

Analyzing AWS WAF logs with Amazon ES, Amazon Athena, and Amazon QuickSight

Post Syndicated from Aaron Franco original https://aws.amazon.com/blogs/big-data/analyzing-aws-waf-logs-with-amazon-es-amazon-athena-and-amazon-quicksight/

AWS WAF now includes the ability to log all web requests inspected by the service. AWS WAF can store these logs in an Amazon S3 bucket in the same Region, but most customers deploy AWS WAF across multiple Regions—wherever they also deploy applications. When analyzing web application security, organizations need the ability to gain a holistic view across all their deployed AWS WAF Regions.

This post presents a simple approach to aggregating AWS WAF logs into a central data lake repository, which lets teams better analyze and understand their organization’s security posture. I walk through the steps to aggregate regional AWS WAF logs into a dedicated S3 bucket. I follow that up by demonstrating how you can use Amazon ES to visualize the log data. I also present an option to offload and process historical data using AWS Glue ETL. With the data collected in one place, I finally show you how you can use Amazon Athena and Amazon QuickSight to query historical data and extract business insights.

Architecture overview

The case I highlight in this post is the forensic use of the AWS WAF access logs to identify distributed denial of service (DDoS) attacks by a client IP address. This solution provides your security teams with a view of all incoming requests hitting every AWS WAF in your infrastructure.

I investigate what the IP access patterns look like over time and assess which IP addresses access the site multiple times in a short period of time. This pattern suggests that the IP address could be an attacker. With this solution, you can identify DDoS attackers for a single application, and detect DDoS patterns across your entire global IT infrastructure.

Walkthrough

This solution requires separate tasks for architecture setup, which allows you to begin receiving log files in a centralized repository, and analytics, which processes your log data into useful results.

Prerequisites

To follow along, you must have the following resources:

  • Two AWS accounts. Following AWS multi-account best practices, create two accounts:
    • A logging account
    • A resource account that hosts the web applications using AWS WAFFor more information about multi-account setup, see AWS Landing Zone. Using multiple accounts isolates your logs from your resource environments. This helps maintain the integrity of your log files and provides a central access point for auditing all application, network, and security logs.
  • The ability to launch new resources into your account. The resources might not be eligible for Free Tier usage and so might incur costs.
  • An application running with an Application Load Balancer, preferably in multiple Regions. If you do not already have one, you can launch any AWS web application reference architecture to test and implement this solution.

For this walkthrough, you can launch an Amazon ECS example from the ecs-refarch-cloudformation GitHub repo. This is a “one click to deploy” example that automatically sets up a web application with an Application Load Balancer. Launch this in two different Regions to simulate a global infrastructure. You ultimately set up a centralized bucket that both Regions log into, which your forensic analysis tools then draw from. Choose Launch Stack to launch the sample application in your Region of choice.

Setup

Architecture setup allows you to begin receiving log files in a centralized repository.

Step 1: Provide permissions

Begin this process by providing appropriate permissions for one account to access resources in another. Your resource account needs cross-account permission to access the bucket in the logging account.

  1. Create your central logging S3 bucket in the logging account and attach the following bucket policy to it under the Permissions Make a note of the bucket’s ARN. You need this information for future steps.
  2. Change RESOURCE-ACCOUNT-ID and CENTRAL-LOGGING-BUCKET-ARNto the correct values based on the actual values in your accounts:
     // JSON Document
     {
       "Version": "2012-10-17",
       "Statement": [
          {
             "Sid": "Cross Account AWS WAF Account 1",
             "Effect": "Allow",
             "Principal": {
                "AWS": "arn:aws:iam::RESOURCE-ACCOUNT-ID:root"
             },
             "Action": [
                "s3:GetObject",
                "s3:PutObject"
             ],
             "Resource": [
                "CENTRAL-LOGGING-BUCKET-ARN/*"
             ]
          }
       ]
    }

Step 2: Manage Lambda permissions

Next, the Lambda function that you create in your resource account needs permissions to access the S3 bucket in your central logging account so it can write files to that location. You already provided basic cross-account access in the previous step, but Lambda still needs the granular permissions at the resources level. Remember to grant these permissions in both Regions where you launched the application that you intend to monitor with AWS WAF.

  1. Log in to your resource account.
  2. To create an IAM role for the Lambda function, in the Lambda console, choose Policies, Create Policy.
  3. Choose JSON, and enter the following policy document. Replace YOUR-SOURCE-BUCKETand YOUR-DESTINATION-BUCKET with the relative ARNs of the buckets that you are using for this walkthrough.
    // JSON document
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "ListSourceAndDestinationBuckets",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket",
                    "s3:ListBucketVersions"
                ],
                "Resource": [
                    "YOUR-SOURCE-BUCKET",
                    "YOUR-DESTINATION-BUCKET"
                ]
            },
            {
                "Sid": "SourceBucketGetObjectAccess",
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject",
                    "s3:GetObjectVersion"
                ],
                "Resource": "YOUR-SOURCE-BUCKET/*"
            },
            {
                "Sid": "DestinationBucketPutObjectAccess",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject"
                ],
                "Resource": "YOUR-DESTINATION-BUCKET/*"
            }
        ]
     }

  4. Choose Review policy, enter your policy name, and save it.
  5. With the policy created, create a new role for your Lambda function and attach the custom policy to that role. To do this, navigate back to the IAM dashboard.
  6. Select Create roleand choose Lambda as the service that uses the role. Select the custom policy that you created earlier in this step and choose Next. You can add tags if required and then name and create this new role.
  7. You must also add S3 as a trusted entity in the Trust Relationship section of the role. Choose Edit trust relationship and add amazonaws.com to the policy, as shown in the following example.

Lambda and S3 now appear as trusted entities under the Trust relationships tab, as shown in the following screenshot.

Step 3: Create a Lambda function and copy log files

Create a Lambda function in the same Region as your resource account’s S3 bucket. This function reads log files from the resource account bucket and then copies that content to the logging account’s bucket. Repeat this step for every Region where you launched the application that you intend to monitor with AWS WAF.

  1. Log in to your resource account.
  2. Navigate to Lambda in your console and choose Create Function.
  3. Choose the Author from scratch function and name it. Choose the IAM role you created in the previous step and attach it to the Lambda function.
  4. Choose Create function.
  5. This Lambda function receives a document from S3 that contains nested JSON string data. To handle this data, you must extract the JSON from this string to retrieve key names of both the document and the bucket. Your function then uses this information to copy the data to your central logging account bucket in the next step. To create this function, Copy and paste this code into the Lambda function that you created. Replace the bucket names with the names of the buckets that you created earlier. After you decide on a partitioning strategy, modify this script later.
    // Load the AWS SDK
    const aws = require('aws-sdk');
    
    // Construct the AWS S3 Object 
    const s3 = new aws.S3();
    
    //Main function
    exports.handler = (event, context, callback) => {
        console.log("Got WAF Item Event")
        var _srcBucket = event.Records[0].s3.bucket.name;
        let _key = event.Records[0].s3.object.key;
        let _keySplit = _key.split("/")
        let _objName = _keySplit[ (_keySplit.length - 1) ];
        let _destPath = _keySplit[0]+"/"+_keySplit[1]+"/YOUR-DESTINATION-BUCKET/"+_objName;
        let _sourcePath = _srcBucket + "/" + _key;
        console.log(_destPath)
        let params = { Bucket: destBucket, ACL: "bucket-owner-full-control", CopySource: _sourcePath, Key: _destPath };
        s3.copyObject(params, function(err, data) {
            if (err) {
                console.log(err, err.stack);
            } else {
                console.log("SUCCESS!");
            }
        });
        callback(null, 'All done!');
    };

Step 4: Set S3 to Lambda event triggers

This step sets up event triggers in your resource account’s S3 buckets. These triggers send the file name and location logged by AWS WAF logs to the Lambda function. The triggers also notify the Lambda function that it must move the newly arrived file into your central logging bucket. Repeat this step for every Region where you launched the application that you intend to monitor with AWS WAF.

  1. Go to the S3 dashboard and choose your S3 bucket, then choose the Properties Under Advanced settings, choose Events.
  2. Give your event a name and select PUT from the Events check boxes.
  3. Choose Lambda from the Send To option and select your Lambda function as the destination for the event.

Step 5: Add AWS WAF to the Application Load Balancer

Add an AWS WAF to the Application Load Balancer so that you can start logging events. You can optionally delete the original log file after Lambda copies it. This reduces costs, but your business and security needs might err on the side of retaining that data.

Create a separate prefix for each Region in your central logging account bucket waf-central-logs so that AWS Glue can properly partition them. For best practices of partitioning with AWS Glue, see Working with partitioned data in AWS Glue. AWS Glue ingests your data and stores it in a columnar format optimized for querying in Amazon Athena. This helps you visualize the data and investigate the potential attacks.

Repeat this step for every Region where you launched the application that you intend to monitor with AWS WAF. The procedure assumes that you already have an AWS WAF enabled that you can use for this exercise. To move forward with the next step, you need AWS WAF enabled and connected to Amazon Kinesis Data Firehose for log delivery.

Setting up and configuring AWS WAF

If you don’t already have a web ACL in place, set up and configure AWS WAF at this point. This solution handles logging data from multiple AWS WAF logs in multiple Regions from more than one account.

To do this efficiently, you should consider your partitioning strategy for the data. You can grant your security teams a comprehensive view of the network. Create each partition based on the Kinesis Data Firehose delivery stream for the specific AWS WAF associated with the Application Load Balancer. This partitioning strategy also allows the security team to view the logs by Region and by account. As a result, your S3 bucket name and prefix look similar to the following example:

s3://central-waf-logs/<account_id>/<region_name>/<kinesis_firehose_name>/...filename...

Step 6: Copying logs with Lambda code

This step updates the Lambda function to start copying log files. Keep your partitioning strategy in mind as you update the Lambda function. Repeat this step for every Region where you launched the application that you intend to monitor with AWS WAF.

To accommodate the partitioning, modify your Lambda code to match the examples in the GitHub repo.

Replace <kinesis_firehose_name> in the example code with the name of the Kinesis Data Firehose delivery stream attached to the AWS WAF. Replace <central logging bucket name> with the S3 bucket name from your central logging account.

Kinesis Data Firehose should now begin writing files to your central S3 logging bucket with the correct partitioning. To generate logs, access your web application.

Analytics

Now that Kinesis Data Firehose can write collected files into your logging account’s S3 bucket, create an Elasticsearch cluster in your logging account in the same Region as the central logging bucket. You also must create a Lambda function to handle S3 events as the central logging bucket receives new log files. This creates a connection between your central log files and your search engine. Amazon ES gives you the ability to query your logs quickly to look for potential security threats. The Lambda function loads the data into your Amazon ES cluster. Amazon ES also includes a tool named Kibana, which helps with managing data and creating visualizations.

Step 7: Create an Elasticsearch cluster

  1. In your central Logging Account, navigate to the Elasticsearch Service in the AWS Console.
  2. Select Create Cluster, enter a domain name for your cluster, and choose version 3 from the Elasticsearch version dropdown. Choose Next.In this example, don’t implement any security policies for your cluster and only use one instance. For any real-world production tasks, keep your Elasticsearch Cluster inside your VPC.
  3. For network configuration, choose Public access and choose Next.
  4. For the access policy, and this tutorial, only allow access to the domain from a specified Account ID or ARN address. In this case, use your Account ID to gain access.
  5. Choose Next and on the final screen and confirm. You generally want to create strict access policies for your domain and not allow public access. This example only uses these settings to quickly demonstrate the capabilities of AWS services. I would never recommend this in a production environment.

AWS takes a few minutes to finish and activate your Amazon ES. Once it goes live, you can see two endpoints. The Endpoint URL is the URL you use to send data to the cluster.

Step 8: Create a Lambda function to copy log files

Add an event trigger to your central logs bucket. This trigger tells your Lambda function to write the data from the log file to Amazon ES. Before you create the S3 trigger, create a Lambda function in your logging account to handle the events.

For this Lambda function, we use code from the aws-samples GitHub repository that streams data from an S3 file line by line into Amazon ES. This example uses code taken from amazon-elasticsearch-lambda-samples. Name your new Lambda function myS3toES.

  1. Copy and paste the following code into a text file named js:
    exports.handler = (event, context, callback) => {
        // get the source bucket name
        var _srcBucket = event.Records[0].s3.bucket.name;
            // get the object key of the file that landed on S3
        let _key = event.Records[0].s3.object.key;
        
        // split the key by "/"
        let _keySplit = _key.split("/")
            // get the object name
        let _objName = _keySplit[ (_keySplit.length - 1) ];
            // reset the destination path
        let _destPath = _keySplit[0]+"/"+_keySplit[1]+"/<kinesis_firehose_name>/"+_objName;
            // setup the source path
        let _sourcePath = _srcBucket + "/" + _key;
            // build the params for the copyObject request to S3
        let params = { Bucket: destBucket, ACL: "bucket-owner-full-control", CopySource: _sourcePath, Key: _destPath };
            // execute the copyObject request
        s3.copyObject(params, function(err, data) {
            if (err) {
                console.log(err, err.stack);
            } else {
                console.log("SUCCESS!");
            }
        });
        callback(null, 'All done!');
    };

  2. Copy and paste this code into a text file and name it json:
    //JSON Document
    {
      "name": "s3toesfunction",
      "version": "1.0.0",
      "description": "",
      "main": "index.js",
      "scripts": {},
      "author": "",
      "dependencies": {
        "byline": "^5.0.0",
        "clf-parser": "0.0.2",
        "path": "^0.12.7",    "stream": "0.0.2"
      }
    }

  3. Execute the following command in the folder containing these files:> npm install
  4. After the installation completes, create a .zip file that includes the js file and the node_modules folder.
  5. Log in to your logging account.
  6. Upload your .zip file to the Lambda function. For Code entry type, choose Upload a .zip file.
  7. This Lambda function needs an appropriate service role with a trust relationship to S3. Choose Edit trust relationships and add amazonaws.com and lambda.amazonaws.com as trusted entities.
  8. Set up your IAM role with the following permissions: S3 Read Only permissions and Lambda Basic Execution. To grant the role the appropriate access, assign it to the Lambda function from the Lambda Execution Role section in the console.
  9. Set Environment variables for your Lambda function so it knows where to send the data. Add an endpoint and use the endpoint URL you created in Step 7. Add an index and enter your index name. Add a value for region and detail the Region where you deployed your application.

Step 9: Create an S3 trigger

After creating the Lambda function, create the event triggers on your S3 bucket to execute that function. This completes your log delivery pipeline to Amazon ES. This is a common pipeline architecture for streaming data from S3 into Amazon S3.

  1. Log in to your central logging account.
  2. Navigate to the S3 console, select your bucket, then open the Properties pane and scroll down to Events.
  3. Choose Add notification and name your new event s3toLambdaToEs.
  4. Under Events, select the check box for PUT. Leave Prefix and Suffix
  5. Under Send to, select Lambda Function, and enter the name of the Lambda function that you created in the previous step—in this example, myS3toES.
  6. Choose Save.

With this complete, Lambda should start sending data to your Elasticsearch index whenever you access your web application.

Step 10: Configure Amazon ES

Your pipeline now automatically adds data to your Elasticsearch cluster. Next, use Kibana to visualize the AWS WAF logs in the central logging account’s S3 bucket. This is the final step in assembling your forensic investigation architecture.

Kibana provides tools to create visualizations and dashboards that help your security teams view log data. Using the log data, you can filter by IP address to see how many times an IP address has hit your firewall each month. This helps you track usage anomalies and isolate potentially malicious IP addresses. You can use this information to add web ACL rules to your firewall that adds extra protection against those IP addresses.

Kibana produces visualizations like the following screenshot.

In addition to the Number of IPs over Time visualization, you can also correlate the IP address to its country of origin. Correlation provides even more precise filtering for potential web ACL rules to protect against attackers. The visualization for that data looks like the following image.

Elasticsearch setup

To set up and visualize your AWS WAF data, follow this How to analyze AWS WAF logs using Amazon Elasticsearch Service post. With this solution, you can investigate your global dataset instead of isolated Regions.

An alternative to Amazon ES

Amazon ES is an excellent tool for forensic work because it provides high-performance search capability for large datasets. However, Amazon ES requires cluster management and complex capacity planning for future growth. To get top-notch performance from Amazon ES, you must adequately scale it. With the more straightforward data of these investigations, you could instead work with more traditional SQL queries.

Forensic data grows quickly, so using a relational database means you might quickly outgrow your capacity. Instead, take advantage of AWS serverless technologies like AWS Glue, Athena, and Amazon QuickSight. These technologies enable forensic analysis without the operational overhead you would experience with Elasticsearch or a relational database. To learn more about this option, consult posts like How to extract, transform, and load data from analytic processing using AWS Glue and Work with partitioned data in AWS Glue.

Athena query

With your forensic tools now in place, you can use Athena to query your data and analyze the results. This lets you refine the data for your Kibana visualizations, or directly load it into Amazon QuickSight for additional visualization. Use the Athena console to experiment until you have the best query for your visual needs. Having the database in your AWS Glue Catalog means you can make ad hoc queries in Athena to inspect your data.

In the Athena console, create a new Query tab and enter the following query:

# SQL Query
SELECT date_format(from_unixtime("timestamp"/1000), '%Y-%m-%d %h:%i:%s') as event_date, client_ip, country, account_id, waf_name, region FROM "paritionedlogdata"."waf_logs_transformed" where year='2018' and month='12';

Replace <your-database-name> and <your-table-name> with the appropriate values for your environment. This query converts the numerical timestamp to an actual date format using the SQL according to Presto 0.176 documentation. It should return the following results.

You can see which IP addresses hit your environment the most over any period of time. In a production environment, you would run an ETL job to re-partition this data and transform it into a columnar format optimized for queries. If you would like more information about doing that, see the Best Practices When Using Athena with AWS Glue post.

Amazon QuickSight visualization

Now that you can query your data in Athena, you can visualize the results using Amazon QuickSight. First, grant Amazon QuickSight access to the S3 bucket where your Athena query results live.

  1. In the Amazon QuickSight console, log in.
  2. Choose Admin/username, Manage QuickSight.
  3. Choose Account settings, Security & permissions.
  4. Under QuickSight access to AWS services, choose Add or remove.
  5. Choose Amazon S3, then choose Select S3 buckets.
  6. Choose the output bucket for your central AWS WAF logs. Also, choose your Athena query results bucket. The query results bucket begins with aws-athena-query-results-*.

Amazon QuickSight can now access the data sources. To set up your visualizations, follow these steps:

  1. In the QuickSight console, choose Manage data, New data set.
  2. For Source, choose Athena.
  3. Give your new dataset a name and choose Validate connection.
  4. After you validate the connection, choose Create data source.
  5. Select Use custom SQL and give your SQL query a name.
  6. Input the same query that you used earlier in Athena, and choose Confirm query.
  7. Choose Import to SPICE for quicker analytics, Visualize.

Allow Amazon QuickSight several minutes. It alerts you after completing the import.

Now that you have imported your data into your analysis, you can apply a visualization:

  1. In Amazon QuickSight, New analysis.
  2. Select the last dataset that you created earlier and choose Create analysis.
  3. At the bottom left of the screen, choose Line Chart.
  4. Drag and drop event_date to the X-Axis
  5. Drag and drop client_ip to the ValueThis should create a visualization similar to the following image.
  6. Choose the right arrow at the top left of the visualization and choose Hide “other” categories.This should modify your visualization to look like the following image.

You can also map the countries from which the requests originate, allowing you to track global access anomalies. You can do this in QuickSight by selecting the “Points on map” visualization type and choosing the country as the data point to visualize.

You can also add a count of IP addresses to see if you have any unusual access patterns originating from specific IP addresses.

Conclusion

Although Amazon ES and Amazon QuickSight offer similar final results, there are trade-offs to the technical approaches that I highlighted. If your use case requires the analysis of data in real time, then Amazon ES is more suitable for your needs. If you prefer a serverless approach that doesn’t require capacity planning or cluster management, then the solution with AWS Glue, Athena, and Amazon QuickSight is more suitable.

In this post, I described an easy way to build operational dashboards that track key metrics over time. Doing this with AWS Glue, Athena, and Amazon QuickSight relieves the heavy lifting of managing servers and infrastructure. To monitor metrics in real time instead, the Amazon ES solution provides a way to do this with little operational overhead. The key here is the adaptability of the solution: putting different services together can provide different solutions to your problems to fit your exact needs.

For more information and use cases, see the following resources:

Hopefully, you have found this post informative and the proposed solutions intriguing. As always, AWS welcomes all feedback or comment.

 


About the Authors

Aaron Franco is a solutions architect at Amazon Web Services .

 

 

 

 

 

 

 

 

Detect fraudulent calls using Amazon QuickSight ML insights

Post Syndicated from Guy Ben Baruch original https://aws.amazon.com/blogs/big-data/detect-fraudulent-calls-using-amazon-quicksight-ml-insights/

The financial impact of fraud in any industry is massive. According to the Financial Times article Fraud Costs Telecoms Industry $17bn a Year (paid subscription required), fraud costs the telecommunications industry $17 billion in lost revenues every year.

Fraudsters constantly look for new technologies and devise new techniques. This changes fraud patterns and makes detection difficult. Companies commonly combat this with a rules-based fraud detection system. However, once the fraudsters realize their current techniques or tools are being identified, they quickly find a way around it. Also, rules-based detection systems tend to struggle and slow down with a lot of data. This makes it difficult to detect fraud and act quickly, resulting in loss of revenue.

Overview

There are several AWS services that implement anomaly detection and could be used to combat fraud, but lets focus on the following three:

When trying to detect fraud, there are two high-level challenges:

  • Scale – The amount of data to be analyzed. For example, each call generates a call detail record (CDR) event. These CDRs include many pieces of information such as originating and terminating phone numbers, and duration of call. Multiply these CDR events times the number of telephone calls placed each day and you can get an idea of the scale that operators must manage.
  • Machine learning knowledge and skill – The right set of skills to help solve business problems with machine learning. Developing these skills or hiring qualified data scientists with adequate domain knowledge is not simple.

Introducing Amazon QuickSight ML Insights

Amazon QuickSight is a fast, cloud-powered BI service that makes it easy for everyone in an organization to get business insights from their data through rich, interactive dashboards. With pay-per-session pricing and a dashboard that can be embedded into your applications, BI is now even more cost-effective and accessible to everyone.

However, as the volume of data that customers generate grows daily, it’s becoming more challenging to harness their data for business insights. This is where machine learning comes in. Amazon is a pioneer in using machine learning to automate and scale various aspects of business analytics in the supply chain, marketing, retail, and finance.

ML Insights integrates proven Amazon technologies into Amazon QuickSight to provide customers with ML-powered insights beyond visualizations.

  • ML-powered anomaly detection to help customers uncover hidden insights by continuously analyzing across billions of data points.
  • ML-powered forecasting and what-if analysis to predict key business metrics with point-and-click simplicity.
  • Auto-narratives to help customers tell the story of their dashboard in a plain-language narrative.

In this post, I demonstrate how a Telecom provider with little to no ML expertise can use Amazon QuickSight ML capabilities to detect fraudulent calls.

Prerequisites

To implement this solution, you need the following resources:

  • Amazon S3 to stage a ‘ribbon’ call detail record sample in a CSV format.
  • AWS Glue running an ETL job in PySpark.
  • AWS Glue crawlers to discover the schema of the tables and update the AWS Glue Data Catalog.
  • Amazon Athena to query the Amazon QuickSight dataset.
  • Amazon QuickSight to build visualizations and perform anomaly detection using ML Insights.

Diagram of fraudulent call-detecting architecture, using a PySpark script to prepare the data and transform it into Parquet and an AWS Glue crawler to build the AWS Glue Data Catalog.

The dataset

For this post, I use a synthetic dataset, thanks to Ribbon Communications. The data was generated by call test generators, and is not customer or sensitive data.

Inspecting the data

The example below is a typical CDR. The STOP CDR shown below is generated after a call has been terminated.


As you can see, there are a lot of values here. Most of them are not relevant for fraud identification or prevention.

Revenue shared fraud

Revenue shared fraud is one of the most common fraud schemes threatening the telecom industry today. It involves using fraudulent or stolen numbers to repeatedly call a premium rate B-number, who then shares the cash generated with the fraudster.

Say that you’d like to detect national and international revenue share fraud using Amazon QuickSight ML. Consider the typical traits of a revenue share fraud phone call. The pattern for revenue share fraud is multiple A-numbers calling the same B-number or a range of B-numbers with the same prefix. The call duration is usually higher than average and could be up to two hours, which is the maximum length of time international switches allow. Generally, the calls originate from one cell or a group of cells.

One SIM may make short test calls to a variety of B-numbers as a precursor to the fraud itself, which most often happens when the risk of detection is lowest, for example, Friday night, weekends, or holidays. Conference calling may be used to make several concurrent calls from one A-number.

Often, SIMs used for this type of fraud are sold or activated in bulk from the same distributor or group of distributors. SIMs could be topped up using fraudulent online or IVR payments, such as using stolen credit card numbers. Both PAYG credit and bundles may be used.Based on the above use case, the following pieces of information are most relevant to detecting fraud.

  • Call duration
  • Calling number (A number)
  • Called number (B number)
  • Start time of the call
  • Accounting ID

You can use this reference to help identify those fields in a CDR.

Figure 2: Decoded CDR data, highlighting the relevant fields.

I identified the columns that I need out of 235 columns in the CDR.

Inspecting the raw sample data, I quickly see that it’s missing a header.

To make life easier, I converted the raw CSV data, added the column names, and converted to Parquet.

Discovering the data

In the AWS Glue console, set up a crawler and name it CDR_CRAWLER.

Point the crawler to s3://telco-dest-bucket/blog where the Parquet CDR data resides.

Next, create a new IAM role to be used by the AWS Glue crawler.

For Frequency, leave the default definition of Run on Demand.

Next, choose Add database and define the name of the database. This database contains the table discovered by the AWS Glue crawler.

Choose next and review the crawler settings. When you’re satisfied, choose Finish.

Next, choose Crawlers, select the crawler that you just created (CDR_CRAWLER), and choose Run crawler.

The AWS Glue crawler starts crawling the database. This can take one minute or more to complete.

When it’s complete, under Data catalog, choose Databases.  You should be able to see the new database created by the AWS Glue crawler. In this case, the name of the database is blog.

To view the tables created under this database, select the relevant database and choose Tables. The crawler’s table also points to the location of the Parquet format CDRs.

To see the table’s schema, select the table created by the crawler.

Data preparation

You have defined the relevant dimensions to use in the ML model to detect fraud. Now, you can use a PySpark script that I built earlier using an Amazon SageMaker notebook and an AWS Glue endpoint. The script covers the following tasks:

  • Reduce the dataset and focus only on the relevant columns.
  • Create a timestamp column, which you need for creating an analysis using Amazon QuickSight.
  • Transform files from CSV to Parquet for improved performance.

You can run the PySpark script on the raw CSV format of the CDRs that you are using. Here is the location of the raw CSV format:

s3:/telco-source-bucket/machine-learning-for-all/v1.0.0/data/cdr-stop/cdr_stop.csv

Here is the PySpark script that I created.

import sys    
from awsglue.transforms import *    
from awsglue.utils import getResolvedOptions    
from pyspark.context import SparkContext    
from awsglue.context import GlueContext    
from awsglue.job import Job    
import pyspark.sql.functions as fn    
from awsglue.dynamicframe import DynamicFrame    
    
    
sc = SparkContext.getOrCreate()    
glueContext = GlueContext(sc)    
spark = glueContext.spark_session    
    
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "demo_ml", table_name = "cdr_stop_csv", transformation_ctx = "datasource0")    
#apply mapping from source table to destination , we pick only the relevant columns     
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("col2", "string", "Accounting_ID", "string"), ("col13", "long", "Call_service_duration", "long"), ("col5", "string", "Start_Time_(MM/DD/YYYY)", "string"), ("col6", "string", "Start_Time_(HH/MM/SSs)", "string"), ("col19", "long", "Calling number", "string"), ("col20", "long", "Called number", "string")], transformation_ctx = "applymapping1")    
    
resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_struct", transformation_ctx = "resolvechoice2")    
resolvechoice2.printSchema()    
    
resolvechoice3 = ResolveChoice.apply(frame = resolvechoice2, choice = "MATCH_CATALOG", database = "demo_ml", table_name = "cdr_stop_csv", transformation_ctx = "resolvechoice3")    
resolvechoice3.printSchema()    
    
customDF = resolvechoice3.toDF()    
#create timestamp column    
customDF = customDF.withColumn('timestamp', fn.concat(fn.col("Start_Time_(MM/DD/YYYY)"),fn.lit(" "),fn.col("Start_Time_(HH/MM/SSs)")))    
    
#create timestamp2 column which is a substring of timestamp column    
customDF = customDF.withColumn('timestamp2',fn.substring(fn.col("timestamp"),1,19))    
#create Date column     
customDF =customDF.withColumn("Date",fn.unix_timestamp(fn.col("timestamp2"),"MM/dd/yyyy HH:mm:ss").cast("timestamp"))    
    
#remove temporary fields     
customDF = customDF.drop('timestamp','timestamp2')    
    
customDynamicFrame = DynamicFrame.fromDF(customDF, glueContext, "customDF_df")    
#transform to parquet format and land in S3 path    
datasink4 = glueContext.write_dynamic_frame.from_options(frame = customDynamicFrame, connection_type = "s3", connection_options = {"path": "s3://telco-dest-bucket/blog/"}, format = "parquet", transformation_ctx = "datasink4")    

The dataset has been cataloged in AWS Glue Data Catalog and is queryable using Athena.

Amazon QuickSight and anomaly detection

Next, build out anomaly detection using Amazon QuickSight. To get started, follow these steps.

  1. In the Amazon QuickSight console, choose new analysis.
  2. click on create new data set
  3. select Athena
  4. enter a data source name
  5. click on create data source
  6. select from the drop down list the relevant database and table that were created by the AWS Glue crawlers and click on select
  7. select directly query your data and click visualize

Visualizing the data using Amazon QuickSight

  1. Under visual types, choose Line chart.
  2. Drag call_service_duration to the Value field well.
  3. Drag timestamp_new to the X axis field well.

Amazon QuickSight generates a dashboard, as in the following screenshot.

The x-axis is the timestamp. By default, it’s based on the aggregates of one day. This can be changed by choosing a different value.

Because I currently define the timestamp to look on one-day aggregations, the call duration is a sum of all call durations from all call records within a day. I can begin the search by looking for days where the total call duration is high.

Anomaly detection

Now look at how to start using the ML insights anomaly detection feature.

  1. On the top of the Insights panel, choose Add anomaly to sheet. This creates an insights visual for anomaly detection.
  2. On the top of the screen, choose Field Wells and add at least one field to the Categories, as in the following example. I added the calling/called number, as those become relevant for fraud use cases; for example, one A-number calling multiple B-numbers or multiple A-numbers calling B-numbers.
    The categories represent the dimensional values by which Amazon QuickSight splits the metric. For example, you can analyze anomalies on sales across all product categories and product SKUs—assuming there are 10 product categories, each with 10 product SKUs. Amazon QuickSight splits the metric by the 100 unique combinations and runs anomaly detection on each of the split metric.
  3. To configure the anomaly detection job, choose Get Started.
  4. On the anomaly detection configuration screen, set up the following options:
  • Analyze all combinations of these categories—By default, if you have selected three categories, Amazon QuickSight runs anomaly detection on the following combinations hierarchically: A, AB, ABC. If you select this option, QuickSight analyzes all combinations: A, AB, ABC, BC, AC. If your data is not hierarchical, check this option.
  • Schedule—Set this option to run anomaly detection on your data hourly, daily, weekly, or monthly, depending on your data and needs. For Start schedule on and Timezone, enter values and choose OK.Important: The schedule does not take effect until you publish the analysis as a dashboard. Within the analysis, you have the option to run the anomaly detection manually (without the schedule).Contribution analysis on anomaly – You can select up to four additional dimensions for Amazon QuickSight to analyze the top contributors when an anomaly is detected. For example, Amazon QuickSight can show you the top customers that contributed to a spike in sale. In my current example, I added one additional dimension: the accounting ID. If you think about a telecom fraud case, you can also consider fields like charging time or cell ID as additional dimensions.
  1. After setting the configuration, choose Run Now to execute the job manually, which includes the “Detecting anomalies… This may take a while…” message. Depending on the size of your dataset, this may take a few minutes or up to an hour.
  2. When the anomaly detection job is complete, any anomalies are called out in the insights visual. By default, only the top anomalies for the latest time period in the data are shown in the insights visuals.

    Anomaly detection reveals several B numbers being called from multiple A numbers with a high call service duration on August 29, 2018. That looks interesting!
  3. To explore all anomalies for this insight, select the menu on the top-right corner of the visual and choose Explore Anomalies.
  4. On the Anomalies detailed page, you can see all the anomalies for the latest period.
    In the view, you can see that two anomalies were detected, showing two time series.The title of the visuals represents the metric that is run on the unique combination of the categorical fields. In this case:
  • [All] | 9645000024
  • 3512000024 | [ALL]So the system detected anomalies for multiple A-numbers calling 9645000024, and 351200024 calling multiple B numbers. In both cases, it observed a high call duration. The labeled data point on the chart represents the most recent anomaly that is detected for that time series.
  1. To expose a date picker, choose show anomalies by date at the top-right corner. This chart shows the number of anomalies that were detected for each day (or hour, depending on your anomaly detection configuration). You can select a particular day to see the anomalies detected for that day.For example, selecting August 10, 2018 on the top chart shows the anomalies for that day:

    Important:
    The first 32 points in the dataset are used for training and are not scored by the anomaly detection algorithm. You may not see any anomalies on the first 32 data points.You can expand the filter controls on the top of the screen. With the filter controls, you can change the anomaly threshold to show high, medium, or low significance anomalies. You can choose to show only anomalies that are higher than expected or lower than expected. You can also filter by the categorical values that are present in your dataset to look at anomalies only for those categories.
  2. Look at the contributors columns. When you configured the anomaly detection, you defined the accounting ID as another dimension. If this were real call traffic instead of practice data, you would be able to single out specific accounting IDs that contribute to the anomaly.
  3. When you’re done, choose Back to analysis.

Summary

In this post, I explored a common fraud pattern called shared revenue fraud. I looked at how to extract the relevant data for training the anomaly detection model in Amazon QuickSight. I then used this data to detect anomalies based on call duration, calling party, and called party, looking at additional contributors like Accounting ID. The entire process used serverless technologies and little to no machine learning experience.

For more information about options and strategies, see Amazon QuickSight Announces General Availability of ML Insights.

If you have questions or suggestions, please comment below.

 


About the Author

Guy Ben Baruch is a solutions architect with Amazon Web Services.

 

 

 

Introducing Amazon QuickSight fine-grained access control over Amazon S3 and Amazon Athena

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/introducing-amazon-quicksight-fine-grained-access-control-over-amazon-s3-and-amazon-athena/

Today, AWS is excited to announce the availability of fine-grained access control for AWS Identity and Access Management (IAM)-permissioned resources in Amazon QuickSight. Fine-grained access control allows Amazon QuickSight account administrators to control authors’ default access to connected AWS resources. Fine-grained access control enables administrators to use IAM policies to scope down access permissions, limiting specific authors’ access to specific items within the AWS resources. Administrators can now apply this new level of access control to Amazon S3, Amazon Athena, and Amazon RDS/Redshift database discovery.

Fine-Grained Access Control Setup

Here’s how fine-grained access control works in Amazon QuickSight:

Imagine an AWS customer, Acme. Acme’s account contains three S3 buckets, called b1, b2, and b3. Amazon QuickSight is configured to read data via Athena. Assume that Acme’s administrators have configured Amazon QuickSight service role permissions with access to all three buckets. The new default access setting introduced today enables administrators to limit access to Acme’s data for all users by default. Administrators grant access to specific buckets (b1, b2, b3) to individual users or groups within Acme via IAM policies.

In the following example, the policies assigned to Acme users A and B, and Group X grant them access to buckets 1, 2, and 3. Group Y is not assigned, as shown in the following diagram.

When User A attempts to read data via Athena from bucket 1, AWS evaluates the IAM policy associated with the user. Since the policy assigned to User A grants access to bucket 1, the query succeeds. User A can access data from bucket 1. Similarly, User B and users in group X can access data in buckets 2 and 3, respectively.

However, when a user from group Y tries to access bucket 2, QuickSight doesn’t allow any access to data. Remember, group Y has no user-level assignments. Users from group Y are denied access to bucket 2 because Amazon QuickSight requires explicit permissions to access data, as shown in the following diagram.

In an S3 data lake, Amazon QuickSight enables administrators to restrict each author’s data access using IAM policies. (Other AWS mechanisms, outside of Amazon QuickSight, provide for the modification of the policies themselves.) This fine-grained access control relies on the underlying IAM policy evaluation infrastructure. Currently, there is a ten-policy limit on policies that may be linked together at runtime to evaluate the user’s permissions.

Amazon QuickSight also offers an updated UI for AWS permissions management. Administrators can now access an account’s default resource setting, as shown in the following screenshot:

You can set Default resource access to Allow or Deny, based on the administration model. Choose the appropriate option and press Update. (Or, if you decide not to proceed, press Cancel.)

As before, you can specify AWS resource permissions for the Amazon QuickSight account through the new UI. Select resources from the checkboxes on the right side of the screen, as shown in the following screenshot:

And, as shown in the next screenshot, S3 access can be granted to buckets within the AWS account or other AWS accounts:

To control access for specific users or groups, use the newly introduced fine-grained access control feature, illustrated in this screenshot:

The IAM policy assignments button leads you to a page displaying all assignments in the account and lets you create a new assignment, as shown in the following screenshot.

Creating a new policy assignment involves only two steps:

  1. Pick from an IAM policy from those on the AWS account list.
  2. Assign to specific users or groups.

If you haven’t yet configured groups in Amazon QuickSight, you can do so using AWS APIs for accounts using SSO or Amazon QuickSight-native credentials. Groups are also natively available in an AD-connected account.

Fine-grained access control in Amazon QuickSight, when combined with data in S3 and Athena, allows you to set up a secure environment for data exploration across the organization. This feature is available in Amazon QuickSight Enterprise Edition in all supported AWS Regions starting today.

 


About the Author

Jose Kunnackal is a principal product manager for Amazon QuickSight.

 

 

 

Amazon QuickSight updates: Multiple sheets in dashboards, axis label orientation options, and more

Post Syndicated from Sahitya Pandiri original https://aws.amazon.com/blogs/big-data/amazon-quicksight-updates-multiple-sheets-in-dashboards-axis-label-orientation-options-and-more/

Today, we are pleased to announce a set of updates to Amazon QuickSight:

  • Richer dashboards with multiple sheets in your regular and embedded dashboards
  • Multiple axis label orientation options for better readability of dashboards
  • More calculations such as standard deviation, variance and conditional string functions on SPICE
  • Enhanced URL actions for supporting a broader set of interaction scenarios
  • One-click duplication of visuals for faster authoring

Multiple sheets in Dashboards

First, let’s look at the update to dashboards that allows multiple sheets, accessed via a new tab control. This allows better organization of information within the dashboard, where visuals related to specific subject areas or topics can be organized in separate sheets and distinctly identified through the tab name. As a reader, you can then navigate to a single dashboard to get a comprehensive view of all the insights related to a topic.

Each sheet within a dashboard has a distinct URL, which allows you to bookmark the specific sheet, or share a link to the specific sheet with others. Additionally, dashboards can also be setup to filter content at a dashboard level, or at a per-sheet level, with the filters being passed either via on-screen controls or using the URL.

Filtering through on-screen controls

Navigating to another dashboard and passing filters via URL

If you are authoring a dashboard, you will find that every dashboard now starts off as a single sheet. You can use the “add sheet” icon in the tabs section to add a new sheet to your dashboard. Note that the tab control only shows up on a dashboard if there is more than one sheet. When authoring, you will also be able to rename a sheet, change the order of sheets or delete a sheet. The first sheet will always be the one presented to your readers by default, so this would be where you want to provide a summary of the dashboard with KPIs, ML-Insights, or high level trends. Email reports will send the first sheet of the dashboard to all your viewers; we are working on additional rendering options for email reports.

Previously, on the Big Data Blog, we covered ways to add parameters and on-screen controls to dashboards. Parameter values are global across a dashboard, which allow you to maintain user inputs and filter content according to user selections across the dashboard. On-screen controls are specific to each sheet, since the context of these controls is often confined to the sheet itself. If you want to discreetly filter data based on user or group-level attributes, you may also use row-level security to do so.

Multi-sheet dashboards can be shared through the regular website, or can be embedded in your custom application. To embed a dashboard in your application, visit Embed interactive dashboards in your application with Amazon QuickSight.

Axis label orientation

When authoring a dashboard, you will see that Amazon QuickSight now automatically selects a horizontal or angled layout to provide better readability. You can also manually navigate to a visual’s options menu and choose “Format visual”, and then choose the axis label orientation that is most suited to your dashboard.

Duplicate visual option

When authoring a dashboard, there are often situations where you want to quickly clone a visual and make changes. With the new “Duplicate visual” option, Amazon QuickSight allows you to achieve this via a one-click action from the visual’s options menu. This new option creates a copy of the visual, along with its filters and other settings such as visual size and format, within the same sheet or a different sheet within the dashboard, thus removing the overhead of creating visuals from scratch each time.

URL action open options

URL actions allow you to configure actions outside of QuickSight that can be triggered by a user viewing a QuickSight dashboard. With today’s release we will allow you to choose how these URLs are opened, with options of overriding the viewer’s QuickSight browser tab, loading in a new tab, or a new browser window altogether.

In addition to these features that improve author and reader experience, QuickSight now supports additional statistical capabilities through new functions in calculated fields. These statistical functions include standard deviation and variance calculations and conditional functions on string fields. With standard deviation and variance, you can now determine dispersion of a metric relative to the mean. You can do this across your entire data set or partition by a select dimension. For more information, see Functions by Category in the Amazon QuickSight User Guide.

With conditional string functions on SPICE, you can now use coalesce, ifelse, isNotNull, isNull, nullif while building reports on SPICE datasets. For more information, see Calculated Field Function and Operator Reference for Amazon QuickSight in the Amazon QuickSight user guide.

All of these new features are now available in both Standard and Enterprise Editions in all supported Amazon QuickSight regions – US East (N. Virginia and Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Singapore, Sydney and Tokyo).


About the Author

Sahitya Pandiri is a technical program manager with Amazon Web Services. Sahitya has been in the product/program management for 5 years now, and has built multiple products in the retail, healthcare and analytics spaces. She enjoys problem solving, and leveraging technology to simplify processes.

 

 

 

Enabling serverless security analytics using AWS WAF full logs, Amazon Athena, and Amazon QuickSight

Post Syndicated from Umesh Ramesh original https://aws.amazon.com/blogs/security/enabling-serverless-security-analytics-using-aws-waf-full-logs/

Traditionally, analyzing data logs required you to extract, transform, and load your data before using a number of data warehouse and business intelligence tools to derive business intelligence from that data—on top of maintaining the servers that ran behind these tools.

This blog post will show you how to analyze AWS Web Application Firewall (AWS WAF) logs and quickly build multiple dashboards, without booting up any servers. With the new AWS WAF full logs feature, you can now log all traffic inspected by AWS WAF into Amazon Simple Storage Service (Amazon S3) buckets by configuring Amazon Kinesis Data Firehose. In this walkthrough, you’ll create an Amazon Kinesis Data Firehose delivery stream to which AWS WAF full logs can be sent, and you’ll enable AWS WAF logging for a specific web ACL. Then you’ll set up an AWS Glue crawler job and an Amazon Athena table. Finally, you’ll set up Amazon QuickSight dashboards to help you visualize your web application security. You can use these same steps to build additional visualizations to draw insights from AWS WAF rules and the web traffic traversing the AWS WAF layer. Security and operations teams can monitor these dashboards directly, without needing to depend on other teams to analyze the logs.

The following architecture diagram highlights the AWS services used in the solution:

Figure 1: Architecture diagram

Figure 1: Architecture diagram

AWS WAF is a web application firewall that lets you monitor HTTP and HTTPS requests that are forwarded to an Amazon API Gateway API, to Amazon CloudFront or to an Application Load Balancer. AWS WAF also lets you control access to your content. Based on conditions that you specify—such as the IP addresses from which requests originate, or the values of query strings—API Gateway, CloudFront, or the Application Load Balancer responds to requests either with the requested content or with an HTTP 403 status code (Forbidden). You can also configure CloudFront to return a custom error page when a request is blocked.

Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk. With Kinesis Data Firehose, you don’t need to write applications or manage resources. You configure your data producers to send data to Kinesis Data Firehose, and it automatically delivers the data to the destination that you specified. You can also configure Kinesis Data Firehose to transform your data before delivering it.

AWS Glue can be used to run serverless queries against your Amazon S3 data lake. AWS Glue can catalog your S3 data, making it available for querying with Amazon Athena and Amazon Redshift Spectrum. With crawlers, your metadata stays in sync with the underlying data (more details about crawlers later in this post). Amazon Athena and Amazon Redshift Spectrum can directly query your Amazon S3 data lake by using the AWS Glue Data Catalog. With AWS Glue, you access and analyze data through one unified interface without loading it into multiple data silos.

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Amazon QuickSight is a business analytics service you can use to build visualizations, perform one-off analysis, and get business insights from your data. It can automatically discover AWS data sources and also works with your data sources. Amazon QuickSight enables organizations to scale to hundreds of thousands of users and delivers responsive performance by using a robust in-memory engine called SPICE.

SPICE stands for Super-fast, Parallel, In-memory Calculation Engine. SPICE supports rich calculations to help you derive insights from your analysis without worrying about provisioning or managing infrastructure. Data in SPICE is persisted until it is explicitly deleted by the user. SPICE also automatically replicates data for high availability and enables Amazon QuickSight to scale to hundreds of thousands of users who can all simultaneously perform fast interactive analysis across a wide variety of AWS data sources.

Step one: Set up a new Amazon Kinesis Data Firehose delivery stream

  1. In the AWS Management Console, open the Amazon Kinesis Data Firehose service and choose the button to create a new stream.
    1. In the Delivery stream name field, enter a name for your new stream that starts with aws-waf-logs- as shown in the screenshot below. AWS WAF filters all streams starting with the keyword aws-waf-logs when it displays the delivery streams. Note the name of your stream since you’ll need it again later in the walkthrough.
    2. For Source, choose Direct PUT, since AWS WAF logs will be the source in this walkthrough.

      Figure 2: Select the delivery stream name and source

      Figure 2: Select the delivery stream name and source

  2. Next, you have the option to enable AWS Lambda if you need to transform your data before transferring it to your destination. (You can learn more about data transformation in the Amazon Kinesis Data Firehose documentation.) In this walkthrough, there are no transformations that need to be performed, so for Record transformation, choose Disabled.
    Figure 3: Select "Disabled" for record transformations

    Figure 3: Select “Disabled” for record transformations

    1. You’ll have the option to convert the JSON object to Apache Parquet or Apache ORC format for better query performance. In this example, you’ll be reading the AWS WAF logs in JSON format, so for Record format conversion, choose Disabled.

      Figure 4: Choose "Disabled" to not convert the JSON object

      Figure 4: Choose “Disabled” to not convert the JSON object

  3. On the Select destination screen, for Destination, choose Amazon S3.
    Figure 5: Choose the destination

    Figure 5: Choose the destination

    1. For the S3 destination, you can either enter the name of an existing S3 bucket or create a new S3 bucket. Note the name of the S3 bucket since you’ll need the bucket name in a later step in this walkthrough.
    2. For Source record S3 backup, choose Disabled, because the destination in this walkthrough is an S3 bucket.

      Figure 6: Enter the S3 bucket name, and select "Disabled" for the Source record S3 backup

      Figure 6: Enter the S3 bucket name, and select “Disabled” for the source record S3 backup

  4. On the next screen, leave the default conditions for Buffer size, Buffer interval, S3 compression and S3 encryption as they are. However, we recommend that you set Error logging to Enabled initially, for troubleshooting purposes.
    1. For IAM role, select Create new or choose. This opens up a new window that will prompt you to create firehose_delivery_role, as shown in the following screenshot. Choose Allow in this window to accept the role creation. This grants the Kinesis Data Firehose service access to the S3 bucket.

      Figure 7: Select "Create new or choose" for IAM Role

      Figure 7: Select “Allow” to create the IAM role “firehose_delivery_role”

  5. On the last step of configuration, review all the options you’ve chosen, and then select Create delivery stream. This will cause the delivery stream to display as “Creating” under Status. In a couple of minutes, the status will change to “Active,” as shown in the below screenshot.

    Figure 8: Review the options you selected

    Figure 8: Review the options you selected

Step two: Enable AWS WAF logging for a specific Web ACL

  1. From the AWS Management Console, open the AWS WAF service and choose Web ACLs. Open your Web ACL resource, which can either be deployed on a CloudFront distribution or on an Application Load Balancer.
    1. Choose the Web ACL for which you want to enable logging. (In the below screenshot, we’ve selected a Web ACL in the US East Region.)
    2. On the Logging tab, choose Enable Logging.

      Figure 9: Choose "Enable Logging"

      Figure 9: Choose “Enable Logging”

  2. The next page displays all the delivery streams that start with aws-waf-logs. Choose the Amazon Kinesis Data Firehose delivery stream that you created for AWS WAF logs at the start of this walkthrough. (In the screenshot below, our example stream name is “aws-waf-logs-us-east-1)
    1. You can also choose to redact certain fields that you wish to exclude from being captured in the logs. In this walkthrough, you don’t need to choose any fields to redact.
    2. Select Create.

      Figure 10: Choose your delivery stream, and select "Create"

      Figure 10: Choose your delivery stream, and select “Create”

After a couple of minutes, you’ll be able to inspect the S3 bucket that you defined in the Kinesis Data Firehose delivery stream. The log files are created in directories by year, month, day, and hour.

Step three: Set up an AWS Glue crawler job and Amazon Athena table

The purpose of a crawler within your Data Catalog is to traverse your data stores (such as S3) and extract the metadata fields of the files. The output of the crawler consists of one or more metadata tables that are defined in your Data Catalog. When the crawler runs, the first classifier in your list to successfully recognize your data store is used to create a schema for your table. AWS Glue provides built-in classifiers to infer schemas from common files with formats that include JSON, CSV, and Apache Avro.

  1. In the AWS Management Console, open the AWS Glue service and choose Crawler to setup a crawler job.
  2. Choose Add crawler to launch a wizard to setup the crawler job. For Crawler name, enter a relevant name. Then select Next.

    Figure 11: Enter "Crawler name," and select "Next"

    Figure 11: Enter “Crawler name,” and select “Next”

  3. For Choose a data store, select S3 and include the path of the S3 bucket that stores your AWS WAF logs, which you made note of in step 1.3. Then choose Next.

    Figure 12: Choose a data store

    Figure 12: Choose a data store

  4. When you’re given the option to add another data store, choose No.
  5. Then, choose Create an IAM role and enter a name. The role grants access to the S3 bucket for the AWS Glue service to access the log files.

    Figure 13: Choose "Create an IAM role," and enter a name

    Figure 13: Choose “Create an IAM role,” and enter a name

  6. Next, set the frequency to Run on demand. You can also schedule the crawler to run periodically to make sure any changes in the file structure are reflected in your data catalog.

    Figure 14: Set the "Frequency" to "Run on demand"

    Figure 14: Set the “Frequency” to “Run on demand”

  7. For output, choose the database in which the Athena table is to be created and add a prefix to identify your table name easily. Select Next.

    Figure 15: Choose the database, and enter a prefix

    Figure 15: Choose the database, and enter a prefix

  8. Review all the options you’ve selected for the crawler job and complete the wizard by selecting the Finish button.
  9. Now that the crawler job parameters are set up, on the left panel of the console, choose Crawlers to select your job and then choose Run crawler. The job creates an Amazon Athena table. The duration depends on the size of the log files.

    Figure 16: Choose "Run crawler" to create an Amazon Athena table

    Figure 16: Choose “Run crawler” to create an Amazon Athena table

  10. To see the Amazon Athena table created by the AWS Glue crawler job, from the AWS Management Console, open the Amazon Athena service. You can filter by your table name prefix.
      1. To view the data, choose Preview table. This displays the table data with certain fields showing data in JSON object structure.
    Figure 17: Choose "Preview table" to view the data

    Figure 17: Choose “Preview table” to view the data

Step four: Create visualizations using Amazon QuickSight

  1. From the AWS Management Console, open Amazon QuickSight.
  2. In the Amazon QuickSight window, in the top left, choose New Analysis. Choose New Data set, and for the data source choose Athena. Enter an appropriate name for the data source name and choose Create data source.

    Figure 18: Enter the "Data source name," and choose "Create data source"

    Figure 18: Enter the “Data source name,” and choose “Create data source”

  3. Next, choose Use custom SQL to extract all the fields in the JSON object using the following SQL query:
    
        ```
        with d as (select
        waf.timestamp,
            waf.formatversion,
            waf.webaclid,
            waf.terminatingruleid,
            waf.terminatingruletype,
            waf.action,
            waf.httpsourcename,
            waf.httpsourceid,
            waf.HTTPREQUEST.clientip as clientip,
            waf.HTTPREQUEST.country as country,
            waf.HTTPREQUEST.httpMethod as httpMethod,
            map_agg(f.name,f.value) as kv
        from sampledb.jsonwaflogs_useast1 waf,
        UNNEST(waf.httprequest.headers) as t(f)
        group by 1,2,3,4,5,6,7,8,9,10,11)
        select d.timestamp,
            d.formatversion,
            d.webaclid,
            d.terminatingruleid,
            d.terminatingruletype,
            d.action,
            d.httpsourcename,
            d.httpsourceid,
            d.clientip,
            d.country,
            d.httpMethod,
            d.kv['Host'] as host,
            d.kv['User-Agent'] as UA,
            d.kv['Accept'] as Acc,
            d.kv['Accept-Language'] as AccL,
            d.kv['Accept-Encoding'] as AccE,
            d.kv['Upgrade-Insecure-Requests'] as UIR,
            d.kv['Cookie'] as Cookie,
            d.kv['X-IMForwards'] as XIMF,
            d.kv['Referer'] as Referer
        from d;
        ```        
        

  4. To extract individual fields, copy the previous SQL query and paste it in the New custom SQL box, then choose Edit/Preview data.
    Figure 19: Paste the SQL query in "New custom SQL query"

    Figure 19: Paste the SQL query in “New custom SQL query”

    1. In the Edit/Preview data view, for Data source, choose SPICE, then choose Finish.

      Figure 20: Choose "Spice" and then "Finish"

      Figure 20: Choose “Spice” and then “Finish”

  5. Back in the Amazon Quicksight console, under the Fields section, select the drop-down menu and change the data type to Date.

    Figure 21: In the Amazon Quicksight console, change the data type to "Date"

    Figure 21: In the Amazon Quicksight console, change the data type to “Date”

  6. After you see the Date column appear, enter an appropriate name for the visualizations at the top of the page, then choose Save.

    Figure 22: Enter the name for the visualizations, and choose "Save"

    Figure 22: Enter the name for the visualizations, and choose “Save”

  7. You can now create various visualization dashboards with multiple visual types by using the drag-and-drop feature. You can drag and drop combinations of fields such as Action, Client IP, Country, Httpmethod, and User Agents. You can also add filters on Date to view dashboards for a specific timeline. Here are some sample screenshots:
    Figure 23: Visualization dashboard samples

    Figure 23a: Visualization dashboard samples

    Figure 23: Visualization dashboard samples

    Figure 23b: Visualization dashboard samples

    Figure 23: Visualization dashboard samples

    Figure 23c: Visualization dashboard samples

    Figure 23: Visualization dashboard samples

    Figure 23d: Visualization dashboard samples

Conclusion

You can enable AWS WAF logs to Amazon S3 buckets and analyze the logs while they are being streamed by configuring Amazon Kinesis Data Firehose. You can further enhance this solution by automating the streaming of data and using AWS Lambda for any data transformations based on your specific requirements. Using Amazon Athena and Amazon QuickSight makes it easy to analyze logs and build visualizations and dashboards for executive leadership teams. Using these solutions, you can go serverless and let AWS do the heavy lifting for you.

Author photo

Umesh Kumar Ramesh

Umesh is a Cloud Infrastructure Architect with Amazon Web Services. He delivers proof-of-concept projects, topical workshops, and lead implementation projects to various AWS customers. He holds a Bachelor’s degree in Computer Science & Engineering from National Institute of Technology, Jamshedpur (India). Outside of work, Umesh enjoys watching documentaries, biking, and practicing meditation.

Author photo

Muralidhar Ramarao

Muralidhar is a Data Engineer with the Amazon Payment Products Machine Learning Team. He has a Bachelor’s degree in Industrial and Production Engineering from the National Institute of Engineering, Mysore, India. Outside of work, he loves to hike. You will find him with his camera or snapping pictures with his phone, and always looking for his next travel destination.

Amazon QuickSight Announces General Availability of ML Insights

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/amazon-quicksight-announces-general-availability-of-ml-insights/

At re:Invent 2018, we announced the preview of ML Insights, a set of out-of-the-box machine learning and natural language features that provide Amazon QuickSight users with business insights beyond visualization. Today, we are announcing the general availability of ML Insights.

As the volume of data that customers generate continues to grow every day, it’s becoming more challenging to harness that data for business insights. This is where machine learning comes into play. Amazon is a pioneer in using machine learning to automate and scale various aspects of business analytics.

With new ML Insights features, Amazon QuickSight can help you discover hidden data trends, identify key business drivers, forecast future results, and summarize your data in easy-to-read, natural language narratives, saving hours of manual analysis and investigation. You can build comprehensive BI solutions that integrate out-of-the-box machine learning with the analytical richness of Amazon QuickSight and distribute interactive dashboards to everyone in your organization. ML Insights makes machine learning easy, allowing anyone regardless of their technical and ML skillset to easily get insights from their data in minutes rather than weeks. ML Insights features include:

  • ML-powered anomaly detection to uncover hidden insights by continuously analyzing billions of data points.
  • ML-powered forecasting to predict growth and business trends with point-and-click simplicity.
  • Auto-narratives to tell customers the story of their dashboard using plain-language narratives.

Check out this video to get a quick overview of ML Insights:

To get you started with ML Insights, this blog post will walk you through new ML-powered capabilities.

Customer use cases

During the past three months of ML Insights preview availably, customers from a broad range of industries, including telecommunication, entertainment, marketing, retail, energy, financial services, and healthcare, have used ML Insights to harness their growing volume of data on AWS and on-premises for business insights. Here are some of the cool things customers are doing with ML Insights:

Expedia Group is the world’s travel platform, and its purpose is to bring the world within reach.

“At Expedia Group two of our key strategic imperatives are to be customer centric and locally relevant on a global basis. This is why tools such as Amazon QuickSight are so helpful in making it easier to measure, report, and act on our business metrics to help our customers find the best matches for their travel searches. Amazon QuickSight’s out-of-the-box machine learning insights help us to continuously monitor our business for anomalies, alert stakeholders when outliers occur, and help our business project future trends, which in turn allows teams to focus on other priorities instead of building out these capabilities from scratch.”

Amit Marwah, Director of Technology, Flights Data & Analytics, Expedia Group

Ricoh Company, Ltd., is a global corporation that provides imaging equipment for offices, production print solutions, document management systems, IT services, and more, for approximately 200 countries and regions throughout the world.

“Machine learning is becoming more important than ever to meet our growing data volume and BI needs. Amazon QuickSight’s ML Insights functionality makes powerful machine learning easy to use in just a few clicks. It allows us to continuously monitor for unexpected usage behaviors in our fleet of smart devices worldwide, forecast usage trends and deliver comprehensive dashboards that incorporate these machine generated insights as auto-narratives to our line of business users. With ML Insights, we can quickly pinpoint and take actions on anomalies down to the specific devices and features, to improve the experience and add value to our customers.”

Naoki Umehara, Group Leader, Ricoh Company, Ltd.

Tata Consultancy Services Limited is an Indian multinational information technology (IT) service and consulting company headquartered in Mumbai, Maharashtra, with international presence in 46 countries.

“Amazon QuickSight allows us to quickly and easily integrate our Amazon Connect contact center metrics with our client ticketing tools in order to deliver interactive and automatic dashboards that our customers love. We revolutionized our staffing, training, and outage reporting with Amazon QuickSight ML Insights, predicting where the call flow is moving and react accordingly in order to prevent call spikes and provide a better service to our customers.”

Marco David Martinez, Cloud Manager, Tata Consultancy Services

Siemens is a global powerhouse focusing on electrification, automation, and digitalization. The company is also a leading supplier of power generation and transmission systems.

“Amazon QuickSight’s out-of-the-box ML Insights and usage-based pricing make it easy and cost effective to deliver robust machine-learning-based anomaly detection for our customers to analyze performance of their manufacturing process, detect faults in the production line, monitor downtime duration across hundreds of machineries, and understand the root cause of the failures — without heavy investment machine learning and custom development. This allows line supervisors and production managers to receive automated alerts on unexpected events and take actions to optimize the manufacturing process and improve performance.”

Massimilliano Ponticelli, Product Manager, Siemens

Daiso Industries Co., Ltd. is a global retailer and franchise of 100-yen shops founded in Japan.

“With Amazon QuickSight, we were able to build out our BI environment that handles the data of 5,000 stores x 70,000 products in 2 months. Precise sales forecast and inventory optimization are the most important challenges in our business. Amazon QuickSight’s ML Insights allow us to easily and quickly identify unexpected trend changes across our products and improve sales forecast and inventory optimization.”

Kenjiro Marumoto, Section Chief, Daiso Industries Co., Ltd.

Getting started

ML Insights is only available on the Enterprise Edition of Amazon QuickSight. If you are using the Standard Edition, you can easily upgrade with 1-click on the Manage QuickSight page.

To get started with ML Insights, you’ll need to connect a data source to Amazon QuickSight. Data sets can be accessed by direct query to the SQL-compatible database source or by using SPICE.

For this walkthrough, your data must have the following properties:

  • At least one date field.
  • At least one metric, such as sales, orders, shipped units, or sign ups.
  • At least one category dimension, such as product, channel, segment, or industry.
  • More than 40 historical data points per metric.

For optimal results, make sure that your data set has enough historical data points. The built-in ML algorithm requires at least 40 historical data points to learn and train the model, and it will use up to the most recent 1,000 data points. For example, if you’re analyzing daily sales by geographic region, make sure that you have at least 40 days of data. Three months to twelve months of data is preferred, depending on the seasonality of your business

You can use your own data set, or you can download the following sample data set. We’ll use this dataset for the walkthrough:

https://s3.amazonaws.com/quicksight-ml-insights/ML-Insights-Sample-Dataset-V1.csv

Once you have created a data set in Amazon QuickSight, create a new analysis from the data set. For more information about creating data sets and analyses in Amazon QuickSight, see the Amazon QuickSight User Guide.

Suggested insights

ML Insights automatically interprets your data and provides contextual insights called suggested insights. Different visuals may result in different types of insights. For example, if you have a time-series visual, you may get insights such as period-over-period changes, anomalies, and forecasts.

Let’s walk through an example.

1. Create a line chart with a metric and a date, such as revenue over time, aggregated daily.

2. Choose Insights in the top left-hand corner of the visual.

You should then see a list of suggested insights on the left pane. Suggested insights provide you with a quick summary of the data in plain language. As you add visuals to your analyses, you will see additional suggested insights on the left pane, grouped by the visual name.

You can choose a suggested insight such as day over day change to highlight the data point or segment on the visual. Choose it again to deselect it.

ML-powered anomaly detection

With ML Insights, you can run ML-powered anomaly detection on up to a million metrics simultaneously to discover hidden trends and outliers that are often buried in aggregates. To learn more about pricing for anomaly detection, go to Amazon QuickSight pricing and choose ML Insights.

Let’s get started with anomaly detection.

1. Choose Add on the application bar, and then choose Add anomaly to sheet. This creates an insights visual for anomaly detection.

2. Expand the field wells on the top of the page and add at least one category field.

The categories represent the dimensional values by which Amazon QuickSight will split the metric. For example, let’s say you are analyzing anomalies in revenue across all product categories and product SKUs. Assuming there are 10 product categories, each with 10 product SKUs, Amazon QuickSight will split the metric by the 100 unique combinations and run anomaly detection on each of the split metrics.

3. Choose Get Started on the insights visual to configure the anomaly detection job.

4. On the anomaly detection configuration pane, configure the following options:

  • Analyze all combinations of these categories – If you select three categories, Amazon QuickSight will run anomaly detection on the following combinations, hierarchically: A, AB, ABC. If you select this option, Amazon QuickSight will analyze all combinations, including: A, AB, ABC, BC, AC. If your data is not hierarchical, you should select this option.
  • Number of anomalies to show – This setting allows you to control the number of top anomalies you want to display on the insights cards as narratives.
  • Schedule – Set the schedule to run anomaly detection on your data hourly, daily, weekly or monthly, depending on your data. Choose the start time and the time zone of the start time.
  • Contribution analysis – You can select up to four additional dimensions for Amazon QuickSight to analyze for top contributors when an anomaly is detected. For example, Amazon QuickSight can show you the top customers that contributed to a spike in sales in the USA for Home Improvement products. If you have additional dimensions in your data (dimensions not used in the anomaly detection), you can add them here for contribution analysis. For this example, choose the Geo for contribution analysis.

5. Choose OK. Amazon QuickSight will not implement the schedule until you publish the analysis as a dashboard. Within an analysis, you will have the option to run anomaly detection manually without the schedule.

6. After the configuration is set, choose Run Now to run detection manually. You will see a “Analyzing for anomalies… This may take a while…” message. Depending on the size of your data set, analysis may take anywhere from a few minutes to an hour.

Once anomaly detection is complete, you will see the top anomalies for the latest period in your data listed in the insights visual. Amazon QuickSight also computes and displays the expected value so you can better understand the significance of the anomaly.

7. To see all anomalies for this data, choose the selector in the upper right of the visual and choose Explore Anomalies.

On the detailed anomalies page, you can see all of the anomalies detected for the latest period. The title of the visual represents the metric that is applied to the unique combination of the categorical fields. The highlighted data point on the chart—on the far right of the chart—represents the most recent anomaly detected for that time series.

On the left pane, you will see the top contributors to the anomaly based on the dimensions you have predefined. When you hover over the top contributors, Amazon QuickSight displays an explanation of the significance of the contribution.

8. To see anomalies by date, choose Show Anomalies by Date from the top of the visual to expose a date picker. The cart will display the number of anomalies detected for each unit of your anomaly detection configuration. You can choose a particular date to see the anomalies for that date. For example, if you choose Nov. 1st, 2018, from the graph, then the bar chart highlights the anomalies for that date.

Important: Amazon QuickSight uses the first 40 data points in a data set for training; these data points will not be scored by the anomaly detection algorithm. You may not see any anomalies on the first 40 data points.

9. Use the filter controls at the top of the pane to change the anomaly threshold to show anomalies with high, medium or low significance or to show only anomalies that are higher than expected or lower than expected. You can also filter by the categorical values that are present in your data set to look at anomalies only for those categories.

10. To go back to your analysis, choose Back to analysis at the top of the page.

ML-powered forecasting

Using the built-in ML algorithm, you can now forecast business metrics with point-and-click simplicity without having to write code or build a complex spreadsheet.

1. On your time series chart, choose the selector in the upper right corner of the visual, and then choose Add forecast. Amazon QuickSight will analyze the historical data using ML and present a forecast for the next 14 periods.

2. On the Forecast properties pane at the left, you can customize forecast settings. For example, you can change the number of periods to forecast into the future or add “forecast” periods into the past to compare historical actuals against ML-based expectations.

You can adjust the width of the prediction band by changing the prediction internal and manually setting the seasonality (number of periods). Choose Apply to save your changes.

3. Select a forecasted data point on the chart and choose What-if analysis. With What-if analysis, you can set target value for a particular date or date range, and Amazon QuickSight will adjust the forecast gracefully to meet the target.

4. Choose Apply to see the new forecast adjusted for the target along with the original forecast. You can hover over the data points to see details.

With ML-powered forecasting, Amazon QuickSight allows you to forecast complex, real-world scenarios such as data with multiple seasonality. Outliers will be excluded automatically, and missing values will be imputed.

To export the forecasting data in CSV format, choose the selector in the upper right corner of the visual, and then choose Export to CSV.

Auto-narratives

Auto narratives allow you to create natural language summaries of your visuals. You can embed these summaries into your dashboard to highlight key insights that are important for your readers, allowing them to access the data without having to sift through the entire dashboard. When you define a template, narratives update automatically as the data in your data set refreshes, just like a visual. The following steps show you how to get started with auto narratives.

1. In your time series chart, choose Insights again to show the suggested insights.

2. An easy way to add a narrative insight to your analysis is to choose the plus sign (+) next to a suggested insight. For the purpose of this walkthrough, choose the Day Over Day Change insight.

You’ll see an insight visual on your analysis with the predefined template. Notice that the field wells have the date, metric, and category filled in. These settings are populated from the visual that you used to create the insight visual. You can customize the fields as needed.

3. To edit the narrative, choose the insight visual menu and select Customize Narrative. You’ll see the Configure narrative pane where you can edit your insights template.. You can format the content with different sizes and colors using the formatting toolbar. You can also insert expressions and conditional statements like IF and FOR statements.

In the left pane, you can add computations to your narrative. Computations are predefined calculations such as period-over-period, period-to-date, growth-rate, max, min, and top movers, that you can reference in your template to describe your data. Currently, Amazon QuickSight supports 13 different types of computations. In this example, PeriodOverPeriod is added by default since we selected the Day Over Day Change insight from the suggested insights pane.

4. To add a new computation, choose Add computation in the bottom left corner of the pane. You’ll be prompted to select from a list of computations. For the purpose of this walkthrough, select the Growth rate computation type, and select Next.

5. You can configure certain aspects of the computation. In the case of growth rate, you can change the number of periods over which that you want to compute growth. After you make your selections, choose OK.

6. Now expand Computations on the left pane. You should see both PeriodOverPeriod and GrowthRate options.

Please note that computation names must be unique. When you create a computation, assign a unique name. You can reference multiple computations of the same type in your template. For example, if you have two metrics, such as $sales and units sold, you can create a GrowthRate computation for each of the metrics, with a different name for each computation. The specific computations can then be referenced by name in the template.

Also be aware that anomaly computation is not compatible with all other computation types. For example, if you have a PeriodOverPeriod or GrowthRate computation, you will not be able to add an Anomaly computation to the same insight visual.

7. To add growth rate to your narrative, enter the phrase Compounded Growth Rate for the last on the narrative template. From the Computations pane, choose GrowthRate, and then choose timePeriods to insert the expression GrowthRate.timePeriods into your narrative. This expression references the number of periods set in the configuration.

8. Complete the sentence by entering days is. Then add another expression from the Computations pane by choosing GrowthRate, then compounded GrowthRate, and then formattedValue. The selection formattedValue returns a phrase formatted according to the format applied to the metric on the field. To see a raw value in integer or decimal format, choose value instead of formattedValue.

Now, let’s try using a conditional statement.

1. To insert an IF statement, place the cursor at the end of your narrative template. From the Insert Code menu, choose Inline IF.

2. You’ll be prompted to enter some code. On the left pane choose GrowthRate, then choose compoundedGrowthRate, and then choose value. To insert the value, enter > 3, and choose Save.

3. For the conditional content, enter Great! Select the text and use the format menu to format menu to change the color to green.

4. Repeat the previous steps, entering <3 for the growth rate value. For conditional content, enter Bad! And format the text as red.

5. Choose Apply. You should see the results similar to the following.

The template provides you with a sophisticated tool to customize your narrative. Within the template, you can also reference parameters in your analysis or dashboard and leverage a set of built-in functions to perform more calculations.

The best way to get started with auto narratives and to learn the syntax is to use the existing templates built from suggested insights.  But you can also create insight visuals from scratch by choosing Add and then choosing Add Insight.

Try it yourself! Try creating a narrative that enumerates the top selling products for the last three months.

Conclusion

As you can see from the walkthrough, ML Insights helps you perform large scale anomaly detection and create business forecast in a few simple clicks. You can build rich and user-friendly auto narratives within your dashboards in minutes, without any custom development or ML skillset necessary.

 


About the Author

Luis Wang is a principal product manager for Amazon QuickSight. He’s been with AWS for over 6 years, working on various services including Amazon EC2 and then launching Amazon QuickSight. Luis is now focused on the application of machine learning and AI to business intelligence and analytics at QuickSight. He enjoys running, watching sitcoms and spending time with his family.

How to visualize Amazon GuardDuty findings: serverless edition

Post Syndicated from Ben Romano original https://aws.amazon.com/blogs/security/how-to-visualize-amazon-guardduty-findings-serverless-edition/

Note: This blog provides an alternate solution to Visualizing Amazon GuardDuty Findings, in which the authors describe how to build an Amazon Elasticsearch Service-powered Kibana dashboard to ingest and visualize Amazon GuardDuty findings.

Amazon GuardDuty is a managed threat detection service powered by machine learning that can monitor your AWS environment with just a few clicks. GuardDuty can identify threats such as unusual API calls or potentially unauthorized users attempting to access your servers. Many customers also like to visualize their findings in order to generate additional meaningful insights. For example, you might track resources affected by security threats to see how they evolve over time.

In this post, we provide a solution to ingest, process, and visualize your GuardDuty finding logs in a completely serverless fashion. Serverless applications automatically run and scale in response to events you define, rather than requiring you to provision, scale, and manage servers. Our solution covers how to build a pipeline that ingests findings into Amazon Simple Storage Service (Amazon S3), transforms their nested JSON structure into tabular form using Amazon Athena and AWS Glue, and creates visualizations using Amazon QuickSight. We aim to provide both an easy-to-implement and cost-effective solution for consuming and analyzing your GuardDuty findings, and to more generally showcase a repeatable example for processing and visualizing many types of complex JSON logs.

Many customers already maintain centralized logging solutions using Amazon Elasticsearch Service (Amazon ES). If you want to incorporate GuardDuty findings with an existing solution, we recommend referencing this blog post to get started. If you don’t have an existing solution or previous experience with Amazon ES, if you prefer to use serverless technologies, or if you’re familiar with more traditional business intelligence tools, read on!

Before you get started

To follow along with this post, you’ll need to enable GuardDuty in order to start generating findings. See Setting Up Amazon GuardDuty for details if you haven’t already done so. Once enabled, GuardDuty will automatically generate findings as events occur. If you have public-facing compute resources in the same region in which you’ve enabled GuardDuty, you may soon find that they are being scanned quite often. All the more reason to continue reading!

You’ll also need Amazon QuickSight enabled in your account for the visualization sections of this post. You can find instructions in Setting Up Amazon QuickSight.

Architecture from end to end

 

Figure 1:  Complete architecture from findings to visualization

Figure 1: Complete architecture from findings to visualization

Figure 1 highlights the solution architecture, from finding generation all the way through final visualization. The steps are as follows:

  1. Deliver GuardDuty findings to Amazon CloudWatch Events
  2. Push GuardDuty Events to S3 using Amazon Kinesis Data Firehose
  3. Use AWS Lambda to reorganize S3 folder structure
  4. Catalog your GuardDuty findings using AWS Glue
  5. Configure Views with Amazon Athena
  6. Build a GuardDuty findings dashboard in Amazon QuickSight

Below, we’ve included an AWS CloudFormation template to launch a complete ingest pipeline (Steps 1-4) so that we can focus this post on the steps dedicated to building the actual visualizations (Steps 5-6). We cover steps 1-4 briefly in the next section to provide context, and we provide links to the pertinent pages in the documentation for those of you interested in building your own pipeline.
 
Select this image to open a link that starts building the CloudFormation stack

Ingest (Steps 1-4): Get Amazon GuardDuty findings into Amazon S3 and AWS Glue Data Catalog

 

Figure 2: In this section, we'll cover the services highlighted in blue

Figure 2: In this section, we’ll cover the services highlighted in blue

Step 1: Deliver GuardDuty findings to Amazon CloudWatch Events

GuardDuty has integration with and can deliver findings to Amazon CloudWatch Events. To perform this manually, follow the instructions in Creating a CloudWatch Events Rule and Target for GuardDuty.

Step 2: Push GuardDuty events to Amazon S3 using Kinesis Data Firehose

Amazon CloudWatch Events can write to an Kinesis Data Firehose delivery stream to store your GuardDuty events in S3, where you can use AWS Lambda, AWS Glue, and Amazon Athena to build the queries you’ll need to visualize the data. You can create your own delivery stream by following the instructions in Creating a Kinesis Data Firehose Delivery Stream and then adding it as a target for CloudWatch Events.

Step 3: Use AWS Lambda to reorganize Amazon S3 folder structure

Kinesis Data Firehose will automatically create a datetime-based file hierarchy to organize the findings as they come in. Due to the variability of the GuardDuty finding types, we recommend reorganizing the file hierarchy with a folder for each finding type, with separate datetime subfolders for each. This will make it easier to target findings that you want to focus on in your visualization. The provided AWS CloudFormation template utilizes an AWS Lambda function to rewrite the files in a new hierarchy as new files are written to S3. You can use the code provided in it along with Using AWS Lambda with S3 to trigger your own function that reorganizes the data. Once the Lambda function has run, the S3 bucket structure should look similar to the structure we show in figure 3.
 

Figure 3: Sample S3 bucket structure

Figure 3: Sample S3 bucket structure

Step 4: Catalog the GuardDuty findings using AWS Glue

With the reorganized findings stored in S3, use an AWS Glue crawler to scan and catalog each finding type. The CloudFormation template we provided schedules the crawler to run once a day. You can also run it on demand as needed. To build your own crawler, refer to Cataloging Tables with a Crawler. Assuming GuardDuty has generated findings in your account, you can navigate to the GuardDuty findings database in the AWS Glue Data Catalog. It should look something like figure 4:
 

Figure 4: List of finding type tables in the AWS Glue Catalog

Figure 4: List of finding type tables in the AWS Glue Catalog

Note: Because AWS Glue crawlers will attempt to combine similar data into one table, you might need to generate sample findings to ensure enough variability for each finding type to have its own table. If you only intend to build your dashboard from a small subset of finding types, you can opt to just edit the crawler to have multiple data sources and specify the folder path for each desired finding type.

Explore the table structure

Before moving on to the next step, take some time to explore the schema structure of the tables. Selecting one of the tables will bring you to a page that looks like what’s shown in figure 5.
 

Figure 5: Schema information for a single finding table

Figure 5: Schema information for a single finding table

You should see that most of the columns contain basic information about each finding, but there’s a column named detail that is of type struct. Select it to expand, as shown in figure 6.
 

Figure 6: The "detail" column expanded

Figure 6: The “detail” column expanded

Ah, this is where the interesting information is tucked away! The tables for each finding may differ slightly, but in all cases the detail column will hold the bulk of the information you’ll want to visualize. See GuardDuty Active Finding Types for information on what you should expect to find in the logs for each finding type. In the next step, we’ll focus on unpacking detail to prepare it for visualization!

Process (Step 5): Unpack nested JSON and configure views with Amazon Athena

 

Figure 7: In this section, we'll cover the services highlighted in blue

Figure 7: In this section, we’ll cover the services highlighted in blue

Note: This step picks up where the CloudFormation template finishes

Explore the table structure (again) in the Amazon Athena console

Begin by navigating to Athena from the AWS Management Console. Once there, you should see a drop-down menu with a list of databases. These are the same databases that are available in the AWS Glue Data Catalog. Choose the database with your GuardDuty findings and expand a table.
 

Figure 8: Expanded table in the Athena console

Figure 8: Expanded table in the Athena console

This should look very familiar to the table information you explored in step 4, including the detail struct!

You’ll need a method to unpack the struct in order to effectively visualize the data. There are many methods and tools to approach this problem. One that we recommend (and will show) is to use SQL queries within Athena to construct tabular views. This approach will allow you to push the bulk of the processing work to Athena. It will also allow you to simplify building visualizations when using Amazon QuickSight by providing a more conventional tabular format.

Extract details for use in visualization using SQL

The following examples contain SQL statements that will provide everything necessary to extract the necessary fields from the detail struct of the Recon:EC2/PortProbeUnprotectedPort finding to build the Amazon QuickSight dashboard we showcase in the next section. The examples also cover most of the operations you’ll need to work with the elements found in GuardDuty findings (such as deeply nested data with lists), and they serve as a good starting point for constructing your own custom queries. In general, you’ll want to traverse the nested layers (i.e. root.detail.service.count) and create new records for each item in an embedded list that you want to target using the UNNEST function. See this blog for even more examples of constructing queries on complex JSON data using Amazon Athena.

Simply copy the SQL statements that you want into the Athena query field to build the port_probe_geo and affected_instances views.

Note: If your account has yet to generate Recon:EC2/PortProbeUnprotectedPort findings, you can generate sample findings to follow along.


CREATE OR REPLACE VIEW "port_probe_geo" AS

WITH getportdetails AS (
    SELECT id, portdetails
    FROM by_finding_type
    CROSS JOIN UNNEST(detail.service.action.portProbeAction.portProbeDetails) WITH ORDINALITY AS p (portdetails, portdetailsindex)
)

SELECT 
    root.id AS id,
    root.region AS region,
    root.time AS time,
    root.detail.type AS type,
    root.detail.service.count AS count, 
    portdetails.localportdetails.port AS localport, 
    portdetails.localportdetails.portname AS localportname, 
    portdetails.remoteipdetails.geolocation.lon AS longitude, 
    portdetails.remoteipdetails.geolocation.lat AS latitude, 
    portdetails.remoteipdetails.country.countryname AS country, 
    portdetails.remoteipdetails.city.cityname AS city 

FROM 
    by_finding_type  as root, getPortDetails
    
WHERE 
    root.id = getportdetails.id

CREATE OR REPLACE VIEW "affected_instances" AS

SELECT 
    max(root.detail.service.count) AS count,
    date_parse(root.time,'%Y-%m-%dT%H:%i:%sZ') as time,
    root.detail.resource.instancedetails.instanceid

FROM 
    recon_ec2_portprobeunprotectedport  AS root

GROUP BY  
    root.detail.resource.instancedetails.instanceid, 
    time

Visualize (Step 6): Build a GuardDuty findings dashboard in Amazon QuickSight

 

Figure 9: In this section we will cover the services highlighted in blue

Figure 9: In this section we will cover the services highlighted in blue

Now that you’ve created tabular views using Athena, you can jump into Amazon QuickSight from the AWS Management Console and begin visualizing! If you haven’t already done so, enable Amazon QuickSight in your account by following the instructions for Setting Up Amazon QuickSight.

For this example, we’ll leverage the geo_port_probe view to build a geographic visualization and see the locations from which nefarious actors are launching port probes.

Creating an analysis

In the upper left-hand corner of the Amazon QuickSight console select New analysis and then New data set.
 

Figure 10: Create a new analysis

Figure 10: Create a new analysis

To utilize the views you built in the previous step, select Athena as the data source. Give your data source a name (in our example, we use “port probe geo”), and select the database that contains the views you created in the previous section. Then select Visualize.
 

Figure 11: Available data sources in Amazon QuickSight. Be sure to choose Athena!

Figure 11: Available data sources in Amazon QuickSight. Be sure to choose Athena!

 

Figure 12: Select the "port prob geo view" you created in step 5

Figure 12: Select the “port prob geo view” you created in step 5

Viz time!

From the Visual types menu in the bottom left corner, select the globe icon to create a map. Then select the latitude and longitude geospatial coordinates. Choose count (with a max aggregation) for size. Finally, select localportname to break the data down by color.
 

Figure 13: A visual containing a map of port probe scans in Amazon QuickSight

Figure 13: A visual containing a map of port probe scans in Amazon QuickSight

Voila! A detailed map of your environment’s attackers!

Build out a dashboard

Once you like how everything looks, you can move on to adding more visuals to create a full monitoring dashboard.

To add another visual to the analysis, select Add and then Add visual.
 

Figure 14: Add another visual using the 'Add' option from the Amazon QuickSight menu bar

Figure 14: Add another visual using the ‘Add’ option from the Amazon QuickSight menu bar

If the new visual will use the same dataset, then you can immediately start selecting fields to build it. If you want to create a visual from a different data set (our example dashboard below adds the affected_instances view), follow the Creating Data Sets guide to add a new data set. Then return to the current analysis and associate the data set with the analysis by selecting the pencil icon shown below and selecting Add data set.
 

Figure 15: Adding a new data set to your Amazon QuickSight analysis

Figure 15: Adding a new data set to your Amazon QuickSight analysis

Repeat this process until you’ve built out everything you need in your monitoring dashboard. Once it’s completed, you can publish the dashboard by selecting Share and then Publish dashboard.
 

Figure 16: Publish your dashboard using the "Share" option of the Amazon QuickSight menu

Figure 16: Publish your dashboard using the “Share” option of the Amazon QuickSight menu

Here’s an example of a dashboard we created using the port_probe_geo and affected_instances views:
 

Figure 17: An example dashboard created using the "port_probe_geo" and "affected_instances" views

Figure 17: An example dashboard created using the “port_probe_geo” and “affected_instances” views

What does something like this cost?

To get an idea of the scale of the cost, we’ve provided a small pricing example (accurate as of the writing of this blog) that assumes 10,000 GuardDuty findings per month with an average payload size of 5KB.

ServicePricing StructureAmount ConsumedTotal Cost
Amazon CloudWatch Events$1 per million events/td>

10000 events $0.01
Amazon Kinesis Data Firehose$0.029 per GB ingested0.05GB ingested $0.00145
Amazon S3$0.029 per GB stored per month0.1GB stored $0.00230
AWS LambdaFirst million invocations free~200 invocations $0
Amazon Athena$5 per TB Scanned0.003TB scanned (Assume 2 full data scans per day to refresh views) $0.015
AWS Glue$0.44 per DPU hour (2 DPU minimum and 10 minute minimum) = $0.15 per crawler run30 crawler runs $4.50
Total Processing Cost$4.53

Oh, the joys of a consumption-based model: Less than five dollars per month for all of that processing!

From here, all that remains are your visualization costs using Amazon QuickSight. This pricing is highly dependent upon your number of users and their respective usage patterns. See the Amazon QuickSight pricing page for more specific details.

Summary

In this post, we demonstrated how you can ingest your GuardDuty findings into S3, process them with AWS Glue and Amazon Athena, and visualize with Amazon QuickSight. All serverless! Each portion of what we showed can be used in tandem or on its own for this or many other data sets. Go launch the template and get started monitoring your AWS environment!

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ben Romano

Ben is a Solutions Architect in AWS supporting customers in their journey to the cloud with a focus on big data solutions. Ben loves to delight customers by diving deep on AWS technologies and helping them achieve their business and technology objectives.

Author

Jimmy Boyle

Jimmy is a Solutions Architect in AWS with a background in software development. He enjoys working with all things serverless because he doesn’t have to maintain infrastructure. Jimmy enjoys delighting customers to drive their business forward and design solutions that will scale as their business grows.

Visualize over 200 years of global climate data using Amazon Athena and Amazon QuickSight

Post Syndicated from Joe Flasher original https://aws.amazon.com/blogs/big-data/visualize-over-200-years-of-global-climate-data-using-amazon-athena-and-amazon-quicksight/

Climate Change continues to have a profound effect on our quality of life. As a result, the investigation into sustainability is growing. Researchers in both the public and private sector are planning for the future by studying recorded climate history and using climate forecast models.

To help explain these concepts, this post introduces the Global Historical Climatology Network Daily (GHCN-D). This registry is used by the global climate change research community.

This post also provides a step-by-step demonstration of how Amazon Web Services (AWS) services improve access to this data for climate change research. Data scientists and engineers previously had to access hundreds of nodes on high-performance computers to query this data. Now they can get the same data by using a few steps on AWS.

Background

Global climate analysis is essential for researchers to assess the implications of climate change on the Earth’s natural capital and ecosystem resources. This activity requires high-quality climate datasets, which can be challenging to work with because of their scale and complexity. To have confidence in their findings, researchers must be confident about the provenance of the climate datasets that they work with. For example, researchers may be trying to answer questions like: has the climate of a particular food producing area changed in a way that impacts food security? They must be able to easily query authoritative and curated datasets.

The National Centers for Environmental Information (NCEI) in the U.S. maintains a dataset of climate data that is based on observations from weather stations around the globe. It’s the Global Historical Climatology Network Daily (GHCN-D) — a central repository for daily weather summaries from ground-based stations. It is comprised of millions of quality-assured observations that are updated daily.

The most common parameters recorded are daily temperatures, rainfall, and snowfall. These are useful parameters for assessing risks for drought, flooding, and extreme weather.

The challenge

The NCEI makes the GHCN_D data available in CSV format through an FTP server, organized by year. Organizing the data by year means that a complete copy of the archive requires over 255 files (the first year in the archive is 1763). Traditionally, if a researcher wants to work on this dataset they must download it and work on it locally. For a researcher to be sure of using the latest data for their analysis, they must repeat this download every day.

For researchers, deriving insight from this data can be a challenge. They must be able to fully engage with the data, because that requires technical skill, computing resources, and subject matter expertise.

A new efficient approach

Through AWS’s collaboration with the NOAA Big Data Project, a daily snapshot of the GHCN_D dataset is now available on AWS. The data is publically accessible through an Amazon S3 bucket. For more information, see the Registry of Open Data on AWS.

Having the data available in this way offers several advantages:

  • The data is globally available to a community of users. Users no longer must download data to work on it. Everyone can work with the same, authoritative copy.
  • Time to insight is reduced. By taking advantage of AWS services, researchers can immediately start to perform analysis.
  • The cost of research is reduced. Researchers can switch off resources as soon as their analysis is finished.

This blog post illustrates a workflow using Amazon S3, Amazon Athena, AWS Glue, and Amazon QuickSight that demonstrates how quickly one can derive insights from this dataset.

The workflow presented in this post follows these general steps:

  • Extract data files from the NOAA bucket and make the data available as tables.
  • Use SQL to query the data contained in the tables.
  • Show how to speed up analysis by creating tables from queries and storing those tables in a private Amazon S3 bucket.
  • Visualize the data to gain insight.

Overview of the GHCN_D dataset

The GHCN-D is a quality-assured dataset that contains daily weather summaries from weather stations across global land areas. It has the following properties:

  • Data is integrated from approximately 30 sources that provide weather observations from various national and international networks.
  • A comprehensive dataset for the US and good coverage for many parts of the world.
  • There are many types of daily weather observations in this dataset, but the majority are maximum temperature, minimum temperature, precipitation, snow fall, and snow depth. These observations include:
    • Over 35,000 temperature stations.
    • Over 100,000 precipitation stations.
    • Over 50,000 snowfall or snow depth stations
  • The source of each datum, the term used for a single record, is contained in the dataset. Each datum has a quality control flag associated with it.
  • The dataset is updated daily. The historic sources are reprocessed weekly.

You can see in the graphic below how the data volume has grown in recent decades.

Figure 1: 1763 to 2018. For 1763 there are less than a thousand observations. For 2017 there are over 34 million observations.

Organization of the data on Amazon S3

As previously mentioned, the GHCN-D dataset is accessible through an Amazon S3 bucket. The details of the dataset are on the Registry of Open Data on AWS (RODA). The landing page for the dataset on RODA contains a link to a comprehensive readme file for the dataset. This readme contains all of the lookup tables and variable definitions.

This section shows the pertinent information required to start working with the dataset.

The data is in a text, or comma-separated values (CSV), format and is contained in the Amazon S3 bucket called noaa-ghcn-pds.

The noaa-ghcn-pds bucket contains virtual folders, and is structured like this:

  • noaa-ghcn-pds. This is the root of the bucket with two subdirectories and a number of useful files. For the purposes of this exercise, we use only the ghcnd-stations.txt file. This file contains information about the observation stations that produced the data for the GHCN_D dataset. You must download the ghcnd-stations.txt file.
  • noaa-ghcn-pds/csv/. This virtual folder contains all of the observations from 1763 to the present organized in .csv files, one file for every year. For this exercise, we’ll collate this data into a single table.

Also for the purpose of this exercise, the data from ‘ghcnd-stations.txt’ and the data contained in noaa-ghcn-pds/csv/ are extracted and added to two separate tables. These tables are the basis of the analysis.

The tables are labeled as:

  • tblallyears. This table contains all the records stored in the yearly .csv files from 1763 to present.
  • tblghcnd_stations. This table contains information for over 106,430 weather stations.

Point of interest: the .csv file from the year 1763 contains the data for one weather station. That station was located in the center of Milan, Italy.

The tools

To implement the general workflow in this exercise, we’re using the following tools:

  • Amazon Simple Storage Service (Amazon S3) to stage the data for analysis. The GHCN_D dataset is stored in a bucket on Amazon S3. We also use a private bucket to store new tables created from queries.
  • Amazon Athena to query data stored on Amazon S3 using standard SQL.
  • AWS Glue to extract and load data into Athena from the Amazon S3 buckets in which it is stored. AWS Glue is a fully managed extract, transform, and load (ETL) service.
  • AWS Glue Data Catalog to catalog the data that we query with Athena.
  • Amazon QuickSight to build visualizations, perform ad hoc analyses, and get insights from the dataset. Queries and tables from Athena can be read directly from Amazon QuickSight. Amazon QuickSight can also run queries against tables in Athena.

To implement the processes outlined in this post, you need an AWS Account. For more information about creating an AWS account, see Getting Started with AWS. You also must create a private Amazon S3 bucket located in the N. Virginia AWS Region. For more information, see Create a Bucket.

When you create the bucket, it must contain the following empty directories:

  1. [your_bucket_name]/stations_raw/
  2. [your_bucket_name]/ghcnblog/
  3. [your_bucket_name]/ghcnblog/stations/
  4. [your_bucket_name]/ghcnblog/allyears/
  5. [your_bucket_name]/ghcnblog/1836usa/

The following is an overview of how the various AWS services interact in this workflow.

Note

The AWS services are in the same AWS Region. One of the Amazon S3 buckets is the existing one that stores the GHCN_D data. The other Amazon S3 bucket is the bucket that you use for storing tables.

Figure 2: How the AWS services work together to compose this workflow.

The workflow

Now that we have the tools and the data, we are ready to:

  1. Extract the yearly .csv files and add them to a table in Amazon Athena.
  2. Extract the stations text file and add it to a separate table in Amazon Athena.

Extract the yearly .csv files and add it to a table in Amazon Athena

The complete set of daily weather observations is organized by year in one folder of the Amazon S3 bucket in .csv format. The path to the data is s3://noaa-ghcn-pds/csv/.

Each file is named by year beginning with 1763.csv and progressing one year at a time up to the present.

From the AWS console, click on AWS Athena. This takes you to the main dashboard for Athena. From here, click on AWS Glue Data Catalog. This brings you to AWS Glue.

In AWS Glue, choose the Tables section on the left side. Then, in the Add table drop-down menu, choose Add table manually. A series of forms displays for you to add the following information:

  • Set up your table’s properties:
    • Give the new table a name, for example, tblallyears
    • Create a database and name it ghcnblog.

The database then appears in the Athena dashboard.

  • Add a data store:
    • Choose the Specified path in another account option, and enter the following path in the text box: s3://noaa-ghcn-pds/csv/
  • Choose a data format:
    • Select CSV, then select Comma as the delimiter.
  • Define a schema:
    • Add the following columns as string variables:
      • id
      • year_date
      • element
      • data_value
      • m_flag
      • q_flag
      • s_flag
      • obs_time

For a full description of the variables and data structures, see the readme file.

  • Choose OK, then Finish.

Now return to the Athena dashboard, and choose the database that you created. The table will appear in the list of tables on the left. You can now preview the data by choosing the ‘Preview table’ option to the right of the table.

Use CTAS to speed up queries

As a final step, create a table using the SQL statement called CREATE TABLE AS SELECT (CTAS). Store the table in a private Amazon S3 bucket.

This step dramatically speeds up queries. The reason is because in this process we extract the data once and store the extracted data in a columnar format (Parquet) in the private Amazon S3 bucket.

To illustrate the improvement in speed, here are two examples:

  • A query that counts all of the distinct IDs, meaning unique weather stations, takes around 55 seconds and scans around 88 GB of data.
  • The same query on the converted data takes around 13 seconds and scans about 5 GB of data.

To create the table for this final step:

  1. Open the Athena console.
  2. In the dashboard, select New query, then enter the query as shown in the following example. Make sure to enter the information that’s applicable to your particular situation, such as your bucket name.
    Figure 3: Query to create a table data converting the data into Parquet and storing it in your S3 bucket.
  3. Make sure that the data format is Parquet.
  4. Name your table tblallyears_qa.
  5. Add this path to this folder in the private Amazon S3 bucket: [your_bucket_name]/ghcnblog/allyearsqa/. Replace your_bucket_name with your specific bucket name.

The new table appears in your database, listed on the left side of the Athena dashboard. This is the table that we work with going forward.

Extract the stations text file and add it to a separate table in Amazon Athena

The stations text file contains information about the weather stations, such as location, nationality, and ID. This data is kept in a separate file from the yearly observations. We need to import this data to look at the geographical spread of weather observations. While dealing with this file is a bit more complicated, the steps to importing this data into Athena are similar to what we have already done.

To import this data into Athena:

  1. Download the ghcnd-stations text file.
  2. Open the file in a spreadsheet program and use the fixed width-delimited data import function. The fixed widths of the columns are described in the readme file in the section called FORMAT OF “ghcnd-stations.txt” file.
  3. After you successfully import the data, save the spreadsheet as a .csv text file.
  4. Copy the new .csv file to [your_bucket_name]/stations_raw/. Replace your_bucket_name with your specific bucket name.
  5. Using this new .csv file, follow the Add table process steps in AWS Glue, as described earlier in this post.
    • Use the following field names:
      • id
      • latitude
      • longitude
      • elevation
      • state
      • name
      • gsn_flag
      • hcn_flag
      • wmo_id
    • Name the table tblghcnd_stations.
  6. After the table is created, follow the CREATE TABLE AS SELECT (CTAS) steps for this table as described earlier in this post.
    • Name the new table tblghcnd_stations_qa.
    • Store the new table in [your_bucket_name]/ghcnblog/stations/. Replace your_bucket_name with your specific bucket name.

The two most important datasets of GHCN_D are now in Athena.

In the next section, we run queries against these tables and visualize the results using Amazon QuickSight.

Exploratory data analysis and visualization

With our two tables created, we are now ready to query and visualize to gain insight.

Exploratory data analysis

In the Athena query window, run the following queries to get an idea of the size of the dataset.

Query #1: the total number of observations since 1763:

Figure 4: Query for total number of observations. This was run in autumn 2018. The dataset is continuingly growing over time.

Query #2: the number of stations since 1763:

Figure 5: Query for total number of stations that have made observations since 1763. Deactivated stations are included.

Average weather parameters for the Earth

The following figure shows a query that calculates the average maximum temperature (Celsius), average minimum temperature (Celsius), and average rainfall (mm) for the Earth since 1763.

In the query, we must convert the data_value from a String variable to a Real variable. We also must divide by 10, because the temperature and precipitation measurements are in tenths of their respective units. For more information about these details and the element codes (TMIB, TMAX and PRCP), see the readme file.

Figure 6. Querying for global averages to get to Earth’s pulse.

It would be convenient if we could just run simple queries, such as this one, on this dataset and accept that the results are correct.

The previous query is assuming an even and equal spread of weather stations around the world since 1763. In fact, the number and spread of weather stations varied over time.

Visualizing the growth in numbers of weather stations over time

The following query visualizes the number of weather stations for each year since 1763, by using Amazon QuickSight.

Note: You must be signed up for Amazon QuickSight to complete these steps. During the sign-up process, you are prompted to manage your Amazon QuickSight data source permissions. At this time, use step 3 in the following procedure to grant access to the Amazon S3 buckets and to Athena.

The steps are as follows:

  1. Open the Amazon QuickSight console.
  2. On the far right of the dashboard, choose Manage QuickSight.
  3. Choose Account Setting, then Manage QuickSight permissions. Give Amazon QuickSight permission to access Athena, and to read the Amazon S3 bucket that contains the new tables.
  4. Return to Amazon QuickSight by choosing the logo on the top left side of the screen.
  5. From the Amazon QuickSight dashboard, choose New analysis, then New data set.
  6. From the Create a Data Set tiles, choose Athena. Name the data source, for example ghcnblog, then choose Create data source.
  7. Choose the option to add a custom SQL, then add the SQL, as shown in the following example:

Figure 7: Location to add a custom SQL query.

  1. Choose Confirm query.
  2. Choose Directly query your data.
  3. Choose Visualize.
  4. To make the graph, choose the line chart graph. Add year to the X-axis and number_of_stations to the Value field wells. The options appear to the left of the visualization dashboard.

Figure 8. The number of global weather stations used by GHCN_D over time.

The resulting graph shows that the number and spread of stations around the world has varied over time.

A look at the development of observation in the US

1836 is the year of the first US observation station in the data set. To get an insight into the development of observations the US, we extracted a subset of US data from the main data source (tblallyears_qa). This dataset features annual data every 30th year from 1836 to 2016.

This query generates a large dataset. To improve performance, save the query as a table stored in an Amazon S3 bucket using the previously described procedure.

The query to do this in one step is shown in the following figure.

Figure 9: The SQL to create a table from a query and store it in Parquet format in a user-specified Amazon S3 bucket.

The table appears in the Amazon Athena dashboard as tbl1836every30thyear and it forms the basis for our analysis.

In the Amazon QuickSight console, use the follow SQL to generate a new dataset.

Figure 10: The SQL to create a dataset for viewing USA data in Amazon QuickSight.

  1. Choose Confirm query.
  2. Choose Directly query your data.
  3. Choose Visualize.

This brings you back to the visualization dashboard. From this dashboard, chose the Points on a map visualization, and set up the fields as follows:

  • Geospatial: state
  • Size: number_of_stations, aggregate by count.
  • Color: year

The results should be the following map of the US showing the growth of weather stations used by GHCN_D from 1836 to 2016 at 30-year increments. In 1836, there was one station. By 2016, there were thousands.

Figure 11: The growth of the number of observations stations in the US.

Interestingly, some states had more stations in 1956 than they did in 1986. This is also illustrated in the following figure. The data for the figure was derived from the previous dataset.

Figure 12: This heat map illustrates the number of stations per state over time. This is a 30th year snapshot.

A look at more data

We have now a data lake of GHN_D data. By using the tools that we have assembled, we are free to experiment with the data. It is now possible to construct queries and visualization on this huge dataset to gain insights.

The following figure shows the heat map that we created. It shows the average minimum temperature in US states over time. As before, we are looking at 30-year snapshots; that is to say, every 30th year we take a yearly average.

Figure 13: This heat map illustrates the minimum temperate for each state over time. A yearly average every 30th year starting at 1836.

Summary

Our headlines are full of Climate Change and Sustainability stories, and research and analysis has become more crucial than ever.

We showed researchers, analysts, and scientists how AWS services have reduced the level of technical skills required to fully use the GHCN_D dataset.

This GHCN-D is available on AWS. The details can be found on the Registry of Open Data on AWS. This data is available to researchers studying climate change and weather impacts.

This blog post demonstrated a typical workflow that a researcher could use to engage with and analyze this important data by using Amazon Athena, AWS Glue, and Amazon S3, and how they can visualize insights by using Amazon QuickSight.

By making this data available, Amazon has removed the heavy lifting that was traditionally required to work with the GHCN_D, thus expanding the opportunity for new research and new discoveries.

 


About the Authors

Joe Flasher is the Open Geospatial Data Lead at Amazon Web Services, helping organizations most effectively make data available for analysis in the cloud. Joe has been working with geospatial data and open source projects for the past decade, both as a contributor and maintainer. He has been a member of the Landsat Advisory Group, and has worked on projects, ranging from building GIS software to making the space shuttle fly. Joe’s background is in astrophysics, but he kindly requests you don’t ask him any questions about constellations.

 

 

Conor Delaney, PhD. is an environmental data scientist.