Tag Archives: Amazon QuickSight

Visualize user behavior with Auth0 and Amazon EventBridge

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/visualize-user-behavior-with-auth0-and-amazon-eventbridge/

In this post, I show how to capture user events and monitor user behavior by using the Amazon EventBridge partner integration with Auth0. This enables you to gain insights to help deliver a more customized application experience for your users.

Auth0 is a flexible, drop-in solution that adds authentication and authorization services to your applications. The EventBridge integration automatically and continuously pushes Auth0 log events your AWS account via a custom SaaS event bus.

The examples used in this post are implemented in a custom-built serverless application called FreshTracks. This is a demo application built in Vue.js, which I will use to demonstrate multiple SaaS integrations into AWS with EventBridge in this and future blog posts.

FreshTracks – a demo serverless web application with multiple SaaS Integrations.

FreshTracks – a demo serverless web application with multiple SaaS Integrations.

The components for this EventBridge integration with Auth0 have been extracted into a separate example application in this GitHub repo.

How the application works

Routing Auth0 Events with Amazon EventBridge.

Routing Auth0 Events with Amazon EventBridge.

  1. Events are emitted from Auth0 when a user interacts with the login service on the front-end application.
  2. These events are streamed into a custom SaaS event bus in EventBridge.
  3. Event rules match events and send them downstream to a Lambda function target.
  4. The receiving Lambda function performs some data transformation before writing an object to S3.
  5. These objects are made available by a QuickSight data source manifest file and used as datapoints for QuickSight visuals.

Configuring the Auth0 EventBridge integration

To capture Auth0 emitted events in EventBridge, you must first configure Auth0 for use as the Event Source on your Auth0 Dashboard.

  1. Log in to the Auth0 Dashboard.
  2. Choose to Logs > Streams.
  3. Choose + Create Stream.
  4. Choose Amazon EventBridge and enter a unique name for the new Amazon EventBridge Event Stream.
  5. Create the Event Source by providing your AWS Account ID and AWS Region. The Region you select must match the Region of the Amazon EventBridge bus.
  6. Choose Save.
Event Source Configuration on Auth0 dashboard

Event Source Configuration on Auth0 dashboard

Auth0 provides you with an Event Source Name. Make sure to save your Event Source Name value since you need this at a later point to complete the integration.

Creating a custom event bus

  1. Go to the EventBridge partners tab in your AWS Management Console. Ensure the AWS Region matches where the Event Source was created.
  2. Paste the Event Source Name in the Partner event sources search box to find and choose the new Auth0 event source.Note: The Event Source remains in a pending state until it is associated with an event bus.

    Partner event source

  3. Choose the event source, then choose Associate with Event Bus.
  4. Choose Associate.

Deploying the application

Once you have associated the Event Source with a new partner event bus, you are ready to deploy backend services to receive and respond to these events.

To set up the example application, visit the GitHub repo and follow the instructions in the README.md file.

When deploying the application stack, make sure to provide the custom event bus name with –parameter-overrides.

sam deploy --parameter-overrides Auth0EventBusName=aws.partner/auth0.com/auth0username-0123344567-e5d2-4514-84f2-97dd4ff8aad0/auth0.logs

You can find the name of the new Auth0 custom event bus in the custom event bus section of the EventBridge console:

Custom event bus name

Custom event bus name

Routing events with rules

The AWS Serverless Application Model (SAM) template in the example application creates four event rules:

  1. Successful sign-in
  2. Successful signup
  3. Successful log-out
  4. Unsuccessful signup

These are defined with the `AWS::Events::Rule` resource type. Each of these rules is routed to a single target Lambda function. For a successful sign-in, the rule event pattern is matched with detail:data:type:s.  This refers to the Auth0 event type code for a successful sign-in. Every Auth0 event code is listed here.

SuccessfullSignIn: 
    Type: AWS::Events::Rule
    Properties: 
      Description: "Auth0 User Successfully signed in"
      EventBusName: 
         Ref: Auth0EventBusName
      EventPattern: 
        account:
        - !Sub '${AWS::AccountId}'
        detail:
          data:
            type:
            - s
      Targets: 
        - 
          Arn:
            Fn::GetAtt:
              - "SaveAuth0EventToS3" 
              - "Arn"
          Id: "SignInSuccessV1"

To respond to additional events, copy this event rule pattern and change the event code string for the event you want to match.

Writing events to S3 with Lambda

The application routes events to a Lambda function, which performs some data transformation before writing an object to S3.  The function code uses an environment variable named AuthLogsBucket to store the S3 bucket name. The permissions to write to S3 are granted by policy defined within the SAM template:

  SaveAuth0EventToS3:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: src/
      Handler: saveAuth0EventToS3.handler
      Runtime: nodejs12.x
      MemorySize: 128
      Environment:
        Variables:
          AuthLogBucket: !Ref AuthZeroToEventBridgeUserActivitylogs
      Policies:
        - S3CrudPolicy:
            BucketName: !Ref AuthZeroToEventBridgeUserActivitylogs

The S3 object is a CSV file with context about each event. Each of the Auth0 event schemas is different. To maintain a consistent CSV file structure across different event types, this Lambda function explicitly defines each of the header and row values. An output string is constructed from the Auth0 event:

Lambda function output string

Lambda function output string

This string is placed into a new buffer and written to S3 with the AWS SDK for Javascript as referenced in GitHub here

Sending events to the application

There is a test event in the /event directory of the example application. This contains an example of a successful sign-in event emitted from Auth0.

Sending a test Auth0 event to Lambda

Send a test event to the Lambda function using the AWS Command Line Interface.

Run the following command in the root directory of the example application, replacing {function-name} with the full name of your Lambda function.

aws lambda invoke --function-name {function-name} --invocation-type Event --payload file://events/event.json  events/response.json --log-type Tail

Response:

{  
 "StatusCode": 202
}

The response output appears in the output terminal window. To confirm that an object is stored in S3, navigate to the S3 Console.  Choose the AuthZeroToEventBridgeUserActivityLogs bucket. You see a new auth0 directory and can open the CSV file that holds context about the event.

Object written to S3

Object written to S3

Sending real Auth0 events from a front-end application

Follow the instructions in the Fresh Tracks repo on GitHub to deploy the front-end application. This application includes Auth0’s authentication flow. You can connect to your Auth0 application by entering your credentials in the `auth0_config.json` file:

{
  "domain": "<YOUR AUTH0 DOMAIN>",
  "clientId": "<YOUR AUTH0 CLIENT ID>"
}

The example backend application starts receiving Auth0 emitted events immediately.

To see the full Fresh Tracks application continue to the backend deployment instructions. This is not required for the examples in this blog post.

Building a QuickSight dashboard

You can visualize these Auth0 user events with an Amazon QuickSight dashboard. This provides a snapshot analysis that you can share with other QuickSight users for reporting purposes.

To use Auth0 events as metrics, create a separate calculated field for each event (for example, successful signup and successful login).  For example, an analysis could include multiple visuals, custom fields, conditional formatting, and events. This gives a snapshot of user interaction with the front-end application at any given time.

FreshTracks final dashboard example

FreshTracks final dashboard example

The example application in the GitHub repo provides instructions on how to create a dashboard.

Conclusion

This post explains how to set up EventBridge’s third-party integration with Auth0 to capture events. The example backend application demonstrates how to filter these events, perform computations on them, save as S3 objects, and send to a downstream service.

The ability to build QuickSight story boards from these events and share visuals with key business stakeholders can provide a narrative about the analysis data. This is implemented with minimal code to provide near real-time streaming of events and without adding latency to your application.

The possibilities are vast. I am excited to see how builders use this serverless pattern to create their own visuals to build a better, more customized application experience for their users.

Start here to learn about other SaaS integrations with Amazon EventBridge.

Ingest Excel data automatically into Amazon QuickSight

Post Syndicated from Ying Wang original https://aws.amazon.com/blogs/big-data/ingest-excel-data-automatically-into-amazon-quicksight/

Amazon QuickSight is a fast, cloud-powered, business intelligence (BI) service that makes it easy to deliver insights to everyone in your organization. This post demonstrates how to build a serverless data ingestion pipeline to automatically import frequently changed data into a SPICE (Super-fast, Parallel, In-memory Calculation Engine) dataset of Amazon QuickSight dashboards.

It is sometimes quite difficult to be agile in BI development. For example, end-users that perform self-service analytics may want to add their additional ad hoc data into an existing dataset and have a view of the corresponding updated dashboards and reports in a timely fashion. However, dashboards and reports are usually built on top of a single online analytic processing (OLAP) data warehouse with a rigid schema. Therefore, an end-user (who doesn’t have permission to update the dataset directly) has to go through a complicated and time-consuming procedure to have their data updated in the warehouse. Alternatively, they could open a ticket for you to edit the dataset manually, but it is still a very inconvenient solution that involves a significant amount of repetitive manual effort, especially if they frequently need to update the data.

Therefore, an automated data processing tool that can perform real-time data ingestion is very useful. This post discusses a tool that, when an end-user uploads Excel files into Amazon S3 or any other data file sharing location, performs the following end-to-end process:

  • Cleans the raw data from the Excel files, which might contain a lot of formatting and redundant information.
  • Ingests the cleaned data.
  • Performs a status check to monitor the data cleaning and ingestion process.
  • Sends a notification of the results to the end-user and BI development team.

With the recently launched feature cross data source joins, you can join across all data sources that Amazon QuickSight supports, including file-to-file, file-to-database, and database-to-database joins. For more information, see Joining across data sources on Amazon QuickSight.

In addition to cross data source joins, Amazon QuickSight has also launched new APIs for SPICE ingestion. For more information, see Importing Data into SPICE and Evolve your analytics with Amazon QuickSight’s new APIs and theming capabilities.

This post shows how you can combine these features to build an agile solution that cleans and ingests an Excel file into a SPICE dataset of Amazon QuickSight automatically. In SPICE, the real-time data from Excel joins with the Amazon Redshift OLAP data warehouse, and end-users receive Amazon SNS messages about its status throughout the process.

Solution overview

The following diagram illustrates the workflow of the solution.

The workflow includes the following steps:

  1. An end-user uploads an Excel file into an on-premises shared folder.
  2. The Excel files upload to the Amazon S3 bucket excel-raw-data.Alternatively, the end-user can skip this step and upload the Excel file into this Amazon S3 bucket directly.
  3. This upload event triggers the SNS message Excel-Raw-Data-Uploaded.
  4. Both the end-user and the BI team receive a message about the new upload event.
  5. The upload event also triggers the AWS Lambda function DataClean to process the Excel data.
  6. The Lambda function removes the formatting and redundant information of the Excel file, saves the cleaned data as a CSV file into the S3 bucket autoingestionqs, and publishes an SNS message to notify end-users about the data cleansing status.
  7. This cleaned CSV file is mapped as an Amazon Athena table.
  8. In the Amazon QuickSight SPICE dataset, this table joins with the Amazon Redshift table through the cross data source join functionality.
  9. The CSV file creation event in the S3 bucket autoingestionqs triggers the Lambda function qsAutoIngestion.
  10. This function calls the data ingestion API of Amazon QuickSight and checks the data ingestion status.
  11. When the data ingestion is complete, end-users receive the Ingestion-Finished SNS message.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Creating resources

Create your resources by launching the following AWS CloudFormation stack:

During the stack creation process, you have to provide a valid email address as the endpoint of Amazon SNS services. After the stack creation is successful, you have three SNS topics, two S3 buckets, and the corresponding IAM policies.

Walkthrough

To implement this solution, complete the following steps:

  1. Enable SNS notification of new object creation event in S3 bucket excel-raw-data. For more information, see How Do I Enable and Configure Event Notifications for an S3 Bucket? When an end-user uploads an Excel file into the excel-raw-data S3 bucket, the event triggers an Amazon SNS message.The following screenshot shows the example Excel file that this post uses.The following screenshot shows the SNS message Excel-Raw-Data-Upload, which includes details of the upload event.
  2. Download the sample code DataClean.py in Python 3.7 from the GitHub repo.
  3. Create a Lambda function named DataClean.
  4. Configure the function to be a subscriber of the SNS topic Excel-Raw-Data-Uploaded.
  5. Edit the SNS topic Cleaning-is-Done, and add the following code to the access policy:
    "Sid": "example-statement-ID",
          "Effect": "Allow",
          "Principal": {
            "AWS": "*"
          },
          "Action": "SNS:Publish",
          "Resource": " arn:aws:sns:us-east-1:AWS Account ID: SNS Topic Name",
          "Condition": {
            "ArnLike": {
              "aws:SourceArn": "arn:aws:lambda:us-east-1:AWSAccountID:function:DataClean"
            }
          }

    The policy allows the Lambda function DataClean to trigger the SNS message Cleaning-is-Done.

    The function DataClean saves the CSV file of the cleaned data into the S3 bucket autoingestionqs. You should see the new CSV file in this bucket. See the following screenshot.

    When the Lambda function ends, it triggers the SNS message Cleaning-is-Done. The following screenshot shows the text of the notification message.

  6. Add an event notification into the S3 bucket autoingestionqs to trigger a Lambda function named qsAutoIngestion.This function calls the Amazon QuickSight data API to ingest data into the SPICE dataset.The cleaned CSV file in the S3 bucket autoingestionqs is mapped as an Athena table. The following screenshot shows the sample data in the CSV file.

    In the Amazon QuickSight SPICE dataset, the Athena table joins with the Amazon Redshift table through the cross data source join functionality.

  7. Create the SPICE dataset. For more information, see Joining across data sources on Amazon QuickSight.The following screenshot shows the Data page in Amazon QuickSight where your data set details appear. The Athena table joins a Redshift table.The new object creation event in the Amazon S3 bucket autoingestionqs triggers another Lambda function named qsAutoIngestion. This function calls the data ingestion API of Amazon QuickSight and checks the data ingestion status. If the data ingestion is completed successfully, end-users receive the SNS message Ingestion-Finished. You can download the sample code of qsAutoIngestion from the GitHub repo.

Cleaning up

To avoid incurring future charges, delete the resources you created: the two Lambda functions, three SNS topics, two S3 buckets, and corresponding IAM policies.

Conclusion

This post discussed how BI developers and architects can use data API, Lambda functions, and other AWS services to complete an end-to-end automation process. End-users can have their real-time data ingested and joined with OLAP data warehouse tables and visualize their data in a timely fashion without the need to wait for nightly or hourly ETL or the need to understand the complex technical development steps. You should now be fully equipped to construct a solution in a development environment and demo it to non-technical business end-users.

 


About the Author

Ying Wang is a Data Visualization Engineer with the Data & Analytics Global Specialty Practice in AWS Professional Services.

Build machine learning-powered business intelligence analyses using Amazon QuickSight

Post Syndicated from Osemeke Isibor original https://aws.amazon.com/blogs/big-data/build-machine-learning-powered-business-intelligence-analyses-using-amazon-quicksight/

Imagine you can see the future—to know how many customers will order your product months ahead of time so you can make adequate provisions, or to know how many of your employees will leave your organization several months in advance so you can take preemptive actions to encourage staff retention. For an organization that sees the future, the possibilities are limitless. Machine learning (ML) makes it possible to predict the future with a higher degree of accuracy.

Amazon SageMaker provides every developer and data scientist the ability to build, train, and deploy ML models quickly, but for business users who usually work on creating business intelligence dashboards and reports rather than ML models, Amazon QuickSight is the service of choice. With Amazon QuickSight, you can still use ML to forecast the future. This post goes through how to create business intelligence analyses that use ML to forecast future data points and detect anomalies in data, with no technical expertise or ML experience needed.

Overview of solution

Amazon QuickSight ML Insights uses AWS-proven ML and natural language capabilities to help you gain deeper insights from your data. These powerful, out-of-the-box features make it easy to discover hidden trends and outliers, identify key business drivers, and perform powerful what-if analysis and forecasting with no technical or ML experience. You can use ML insights in sales reporting, web analytics, financial planning, and more. You can detect insights buried in aggregates, perform interactive what-if analysis, and discover what activities you need to meet business goals.

This post imports data from Amazon S3 into Amazon QuickSight and creates ML-powered analyses with the imported data. The following diagram illustrates this architecture.

Walkthrough

In this walkthrough, you create an Amazon QuickSight analysis that contains ML-powered visuals that forecast the future demand for taxis in New York City. You also generate ML-powered insights to detect anomalies in your data. This post uses the New York City Taxi and Limousine Commission (TLC) Trip Record Data on the Registry of Open Data on AWS.

The walkthrough includes the following steps:

  1. Set up and import data into Amazon QuickSight
  2. Create an ML-powered visual to forecast the future demand for taxis
  3. Generate an ML-powered insight to detect anomalies in the data set

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account
  • Amazon Quicksight Enterprise edition
  • Basic knowledge of AWS

Setting up and importing data into Amazon QuickSight

Set up Amazon Quicksight as an individual user. Complete the following steps:

  1. On the AWS Management Console, in the Region list, select US East (N. Virginia) or any Region of your choice that Amazon QuickSight
  2. Under Analytics, for Services, choose Amazon QuickSight.If you already have an existing Amazon QuickSight account, make sure it is the Enterprise edition; if it is not, upgrade to Enterprise edition. For more information, see Upgrading your Amazon QuickSight Subscription from Standard Edition to Enterprise Edition.If you do not have an existing Amazon QuickSight account, proceed with the setup and make sure you choose Enterprise Edition when setting up the account. For more information, see Setup a Free Standalone User Account in Amazon QuickSight.After you complete the setup, a Welcome Wizard screen appears.
  1. Choose Next on each of the Welcome Wizard screens.
  2. Choose Get Started.

Before you import the data set, make sure that you have at least 3GB of SPICE capacity. For more information, see Managing SPICE Capacity.

Importing the NYC Taxi data set into Amazon QuickSight

The NYC Taxi data set is in an S3 bucket. To import S3 data into Amazon QuickSight, use a manifest file. For more information, see Supported Formats for Amazon S3 Manifest Files. To import your data, complete the following steps:

  1. In a new text file, copy and paste the following code:
    "fileLocations": [
            {
                "URIs": [
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-01.csv",
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-02.csv",
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-03.csv",
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-04.csv",
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-05.csv",
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-06.csv",
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-07.csv",
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-08.csv", 
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-09.csv",
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-10.csv",
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-11.csv", 
              "https://nyc-tlc.s3.amazonaws.com/trip data/green_tripdata_2018-12.csv"
    
                ]
            }
        ],
        "globalUploadSettings": {
            "textqualifier": "\""
        }
    }

  2. Save the text file as nyc-taxi.json.
  3. On the Amazon QuickSight console, choose New analysis.
  4. Choose New data set.
  5. For data source, choose S3.
  6. Under New S3 data source, for Data source name, enter a name of your choice.
  7. For Upload a manifest file field, select Upload.
  8. Choose the nyc-taxi.json file you created earlier.
  9. Choose Connect.
    The S3 bucket this post uses is a public bucket that contains a public data set and open to the public. When using S3 buckets in your account with Amazon QuickSight, it is highly recommended that the buckets are not open to the public; you need to configure authentication to access your S3 bucket from Amazon QuickSight. For more information about troubleshooting, see I Can’t Connect to Amazon S3.After you choose Connect, the Finish data set creation screen appears.
  10. Choose Visualize.
  11. Wait for the import to complete.

You can see the progress on the top right corner of the screen. When the import is complete, the result shows the number of rows imported successfully and the number of rows skipped.

Creating an ML-powered visual

After you import the data set into Amazon QuickSight SPICE, you can start creating analyses and visuals. Your goal is to create an ML-powered visual to forecast the future demand for taxis. For more information, see Forecasting and Creating What-If Scenarios with Amazon Quicksight.

To create your visual, complete the following steps:

  1. From the Data source details pop screen, choose Visualize.
  2. From the field list, select Lpep_pickup_datetime.
  3. Under Visual types, select the first visual.Amazon QuickSight automatically uses the best visual based on the number and data type of fields you selected. From your selection, Amazon Quicksight displays a line chart visual.From the preceding graph, you can see that the bulk of your data clusters are around December 31, 2017, to January 1, 2019, for the Lpep_pickup_datetime field. There are a few data points with date ranges up to June 2080. These values are incorrect and can impact your ML forecasts.To clean up your data set, filter out the incorrect data using the data in the Lpep_pickup_datetime. This post only uses data in which Lpep_pickup_datetime falls between January 1, 2018, and December 18, 2018, because there is a more consistent amount of data within this date range.
  4. Use the filter menu to create a filter using the Lpep_pickup_datetime
  5. Under Filter type, choose Time range and Between.
  6. For Start date, enter 2018-01-01 00:00.
  7. Select Include start date.
  8. For End date, enter 2018-12-18 00:00.
  9. Select Include end date.
  10. Choose Apply.

The line chart should now contain only data with Lpep_pickup_datetime from January 1, 2018, and December 18, 2018. You can now add the forecast for the next 31 days to the line chart visual.

Adding the forecast to your visual

To add your forecast, complete the following steps:

  1. On the visual, choose the arrow.
  2. From the drop-down menu, choose Add forecast.
  3. Under Forecast properties, for Forecast length, for Periods forward, enter 31.
  4. For Periods backwards, enter 0.
  5. For Prediction interval, leave at the default value 90.
  6. For Seasonality, leave at the default selection Automatic.
  7. Choose Apply.

You now see an orange line on your graph, which is the forecasted pickup quantity per day for the next 31 days after December 18, 2018. You can explore the different dates by hovering your cursor over different points on the forecasted pickup line. For example, hovering your cursor over January 10, 2019, shows that the expected forecasted number of pickups for that day is approximately 22,000. The forecast also provides an upper bound (maximum number of pickups forecasted) of about 26,000 and a lower bound (minimum number of pickups forecasted) of about 18,000.

You can create multiple visuals with forecasts and combine them into a sharable Amazon QuickSight dashboard. For more information, see Working with Dashboards.

Generating an ML-powered insight to detect anomalies

In Amazon QuickSight, you can add insights, autonarratives, and ML-powered anomaly detection to your analyses without ML expertise or knowledge. Amazon QuickSight generates suggested insights and autonarratives automatically, but for ML-powered anomaly detection, you need to perform additional steps. For more information, see Using ML-Powered Anomaly Detection.

This post checks if there are any anomalies in the total fare amount over time from select locations. For example, if the total fare charged for taxi rides is about $1,000 and above from the first pickup location (for example, the airport) for most dates in your data set, an anomaly is when the total fare charged deviates from the standard pattern. Anomalies are not necessarily negative, but rather abnormalities that you can choose to investigate further.

To create an anomaly insight, complete the following steps:

  1. From the top right corner of the analysis creation screen, click the Add drop-down menu and choose Add insight.
  2. On the Computation screen, for Computation type, select Anomaly detection.
  3. Under Fields list, choose the following fields:
  • fare_amount
  • lpep_pickup_datetime
  • PULocationID
  1. Choose Get started.
  2. The Configure anomaly detection
  3. Choose Analyze all combinations of these categories.
  4. Leave the other settings as their default.You can now perform a contribution analysis and discover how the drop-off location contributed to the anomalies. For more information, see Viewing Top Contributors.
  5. Under Contribution analysis, choose DOLocationID.
  6. Choose Save.
  7. Choose Run Now.The anomaly detection can take up to 10 minutes to complete. If it is still running after about 10 minutes, your browser may have timed out. Refresh your browser and you should see the anomalies displayed in the visual.
  8. Choose Explore anomalies.

By default, the anomalies you see are for the last date in your data set. You can explore the anomalies across the entire date range of your data set by choosing SHOW ANOMALIES BY DATE and dragging the slider at the bottom of the visual to display the entire date range from January 1, 2018, to December 30, 2018.

This graph shows that March 21, 2018, has the highest number of anomalies of the fare charged in the entire data set. For example, the total fare amount charged by taxis that picked up passengers from location 74 on March 21, 2018, was 7,181. This is -64% (about 19,728.5) of the total fare charged by taxis for the same pickup location on March 20, 2018. When you explore the anomalies of other pickup locations for that same date, you can see that they all have similar drops in the total fare charged. You can also see the top DOLocationID contributors to these anomalies.

What happened in New York City on March 21, 2018, to cause this drop? A quick online search reveals that New York City experienced a severe weather condition on March 21, 2018.

Publishing your analyses to a dashboard, sharing the dashboard, and setting up email alerts

You can create additional visuals to your analyses, publish the analyses as a dashboard, and share the dashboard with other users. QuickSight anomaly detection allows you to uncover hidden insights in your data by continuously analyzing billions of data points. You can subscribe to receive alerts to your inbox if an anomaly occurs in your business metrics. The email alert also indicates the factors that contribute to these anomalies. This allows you to act immediately on the business metrics that need attention.

From the QuickSight dashboard, you can configure an anomaly alert to be sent to your email with Severity set to High and above and Direction set to Lower than expected. Also make sure to schedule a data refresh so that the anomaly detection runs on your most recent data. For more information, see Refreshing Data.

Cleaning up

To avoid incurring future charges, you need to cancel your Amazon Quicksight subscription.

Conclusion

This post walked you through how to use ML-powered insights with Amazon QuickSight to forecast future data points, detect anomalies, and derive valuable insights from your data without needing prior experience or knowledge of ML. If you want to do more forecasting without ML experience, check out Amazon Forecast.

If you have questions or suggestions, please leave a comment.

 


About the Author

Osemeke Isibor is a partner solutions architect at AWS. He works with AWS Partner Network (APN) partners to design secure, highly available, scalable and cost optimized solutions on AWS. He is a science-fiction enthusiast and a fan of anime.

 

 

 

ICYMI: Serverless Q4 2019

Post Syndicated from Rob Sutter original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2019/

Welcome to the eighth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

The three months comprising the fourth quarter of 2019

AWS re:Invent

AWS re:Invent 2019

re:Invent 2019 dominated the fourth quarter at AWS. The serverless team presented a number of talks, workshops, and builder sessions to help customers increase their skills and deliver value more rapidly to their own customers.

Serverless talks from re:Invent 2019

Chris Munns presenting 'Building microservices with AWS Lambda' at re:Invent 2019

We presented dozens of sessions showing how customers can improve their architecture and agility with serverless. Here are some of the most popular.

Videos

Decks

You can also find decks for many of the serverless presentations and other re:Invent presentations on our AWS Events Content.

AWS Lambda

For developers needing greater control over performance of their serverless applications at any scale, AWS Lambda announced Provisioned Concurrency at re:Invent. This feature enables Lambda functions to execute with consistent start-up latency making them ideal for building latency sensitive applications.

As shown in the below graph, provisioned concurrency reduces tail latency, directly impacting response times and providing a more responsive end user experience.

Graph showing performance enhancements with AWS Lambda Provisioned Concurrency

Lambda rolled out enhanced VPC networking to 14 additional Regions around the world. This change brings dramatic improvements to startup performance for Lambda functions running in VPCs due to more efficient usage of elastic network interfaces.

Illustration of AWS Lambda VPC to VPC NAT

New VPC to VPC NAT for Lambda functions

Lambda now supports three additional runtimes: Node.js 12, Java 11, and Python 3.8. Each of these new runtimes has new version-specific features and benefits, which are covered in the linked release posts. Like the Node.js 10 runtime, these new runtimes are all based on an Amazon Linux 2 execution environment.

Lambda released a number of controls for both stream and async-based invocations:

  • You can now configure error handling for Lambda functions consuming events from Amazon Kinesis Data Streams or Amazon DynamoDB Streams. It’s now possible to limit the retry count, limit the age of records being retried, configure a failure destination, or split a batch to isolate a problem record. These capabilities help you deal with potential “poison pill” records that would previously cause streams to pause in processing.
  • For asynchronous Lambda invocations, you can now set the maximum event age and retry attempts on the event. If either configured condition is met, the event can be routed to a dead letter queue (DLQ), Lambda destination, or it can be discarded.

AWS Lambda Destinations is a new feature that allows developers to designate an asynchronous target for Lambda function invocation results. You can set separate destinations for success and failure. This unlocks new patterns for distributed event-based applications and can replace custom code previously used to manage routing results.

Illustration depicting AWS Lambda Destinations with success and failure configurations

Lambda Destinations

Lambda also now supports setting a Parallelization Factor, which allows you to set multiple Lambda invocations per shard for Kinesis Data Streams and DynamoDB Streams. This enables faster processing without the need to increase your shard count, while still guaranteeing the order of records processed.

Illustration of multiple AWS Lambda invocations per Kinesis Data Streams shard

Lambda Parallelization Factor diagram

Lambda introduced Amazon SQS FIFO queues as an event source. “First in, first out” (FIFO) queues guarantee the order of record processing, unlike standard queues. FIFO queues support messaging batching via a MessageGroupID attribute that supports parallel Lambda consumers of a single FIFO queue, enabling high throughput of record processing by Lambda.

Lambda now supports Environment Variables in the AWS China (Beijing) Region and the AWS China (Ningxia) Region.

You can now view percentile statistics for the duration metric of your Lambda functions. Percentile statistics show the relative standing of a value in a dataset, and are useful when applied to metrics that exhibit large variances. They can help you understand the distribution of a metric, discover outliers, and find hard-to-spot situations that affect customer experience for a subset of your users.

Amazon API Gateway

Screen capture of creating an Amazon API Gateway HTTP API in the AWS Management Console

Amazon API Gateway announced the preview of HTTP APIs. In addition to significant performance improvements, most customers see an average cost savings of 70% when compared with API Gateway REST APIs. With HTTP APIs, you can create an API in four simple steps. Once the API is created, additional configuration for CORS and JWT authorizers can be added.

AWS SAM CLI

Screen capture of the new 'sam deploy' process in a terminal window

The AWS SAM CLI team simplified the bucket management and deployment process in the SAM CLI. You no longer need to manage a bucket for deployment artifacts – SAM CLI handles this for you. The deployment process has also been streamlined from multiple flagged commands to a single command, sam deploy.

AWS Step Functions

One powerful feature of AWS Step Functions is its ability to integrate directly with AWS services without you needing to write complicated application code. In Q4, Step Functions expanded its integration with Amazon SageMaker to simplify machine learning workflows. Step Functions also added a new integration with Amazon EMR, making EMR big data processing workflows faster to build and easier to monitor.

Screen capture of an AWS Step Functions step with Amazon EMR

Step Functions step with EMR

Step Functions now provides the ability to track state transition usage by integrating with AWS Budgets, allowing you to monitor trends and react to usage on your AWS account.

You can now view CloudWatch Metrics for Step Functions at a one-minute frequency. This makes it easier to set up detailed monitoring for your workflows. You can use one-minute metrics to set up CloudWatch Alarms based on your Step Functions API usage, Lambda functions, service integrations, and execution details.

Step Functions now supports higher throughput workflows, making it easier to coordinate applications with high event rates. This increases the limits to 1,500 state transitions per second and a default start rate of 300 state machine executions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). Click the above link to learn more about the limit increases in other Regions.

Screen capture of choosing Express Workflows in the AWS Management Console

Step Functions released AWS Step Functions Express Workflows. With the ability to support event rates greater than 100,000 per second, this feature is designed for high-performance workloads at a reduced cost.

Amazon EventBridge

Illustration of the Amazon EventBridge schema registry and discovery service

Amazon EventBridge announced the preview of the Amazon EventBridge schema registry and discovery service. This service allows developers to automate discovery and cataloging event schemas for use in their applications. Additionally, once a schema is stored in the registry, you can generate and download a code binding that represents the schema as an object in your code.

Amazon SNS

Amazon SNS now supports the use of dead letter queues (DLQ) to help capture unhandled events. By enabling a DLQ, you can catch events that are not processed and re-submit them or analyze to locate processing issues.

Amazon CloudWatch

Amazon CloudWatch announced Amazon CloudWatch ServiceLens to provide a “single pane of glass” to observe health, performance, and availability of your application.

Screenshot of Amazon CloudWatch ServiceLens in the AWS Management Console

CloudWatch ServiceLens

CloudWatch also announced a preview of a capability called Synthetics. CloudWatch Synthetics allows you to test your application endpoints and URLs using configurable scripts that mimic what a real customer would do. This enables the outside-in view of your customers’ experiences, and your service’s availability from their point of view.

CloudWatch introduced Embedded Metric Format, which helps you ingest complex high-cardinality application data as logs and easily generate actionable metrics. You can publish these metrics from your Lambda function by using the PutLogEvents API or using an open source library for Node.js or Python applications.

Finally, CloudWatch announced a preview of Contributor Insights, a capability to identify who or what is impacting your system or application performance by identifying outliers or patterns in log data.

AWS X-Ray

AWS X-Ray announced trace maps, which enable you to map the end-to-end path of a single request. Identifiers show issues and how they affect other services in the request’s path. These can help you to identify and isolate service points that are causing degradation or failures.

X-Ray also announced support for Amazon CloudWatch Synthetics, currently in preview. CloudWatch Synthetics on X-Ray support tracing canary scripts throughout the application, providing metrics on performance or application issues.

Screen capture of AWS X-Ray Service map in the AWS Management Console

X-Ray Service map with CloudWatch Synthetics

Amazon DynamoDB

Amazon DynamoDB announced support for customer-managed customer master keys (CMKs) to encrypt data in DynamoDB. This allows customers to bring your own key (BYOK) giving you full control over how you encrypt and manage the security of your DynamoDB data.

It is now possible to add global replicas to existing DynamoDB tables to provide enhanced availability across the globe.

Another new DynamoDB capability to identify frequently accessed keys and database traffic trends is currently in preview. With this, you can now more easily identify “hot keys” and understand usage of your DynamoDB tables.

Screen capture of Amazon CloudWatch Contributor Insights for DynamoDB in the AWS Management Console

CloudWatch Contributor Insights for DynamoDB

DynamoDB also released adaptive capacity. Adaptive capacity helps you handle imbalanced workloads by automatically isolating frequently accessed items and shifting data across partitions to rebalance them. This helps reduce cost by enabling you to provision throughput for a more balanced workload instead of over provisioning for uneven data access patterns.

Amazon RDS

Amazon Relational Database Services (RDS) announced a preview of Amazon RDS Proxy to help developers manage RDS connection strings for serverless applications.

Illustration of Amazon RDS Proxy

The RDS Proxy maintains a pool of established connections to your RDS database instances. This pool enables you to support a large number of application connections so your application can scale without compromising performance. It also increases security by enabling IAM authentication for database access and enabling you to centrally manage database credentials using AWS Secrets Manager.

AWS Serverless Application Repository

The AWS Serverless Application Repository (SAR) now offers Verified Author badges. These badges enable consumers to quickly and reliably know who you are. The badge appears next to your name in the SAR and links to your GitHub profile.

Screen capture of SAR Verifiedl developer badge in the AWS Management Console

SAR Verified developer badges

AWS Developer Tools

AWS CodeCommit launched the ability for you to enforce rule workflows for pull requests, making it easier to ensure that code has pass through specific rule requirements. You can now create an approval rule specifically for a pull request, or create approval rule templates to be applied to all future pull requests in a repository.

AWS CodeBuild added beta support for test reporting. With test reporting, you can now view the detailed results, trends, and history for tests executed on CodeBuild for any framework that supports the JUnit XML or Cucumber JSON test format.

Screen capture of AWS CodeBuild

CodeBuild test trends in the AWS Management Console

Amazon CodeGuru

AWS announced a preview of Amazon CodeGuru at re:Invent 2019. CodeGuru is a machine learning based service that makes code reviews more effective and aids developers in writing code that is more secure, performant, and consistent.

AWS Amplify and AWS AppSync

AWS Amplify added iOS and Android as supported platforms. Now developers can build iOS and Android applications using the Amplify Framework with the same category-based programming model that they use for JavaScript apps.

Screen capture of 'amplify init' for an iOS application in a terminal window

The Amplify team has also improved offline data access and synchronization by announcing Amplify DataStore. Developers can now create applications that allow users to continue to access and modify data, without an internet connection. Upon connection, the data synchronizes transparently with the cloud.

For a summary of Amplify and AppSync announcements before re:Invent, read: “A round up of the recent pre-re:Invent 2019 AWS Amplify Launches”.

Illustration of AWS AppSync integrations with other AWS services

Q4 serverless content

Blog posts

October

November

December

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page.

Here are the ones from Q4:

Twitch

October

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

AWS Serverless Heroes

We are excited to welcome some new AWS Serverless Heroes to help grow the serverless community. We look forward to some amazing content to help you with your serverless journey.

AWS Serverless Application Repository (SAR) Apps

In this edition of ICYMI, we are introducing a section devoted to SAR apps written by the AWS Serverless Developer Advocacy team. You can run these applications and review their source code to learn more about serverless and to see examples of suggested practices.

Still looking for more?

The Serverless landing page has much more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. We’re also kicking off a fresh series of Tech Talks in 2020 with new content providing greater detail on everything new coming out of AWS for serverless application developers.

Throughout 2020, the AWS Serverless Developer Advocates are crossing the globe to tell you more about serverless, and to hear more about what you need. Follow this blog to keep up on new launches and announcements, best practices, and examples of serverless applications in action.

You can also follow all of us on Twitter to see latest news, follow conversations, and interact with the team.

Chris Munns: @chrismunns
Eric Johnson: @edjgeek
James Beswick: @jbesw
Moheeb Zara: @virgilvox
Ben Smith: @benjamin_l_s
Rob Sutter: @rts_rob
Julian Wood: @julian_wood

Happy coding!

Amazon QuickSight: 2019 in review

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/amazon-quicksight-2019-in-review/

2019 has been an exciting year for Amazon QuickSight. We onboarded thousands of customers, expanded our global presence to 10 AWS Regions, and launched over 60 features—more than a feature a week! We are inspired by you—our customers, and all that you do with Amazon QuickSight. We are thankful for the time you spend with us across in-person meetings, conference calls, emails, discussion forums, and AWS summits. As we close this year, here’s a quick summary of the highlights.

re:Invent 2019

The Amazon QuickSight team was at re:Invent alongside customers such as Best Western, Capital One, and Club OS, who spoke about their experiences implementing and using Amazon QuickSight for their analytics needs. We also ran two hands-on workshops using the newly launched APIs.

ANT324: Deploy business analytics at enterprise scale with Amazon QuickSight

This session discussed how enterprises are rolling out Amazon QuickSight Enterprise Edition for all their users and using features such as Active Directory or Federated SSO (SAML/OpenID Connect) authentication, private connectivity to data in AWS, email reports, and embedded dashboards.

Best Western rolled out Amazon QuickSight for business reporting across tens of thousands of users, using it in conjunction with Amazon Redshift and AWS Glue. While their previous legacy reporting system took 18 months and over 7,000 staff hours to upgrade server software versions, they now benefit from Amazon QuickSight’s modern reporting infrastructure and constant feature updates, and do not need to manage any servers to report on this data.

Capital One has made Amazon QuickSight available internally to their BI community of over 25,000 users. Key benefits that the Capital One team sees from QuickSight include the ability to roll out new BI use cases without server setup or capacity planning, integrated machine learning (ML) capabilities, and the absence of a traditional software update cycle. All this is with PrivateLink connectivity to data in AWS (Snowflake, Presto, Amazon Redshift, and Amazon RDS), and pay-per-session pricing for consumption with a max charge of $5 per month, per reader.

ANT217: Embedding analytics into applications with Amazon QuickSight

This session dove into the latest features that allow you to integrate Amazon QuickSight dashboards into your software development lifecycle, including new APIs, theming capabilities, and dashboard embedding. The full session recording is available on YouTube.

As an Independent Software Vendor (ISV), Club OS uses Amazon QuickSight’s embedded capabilities to add analytics to their cloud-based Customer Relationship Management for fitness and wellness businesses. Amazon QuickSight’s serverless architecture lets Club OS focus on meeting actual customer needs without being burdened by operations and infrastructure management. This architecture also provides fast, consistent performance for end-users; the analytics dashboards are rolled out to over 40,000 users across 3,500 health and fitness locations. For more information, see the section of the talk on YouTube by Nicholas Hahn, VP of Product Club OS.

As an enterprise with tens of thousands of internal users, Capital One uses Amazon QuickSight’s embedded capabilities to add analytics to internal applications and portals. With no servers to set up, teams can set up embedded dashboards in their portals in as quickly as a week, whether they want to serve this to just a few users or thousands. Traditional BI approaches to such portals would require server setup and maintenance, and expensive licensing contracts. For more information, see the section of the talk on YouTube by Latha Govada, Senior Manager Analytics at Capital One.

ANT302 – Enhancing your applications with Amazon QuickSight dashboards

This hands-on workshop session guided users through setting up an Amazon QuickSight Enterprise Edition account, connecting to data, creating dashboards, and using APIs to add users, move assets across development and staging versions, and embedding dashboards in a web portal. For more information, see Dashboard Embedding & Operationalizing with Quicksight APIs. Try it out for yourself!

Embedding, theming, and APIs

We followed up on our launch of dashboard embedding for federated AWS identities at re:Invent 2018 (for more information about how the NFL uses embedded Amazon QuickSight dashboards for hundreds of stakeholders of Next Gen Stats, see the session video on YouTube). In 2019, we launched the following features:

Machine learning integrations

We added the following native ML integrations in Amazon QuickSight to help you make the most of your data in AWS and QuickSight:

Visualizations and interactivity

We added the following support for new chart types, conditional formatting, QuickSight Actions, sheets in dashboards, and more:

Calculations and aggregations

We added the following new aggregation options, new functions, and more:

Data connectivity and security

We increased SPICE limits, provided finer control over data sources in QuickSight, enabled data source sharing, and more:

  • SPICE datasets now accommodate up to 100 million rows of data in Enterprise Edition and 25 million rows for Standard Edition. For more information, see Data Source Limits.
  • Fine-grained access control over S3 and Athena allows you to scope down access to these data sources to specific users or groups in Amazon QuickSight using IAM.
  • Cross data source joins allow business analysts to perform joins across supported data sources without relying on data engineering teams to set up complex ETL processes.
  • Shared data sources centralize credential management of data sources. This also allows shared ownership of data connections, allowing collaboration over SQL scripts that define custom SQL datasets.
  • Connecting to Presto data sources in AWS without any public routing of data similar to Amazon Redshift, Snowflake, RDS, and others.
  • Amazon QuickSight data sources connect to Amazon Athena, which you can tag against specific workgroups. This allows cost allocation of Athena queries by workgroup.

Mobile

We launched the new QuickSight Mobile app for iOS and Android, which allows you to access your dashboards and explore your data with drill-downs and filters. For more information, see New QuickSignt Mobile App on YouTube.

Geographic availability

We added the following new Regions and language support in Amazon QuickSight:

  • Amazon QuickSight is now available in 10 AWS regions: US East (N. Virginia and Ohio), US West (Oregon), EU (Frankfurt, Ireland, and London), and Asia Pacific (Seoul, Singapore, Sydney, and Tokyo).
  • The Amazon QuickSight interface is now available in 10 languages: English, German, Spanish, French, Portuguese, Italian, Japanese Korean, Simplified Chinese, and Traditional Chinese.

With all of our updates you can build dashboards like the ones below and integrate them into your applications and portals.

Looking ahead

To see the full list of 2019 launches, see What’s New in Amazon QuickSight or subscribe to the Amazon QuickSight YouTube channel for the latest training and feature walkthroughs. We have a packed roadmap for 2020, and continue to focus on enabling you with insights from all your data, sharing these with all your users, while not having to worry about operations and servers. Thank you for your support.

We wish you all the very best in the new year (and decade)!


About the authors

Jose Kunnackal John is a principal product manager for Amazon QuickSight.

 

 

 

 

Sahitya Pandiri is a senior technical program manager with Amazon Web Services. Sahitya has been in product/program management for 6 years now, and has built multiple products in the retail, healthcare, and analytics space.

 

 

 

Highlight Critical Insights with Conditional Formatting in Amazon QuickSight

Post Syndicated from Susan Fang original https://aws.amazon.com/blogs/big-data/highlight-critical-insights-with-conditional-formatting-in-amazon-quicksight/

Amazon QuickSight now makes it easier for you to spot the highlights or low-lights in data through conditional formatting. With conditional formatting, you can specify customized text or background colors based on field values in the dataset, using solid or gradient colors. You can also display data values with the supported icons. Using color coding and icons, you can visually explore and analyze data, quickly detect issues, and identify trends, as seen in the following screenshot. These dynamic visual cues also help your dashboard viewers cut through the densely populated data values and get to the attention-worthy insights much faster.

With this release, you can apply conditional formatting to tables, pivot tables and KPI charts. This blog post walks you through how to leverage conditional formatting in each supported chart type in Amazon QuickSight.

Tables

You can apply conditional formatting to tables through the following methods.

Conditionally formatting columns based on data fields

When applying conditional formatting to tables, there are two ways to access the configuration pane. One option is to select Table in an analysis, and choose the caret in the upper right corner of the Table visual. Select Conditional formatting from the dropdown menu, as shown in the following screenshots, and the Conditional formatting configuration pane pops up to the left of the analysis canvas. Choose the + sign to select the target column to which you’d like to apply the formatting; it can be any column present in the table.

For example, select Sales from the dropdown menu to add a condition for the Sales target column.

Alternatively, from the selected table, you can expand the field well on the top of the analysis canvas and choose the Group by or Value target column directly from the columns used in the table. For example, select Sales from the Value well and select Conditional formatting from the dropdown, as seen in the following screenshots. The conditional formatting confirmation pane pops up to the left of the analysis canvas, with a condition already added for the Sales target column.

Next, choose the formatting style from the three available options.

  • Add background color
  • Add text color
  • Add icon

For each target column, you can add any formatting style or any combination of the styles.

Background and text color

Choose Add background color to expand the configuration pane for New background color. In the Sales target column are two Fill type options, Solid or Gradient, with Solid as the default.

You are prompted to decide the base column of the conditions, which can be any field in the dataset, regardless of whether it is present in the Table or not.

Next, define the Aggregation of the base column used in the conditions, which can be any aggregation currently supported in QuickSight, such as sum, average, min, max, or standard deviation. Then you can create the conditions that drive the color coding of the cell background.

This example highlights cells based on the following rules:

  • Cells with sum of sales greater than $100, green
  • Cells with sum of between $10 and $100 (inclusive), yellow
  • Cells with sum of sales less than $10, red

Since the conditions are discrete, apply solid color based on Sales in the Sum aggregation.

In Conditional #1:

  1. Choose the Greater than comparison method.
  2. Enter 100 as the threshold Value to be 100
  3. Use the color picker to pick green.
  4. Choose Add condition and configure the other two conditions accordingly.
  5. Choose Apply, and the Sales column is conditionally formatted, showing green, yellow and red colors when conditions are met.

If you’d like to re-order the conditions, choose and move the chosen condition up or down. You can also delete the condition. To reset all of the conditions, choose Clear to clear the applied formatting from the visual and remove the conditions from the configuration pane.

The following screenshots demonstrate the configuration step by step:

If you are happy with the highlighting rules, choose Close to return to the summary page.

You see a brief summary of the conditions applied to background color of the Sales column. You can Edit or Delete the applied conditions, Add other formatting styles such as text color or icon to the same Sales column, Delete the target column altogether, or Add another target column up top.

Icons

This example adds icons to the same Sales column based on data in the Profits column so that a thumbs-up icon shows when profits are greater than 0.

  1. Start with Add icon. In the New icon configuration page, choose the base column and Aggregation
  2. Under Icon set, QuickSight offers a list of canned color icon sets for quick application.
  3. For icons beyond this list, collapse the dropdown and check the Custom conditions Choose the pre-populated + icon for more icon selections and the option to Use custom Unicode icon. This example uses the thumbs-up and sets the color as orange.
  4. Choose Apply and to see icons show up in the Sales column accordingly.

The following screenshots show the configuration step by step:

Please note that you have the option to Show icon only. Once checked, the text values of the target column are hidden, showing icons only.

Conditionally formatting entire rows based on calculated fields

Next, here is a slightly more complex example that involves calculated fields. This example highlights rows in which Ship charges exceeds 20% of the total sales amount. This condition needs to be based on a flag that differentiates orders with high shipping charges from the rest.

Create a calculated field called high_shipping_flag using this function:

ifelse(sum({Ship charges}) > 0.2 * sum(Sales),1,0)

As a result, orders with shipping charges over 20% of sales amount take a value of 1, otherwise, they take a value of 0.

With the base field ready, proceed with conditional formatting configuration. Since the target property for formatting is the entire row instead of a specific column, access the configuration pane from the visual dropdown menu instead of a particular column from the field well.

Scroll down to the bottom of the dropdown list and select the option to conditionally format the [Entire row] in addition to individual columns.

Note the options to Add background color or Add text color, without the Add icon option, because it is not applicable to entire rows. Also note the Apply on top checkbox, which allows the conditional formatting for the entire row to paint over conditional formatting applied to individual columns. For this example, color the entire row teal if the shipping charge flag is 1.

The following screenshots demonstrate the configuration step by step:

Pivot tables

Conditional formatting is enabled for pivot tables as well, and the configuration workflow is very similar to that of tables, with the following differences:

  • Conditional formatting can be applied to measures only, or fields in the Values well, but not the other data types.
  • Conditional formatting in solid colors can be applied to various levels of your choice: values, subtotals, and totals.
  • Conditional formatting can only be applied to individual columns, not entire rows.

Applying gradient colors

Next, explore the conditional formatting functionalities unique to pivot tables. As mentioned above, only the two measures, or numeric fields, in the Values well can be the target fields for conditional formatting.

The following screenshot shows a conditionally formatted pivot table:

As with tables, you can access Conditional formatting configuration pane from the dropdown of either the visual or the field itself in the Values well. Note that only the two measures show up in the dropdown list.

Applying gradient colors to Percent Tested is fairly straightforward. When you select Gradient, you are prompted to define a base field, which can be any field from the dataset as well as Aggregation. Amazon QuickSight offers three pre-set gradient colors for easy application: blue, red-green, and red-yellow-green.

Choose the diverging color option. You have the option to further customize the three colors as well as the corresponding min, mid, and max values by overriding the default, as shown in the following screenshots.

Applying solid colors to column values, subtotals and totals

For Average Score, apply solid color rules to the text.

Add Average Score as the target column for conditional formatting and select Add text color. As mentioned above, only solid colors can be applied to subtotals and totals.

Note the extra text Values next to Condition, which is different from those in Table, right above the comparison method dropdown box. By default, Amazon QuickSight applies the solid color conditions to the most fine-grained values of the target field with the dimension fields at all levels expanded.

This example examines how the average scores at the Borough level compare with each other, which requires applying conditions to both Values and Subtotals. To do so, click on the icon to expose the options. This is a multi-select dropdown list, in which you can check one option or multiple options. After checking the desired boxes, the Values and Subtotals show up next to Condition. You can Add condition after the first you have completed.

Note that the next condition added inherits the level of granularity from the previous condition. In this example, Condition #2 automatically applies to Values and Subtotals, just like in the previous one. You can override from the check boxes of the multi-select dropdown.

The following screenshot shows the configuration and the effect.

For illustration purposes, add icons as well and enable Show subtotals for rows from the visual dropdown menu. With some rows collapsed at the Borough level, the solid color rules are applied to the average score for the collapsed boroughs, along with the school-level average scores on the expanded rows. The solid color rules also show up in the subtotals enabled for rows.

The following screenshots show conditional formatting applied to subtotals:

KPI Chart

This release also makes conditional formatting available for KPI charts so that you can easily tell whether the KPI is over or under the business-critical thresholds, using text color and icons as quick indicators. In cases in which progress bars are shown, you also have the option to conditionally format the foreground color of the progress bar. The configuration workflow is similar to that of a table or pivot table. The difference is that only the primary value field can be conditionally formatted, not the Target nor the Trend group. This example uses a simple KPI chart showing Primary value only and adds an icon if the Average SAT Score is above 1200.

The following screenshots show the configuration and the effect:

Summary

This post illustrated how you can leverage conditional formatting to create richer data visualizations and thus more appealing dashboards in Amazon QuickSight. As of this writing, conditional formatting is enabled for tables, pivot tables, and KPI charts. You can expect to see enhanced capability to dynamically customize data colors based on numeric fields in other chart types as well. Stay tuned.

If you have any questions or feedback, please leave a comment.

 


About the Authors

Susan Fang is a senior product manager for QuickSight with AWS.

 

 

 

 

Evolve your analytics with Amazon QuickSight’s new APIs and theming capabilities

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/evolve-your-analytics-with-amazon-quicksights-new-apis-and-theming-capabilities/

The Amazon QuickSight team is excited to announce the availability of Themes and more APIs! Themes for dashboards let you align the look and feel of Amazon QuickSight dashboards with your application’s branding or corporate themes. The new APIs added allow you to manage your Amazon QuickSight deployments programmatically, with support for dashboards, datasets, data sources, SPICE ingestion, and fine-grained access control over AWS resources. Together, they allow you to creatively tailor Amazon QuickSight to your audiences, whether you are using Amazon QuickSight to provide your users with an embedded analytics experience or for your corporate Business Intelligence (BI) needs. This post provides an overview of these new capabilities and details on getting started.

Theming your dashboards

Amazon QuickSight themes allow you to control the look and feel of your dashboards. Specifically, you will be able to affect the following items through themes, and then share with other authors in the account.

  • Margins
  • Borders around visuals
  • Gutters (spacing between visuals)
  • Data colors (used within the visuals)
  • Background colors (for visuals within the dashboard and the dashboard itself)
  • Foreground colors (for text in the visuals and on the dashboard)
  • Highlight colors on visuals

The following screenshot of the dashboard highlights aspects of the dashboards you can modify via the theming options currently available:

With the available options, you can also make your QuickSight dashboards denser (e.g. no margins or gutters), as shown in the view below that uses QuickSight’s new “Seaside” theme.

You also have a dark dashboard option with QuickSight’s new “Midnight” theme as shown below.

Themes are accessible via the left hand panel in the dashboard authoring interface.

You can create your own theme by starting with one of the built-in themes and customizing it. The interactive theme editing experience makes this easy – you can edit as little or as much as you want, and get to the perfect theme for your dashboards.

Using APIs for dashboards, datasets, data sources, SPICE ingestion, and permissions

The new APIs cover dashboards, datasets, data sources, SPICE ingestion, and fine-grained access control over S3/Athena. We’ve introduced templates in QuickSight with this release: templates allow you to store the visual configuration and data schema associated with a dashboard and then easily replicate the dashboard for different sets of users or different data, within or across accounts.

The new template APIs allow you to create templates from QuickSight analyses. You can then use the dashboard APIs to create a dashboard from the template in the same or different account, version control dashboards, and connect them to different datasets as needed. The dataset APIs allow you to create custom datasets, with filters applied at the dataset level (for example, data for a specific customer only). Dataset APIs can be used to create both SPICE and direct query datasets. SPICE APIs let you refresh datasets programmatically and view SPICE ingestion status and history. Data source APIs programmatically define data source connections. Finally, fine-grained access permissions APIs let you secure access to data in Amazon S3, Amazon Athena, and other AWS resources.

As a developer using Amazon QuickSight’s embedded capabilities, or author using Amazon QuickSight to publish dashboards on different subject areas, the new APIs allow you to automate your workflows and programmatically perform repetitive activities such as rolling out identical versions of dashboards to different teams or customers. QuickSight assets can now be migrated from dev to QA to production via API, overcoming the need to manually access each environment. With versioning, you also have your previous dashboards backed up, providing you flexibility to roll back dashboard deployments if the need arises.

As an Amazon QuickSight administrator, you can use the new APIs to audit and manage access to dashboards, data sources, and datasets across your account. You can also pre-populate dashboards and datasets into individual author accounts for easier and faster onboarding. You no longer need to distribute individual credentials to data sources: you can provision centrally-managed data sources shared with all your users. You can also manage individual author access to S3, Athena, or other AWS resources on-the-fly using IAM policies through the fine-grained access control APIs. The APIs open up a lot of possibilities with Amazon QuickSight – we’ll cover a few scenarios in this blog post and then follow up with additional posts that cover more.

The Amazon QuickSight APIs can be invoked via the AWS Software Development Kit (SDK) or the AWS Command Line Interface (CLI). For our discussion, we will use the AWS CLI. To capture all the details needed for each API, this post uses the generate-cli-skeleton option available in the AWS CLI and walks you through the example of creating a Redshift data source, creating data sets, and dashboards associated with it. This option, when invoked, provides a skeleton file with all the required API details, which you can edit to the correct inputs and use for your API calls. The code below invokes the generate-cli-skeleton option and writes the details to the create-data-source-cli-input.json file; we would then modify this file and use it as the input to the create-data-source API when invoking it.

aws quicksight create-data-source --generate-cli-skeleton > create-data-source-cli-input.json

Let’s say that you want to help Alice, a new hire in your company, create an Amazon QuickSight dashboard from some customer revenue data in Redshift. To set things up for Alice, you want to use the new APIs to connect to Redshift and create a QuickSight data set within her QuickSight user account.

Creating a data source

To connect to your Redshift database, you’ll need to create a data source in QuickSight. First, you describe it with the following configuration, which covers the data source connection details, credentials, and permissions around who should be able to see the data sources.

cat create-data-source-cli-input.json

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DataSourceId": "SampleRedshiftDatasouce",
    "Name": "Sample Redshift Datasouce",
    "Type": "REDSHIFT",
    "DataSourceParameters": {
        "RedshiftParameters": {
            "Host": "redshift-cluster-1.xxxxxxxxxx.us-east-1.redshift.amazonaws.com",
            "Port": 5439,
            "Database": "dev",
            "ClusterId": "redshift-cluster-1"
        }
    },
    "Credentials": {
        "CredentialPair": {
            "Username": "xxxxxxxxx",
            "Password": "xxxxxxxxxxx"
        }
    },
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/alice",
            "Actions": [
                "quicksight:UpdateDataSourcePermissions",
                "quicksight:DescribeDataSource",
                "quicksight:DescribeDataSourcePermissions",
                "quicksight:PassDataSource",
                "quicksight:UpdateDataSource",
                "quicksight:DeleteDataSource"
                
            ]
        }
    ],
    "VpcConnectionProperties": {
        "VpcConnectionArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:vpcConnection/VPC to Redshift"
    },
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-RedShiftDemoDataSource"
        }
    ]
}

You can then invoke create-data-source API with this input to start the creation of your data source as indicated below.

aws quicksight create-data-source --cli-input-json file://./create-data-source-cli-input.json

{
    "Status": 202,
    "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
    "DataSourceId": "SampleRedshiftDatasouce",
    "CreationStatus": "CREATION_IN_PROGRESS",
    "RequestId": "ac9fb8fe-71d8-4005-a7c9-d66d814e224e"
}

You can validate that the data source was successfully created by calling the describe-data-source API as shown below and checking the response’s Status:

aws quicksight describe-data-source --aws-account-id $AAI --data-source-id SampleRedshiftDatasouce

{
    "Status": 200,
    "DataSource": {
        "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
        "DataSourceId": "SampleRedshiftDatasouce",
        "Name": "Sample Redshift Datasouce",
        "Type": "REDSHIFT",
        "Status": "CREATION_SUCCESSFUL",
        "CreatedTime": 1574053515.41,
        "LastUpdatedTime": 1574053516.368,
        "DataSourceParameters": {
            "RedshiftParameters": {
                "Host": "redshift-cluster-1.c6qmmwnqxxxx.us-east-1.redshift.amazonaws.com",
                "Port": 5439,
                "Database": "dev",
                "ClusterId": "redshift-cluster-1"
            }
        },
        "VpcConnectionProperties": {
            "VpcConnectionArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:vpcConnection/VPC to Redshift"
        },
        "SslProperties": {
            "DisableSsl": false
        }
    },
    "RequestId": "57d2d6d6-ebbf-4e8c-82ab-c38935cae8aa"
}

Because you granted permissions to Alice as part of the call to create-data-source, this data source will now be visible to her when she logs in to QuickSight. If you didn’t grant user permissions, this data source is only visible to programmatic users with appropriate permissions.

Creating a dataset

Before Alice can build her dashboard, you’ll need to create a dataset, which identifies the specific data in a data source you want to use. To create a dataset from this data source, you can call the create-data-set API. The create-data-set API allows you to join two tables to create a desired view of data for Alice. Below, PhysicalTableMap points to the tables in your Redshift data source you want to join, and LogicalTableMap defines the join between them:

A sample dataset definition with joins included is shown below.

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DataSetId": "DemoDataSet1",
    "Name": "Demo Data Set 1",
    "PhysicalTableMap": {
        "DemoDataSet1-LineOrder": {
            "RelationalTable": {
                "DataSourceArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
                "Schema": "public",
                "Name": "lineorder",
                "InputColumns": [
                        {
                            "Name": "lo_custkey",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_revenue",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_quantity",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_orderkey",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_shipmode",
                            "Type": "STRING"
                        },
                        {
                            "Name": "lo_orderdate",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_ordertotalprice",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_tax",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "lo_discount",
                            "Type": "INTEGER"
                        }
                ]
	    }
            },
        "DemoDataSet1-Customer": {
            "RelationalTable": {
                "DataSourceArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
                "Schema": "public",
                "Name": "customer",
                "InputColumns": [
                        {
                            "Name": "c_custkey",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "c_phone",
                            "Type": "STRING"
                        },
                        {
                            "Name": "c_address",
                            "Type": "STRING"
                        },
                        {
                            "Name": "c_name",
                            "Type": "STRING"
                        },
                        {
                            "Name": "c_city",
                            "Type": "STRING"
                        }
                ]
            }
        }
    },
    "LogicalTableMap": {
        "DemoDataSet1-LineOrder-Logical": {
            "Alias": "LineOrder",
            "Source": {
                "PhysicalTableId": "DemoDataSet1-LineOrder"
            }
        },
        "DemoDataSet1-Customer-Logical": { 
            "Alias": "Customer",
            "Source": {
                "PhysicalTableId": "DemoDataSet1-Customer"
            }
        },
        "DemoDataSet1-Intermediate": { 
            "Alias": "Intermediate",
            "DataTransforms": [
                {   
                    "ProjectOperation": {
                        "ProjectedColumns": [
                                "lo_revenue",
                                "c_name",
                                "c_city"
                        ]
                    }
                }
            ],
            "Source": {
                "JoinInstruction": {
                    "LeftOperand": "DemoDataSet1-LineOrder-Logical",
                    "RightOperand": "DemoDataSet1-Customer-Logical",
                    "Type": "INNER",
                    "OnClause": "{lo_custkey} = {c_custkey}"
                }
            }
        }

    },
    "ImportMode": "DIRECT_QUERY",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/alice",
            "Actions": [
                "quicksight:UpdateDataSetPermissions",
                "quicksight:DescribeDataSet",
                "quicksight:DescribeDataSetPermissions",
                "quicksight:PassDataSet",
                "quicksight:DescribeIngestion",
                "quicksight:ListIngestions",
                "quicksight:UpdateDataSet",
                "quicksight:DeleteDataSet",
                "quicksight:CreateIngestion",
                "quicksight:CancelIngestion"
            ]
        }
    ],
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-DemoDataSet1"
        }
    ]
}

If you wanted to create a similar dataset with a SQL statement instead of specifying all the columns, call the create-data-set API specifying your SQL statement in PhysicalTableMap as shown below:

aws quicksight create-data-set --generate-cli-skeleton >create-data-set-cli-input-sql.json

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DataSetId": "DemoDataSet2",
    "Name": "Demo Data Set 2",
    "PhysicalTableMap": {
        "DemoDataSet2-CustomerSales": {
            "CustomSql": {
                "DataSourceArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
                "Name": "Customer Sales",
                "SqlQuery":"SELECT lineorder.lo_revenue, customer.c_name, customer.c_city FROM public.lineorder INNER JOIN public.customer ON lineorder.lo_custkey = customer.c_custkey",
                    "Columns": [
                        {
                            "Name": "lo_revenue",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "c_name",
                            "Type": "STRING"
                        },
                        {
                            "Name": "c_city",
                            "Type": "STRING"
                        }
                    ]
 
	    }
        }
    },
    "LogicalTableMap": {
        "DemoDataSet2-CustomerSales-Logical": {
            "Alias": "Customer Sales",
            "Source": {
                "PhysicalTableId": "DemoDataSet2-CustomerSales"
            }
        }

    },
    "ImportMode": "DIRECT_QUERY",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/alice",
            "Actions": [
                "quicksight:UpdateDataSetPermissions",
                "quicksight:DescribeDataSet",
                "quicksight:DescribeDataSetPermissions",
                "quicksight:PassDataSet",
                "quicksight:DescribeIngestion",
                "quicksight:ListIngestions",
                "quicksight:UpdateDataSet",
                "quicksight:DeleteDataSet",
                "quicksight:CreateIngestion",
                "quicksight:CancelIngestion"
            ]
        }
    ],
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-DemoDataSet2"
        }
    ]
}

Because you granted permissions to Alice, these data sets will be visible to her under Your Data Sets when she logs into QuickSight.

This dataset can be used like any other dataset in Amazon QuickSight. Alice can log in to the QuickSight UI and create an analysis that utilizes this dataset, as shown below:

Creating a template

Once Alice creates her dashboard and it gets popular with her team, the next ask might be to provide the same view, but to 100 other teams within the company, and using different datasets for each team. Without APIs, this ask would be a challenge. However, with the APIs, you can now define a template from Alice’s analysis and then create dashboards as needed. A template is created by pointing to a source analysis in QuickSight, which creates references for all datasets used, so that new datasets can referenced when using the template.

The following configuration describes a template created from Alice’s analysis:

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "TemplateId": "DemoDashboardTemplate",
    "Name": "Demo Dashboard Template",
    "SourceEntity": {
        "SourceAnalysis": {
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:analysis/7975f7aa-261c-4e7c-b430-305d71e07a8e",
            "DataSetReferences": [
                {
                    "DataSetPlaceholder": "DS1",
                    "DataSetArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet1"
                }
            ]
        }
    },
    "VersionDescription": "1"
}

Templates are not visible within the QuickSight UI and are a developer/admin managed asset that is only accessible via the APIs at this point. You can save the template configuration by calling the create-template API to start creation of your template:

aws quicksight create-template --cli-input-json file://./create-template-cli-input.json

{
    "Status": 202,
    "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate",
    "VersionArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate/version/1",
    "TemplateId": "DemoDashboardTemplate",
    "CreationStatus": "CREATION_IN_PROGRESS",
    "RequestId": "aa189aee-8ab5-4358-9bba-8f4e32450aef"
}

To confirm the template was created, call the describe-template API as shown below:

aws quicksight describe-template --aws-account-id $AAI --template-id DemoDashboardTemplate

{
    "Status": 200,
    "Template": {
        "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate",
        "Name": "Demo Dashboard Template",
        "Version": {
            "CreatedTime": 1574137165.136,
            "VersionNumber": 1,
            "Status": "CREATION_SUCCESSFUL",
            "DataSetConfigurations": [
                {
                    "Placeholder": "DS1",
                    "DataSetSchema": {
                        "ColumnSchemaList": [
                            {
                                "Name": "c_city",
                                "DataType": "STRING"
                            },
                            {
                                "Name": "lo_revenue",
                                "DataType": "INTEGER"
                            }
                        ]
                    },
                    "ColumnGroupSchemaList": []
                }
            ],
            "Description": "1",
            "SourceEntityArn": "arn:aws:quicksight:us-east-1: xxxxxxxxxxxx:analysis/7975f7aa-261c-4e7c-b430-305d71e07a8e"
        },
        "TemplateId": "DemoDashboardTemplate",
        "LastUpdatedTime": 1574137165.104,
        "CreatedTime": 1574137165.104
    }
}

Creating dashboards

You can then use this template to create a dashboard that is connected to the second, similar, dataset you created earlier. When calling the create-dashboard API, you define a SourceEntity, (in this case, the template we just created), permissions, and any dashboard publish options, as shown in the configuration used for create-dashboard-cli-input.json

create-dashboard-cli-input.json

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DashboardId": "DemoDashboard1",
    "Name": "Demo Dashboard 1",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/alice",
            "Actions": [
                "quicksight:DescribeDashboard",
                "quicksight:ListDashboardVersions",
                "quicksight:UpdateDashboardPermissions",
                "quicksight:QueryDashboard",
                "quicksight:UpdateDashboard",
                "quicksight:DeleteDashboard",
                "quicksight:DescribeDashboardPermissions",
                "quicksight:UpdateDashboardPublishedVersion"
            ]
        }
    ],
    "SourceEntity": {
        "SourceTemplate": {
            "DataSetReferences": [
                {
                    "DataSetPlaceholder": "DS1",
                    "DataSetArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet2"
                }
            ],
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate"
        }
    },
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-DemoDashboard"
        }
    ],
    "VersionDescription": "1",
    "DashboardPublishOptions": {
        "AdHocFilteringOption": {
            "AvailabilityStatus": "ENABLED"
        },
        "ExportToCSVOption": {
            "AvailabilityStatus": "ENABLED"
        },
        "SheetControlsOption": {
            "VisibilityState": "EXPANDED"
        }
    }
}

You can confirm you created the dashboard correctly with the describe-dashboard API. See the following code:

aws quicksight describe-dashboard --aws-account-id $AAI --dashboard-id DemoDashboard1

{
    "Status": 200,
    "Dashboard": {
        "DashboardId": "DemoDashboard1",
        "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dashboard/DemoDashboard1",
        "Name": "Demo Dashboard 1",
        "Version": {
            "CreatedTime": 1574138252.449,
            "Errors": [],
            "VersionNumber": 1,
            "Status": "CREATION_SUCCESSFUL",
            "SourceEntityArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate/version/1",
            "Description": "1"
        },
        "CreatedTime": 1574138252.395,
        "LastPublishedTime": 1574138252.395,
        "LastUpdatedTime": 1574138252.449
    },
    "RequestId": "62945539-bb63-4faf-b897-8e84ea5644ae"
}

Because you granted permissions to Alice during creation, this dashboard will be visible to her under All dashboards when she logs into QuickSight as shown below:

Configuring SPICE datasets        

Alice’s datasets are direct query datasets and may cause higher load on your Redshift cluster during peak times, which might also cause Alice and her customers to wait longer for their dashboards to load. With SPICE, Amazon QuickSight’s in-memory data store, you can offload user traffic from your database, while also achieving high concurrency limits and fast, responsive performance for your end-users. The following configuration redefines our second dataset and sets ImportMode to SPICE as highlighted below:

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DataSetId": "DemoDataSet3",
    "Name": "Demo Data Set 3",
    "PhysicalTableMap": {
        "DemoDataSet2-CustomerSales": {
            "CustomSql": {
                "DataSourceArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:datasource/SampleRedshiftDatasouce",
                "Name": "Customer Sales",
                "SqlQuery":"SELECT lineorder.lo_revenue, customer.c_name, customer.c_city FROM public.lineorder INNER JOIN public.customer ON lineorder.lo_custkey = customer.c_custkey",
                    "Columns": [
                        {
                            "Name": "lo_revenue",
                            "Type": "INTEGER"
                        },
                        {
                            "Name": "c_name",
                            "Type": "STRING"
                        },
                        {
                            "Name": "c_city",
                            "Type": "STRING"
                        }
                    ]
 
	    }
        }
    },
    "LogicalTableMap": {
        "DemoDataSet2-CustomerSales-Logical": {
            "Alias": "Customer Sales",
            "Source": {
                "PhysicalTableId": "DemoDataSet2-CustomerSales"
            }
        }

    },
    "ImportMode": "SPICE",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/alice",
            "Actions": [
                "quicksight:UpdateDataSetPermissions",
                "quicksight:DescribeDataSet",
                "quicksight:DescribeDataSetPermissions",
                "quicksight:PassDataSet",
                "quicksight:DescribeIngestion",
                "quicksight:ListIngestions",
                "quicksight:UpdateDataSet",
                "quicksight:DeleteDataSet",
                "quicksight:CreateIngestion",
                "quicksight:CancelIngestion"
            ]
        }
    ],
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-DemoDataSet2"
        }
    ]
}

You can call the create-data-set API with that configuration to create the dataset in SPICE mode:

aws quicksight create-data-set --aws-account-id $AAI --cli-input-json file://./create-data-set-cli-input-sql-spice.json

{
    "Status": 201,
    "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3",
    "DataSetId": "DemoDataSet3",
    "IngestionArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3/ingestion/589eaf9d-0e21-4572-9ebe-f61b129886d9",
    "IngestionId": "589eaf9d-0e21-4572-9ebe-f61b129886d9",
    "RequestId": "7a442521-ab8f-448f-8668-1f0d6823f987"
}

When creating datasets in SPICE mode, you’ll consume SPICE capacity. The describe-data-set API provides details on the SPICE capacity consumed. See the relevant bits from the describe-data-set response highlighted in the code snippet below:

aws quicksight describe-data-set --aws-account-id $AAI --data-set-id DemoDataSet3|tail -5

        "ImportMode": "SPICE",
        "ConsumedSpiceCapacityInBytes": 107027200
    },
    "RequestId": "b0175568-94ed-40e4-85cc-b382979ca377"
}

For SPICE datasets, the APIs also provide the option of setting up ingestions or listing past ingestions. Listing past ingestions allows you to identify ingestion status and when the data in the dataset was last updated. To list past ingestions, call the list-ingestions API for your dataset:

aws quicksight list-ingestions --aws-account-id $AAI --data-set-id DemoDataSet3

{
    "Status": 200,
    "Ingestions": [
        {
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3/ingestion/589eaf9d-0e21-4572-9ebe-f61b129886d9",
            "IngestionId": "589eaf9d-0e21-4572-9ebe-f61b129886d9",
            "IngestionStatus": "COMPLETED",
            "ErrorInfo": {},
            "RowInfo": {
                "RowsIngested": 1126608,
                "RowsDropped": 0
            },
            "CreatedTime": 1574138852.564,
            "IngestionTimeInSeconds": 24,
            "IngestionSizeInBytes": 107027200,
            "RequestSource": "MANUAL",
            "RequestType": "INITIAL_INGESTION"
        }
    ],
    "RequestId": "ee1d9a6f-a290-418a-a526-8906f4689776"
}

Set up ingestions to update the data in your dataset from your data source. As you may have new data in your Redshift cluster, you can use the create-ingestion API for your dataset to trigger a SPICE refresh:

aws quicksight create-ingestion --aws-account-id $AAI --data-set-id DemoDataSet3 --ingestion-id DemoDataSet3-Ingestion2

{
    "Status": 201,
    "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3/ingestion/DemoDataSet3-Ingestion2",
    "IngestionId": "DemoDataSet3-Ingestion2",
    "IngestionStatus": "INITIALIZED",
    "RequestId": "fc1f7eea-1327-41d6-9af7-c12f097ed343"
}

Listing ingestion history again shows the new ingestion in the state RUNNING. See the following code:

aws quicksight list-ingestions --aws-account-id $AAI --data-set-id DemoDataSet3

{
    "Status": 200,
    "Ingestions": [
        {
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3/ingestion/DemoDataSet3-Ingestion2",
            "IngestionId": "DemoDataSet3-Ingestion2",
            "IngestionStatus": "RUNNING",
            "RowInfo": {
                "RowsIngested": 0
            },
            "CreatedTime": 1574139332.876,
            "RequestSource": "MANUAL",
            "RequestType": "FULL_REFRESH"
        },
        {
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet3/ingestion/589eaf9d-0e21-4572-9ebe-f61b129886d9",
            "IngestionId": "589eaf9d-0e21-4572-9ebe-f61b129886d9",
            "IngestionStatus": "COMPLETED",
            "ErrorInfo": {},
            "RowInfo": {
                "RowsIngested": 1126608,
                "RowsDropped": 0
            },
            "CreatedTime": 1574138852.564,
            "IngestionTimeInSeconds": 24,
            "IngestionSizeInBytes": 107027200,
            "RequestSource": "MANUAL",
            "RequestType": "INITIAL_INGESTION"
        }
    ],
    "RequestId": "322c165e-fb90-45bf-8fa5-246d3034a08e"
}

Versioning and aliasing

As you may have noticed, the template and dashboard APIs support versioning. When you create a template or dashboard, version 1 of that entity is created, and each update of a template or dashboard results in a new version of that entity with an incremented version number. Versioning can be useful when you want to republish an old version of a dashboard or ensure you’re building off a known version of the template. For example, if you published your dashboard multiple times, but want to republish the first version, you can call the update-dashboard-published-version API as follows:

aws quicksight update-dashboard-published-version --aws-account-id $AAI --dashboard-id DemoDashboard1 --dashboard-version-number 1

{
   "DashboardArn": "arn:aws:quicksight:us-east-
1:xxxxxxxxxxxx:dashboard/DemoDashboard1",
   "DashboardId": "DemoDashboard1",
   "RequestId": "c97cdbff-d954-4f5b-a887-28659ad9cc9a"
}

Versioning becomes even more powerful with aliases. A template or dashboard alias is a pointer to a specific template or dashboard version. Aliases are helpful when you want to point to a logical version of a resource rather than an explicit version number that needs to be manually updated. This feature can be helpful for development workflows where you want to make changes to a template and only want production code to use an approved template version. For example, you can create a ‘PROD’ template alias for your approved production template and point it to version 1, as shown below:

aws quicksight create-template-alias --aws-account-id $AAI --template-id DemoDashboardTemplate --template-version-number 1 --alias-name PROD

{
   "RequestId": "c112a48c-14da-439e-b4d8-538b60c1f188",
   "TemplateAlias": { 
      "AliasName": "PROD",
      "Arn": "arn:aws:quicksight:us-east-
1:xxxxxxxxxxxx:template/DemoDashboardTemplate/alias/PROD",
      "TemplateVersionNumber": 1
   }
}

When you create a dashboard from this template, you can specify the SourceEntity as the ‘PROD’ template alias to always publish the dashboard from the approved production template version. Note how the template Arn contains the template alias name in the call to the create-dashboard API below:

aws quicksight create-dashboard --generate-cli-skeleton > create-dashboard-cli-input.json

{
    "AwsAccountId": "xxxxxxxxxxxx",
    "DashboardId": "DemoDashboard2",
    "Name": "Demo Dashboard 2",
    "Permissions": [
        {
            "Principal": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:user/default/user/Alice",
            "Actions": [
                "quicksight:DescribeDashboard",
                "quicksight:ListDashboardVersions",
                "quicksight:UpdateDashboardPermissions",
                "quicksight:QueryDashboard",
                "quicksight:UpdateDashboard",
                "quicksight:DeleteDashboard",
                "quicksight:DescribeDashboardPermissions",
                "quicksight:UpdateDashboardPublishedVersion"
            ]
        }
    ],
    "SourceEntity": {
        "SourceTemplate": {
            "DataSetReferences": [
                {
                    "DataSetPlaceholder": "DS1",
                    "DataSetArn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:dataset/DemoDataSet2"
                }
            ],
            "Arn": "arn:aws:quicksight:us-east-1:xxxxxxxxxxxx:template/DemoDashboardTemplate/alias/PROD"
        }
    },
    "Tags": [
        {
            "Key": "Name",
            "Value": "API-DemoDashboard"
        }
    ],
    "VersionDescription": "1",
    "DashboardPublishOptions": {
        "AdHocFilteringOption": {
            "AvailabilityStatus": "ENABLED"
        },
        "ExportToCSVOption": {
            "AvailabilityStatus": "ENABLED"
        },
        "SheetControlsOption": {
            "VisibilityState": "EXPANDED"
        }
    }
}

If you were to create new versions of the template, you could then update the ‘PROD’ template alias to point to the new template version, without touching your dashboard creation code from above. For example, if you wanted to set template version 3 to be your approved production version, you can call the update-template-alias API as shown below:

aws quicksight update-template-alias --aws-account-id $AAI --template-id DemoDashboardTemplate --template-version-number 3 --alias-name PROD

{
   "RequestId": "5ac9ff9e-28c1-4b61-aeab-5949986f5b2b",
   "TemplateAlias": { 
      "AliasName": "PROD",
      "Arn": "arn:aws:quicksight:us-east-
1:xxxxxxxxxxxx:template/DemoDashboardTemplate/alias/PROD",
      "TemplateVersionNumber": 3
   }
}

Keep an eye out for a future blog post that will detail more examples of using versioning and aliases for large scale deployments!

Amazon QuickSight’s data source, dataset, dashboard, and SPICE APIs are available in both Standard and Enterprise editions, with edition-specific support for functionality. Fine-grained access control APIs and template support for dashboard APIs are available in Amazon QuickSight Enterprise edition. APIs for Amazon QuickSight are part of the AWS SDK and are available in a variety of programming languages. For more information, see API Reference and Tools to Build on AWS.

Conclusion

With QuickSight’s new themes, APIs, and additional customization options underway, there are many creative ways to author the next chapter in your analytics story. To be a part of future API previews and connect with us, sign up for the embedding early access program.

 


About the Author

Jose Kunnackal John is a principal product manager for Amazon QuickSight.

 

 

 

Enhancing dashboard interactivity with Amazon QuickSight Actions

Post Syndicated from Sahitya Pandiri original https://aws.amazon.com/blogs/big-data/enhancing-dashboard-interactivity-with-amazon-quicksight-actions/

Amazon QuickSight now offers enhanced dashboard interactivity capabilities through QuickSight Actions. QuickSight Actions provide advanced filtering capabilities through single point-and-click actions on dashboards. With Actions, you can link visuals within a dashboard so that selecting a dimensional point on one visual provides you with granular insights on the selected point on other visuals within your dashboard. Therefore, you can start with summaries and dive deep into details of your business metrics, all within the same dashboard sheet. You can define what visuals within your dashboard are interactive and how these interact with each other. As of this writing, QuickSight Actions lets you define two primary actions of interactivity: filter actions and URL actions. URL actions within Amazon QuickSight are not new, but the point of entry to create URL actions is now consolidated with Actions.

You can apply QuickSight Actions to any supported chart that holds at least one dimension. This post provides examples of getting starting with Actions, configuring different Actions on a dashboard, and enabling different forms of interactivity for each action configured.

This post uses the following data sets:

B2B Sales
This data set holds order details for a fictitious company ABCDO for 2016 and 2017. The dashboard we will build will report on sales metrics by industry, segment, region as primary dimensions, and also provides granular details for each order purchased.

Product Availability
This data set holds available quantity for every product by ID.

Prerequisites

Before implementing Actions on your Amazon QuickSight dashboards, review how to create and publish dashboards.

Getting started with QuickSight Actions

This screenshot is a dashboard built from the above two data sets. It shows sales by category, industry and region on line 1; segment sales by quarter, industry sales by segment on line 2; total profit, sales, discount and quanity sold on line 3; order details pivot on line 4 and shipping details pivot on line 5.

Before getting started, note the following terminology:

  • Source visual – The visual on which an action is created. Choosing a point on the source visual triggers the action, and the dimensional values chosen are passed as filters to target visuals.
  • Target visual – The visual that is filtered by the dimensional values chosen on the source visual.
  • Activation – You can trigger an action either by  selecting it directly (left-click), or selecting from the menu options (right-click).
  • Action type – You can configure two types of actions on a visual: filter actions and URL actions. Filter actions pass select or all dimensions as filters across select or all visuals across the dashboard. URL actions allow you to navigate from the dashboard to an external site or different sheet within the same dashboard with the selected dimension passed along.

Setting up a click/select action

To set up a click/select action, complete the following steps:

  1. Select the source visual on the analysis and choose Actions.The following screenshot shows you this step.
  2. Within Actions, you can either create and configure your actions or use the 1-click action setup. The 1-click setup creates a default action on the visual. As a result, when a point on the visual is selected/clicked, all dimensions on a selected point are passed as filters across all other visuals on the dashboard. Under Quick create, choose Filter same-sheet visuals.

This creates a default action, which you can modify at any time.

After creating the 1-click action, you can select any point on the source visual. The following screenshot shows that the selected point is highlighted and all the underlying dimensions are used as filters on all other visuals on the analysis or dashboard.

In this screenshot, selecting Copper and Diamonds for the year 2017 passes these two as filters to all other visuals.

To verify what filter actions are applied on a particular visual, choose the filter icon on the top right. This shows all the filters applied through filter actions and also regular filters.

Setting up custom filter actions

To perform further analysis on the sales metrics for any segment, configure an action on the Industry sales by Segment visual. To configure a filter action to pass segment as the filter onto the KPI charts, complete the following steps:

  1. Choose the source visual and choose Actions.
  2. Choose Define a custom action.
  3. For Action name, enter a name for your action. You could also add dynamic placeholder values to the name to be more descriptive to your readers by choosing the plus sign next to the Action name.You have four configuration settings. While this post uses values specific for this use case, you can choose other values according to your specific needs.
  4. For Activation, select Menu option.
  5. For Action type, choose Filter action.
  6. For Filter scope, select Selected fields, segment.
  7. For Target visuals, select Select visuals and check needed target visuals Total quantity, Total discount offered, and Total profit by year.
  8. Choose Save.

To view menu options and choose a filter action, right-click any segment on the visual. This post selects Strategic. The following screenshot shows that all the Strategic points are highlighted and the KPIs are updated.

Adding filter actions on a pivot tables

Filter actions on pivot tables provide exceptional flexibility to pass anything from the innermost to the outermost dimensions. Additionally, when a filter action is triggered, the source pivot table is highlighted to show values that are passed as filters to the target visuals.

This post adds click/select filter actions on the Order details pivot table to analyze shipping details for each order. This post also adds two filter actions to the menu options: one to filter by the product name and the other by the customer name.

To create click/select filter actions on order_id, complete the following steps:

  1. Choose the Order details pivot table chart.
  2. Choose Actions.
  3. Choose Define custom action.
  4. For Action name, enter your action name.
  5. For Activation, select Select.
  6. For Filter scope, select Selected fields and check order_id.
  7. For Target visuals, select Select visuals, and check Shipping details.
  8. Choose Save.

The following screenshot shows that when choosing any point on the pivot table, the order pertaining to the point is highlighted and the Shipping details table shows shipping details for the selected order.

You can only create one Selection action on a source visual and any number of menu option actions.

You can now create additional actions in the menu. To create a new actions, go to Actions, choose the plus sign and create these two actions.

After creating the two actions, they are visible in the Actions menu. See the following screenshot.

After setting up these action filters, you can right-click on the visual and trigger these filters from the menu.

The screenshots below shows you the menu on right-clicking product name.

The screenshots below shows you the menu on right-clicking customer name.

Removing applied filter actions

There are three ways to remove a triggered filter action:

  • On source visuals with filter actions triggered via select or menu options, clear the filter action by choosing the selected point again or a different point.
  • On source visuals with empty spaces within the visual (such as bar chart family or pie chart), clicking within the empty space also clears the filter action selection. This deselection is not applicable to heat maps and tree maps. Deselect points on these charts by choosing an already selected point.
  • On target visuals with a highlighted filter icon, choosing the X on the filter actions displayed removes the filter for this visual and all other visuals the filter is applied to.

Using action filters when the source and target visuals use different data sets

If you use multiple data sets within an analysis, Amazon QuickSight automatically maps fields across these data sets by the field name. For example, product_id is a field in the B2B Sales data set and Product Availability data set. When setting a filter action on product_id from a source visual using the B2B Sales data set, the target visual (the adjacent KPI) showing product quantity availability shows results for the selected product_id from the source visual. This implicit mapping of fields is case sensitive.

You can add the new data set Product Availability to this analysis. For more information, see Adding a Data Set to an Analysis.

Add a KPI chart to show count of product_id as in the screenshot below.

To show total quantity available of select products, create an action filter on the menu options, as seen in the screenshot below:

To drill down into product quantity availability, choose (right-click) product_id on the order details pivot table. The following screenshot shows the product availability details for the selected product_id.

Creating custom URL actions

Custom URL actions aren’t new to Amazon QuickSight, but the point of entry to create URL actions is now consolidated with Actions. To create a custom URL action on the Order details pivot table to see Google search results on a customer, complete the following steps:

  1. Choose the Order details pivot table.
  2. Choose Actions.
  3. Choose Add a new action.
  4. For Action name, enter an action name.
  5. For Action type, choose URL action.
  6. Enter the URL https://www.google.com/search?q=<<customer_name>>.Customer_name is the value of the dimension field to select.
  7. Choose New browser tab.
  8. Choose Save.

When you right-click on the pivot table, menu options appear with your two filter actions created previously and the new custom URL action. See the following screenshot.

Conclusion

With Amazon QuickSight, you can create filter actions on any chart type with at least one dimensional value. You can also create URL actions on all chart types, including KPIs and gauge charts, which do not hold dimensional values. As of this writing, QuickSight Actions don’t support selecting multiple data points within the same source visual, cascading filter actions in which you choose one filter action on one visual and a subsequent filter on another visual, or adding customer mappings between different data sets used in the dashboard. You can expect to see these features in future releases.

QuickSight Actions is available on both Enterprise and Standard editions in all QuickSight-supported regions.

 


About the Author

Sahitya Pandiri is a technical program manager with Amazon Web Services. Sahitya has been in product/program management for 6 years now, and has built multiple products in the retail, healthcare, and analytics space.

 

 

 

Amazon QuickSight announces the all-new QuickSight Mobile app

Post Syndicated from Tina Kelleher original https://aws.amazon.com/blogs/big-data/announcing-the-new-mobile-app-for-amazon-quicksight/

AWS is happy to announce the release of QuickSight Mobile for iOS and Android devices. This release is both a major update to the existing iOS app and the launch of a new Android application. The app enables you to securely get insights from your data from anywhere; favorite, browse, and interact with your dashboards; explore your data with drilldowns and filters; stay ahead of the curve via forecasting; get email alerts when unexpected changes happen in your data; and share those insights with colleagues.

To download the QuickSight Mobile app, visit the iOS App Store and Google Play.

The new QuickSight Mobile optimizes the dashboard consumption experience with newly added features and enhanced interactivity. In this blog, we will walk you through the new Mobile experience in detail.

Features and capabilities

Home page

After authenticating with the app, you land on the QuickSight home screen. From here, you have quick access to your favorite dashboards (synched from the browser application), and most recently viewed dashboards. The recently viewed list can display up to the last 100 dashboards you previously opened.

The following screenshots show your favorited and recently viewed dashboards.

Dashboards

You can browse through all your dashboards via the Dashboards tab. You can sort the dashboards by name or updated (published) date and change to a list or a grid view. You can also search for dashboards by name. Lastly, you can easily add any dashboard to your favorites by choosing the star icon next to the dashboard’s name.

The following screenshots show dashboards in grid view, display preferences and search tool.

Dashboard view

The dashboard view is optimized for your mobile device by showing the visuals in vertical stack view, with each visual expanded to full width. A multi-sheet dashboard shows the first sheet by default. To navigate across sheets, choose the sheet title and select the sheet name you want to view.

The following screenshots show an opened dashboard with visuals in vertical stack view and how you navigate across different sheets within the dashboard.

To apply filters to visuals, choose the funnel icon to view all available filter controls and options for your dashboard sheet. After you’ve made your selection, choose Apply, and choose Done. You can see the number of filters applied via the small blue tag next to the funnel icon. You can also expand the filter control view and reset a given filter to the default.

The following screenshots show dashboard filters, expanded filter pane and how you select filter values.

Some visuals have a scroll bar, which allows you to customize the range to zoom in on a visual. A long press on a data point on most charts, such as a line or a bar, brings up the context menu. You can focus on this data point, drill up or down if the visual has a drill hierarchy, and exclude the chosen data point.

The following screenshots show interactions with zoom bars and context menu.

To go back to the full view, choose the … icon in the upper right and choose Undo. To exit the dashboard view, choose the back arrow.

If it is an anomaly detection widget, you can click on “Explore anomalies” and get redirected to the detailed anomalies page. Here, you can see all of the anomalies detected for the latest period. To perform contribution analysis to uncover the hidden drivers, long press the anomaly data point and you will see the option show up in the bottom sheet. To find out more about anomaly detection and contribution analysis, see Amazon QuickSight Announces General Availability of ML Insights.

You can expect contribution analysis to extend support to all visuals beyond anomaly detection on the app in the near future.

The following screenshots show the anomaly details and contribution analysis flow.

Visual view

From the dashboard view, choose the expand icon in the upper right corner to expand the desired visual to full-screen. The app supports both portrait and landscape modes in the expanded visual, and you can use the same interactions such as zoom, drill, focus, and exclude.

The following screenshots show expanded visuals in both portrait and landscape modes.

Settings

The Settings tab allows you to change language, Region, and biometric authentication methods. You can also provide feedback by choosing Provide feedback, which initiates an email to our mobile team.

The following screenshot shows you the options on settings page.

Setup and authentication

The QuickSight Mobile app supports authentication by any provisioned user. You can authenticate by using either your Amazon QuickSight username and password (with MFA, if this option is enabled), or through your own identity provider (IdP). After authentication, you can re-authenticate using biometric authentication such as face ID or touch ID, if supported by your mobile device, or an application-specific PIN.

The following screenshots show the generic sign-in flow.

IdP-initiated login

For IdP-initiated logins, set up your account with identity federation. For more information, see Federate Amazon QuickSight access with Okta. You can federate using standard mobile federation processes that your IdP provides, either through an app or through the web. This post uses the Okta mobile app. When you are federated, you receive the prompt to either continue your session using the mobile web experience or in the QuickSight Mobile app.

Account administrators can set the duration of mobile sessions to be anywhere from 1–30 days through the administrative console. The default session length for new accounts is 30 days.

The following screenshots show IdP-initiated login flow.

Microsoft Active Directory

If your account uses Microsoft Active Directory, you can log in to the app using your current AD credentials. If your account requires MFA, you also need to provide your MFA token to authenticate. For more information, see Can I use AWS Directory Service for Microsoft Active Directory to authenticate users in Amazon QuickSight?

Email Invitations

If you receive an email invitation to Amazon QuickSight, follow the instructions in the email to set your username and password through the website (mobile or desktop). Use these credentials to authenticate with the mobile applications.

IAM Users

If your account is an IAM user, you need to provision yourself as an Amazon QuickSight user before authenticating in the mobile application. For more information on provisioning users, Provisioning Users for Amazon QuickSight.

Summary

This post provided an overview of the key features and functionalities as well as sign-in flows for the new QuickSight Mobile app. The app is available as a free download in Google Play and the iOS App Store. You can expect the app to continue to evolve its features and capabilities to give you the best experience for getting insights from your data while on the go!

If you have any questions or feedback, please leave a comment.

 


About the Authors

Susan Fang is a senior product manager for QuickSight with AWS.

 

 

 

 

Brian Stein is a software development manager at AWS.

 

 

 

 

Joining across data sources on Amazon QuickSight

Post Syndicated from Rakshith Dayananda original https://aws.amazon.com/blogs/big-data/joining-across-data-sources-on-amazon-quicksight/

Amazon QuickSight announced the launch of Cross Data Source Join, which allows you to connect to multiple data sources and join data across these sources in Amazon QuickSight directly to create data sets used to build dashboards. For example, you can join transactional data in Amazon Redshift that contains customer IDs with Salesforce tables that contain customer profile data to create an interactive dashboard with order and customer details. You can slice and dice transactional data by various customer dimensional data such as segment, geographic, or demographic without first pulling the data to a single source outside Amazon QuickSight.

With cross data source joins, you can join across all data sources supported by Amazon QuickSight, including file-to-file, file-to-database, and database-to-database joins using the built-in drag-and-drop UI, without heavily relying on the BI and data engineering teams to set up complex and time-consuming ETLs. Whether it is local CSV files, Amazon RDS databases, or JSON objects on an S3 bucket, you can now join any of these data sources together to create data sets.

Finally, you can set up a scheduled refresh up to the hour and confirm that the joined data set is always up to date with the latest information.

Getting started with Cross Data Source Join

The screenshot below shows all data sources you can connect to on QuickSight.

Amazon QuickSight allows you to connect to different data sources. It is common for businesses to have data spread across multiple data sources, depending on your data requirements. For example, you might have your web server logs stored in Amazon S3, customer details on Amazon Redshift tables, and order details on RDS. You may need to build reports from data combined from two or more of these different data sources.

You can accomplish this to some extent by building data pipelines to consolidate from multiple data sources into one single data source. However, creating these data pipelines results in data duplications across various AWS services, and adds additional cost in terms of effort and time to move data to a single data source. You would then build Amazon QuickSight data sets from this single data source. With cross data source join available directly on Amazon QuickSight, you can eliminate this problem.

There is no size restriction on your largest source, as long as the post-join table can fit into your SPICE capacity per data set. The rest of the tables together need to be within 1 GB in size. For example, if you have 20 numeric columns in your smaller table, you can fit about 5 million rows until you exceed the 1 GB memory limit.

This post demonstrates how to create data sets from two CSV files and how to join an RDS table with an S3 file. The post uses an example table with orders-related data in one data source, and returns-related data in another data source. The final goal is to create one single data set that contains both orders and returns data.

Prerequisites

Before getting started, download these CSV files in to your local machines:

Also, learn how to create, edit, delete QuickSight data sources from Working with Data Sources in Amazon QuickSight.

Joining multiple CSV files

To join two CSV files, complete the following steps:

  1. Use the orders CSV file downloaded from the S3 bucket above and upload to QuickSight.
  2. After selecting the ‘orders’ sheet, go to edit/preview data page where your data set details appears.
  1. From the top menu, choose Add data.A pop-up window appears with the option to either switch data sources or upload a new CSV to join with the current data set. The window also shown has existing source.
  1. Choose Upload a file and upload the ‘returns’ CSV file.After uploading the file, you can see sheet names for both the CSVs. The following screenshot shows orders and returns.
  1. Choose the two circles between the files.This step allows you to edit the join configuration between the two CSVs. In the Join configuration section, you can select the join type (inner, left, right, or full) and also select the column on which to apply the join. This post uses an inner join.
  1. For Join type, choose Inner.This post is joining the order ID classes of the two files.
  1. For Join classes, select Order ID in both drop-downs.
  1. Choose Apply.

You now have a data set that contains both orders and returns data from two different CSVs. You can save your data set and continue to create your analysis.

Joining an RDS table with S3

In this example, you have orders data in RDS and returns data as a JSON file in an S3 bucket. To join this data, complete the following steps:

  1. Create a data set on QuickSight from the RDS table. Review Create a Database Data Set and an Analysis to learn to connect to RDS to create data sets.
  2. Next, go to Manage data set, select the RDS data set. This post uses the orders data set.
  1. Choose Edit data set.A page with your data set details appears.
  1. From the top menu, choose Add data.A pop-up window appears with the option to either switch data sources or upload a new CSV to join with the current data set. The window also shown has existing source.
  2. Choose Switch data source.A list appears of the different data sets and their sources.
  1. Choose a data set. This post chooses Returns.You can now see both data sets linked together.
  2. Choose the two spheres between the data sets.
  3. Under Join configuration, choose your desired Join type. This post chooses Inner.
  1. For Join clauses, from the drop-downs of each data set, select the column on which to apply the join.This post chooses order_id for orders and Order ID for Returns.
  1. Choose Apply.

Your new data set contains both orders and returns data from two different CSVs. You can save your data set and continue to create your analysis.

Conclusion

This post showed you how to join data from two files, join tables from RDS and S3. You can join data from any two data sources (except IoT) on Amazon QuickSight with this method. Cross data source join capability is now available in both Enterprise and Standard editions in all Amazon QuickSight Regions.

 


About the Author

Rakshith Dayananda is an IT app dev engineer at AWS.

 

 

 

Analyze Google Analytics data using Upsolver, Amazon Athena, and Amazon QuickSight

Post Syndicated from Roy Hasson original https://aws.amazon.com/blogs/big-data/analyze-google-analytics-data-using-upsolver-amazon-athena-and-amazon-quicksight/

In this post, we present a solution for analyzing Google Analytics data using Amazon Athena. We’re including a reference architecture built on moving hit-level data from Google Analytics to Amazon S3, performing joins and enrichments, and visualizing the data using Amazon Athena and Amazon QuickSight. Upsolver is used for data lake automation and orchestration, enabling customers to get started quickly.

Google Analytics is a popular solution for organizations who want to understand the performance of their web properties and applications. Google Analytics data is collected and aggregated to help users extract insights quickly.  This works great for simple analytics. It’s less than ideal, however, when you need to enrich Google Analytics data with other datasets to produce a comprehensive view of the customer journey.

Why analyze Google Analytics data on AWS?

Google Analytics has become the de-facto standard web analytics tool. It is offered for free at lower data volumes and provides tracking, analytics, and reporting.  It enables non-technical users to understand website performance by answering questions such as: where are users coming from? Which pages have the highest conversion rates? Where are users experiencing friction and abandoning their shopping cart?

While these questions are answered within the Google Analytics UI, there are however some limitation, such as:

  • Data sampling: Google Analytics standard edition displays sampled data when running ad hoc queries on time periods that contain more than 500,000 sessions. Large websites can easily exceed this number on a weekly or even daily basis. This can create reliability issues between different reports, as each query can be fed by a different sample of the data.
  • Difficulty integrating with existing AWS stack: Many customers have built or are in the process of building their data and analytics platform on AWS. Customers want to use the AWS analytics and machine learning capabilities with their Google Analytics data to enable new and innovative use cases.
  • Joining with external data sources: Seeing the full picture of a business’ online activity might require combining web traffic data with other sources. Google Analytics does not offer a simple way to either move raw data in or out of the system. Custom dimensions in Google Analytics can be used, but they are limited to 20 for the standard edition and are difficult to use.
  • Multi-dimensional analysis: Google Analytics custom reports and APIs are limited to seven dimensions per query. This limits the depth of analysis and requires various workarounds for more granular slicing and dicing.
  • Lack of alternatives: Google Analytics 360, which allows users to export raw data to Google BigQuery, carries a hefty annual fee. This can be prohibitive for organizations. And even with this upgrade, the native integration is only with BigQuery, which means users still can’t use their existing AWS stack.

Building or buying a new web analytics solution (including cookie-based tracking) is also cost-prohibitive, and can interrupt existing workflows that rely on Google Analytics data.

Customers are looking for a solution to enable their analysts and business users to incorporate Google Analytics data into their existing workflows using familiar AWS tools.

Moving Google Analytics data to AWS: Defining the requirements

To provide an analytics solution with the same or better level of reporting as Google Analytics, we designed our solution around the following tenets:

  1. Analytics with a low technical barrier to entry: Google Analytics is built for business users, and our solution is designed to provide a similar experience. This means that beyond ingesting the data, we want to automate the data engineering work that goes into making the data ready for analysis.  This includes data retention, partitioning, and compression. All of this work must be done under the hood and remain invisible to the user querying the data.
  2. Hit-level data: Google Analytics tracks clickstream activity based on Hits – the lowest level of interaction between a user and a webpage. These hits are then grouped into Sessions – hits within a given time period, and Users – groups of sessions (more details here). The standard Google Analytics API is limited to session and user-based queries, and does not offer any simple way of extracting hit-level data. Our solution, however, does provide access to this granular data.
  3. Unsampled data: By extracting the data from Google Analytics and storing it on Amazon S3, we are able to bypass the 500K sessions limitation. We also have access to unsampled data for any query at any scale.
  4. Data privacy: If sensitive data is stored in Google Analytics, relying on third-party ETL tools can create risks around data privacy, especially in the era of GDPR. Therefore, our solution encrypts data in transit and relies exclusively on processing within the customer’s VPC.

Solution overview

The solution is built on extracting hit-level data and storing it in a data lake architecture on Amazon S3. We then use Amazon Athena and Amazon QuickSight for analytics and reporting. Upsolver, an AWS premier solution provider, is used to automate ingestion, ETL and data management on S3. Upsolver also orchestrate the entire solution with a simple-to-use graphical user interface.  The following diagram shows the high level architecture of our solutions.

Reference architecture showing the flow of data across Google Anaytics, Amazon Athena and Amazon QuickSight

Using Upsolver’s GA connector we extract unsampled, hit-level data from Google Analytics. This data is then automatically ingested according to accepted data lake best practices and stored in an optimized form on Amazon S3. The following best practices are applied to the data:

  • Store data in Apache Parquet columnar file format to improve read performance and reduce the amount of data scanned per query.
  • Partition data by event (hit) time rather than by API query time.
  • Perform periodic compaction by which small files are merged into larger ones improving performance and optimizing compression.

Once data is stored on S3, we use Upsolver’s GUI to create structured fact tables from the Google Analytics data. Users can query them using Amazon Athena and Amazon Redshift. Upsolver provides simple to use templates to help users quickly create tables from their Google Analytics data.  Finally, we use Amazon QuickSight to create interactive dashboards to visualize the data.

The result is a complete view of our Google Analytics data. This view provides the level of self-service analytics that users have grown accustomed to, at any scale, and without the limitations outlined earlier.

Building the solution: Step by step guide

In this section, we walk through the steps to set up the environment, configure Upsolver’s Google Analytics plugin, extract the data, and begin exploring.

Step 1: Installation and permissions

  1. Sign up for Upsolver (can also be done via the AWS Marketplace).
  2. Allow Upsolver access to read data from Google Analytics and add new custom dimensions. Custom dimensions enable Upsolver to read non-sampled hit-level data directly from Google Analytics instead of creating parallel tracking mechanisms that aren’t as trust-worthy.
  3. To populate the custom dimensions that were added to Google Analytics, allow Upsolver to run a small JavaScript code on your website. If you’re using GA360, this is not required.

Step 2: Review and clean the raw data

For supported data sources, Upsolver automatically discovers the schema and collects key statistics for every field in the table. Doing so gives users a glimpse into their data.

In the following screenshot, you can see schema-on-read information on the left side, stats per field and value distribution on the right side.

Screen shot of the Upsolver UI showing schema-on-read information on the left side, stats per field and value distribution on the right side

Step 3: Publishing to Amazon Athena

Upsolver comes with four templates for creating tables in your AWS based data lake according to the Google Analytics entity being analyzed:

  • Pageviews – used to analyze user flow and behavior on specific sections of the web property using metrics such as time on page and exit rate.
  • Events – user-defined interactions such as scroll depth and link clicks.
  • Sessions – monitor a specific journey in the web property (all pageviews and events).
  • Users – understand a user’s interaction with the web property or app over time.

All tables are partitioned by event time, which helps improve query performance.

Upsolver users can choose to run the templates as-is, modify them first or create new tables unique to their needs.

The following screenshot shows the schema produced by the Pageviews template:

Screen shot of the Upsolver UI showing the schema produced by the Pageviews template:

The following screenshot shows the Pageviews and Events tables as well as the Amazon Athena views for Sessions and Users generated by the Upsolver templates.

Screenshot showing the Pageviews and Events tables as well as the Athena views for Sessions and Users generated from the Upsolver templates.

The following are a couple example queries you may want to run to extract specific insights:

-- Popular page titles 
SELECT page_title, 
       Count(*) AS num_hits 
FROM   ga_hits_pageviews 
GROUP  BY page_title 
ORDER  BY 2 DESC 
-- User aggregations from hit data 
SELECT user_id, 
       Count(*)                   AS num_hits, 
       Count(DISTINCT session_id) AS num_of_sessions, 
       Sum(session_duration)      AS total_sessions_time 
FROM   ga_hits_pageviews 
GROUP  BY user_id 

Step 4: Visualization in Amazon QuickSight

Now that the data has been ingested, cleansed, and written to S3 in a structured manner, we are ready visualize it with Amazon QuickSight. Start by creating a dashboard to mimic the one provided by Google Analytics.  But we don’t need to stop there.  We can use QuickSight ML Insights to extract deeper insights from our data.  We can also embed Amazon QuickSight visualizations into existing web portals and applications making insights available to everyone.

Screenshot of QuickSight visual ization showing several sections, one with a graph, several others with various statistics

Screen shot of QuickSight showing a global map with usage concentrations marked by bubbles, alongside a pie graph.

Sreenshot of QuickSight showing a bar graph, alongside a table with various data values.

Conclusion

With minimal setup, we were able to extract raw hit-level Google Analytics data, prepare, and stored it in a data lake on Amazon S3.  Using Upsolver, combined with Amazon Athena and Amazon QuickSight, we built a feature-complete solution for analyzing web traffic collected by Google Analytics on AWS.

Key technical benefits:

  • Schema on-read means data consumers don’t need to model the data into a table structure, and can instantly understand what their top dimensions are. For example, 85% of my users navigate my website using Google Chrome browser.
  • Graphical user interface that enables self-service consumption of Google Analytics data.
  • Fast implementation using pre-defined templates that map raw data from Google Analytics to tables in the data lake.
  • Ability to replay historical Google Analytics data stored on Amazon S3.
  • Ability to partition the data on Amazon S3 by hit time reducing complexity of handling late arriving events.
  • Optimize data on Amazon S3 automatically for improved query performance.
  • Automatically manage tables and partitions in AWS Glue Data Catalog.
  • Fully integrated with a suite of AWS native services – Amazon S3, Amazon Athena, Amazon Redshift and Amazon QuickSight.

Now that we have feature parity, we can begin to explore integrating other data sources such as CRM, sales, and customer profile to build a true 360-degree view of the customer.  Furthermore, you can now begin using AWS Machine Learning services to optimize traffic to your websites, forecast demand and personalize the user experience.

We’d love to hear what you think. Please feel free to leave a comment with any feedback or questions you may have.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

 


About the Authors


Roy Hasson is the global business development lead of analytics and data lakes at AWS.
He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan, cheering his team on and hanging out with his family.

 

 

 

Eran Levy is the director of marketing at Upsolver.

 

 

 

 

Visualizing Sensor Data in Amazon QuickSight

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/visualizing-sensor-data-in-amazon-quicksight/

This post is courtesy of Moheeb Zara, Developer Advocate, AWS Serverless

The Internet of Things (IoT) is a term used wherever physical devices are networked in some meaningful connected way. Often, this takes the form of sensor data collection and analysis. As the number of devices and size of data scales, it can become costly and difficult to keep up with demand.

Using AWS Serverless Application Model (AWS SAM), you can reduce the cost and time to market of an IoT solution. This guide demonstrates how to collect and visualize data from a low-cost, Wi-Fi connected IoT device using a variety of AWS services. Much of this can be accomplished within the AWS Free Usage Tier, which is necessary for the following instructions.

Services used

The following services are used in this example:

What’s covered in this post?

This post covers:

  • Connecting an Arduino MKR 1010 Wi-Fi device to AWS IoT Core.
  • Forwarding messages from an AWS IoT Core topic stream to a Lambda function.
  • Using a Kinesis Data Firehose delivery stream to store data in S3.
  • Analyzing and visualizing data stored in S3 using Amazon QuickSight.

Connect the device to AWS IoT Core using MQTT

The Arduino MKR 1010 is a low-cost, Wi-Fi enabled, IoT device, shown in the following image.

An Arduino MKR 1010 Wi-Fi microcontroller

Its analog and digital input and output pins can be used to read sensors or to write to actuators. Arduino provides a detailed guide on how to securely connect this device to AWS IoT Core. The following steps build upon it to push arbitrary sensor data to a topic stream and ultimately visualize that data using Amazon QuickSight.

  1. Start by following this comprehensive guide to using an Arduino MKR 1010 with AWS IoT Core. Upon completion, your device is connected to AWS IoT Core using MQTT (Message Queuing Telemetry Transport), a protocol for publishing and subscribing to messages using topics.
  2. In the Arduino IDE, choose File, Sketch, Include Library, and Manage Libraries.
  3. In the window that opens, search for ArduinoJson and select the library by Benoit Blanchon. Choose install.

4. Add #include <ArduinoJson.h> to the top of your sketch from the Arduino guide.

5. Modify the publishMessage() function with this code. It publishes a JSON message with two keys: time (ms) and the current value read from the first analog pin.

void publishMessage() {  
  Serial.println("Publishing message");

  // send message, the Print interface can be used to set the message contents
  mqttClient.beginMessage("arduino/outgoing");
  
  // create json message to send
  StaticJsonDocument<200> doc;
  doc["time"] = millis();
  doc["sensor_a0"] = analogRead(0);
  serializeJson(doc, mqttClient); // print to client
  
  mqttClient.endMessage();
}

6. Save and upload the sketch to your board.

Create a Kinesis Firehose delivery stream

Amazon Kinesis Data Firehose is a service that reliably loads streaming data into data stores, data lakes, and analytics tools. Amazon QuickSight requires a data store to create visualizations of the sensor data. This simple Kinesis Data Firehose delivery stream continuously uploads data to an S3 storage bucket. The next sections cover how to add records to this stream using a Lambda function.

  1. In the Kinesis Data Firehose console, create a new delivery stream, called SensorDataStream.
  2. Leave the default source as a Direct PUT or other sources and choose Next.
  3. On the next screen, leave all the default values and choose Next.
  4. Select Amazon S3 as the destination and create a new bucket with a unique name. This is where records are continuously uploaded so that they can be used by Amazon QuickSight.
  5. On the next screen, choose Create New IAM Role, Allow. This gives the Firehose delivery stream permission to upload to S3.
  6. Review and then choose Create Delivery Stream.

It can take some time to fully create the stream. In the meantime, continue on to the next section.

Invoking Lambda using AWS IoT Core rules

Using AWS IoT Core rules, you can forward messages from devices to a Lambda function, which can perform actions such as uploading to an Amazon DynamoDB table or an S3 bucket, or running data against various Amazon Machine Learning services. In this case, the function transforms and adds a message to the Kinesis Data Firehose delivery stream, which then adds that data to S3.

AWS IoT Core rules use the MQTT topic stream to trigger interactions with other AWS services. An AWS IoT Core rule is created by using an SQL statement, a topic filter, and a rule action. The Arduino example publishes messages every five seconds on the topic arduino/outgoing. The following instructions show how to consume those messages with a Lambda function.

Create a Lambda function

Before creating an AWS IoT Core rule, you need a Lambda function to consume forwarded messages.

  1. In the AWS Lambda console, choose Create function.
  2. Name the function ArduinoConsumeMessage.
  3. For Runtime, choose Author From Scratch, Node.js10.x. For Execution role, choose Create a new role with basic Lambda permissions. Choose Create.
  4. On the Execution role card, choose View the ArduinoConsumeMessage-role-xxxx on the IAM console.
  5. Choose Attach Policies. Then, search for and select AmazonKinesisFirehoseFullAccess.
  6. Choose Attach Policy. This applies the necessary permissions to add records to the Firehose delivery stream.
  7. In the Lambda console, in the Designer card, select the function name.
  8. Paste the following in the code editor, replacing SensorDataStream with the name of your own Firehose delivery stream. Choose Save.
const AWS = require('aws-sdk')

const firehose = new AWS.Firehose()
const StreamName = "SensorDataStream"

exports.handler = async (event) => {
    
    console.log('Received IoT event:', JSON.stringify(event, null, 2))
    
    let payload = {
        time: new Date(event.time),
        sensor_value: event.sensor_a0
    }
    
    let params = {
            DeliveryStreamName: StreamName,
            Record: { 
                Data: JSON.stringify(payload)
            }
        }
        
    return await firehose.putRecord(params).promise()

}

Create an AWS IoT Core rule

To create an AWS IoT Core rule, follow these steps.

  1. In the AWS IoT console, choose Act.
  2. Choose Create.
  3. For Rule query statement, copy and paste SELECT * FROM 'arduino/outgoing’. This subscribes to the outgoing message topic used in the Arduino example.
  4. Choose Add action, Send a message to a Lambda function, Configure action.
  5. Select the function created in the last set of instructions.
  6. Choose Create rule.

At this stage, any message published to the arduino/outgoing topic forwards to the ArduinoConsumeMessage Lambda function, which transforms and puts the payload on the Kinesis Data Firehose stream and also logs the message to Amazon CloudWatch. If you’ve connected an Arduino device to AWS IoT Core, it publishes to that topic every five seconds.

The following steps show how to test functionality using the AWS IoT console.

  1. In the AWS IoT console, choose Test.
  2. For Publish, enter the topic arduino/outgoing .
  3. Enter the following test payload:
    {
      “time”: 1567023375013,  
      “sensor_a0”: 456
    }
  4. Choose Publish to topic.
  5. Navigate back to your Lambda function.
  6. Choose Monitoring, View logs in CloudWatch.
  7. Select a log item to view the message contents, as shown in the following screenshot.

Visualizing data with Amazon QuickSight

To visualize data with Amazon QuickSight, follow these steps.

  1. In the Amazon QuickSight console, sign up.
  2. Choose Manage Data, New Data Set. Select S3 as the data source.
  3. A manifest file is necessary for Amazon QuickSight to be able to fetch data from your S3 bucket. Copy the following into a file named manifest.json. Replace YOUR-BUCKET-NAME with the name of the bucket created for the Firehose delivery stream.
    {
       "fileLocations":[
          {
             "URIPrefixes":[
                "s3://YOUR-BUCKET-NAME/"
             ]
          }
       ],
       "globalUploadSettings":{
          "format":"JSON"
       }
    }
  4. Upload the manifest.json file.
  5. Choose Connect, then Visualize. You may have to give Amazon QuickSight explicit permissions to your S3 bucket.
  6. Finally, design the Amazon QuickSight visualizations in the drag and drop editor. Drag the two available fields into the center card to generate a Sum of Sensor_value by Time visual.

Conclusion

This post demonstrated visualizing data from a securely connected remote IoT device. This was achieved by connecting an Arduino to AWS IoT Core using MQTT, forwarding messages from the topic stream to Lambda using IoT Core rules, putting records on an Amazon Kinesis Data Firehose delivery stream, and using Amazon QuickSight to visualize the data stored within an S3 bucket.

With these building blocks, it is possible to implement highly scalable and customizable IoT data collection, analysis, and visualization. With the use of other AWS services, you can build a full end-to-end platform for an IoT product that can reliably handle volume. To further explore how hardware and AWS Serverless can work together, visit the Amazon Web Services page on Hackster.

Federate Amazon QuickSight access with Okta

Post Syndicated from Loc Trinh original https://aws.amazon.com/blogs/big-data/federate-amazon-quicksight-access-with-okta/

Amazon QuickSight is a fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in your organization. As a fully managed service, Amazon QuickSight lets you easily create and publish interactive dashboards that can then be accessed from any device and embedded into your applications, portals, and websites.

Amazon QuickSight supports identity federation through Security Assertion Markup Language 2.0 (SAML 2.0) in both Standard and Enterprise editions. With federation, you can manage users using your enterprise identity provider (IdP) and pass them to Amazon QuickSight at log-in. Such IdPs include Microsoft Active Directory Federation Services, Ping One Federation Server, and Okta.

This post provides step-by-step guidance for how to use Okta to federate access to Amazon QuickSight.

Create an Okta application

Sign in to your Okta admin dashboard. You can create a free Okta Developer Edition account.

  1. From the Okta admin dashboard ribbon, choose Applications.
  2. If you are viewing the Developer Console, switch to Classic UI, as shown in the following screenshot.
  3. Choose Add Application.
  4. Search for Amazon Web Services and choose Add.
  5. Rename Application label to Amazon QuickSight and choose Next.
  6. For Sign-On Options, choose SAML 2.0.
  7. For Default Relay State, type https://quicksight.aws.amazon.com.
  8. Right-click on Identity Provider metadata and choose Save Link As…
  9. Save the XML file to disk and choose Done. You need to use this file in the next steps.

Create a SAML provider in AWS

Open a new window and sign in to the AWS Management Console.

  1. Open the IAM console.
  2. In the navigation pane, choose Identity Providers, Create Provider.
  3. For Provider Type, choose SAML and provide a Provider Name (for example, Okta).
  4. For Metadata Document, upload the XML file from the previous steps.
  5. Choose Next Step, Create.
  6. Locate the IdP that you just created and make note of the Provider ARN

Create a role for federated users

This section describes the steps for creating an IAM SAML 2.0 federation role. While Okta is used for a single sign-on, there are two ways to provision users in Amazon QuickSight:

  • Grant the federation role permission to create new Amazon QuickSight users when a user visits for the first time.
  • Pre-provision Amazon QuickSight users using the API and add users to the appropriate groups. This is preferred for adding users to groups within Amazon QuickSight, because you can provision the user and add them to the groups at the same time.

The following steps demonstrate how to create a federation role with permission to create new Amazon QuickSight users. If you would rather pre-provision Amazon QuickSight users, instructions for using the API are at the end of this post.

  1. Open the IAM console.
  2. In the navigation pane, choose Roles, Create Role, Select type of trusted entity as SAML 2.0 federation.
  3. For SAML provider, select the IdP that you created in the previous steps (Okta).
  4. Select Allow programmatic and AWS Management Console access.
  5. Choose Next: Permissions, Create policy.
  6. In the Create policy window, navigate to the JSON tab and type the following:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "sts:AssumeRoleWithSAML",
                "Resource": "<YOUR SAML IDENTITY PROVIDER ARN>",
                "Condition": {
                    "StringEquals": {
                        "saml:aud": "https://signin.aws.amazon.com/saml"
                    }
                }
            },
            {
                "Action": [
                    "quicksight:CreateReader"
                ],
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:quicksight::<YOUR ACCOUNT ID>:user/${aws:userid}"
                ]
            }
        ]
    }

The IAM policy above grants the federation role permission to self-provision an Amazon QuickSight reader with the quicksight:CreateReader action. Best practice is to grant users in your organization reader access, and then upgrade users from within the application. Instructions for upgrading users are at the end of this post.

If you would rather pre-provision Amazon QuickSight users using the API, do not include any actions in the permission policy.

  1. Choose Review Policy.
  2. For Name, enter a value (for example, QuicksightOktaFederatedPolicy) and choose Create policy.
  3. On the Create role page, choose Refresh and select your new policy.
  4. Choose Next: Tags and Next: Review.
  5. Provide a Role name (for example, QuicksightOktaFederatedRole) and Role description.
  6. Choose Create role.

Create an AWS access key for Okta

To create an access key for Okta, follow these steps.

  1. Open the IAM console.
  2. In the navigation pane, choose Users, Add user.
  3. For User name, enter a value (for example, OktaSSOUser).
  4. For Access type, choose Programmatic access.
  5. Choose Next: Permissions, Attach existing policies directly, and Create policy.
  6. On the Create policy page, navigate to the JSON tab and type the following:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "iam:ListRoles",
                    "iam:ListAccountAliases"
                ],
                "Resource": "*"
            }
        ]
    }

  7. Choose Review Policy.
  8. For a Name, enter a value (for example, OktaListRolesPolicy) and choose Create policy.
  9. On the Create user page, choose Refresh and select your new policy.
  10. Choose Next: Tags, Next: Review, and Create user.
  11. To save your access key and secret key, choose Download .csv.

Configure the Okta application

Return to the window with your Okta admin dashboard.

  1. For Identity Provider ARN (Required only for SAML SSO), provide the ARN (for example, arn:aws:iam::<YOUR ACCOUNT ID>:saml-provider/Okta) of the IdP that you created in previous steps.
  2. Choose Done.
  3. From the Applications dashboard, choose Provisioning.
  4. Choose Configure API Integration.
  5. Select Enable API Integration.
  6. For Access Key and Secret Key, provide the access key and secret key that you downloaded in previous steps.
  7. Choose Test API CredentialsSave.
  8. From the SETTINGS pane, navigate to To App.
  9. Choose Edit.
  10. Enable Create Usersand choose Save.
  11. Choose Assignments, Assign and then select the users or groups to which to grant federated access.
  12. Select the Roles and SAML User Roles to grant to the users, as shown in the following screenshot.
  13. Choose Save and Go Back, Done.

Launch Amazon QuickSight

Log in to your Okta Applications dashboard with a user (if you are using an admin account, switch to user mode) that has been granted federated access. You should see a new application with your label (for example, Amazon QuickSight). Choose on the application icon to launch Amazon QuickSight.

You can now manage your users and groups using Okta as your IdP and federate access to Amazon QuickSight.

Pre-provisioning Amazon QuickSight users

The outlined steps demonstrate how to grant users permission to self-provision Amazon QuickSight users when they visit Amazon QuickSight for the first time. If you would rather pre-provision Amazon QuickSight users, you can use the API to create users and groups and then add users to those groups.

  1. To create an Amazon QuickSight user, run the following AWS CLI Link the Amazon QuickSight user to your federated Okta username by providing the user-name parameter with the format <role name>\<email> (for example, QuicksightOktaFederatedRole\[email protected])
    aws quicksight register-user \
        --aws-account-id=<YOUR ACCOUNT ID> \
        --namespace=default \
        --email=<[email protected]> \
        --user-name=<ROLE NAME>\<[email protected]> \
        --identity-type=QUICKSIGHT \
        --user-role=READER

  2. Optionally, create an Amazon QuickSight group.
    aws quicksight create-group \
        --aws-account-id=<YOUR ACCOUNT ID> \
        --namespace=default \
        --group-name="<YOUR GROUP NAME>" \
        --description="<YOUR GROUP DESCRIPTION>"

  3. Add users to groups.
    aws quicksight create-group-membership \
        --aws-account-id=<YOUR ACCOUNT ID> \
        --namespace=default \
        --group-name="<YOUR GROUP NAME>" \
        --member-name="<YOUR MEMBER USER NAME>"

By using the Amazon QuickSight API, you can manage users, groups, and group membership. After they’re created, groups automatically become available for use when modifying permissions to data sets, analyses, or dashboards by typing in the group name instead of a specific user. For other supported group and user management functions, see List of Actions by Function.

Managing users

You can upgrade users between reader and author or admin in the Manage users tab of the Manage QuickSight screen.

  1. In the Amazon QuickSight console, choose your user name in the upper-right corner and choose Manage QuickSight.
  2. In the navigation pane, choose Manage users.
  3. Locate the user to upgrade, and select the role to grant from the Role

Deep-linking dashboards

AmazonQuickSight dashboards can be shared using the Okta application’s single sign-on URL so that users can be federated directly to specific dashboards.

To deep link to a specific Amazon QuickSight dashboard with single sign-on, first locate the Okta application’s single sign-on URL. This can be found by opening the metadata XML file that you downloaded in the Create an Okta application steps above. The URL is the value of the Location attribute in the md:SingleSignOnService element and ends with /sso/saml.

After you have the Okta application’s single sign-on URL, append ?RelayState= to the end of the URL followed by the URL to your Amazon QuickSight dashboard. For example, your deep link URL might look as follows:

https://my-test-org.okta.com/app/amazon_aws/abcdefg12345XYZ678/sso/saml?RelayState=https://us-east-1.quicksight.aws.amazon.com/sn/dashboards/11111111-abcd-1234-efghi-111111111111

By deep-linking dashboards, you can provide users with a way to use single sign-on and directly access specific dashboards.

Summary

This post provided a step-by-step guide for configuring Okta as your IdP, and using IAM roles to enable single sign-on to Amazon QuickSight. It also showed how users and groups can be managed using the Amazon QuickSight API.

Although this post demonstrated the integration of IAM and Okta, you can replicate this solution using your choice of SAML 2.0 IdPs. For other supported federation options, see Enabling Single Sign-On Access to Amazon QuickSight Using SAML 2.0.

If you have any questions or feedback, please leave a comment.

 


About the Authors

Loc Trinh is a solutions architect at Amazon Web Services.

 

 

 

 

Naresh Gautam is a senior solutions architect at Amazon Web Services.

 

 

 

Create advanced insights using Level Aware Aggregations in Amazon QuickSight

Post Syndicated from Arun Baskar original https://aws.amazon.com/blogs/big-data/create-advanced-insights-using-level-aware-aggregations-in-amazon-quicksight/

Amazon QuickSight recently launched Level Aware Aggregations (LAA), which enables you to perform calculations on your data to derive advanced and meaningful insights. In this blog post, we go through examples of applying these calculations to a sample sales dataset so that you can start using these for your own needs.

What are Level Aware Aggregations?

Level aware aggregations are aggregation calculations that can be computed at a desired level in the overall query evaluation order of QuickSight. Please check this link for details on QuickSight’s Order of Evaluation. Up until now, the only types of aggregations possible in QuickSight were Display-level and Table calculation aggregation types.

  • Display-level aggregations are aggregations that are defined by the dimensions and metrics present in the field wells of a QuickSight visual.
  • Table calculations are computed by windowing/rolling-up over the display-level aggregated values of the visual. Hence, by definition, these are calculated after the Display-level aggregations are computed.

With Level Aware Aggregations, QuickSight now allows you to aggregate values before the Display-level aggregation. For more information, please visit the Level Aware Aggregations documentation.

Customer use cases

Distribution of customers by lifetime orders

Customer question: How many customers have made one order, two orders, three orders, and so forth?

In this case, we first want to aggregate the total number of orders made by each customer, then use the output of that as a visual dimension. This isn’t feasible to compute without LAA.

Solution using LAA

1.) Compute the number of orders per customer.

Calculated field name : NumberOrdersPerCustomer

Calculated field expression : countOver({order_id}, [{Customer Id}], PRE_AGG)

This computes the number of orders per customer, before the display-level aggregation of the visual.

2.) Create the visual.

Create the visual with the above field NumberOrdersPerCustomer in the “X-Axis well of the Field Wells. Add “Count Distinct” of “Customer Id” in the “Value” section of the Field Wells to create a histogram on number of orders made by customers.

As we can see, there are around 5000 unique customers with one order, around 3500 customers with two orders, and so on.

Filter out customers based on lifetime spends

Customer Question: How do I filter out customers with life-time spend less than $100,000? My visual’s dimension (group by) and metric definitions are independent of total spend per customer.

If the group dimensions of the aforementioned aggregation(spend) is exactly the same as the group dimensions in the field well, the customer can achieve this using aggregated filters feature. But that’s not always the case. As mentioned in the customer question, the visual’s definition can be different from the filter’s aggregation.

Solution using LAA

1.) Compute sum of sales per customer.

Calculated field name :  salesPerCustomer

Calculated field expression : sumOver(sales,[{customer_id}],PRE_AGG)

PRE_AGG indicates that the computation must occur before display-level aggregation.

2.) Create the visuals.

The visual on the left shows sum of sales per segment and the visual on the right shows the total number of customers. Note that there are no filters applied at this point.

3.) Create the filter on salesPerCustomer. 

Create a filter on top of the above field salesPerCustomer to select items greater than $100,000.

4.) Apply the filter.

The above image shows applying the filter on “salesPerCustomer” greater than $100,000.

With the filter applied, we have excluded the customers whose total spend is less than $100,000, regardless of what we choose to display in the visuals.

Fixed percent of total sales even with filters applied

Customer Question: How much is the contribution of each industry to the entire company’s profit (percent of total)? I don’t want the total to recompute when filters are applied.

The existing table calculation function percentOfTotal isn’t able to solve this problem, since filters on categories are applied before computing the total. Using percentOfTotal would recalculate the total every time filters are applied. We need a solution that doesn’t consider the filtering when computing the total.

Solution using LAA

1.) Compute total sales before filters through a calculated field.

Calculated field name : totalSalesBeforeFilters

Calculated field expression : sumOver(sales,[],PRE_FILTER)

PRE_FILTER indicates that this computation must be done prior to applying filters.

The partition dimension list (second argument) is empty since we want to compute the overall total.

2.) Compute the fixed percent of total sales.

Calculated field name : fixedPercentOfTotal

Calculated field expression : sum(sales) / min(totalSalesBeforeFilters)

Note: totalSalesBeforeFilters is the same for every row of the unaggregated data. Since we want to use it post-aggregation, we are using the aggregation min on top of it. If all values are the same, max or avg aggregations can be used as well as it serves the same purpose.

3.) Create the visual.

Add “industry” field to “Rows” well. Add “sales (SUM)” and “fixedPercentOfTotal“ to the ”values“ section. Now, the percent of total metric would remain fixed even if we filter out the data based on any underlying dimension or measure.

The visual shows sales per industry along with percent of total, computed using the table calculation percentOfTotal and using Level Aware Aggregation as described above. Both the percent of total values are currently the same since there aren’t any filters applied.

The visual shows the same metrics but with industries filtered only to 5 of them. As we can see “Percent of total sales” got re-adjusted to represent only the filtered data, whereas “Fixed Percent of total sales” remains the same even after filtering. Both the metrics are valuable customer use cases now feasible through QuickSight.

Compare sales in a category to industry average

Customer question: How do I compare sales in a category to the industry average? I want the industry average to include all categories even after filtering.

Since we want the industry average to stay fixed even with filtering, we need PRE_FILTER aggregation to achieve this.

Solution using LAA

1.) Compute the industry average.

Calculated field name : IndustryAverage

Calculated field expression : avgOver(sumOver(sales,[{category}],PRE_FILTER),[],PRE_FILTER)

We first compute the sum of sales per category and then average it across all categories. It’s important to note here that we first computed a finer level aggregation and fed that into a coarser level aggregation.

2.) Compute the difference from IndustryAverage.

Calculated field name : FixedDifferenceFromIndustryAverage

Calculated field expression : sum(sales) – min(IndustryAverage)

As mentioned in one of the examples above, we use min aggregation to retain the data while going.

3.) Create the visual.

Create the visual by adding “Category” in “X axis” field well and SUM(Sales), IndustryAverage and FixedDifferenceFromIndustryAverage as the values in a bar chart.

Visual shows total sales per category, the average across all industries and each category’s difference from average.

This visual shows the same metrics, but with categories filtered to include only 6 of them. As we can see, the industry average remained the same before and after filtering, keeping the difference the same whether you choose to show all categories, some of them, or just one.

Categorize customers based on lifetime spend

Customer question: How do I classify customers based on cumulative sales contribution? I then want to use that classification as my visual’s grouping.

The objective here is create custom sized bins to classify the customer. Even though we could do this classification post display-level aggregation, we wouldn’t be able to use it as a dimension/group by in the visual.

Solution using LAA

1.) Compute sales per customer before display-level aggregation.

Calculated field name : salesPerCustomer

Calculated field expression : sumOver({sales amount},[{customer id}],PRE_AGG)

2.) Categorize Customers.

Calculated field name : Customer Category

Calculated field expression : ifelse(salesPerCustomer < 1000, “VERY_LOW”, salesPerCustomer < 10000, “LOW”, salesPerCustomer < 100000, “MEDIUM”, “HIGH”)

3.) Create the visual.

Create the visual by adding “Customer Category” to the “Y-axis” field well, “Count Distinct” of “customer id” to the value field well.

Above image shows the number of unique customers per Custom Category.

Filtering can be done on top of these categories as well to build other relevant visuals, since the categories are tagged before aggregation.

Above image shows the number of unique customers per custom category split by gender.

Availability

Level aware aggregations are available in both Standard and Enterprise editions, in all supported AWS Regions. For more information, see the Amazon QuickSight documentation.

 


About the Author

Arun Baskar is a software development engineer for QuickSight at Amazon Web Services.

 

 

 

 

Creating custom Pinpoint dashboards using Amazon QuickSight, part 2

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/messaging-and-targeting/creating-custom-pinpoint-dashboards-using-amazon-quicksight-part-2/

Note: This post was written by Manan Nayar and Aprajita Arora, Software Development Engineers on the AWS Digital User Engagement team.


In our previous post, we discussed the process of visualizing specific, pre-aggregated Amazon Pinpoint metrics—such as delivery rate or open rate—using the Amazon Pinpoint Metrics APIs. In that example, we showed how to create a Lambda function that retrieves your metrics, and then make those metrics available for creating visualizations in Amazon QuickSight.

This post discusses shows a different approach to exporting data from Amazon Pinpoint and using it to build visualizations. Rather than retrieve specific metrics, you can use the event stream feature in Amazon Pinpoint to export raw event data. You can use this data in Amazon QuickSight to create in-depth analyses of your data, as opposed to visualizing pre-calculated metrics. As an added benefit, when you use this solution, you don’t have to modify any code, and the underlying data is updated every few minutes.

Step 1: Configure the event stream in Amazon Pinpoint

The Amazon Pinpoint event stream includes information about campaign events (such as campaign.send) and application events (such as session.start). It also includes response information related to all of the emails and SMS messages that you send from your Amazon Pinpoint project, regardless of whether they were sent from campaigns or on a transactional basis. When you enable event streams, Amazon Pinpoint automatically sends this data to your S3 bucket (via Amazon Kinesis Data Firehose) every few minutes.

To set up the event stream

  1. Sign in to the Amazon Pinpoint console at http://console.aws.amazon.com/pinpoint. In the list of projects, choose the project that you want to enable event streaming for.
  2. Under Settings, choose Event stream.
  3. Choose Edit, and then configure the event stream to use Amazon Kinesis Data Firehose. If you don’t already have a Kinesis Data Firehose stream, follow the link to create one in the Kinesis console. Configure the stream to send data to an S3 bucket. For more information about creating streams, see Creating an Amazon Kinesis Data Firehose Delivery Stream.
  4. Under IAM role, choose Automatically create a role. Choose Save.

Step 2: Add a data set in Amazon QuickSight

Now that you’ve started streaming your Amazon Pinpoint data to S3, you can set Amazon QuickSight to look for data in the S3 bucket. You connect QuickSight to sources of data by creating data sets.

To create a data set

    1. In a text editor, create a new file. Paste the following code:
      {
          "fileLocations": [
              {
                  "URIPrefixes": [ 
                      "s3://<bucketName>/"          
                  ]
              }
          ],
          "globalUploadSettings": {
              "format": "JSON"
          }
      }

      In the preceding code, replace <bucketName> with the name of the S3 bucket that you’re using to store the event stream data. Save the file as manifest.json.

    2. Sign in to the QuickSight console at https://quicksight.aws.amazon.com.
    3. Create a new S3 data set. When prompted, choose the manifest file that you created in step 1. For more information about creating S3 data sets, see Creating a Data Set Using Amazon S3 Files in the Amazon QuickSight User Guide.
    4. Create a new analysis. From here, you can start creating visualizations of your data. To learn more, see Creating an Analysis in the Amazon QuickSight User Guide.

Step 3: Set the refresh rate for the data set

You can configure your data sets in Amazon QuickSight to automatically refresh on a certain schedule. In this section, you configure the data set to refresh every day, one minute before midnight.

To set the refresh schedule

  1. Go to the QuickSight start page at https://quicksight.aws.amazon.com/sn/start. Choose Manage data.
  2. Choose the data set that you created in the previous section.
  3. Choose Schedule refresh. Follow the prompts to set up a daily refresh schedule.

Step 4: Create your visualizations

From this point, you can start creating visualizations of your data. To learn more about creating visualizations, see Creating an Analysis in the Amazon QuickSight User Guide.

Analyzing AWS WAF logs with Amazon ES, Amazon Athena, and Amazon QuickSight

Post Syndicated from Aaron Franco original https://aws.amazon.com/blogs/big-data/analyzing-aws-waf-logs-with-amazon-es-amazon-athena-and-amazon-quicksight/

AWS WAF now includes the ability to log all web requests inspected by the service. AWS WAF can store these logs in an Amazon S3 bucket in the same Region, but most customers deploy AWS WAF across multiple Regions—wherever they also deploy applications. When analyzing web application security, organizations need the ability to gain a holistic view across all their deployed AWS WAF Regions.

This post presents a simple approach to aggregating AWS WAF logs into a central data lake repository, which lets teams better analyze and understand their organization’s security posture. I walk through the steps to aggregate regional AWS WAF logs into a dedicated S3 bucket. I follow that up by demonstrating how you can use Amazon ES to visualize the log data. I also present an option to offload and process historical data using AWS Glue ETL. With the data collected in one place, I finally show you how you can use Amazon Athena and Amazon QuickSight to query historical data and extract business insights.

Architecture overview

The case I highlight in this post is the forensic use of the AWS WAF access logs to identify distributed denial of service (DDoS) attacks by a client IP address. This solution provides your security teams with a view of all incoming requests hitting every AWS WAF in your infrastructure.

I investigate what the IP access patterns look like over time and assess which IP addresses access the site multiple times in a short period of time. This pattern suggests that the IP address could be an attacker. With this solution, you can identify DDoS attackers for a single application, and detect DDoS patterns across your entire global IT infrastructure.

Walkthrough

This solution requires separate tasks for architecture setup, which allows you to begin receiving log files in a centralized repository, and analytics, which processes your log data into useful results.

Prerequisites

To follow along, you must have the following resources:

  • Two AWS accounts. Following AWS multi-account best practices, create two accounts:
    • A logging account
    • A resource account that hosts the web applications using AWS WAFFor more information about multi-account setup, see AWS Landing Zone. Using multiple accounts isolates your logs from your resource environments. This helps maintain the integrity of your log files and provides a central access point for auditing all application, network, and security logs.
  • The ability to launch new resources into your account. The resources might not be eligible for Free Tier usage and so might incur costs.
  • An application running with an Application Load Balancer, preferably in multiple Regions. If you do not already have one, you can launch any AWS web application reference architecture to test and implement this solution.

For this walkthrough, you can launch an Amazon ECS example from the ecs-refarch-cloudformation GitHub repo. This is a “one click to deploy” example that automatically sets up a web application with an Application Load Balancer. Launch this in two different Regions to simulate a global infrastructure. You ultimately set up a centralized bucket that both Regions log into, which your forensic analysis tools then draw from. Choose Launch Stack to launch the sample application in your Region of choice.

Setup

Architecture setup allows you to begin receiving log files in a centralized repository.

Step 1: Provide permissions

Begin this process by providing appropriate permissions for one account to access resources in another. Your resource account needs cross-account permission to access the bucket in the logging account.

  1. Create your central logging S3 bucket in the logging account and attach the following bucket policy to it under the Permissions Make a note of the bucket’s ARN. You need this information for future steps.
  2. Change RESOURCE-ACCOUNT-ID and CENTRAL-LOGGING-BUCKET-ARNto the correct values based on the actual values in your accounts:
     // JSON Document
     {
       "Version": "2012-10-17",
       "Statement": [
          {
             "Sid": "Cross Account AWS WAF Account 1",
             "Effect": "Allow",
             "Principal": {
                "AWS": "arn:aws:iam::RESOURCE-ACCOUNT-ID:root"
             },
             "Action": [
                "s3:GetObject",
                "s3:PutObject"
             ],
             "Resource": [
                "CENTRAL-LOGGING-BUCKET-ARN/*"
             ]
          }
       ]
    }

Step 2: Manage Lambda permissions

Next, the Lambda function that you create in your resource account needs permissions to access the S3 bucket in your central logging account so it can write files to that location. You already provided basic cross-account access in the previous step, but Lambda still needs the granular permissions at the resources level. Remember to grant these permissions in both Regions where you launched the application that you intend to monitor with AWS WAF.

  1. Log in to your resource account.
  2. To create an IAM role for the Lambda function, in the Lambda console, choose Policies, Create Policy.
  3. Choose JSON, and enter the following policy document. Replace YOUR-SOURCE-BUCKETand YOUR-DESTINATION-BUCKET with the relative ARNs of the buckets that you are using for this walkthrough.
    // JSON document
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "ListSourceAndDestinationBuckets",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket",
                    "s3:ListBucketVersions"
                ],
                "Resource": [
                    "YOUR-SOURCE-BUCKET",
                    "YOUR-DESTINATION-BUCKET"
                ]
            },
            {
                "Sid": "SourceBucketGetObjectAccess",
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject",
                    "s3:GetObjectVersion"
                ],
                "Resource": "YOUR-SOURCE-BUCKET/*"
            },
            {
                "Sid": "DestinationBucketPutObjectAccess",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject"
                ],
                "Resource": "YOUR-DESTINATION-BUCKET/*"
            }
        ]
     }

  4. Choose Review policy, enter your policy name, and save it.
  5. With the policy created, create a new role for your Lambda function and attach the custom policy to that role. To do this, navigate back to the IAM dashboard.
  6. Select Create roleand choose Lambda as the service that uses the role. Select the custom policy that you created earlier in this step and choose Next. You can add tags if required and then name and create this new role.
  7. You must also add S3 as a trusted entity in the Trust Relationship section of the role. Choose Edit trust relationship and add amazonaws.com to the policy, as shown in the following example.

Lambda and S3 now appear as trusted entities under the Trust relationships tab, as shown in the following screenshot.

Step 3: Create a Lambda function and copy log files

Create a Lambda function in the same Region as your resource account’s S3 bucket. This function reads log files from the resource account bucket and then copies that content to the logging account’s bucket. Repeat this step for every Region where you launched the application that you intend to monitor with AWS WAF.

  1. Log in to your resource account.
  2. Navigate to Lambda in your console and choose Create Function.
  3. Choose the Author from scratch function and name it. Choose the IAM role you created in the previous step and attach it to the Lambda function.
  4. Choose Create function.
  5. This Lambda function receives a document from S3 that contains nested JSON string data. To handle this data, you must extract the JSON from this string to retrieve key names of both the document and the bucket. Your function then uses this information to copy the data to your central logging account bucket in the next step. To create this function, Copy and paste this code into the Lambda function that you created. Replace the bucket names with the names of the buckets that you created earlier. After you decide on a partitioning strategy, modify this script later.
    // Load the AWS SDK
    const aws = require('aws-sdk');
    
    // Construct the AWS S3 Object 
    const s3 = new aws.S3();
    
    //Main function
    exports.handler = (event, context, callback) => {
        console.log("Got WAF Item Event")
        var _srcBucket = event.Records[0].s3.bucket.name;
        let _key = event.Records[0].s3.object.key;
        let _keySplit = _key.split("/")
        let _objName = _keySplit[ (_keySplit.length - 1) ];
        let _destPath = _keySplit[0]+"/"+_keySplit[1]+"/YOUR-DESTINATION-BUCKET/"+_objName;
        let _sourcePath = _srcBucket + "/" + _key;
        console.log(_destPath)
        let params = { Bucket: destBucket, ACL: "bucket-owner-full-control", CopySource: _sourcePath, Key: _destPath };
        s3.copyObject(params, function(err, data) {
            if (err) {
                console.log(err, err.stack);
            } else {
                console.log("SUCCESS!");
            }
        });
        callback(null, 'All done!');
    };

Step 4: Set S3 to Lambda event triggers

This step sets up event triggers in your resource account’s S3 buckets. These triggers send the file name and location logged by AWS WAF logs to the Lambda function. The triggers also notify the Lambda function that it must move the newly arrived file into your central logging bucket. Repeat this step for every Region where you launched the application that you intend to monitor with AWS WAF.

  1. Go to the S3 dashboard and choose your S3 bucket, then choose the Properties Under Advanced settings, choose Events.
  2. Give your event a name and select PUT from the Events check boxes.
  3. Choose Lambda from the Send To option and select your Lambda function as the destination for the event.

Step 5: Add AWS WAF to the Application Load Balancer

Add an AWS WAF to the Application Load Balancer so that you can start logging events. You can optionally delete the original log file after Lambda copies it. This reduces costs, but your business and security needs might err on the side of retaining that data.

Create a separate prefix for each Region in your central logging account bucket waf-central-logs so that AWS Glue can properly partition them. For best practices of partitioning with AWS Glue, see Working with partitioned data in AWS Glue. AWS Glue ingests your data and stores it in a columnar format optimized for querying in Amazon Athena. This helps you visualize the data and investigate the potential attacks.

Repeat this step for every Region where you launched the application that you intend to monitor with AWS WAF. The procedure assumes that you already have an AWS WAF enabled that you can use for this exercise. To move forward with the next step, you need AWS WAF enabled and connected to Amazon Kinesis Data Firehose for log delivery.

Setting up and configuring AWS WAF

If you don’t already have a web ACL in place, set up and configure AWS WAF at this point. This solution handles logging data from multiple AWS WAF logs in multiple Regions from more than one account.

To do this efficiently, you should consider your partitioning strategy for the data. You can grant your security teams a comprehensive view of the network. Create each partition based on the Kinesis Data Firehose delivery stream for the specific AWS WAF associated with the Application Load Balancer. This partitioning strategy also allows the security team to view the logs by Region and by account. As a result, your S3 bucket name and prefix look similar to the following example:

s3://central-waf-logs/<account_id>/<region_name>/<kinesis_firehose_name>/...filename...

Step 6: Copying logs with Lambda code

This step updates the Lambda function to start copying log files. Keep your partitioning strategy in mind as you update the Lambda function. Repeat this step for every Region where you launched the application that you intend to monitor with AWS WAF.

To accommodate the partitioning, modify your Lambda code to match the examples in the GitHub repo.

Replace <kinesis_firehose_name> in the example code with the name of the Kinesis Data Firehose delivery stream attached to the AWS WAF. Replace <central logging bucket name> with the S3 bucket name from your central logging account.

Kinesis Data Firehose should now begin writing files to your central S3 logging bucket with the correct partitioning. To generate logs, access your web application.

Analytics

Now that Kinesis Data Firehose can write collected files into your logging account’s S3 bucket, create an Elasticsearch cluster in your logging account in the same Region as the central logging bucket. You also must create a Lambda function to handle S3 events as the central logging bucket receives new log files. This creates a connection between your central log files and your search engine. Amazon ES gives you the ability to query your logs quickly to look for potential security threats. The Lambda function loads the data into your Amazon ES cluster. Amazon ES also includes a tool named Kibana, which helps with managing data and creating visualizations.

Step 7: Create an Elasticsearch cluster

  1. In your central Logging Account, navigate to the Elasticsearch Service in the AWS Console.
  2. Select Create Cluster, enter a domain name for your cluster, and choose version 3 from the Elasticsearch version dropdown. Choose Next.In this example, don’t implement any security policies for your cluster and only use one instance. For any real-world production tasks, keep your Elasticsearch Cluster inside your VPC.
  3. For network configuration, choose Public access and choose Next.
  4. For the access policy, and this tutorial, only allow access to the domain from a specified Account ID or ARN address. In this case, use your Account ID to gain access.
  5. Choose Next and on the final screen and confirm. You generally want to create strict access policies for your domain and not allow public access. This example only uses these settings to quickly demonstrate the capabilities of AWS services. I would never recommend this in a production environment.

AWS takes a few minutes to finish and activate your Amazon ES. Once it goes live, you can see two endpoints. The Endpoint URL is the URL you use to send data to the cluster.

Step 8: Create a Lambda function to copy log files

Add an event trigger to your central logs bucket. This trigger tells your Lambda function to write the data from the log file to Amazon ES. Before you create the S3 trigger, create a Lambda function in your logging account to handle the events.

For this Lambda function, we use code from the aws-samples GitHub repository that streams data from an S3 file line by line into Amazon ES. This example uses code taken from amazon-elasticsearch-lambda-samples. Name your new Lambda function myS3toES.

  1. Copy and paste the following code into a text file named js:
    exports.handler = (event, context, callback) => {
        // get the source bucket name
        var _srcBucket = event.Records[0].s3.bucket.name;
            // get the object key of the file that landed on S3
        let _key = event.Records[0].s3.object.key;
        
        // split the key by "/"
        let _keySplit = _key.split("/")
            // get the object name
        let _objName = _keySplit[ (_keySplit.length - 1) ];
            // reset the destination path
        let _destPath = _keySplit[0]+"/"+_keySplit[1]+"/<kinesis_firehose_name>/"+_objName;
            // setup the source path
        let _sourcePath = _srcBucket + "/" + _key;
            // build the params for the copyObject request to S3
        let params = { Bucket: destBucket, ACL: "bucket-owner-full-control", CopySource: _sourcePath, Key: _destPath };
            // execute the copyObject request
        s3.copyObject(params, function(err, data) {
            if (err) {
                console.log(err, err.stack);
            } else {
                console.log("SUCCESS!");
            }
        });
        callback(null, 'All done!');
    };

  2. Copy and paste this code into a text file and name it json:
    //JSON Document
    {
      "name": "s3toesfunction",
      "version": "1.0.0",
      "description": "",
      "main": "index.js",
      "scripts": {},
      "author": "",
      "dependencies": {
        "byline": "^5.0.0",
        "clf-parser": "0.0.2",
        "path": "^0.12.7",    "stream": "0.0.2"
      }
    }

  3. Execute the following command in the folder containing these files:> npm install
  4. After the installation completes, create a .zip file that includes the js file and the node_modules folder.
  5. Log in to your logging account.
  6. Upload your .zip file to the Lambda function. For Code entry type, choose Upload a .zip file.
  7. This Lambda function needs an appropriate service role with a trust relationship to S3. Choose Edit trust relationships and add amazonaws.com and lambda.amazonaws.com as trusted entities.
  8. Set up your IAM role with the following permissions: S3 Read Only permissions and Lambda Basic Execution. To grant the role the appropriate access, assign it to the Lambda function from the Lambda Execution Role section in the console.
  9. Set Environment variables for your Lambda function so it knows where to send the data. Add an endpoint and use the endpoint URL you created in Step 7. Add an index and enter your index name. Add a value for region and detail the Region where you deployed your application.

Step 9: Create an S3 trigger

After creating the Lambda function, create the event triggers on your S3 bucket to execute that function. This completes your log delivery pipeline to Amazon ES. This is a common pipeline architecture for streaming data from S3 into Amazon S3.

  1. Log in to your central logging account.
  2. Navigate to the S3 console, select your bucket, then open the Properties pane and scroll down to Events.
  3. Choose Add notification and name your new event s3toLambdaToEs.
  4. Under Events, select the check box for PUT. Leave Prefix and Suffix
  5. Under Send to, select Lambda Function, and enter the name of the Lambda function that you created in the previous step—in this example, myS3toES.
  6. Choose Save.

With this complete, Lambda should start sending data to your Elasticsearch index whenever you access your web application.

Step 10: Configure Amazon ES

Your pipeline now automatically adds data to your Elasticsearch cluster. Next, use Kibana to visualize the AWS WAF logs in the central logging account’s S3 bucket. This is the final step in assembling your forensic investigation architecture.

Kibana provides tools to create visualizations and dashboards that help your security teams view log data. Using the log data, you can filter by IP address to see how many times an IP address has hit your firewall each month. This helps you track usage anomalies and isolate potentially malicious IP addresses. You can use this information to add web ACL rules to your firewall that adds extra protection against those IP addresses.

Kibana produces visualizations like the following screenshot.

In addition to the Number of IPs over Time visualization, you can also correlate the IP address to its country of origin. Correlation provides even more precise filtering for potential web ACL rules to protect against attackers. The visualization for that data looks like the following image.

Elasticsearch setup

To set up and visualize your AWS WAF data, follow this How to analyze AWS WAF logs using Amazon Elasticsearch Service post. With this solution, you can investigate your global dataset instead of isolated Regions.

An alternative to Amazon ES

Amazon ES is an excellent tool for forensic work because it provides high-performance search capability for large datasets. However, Amazon ES requires cluster management and complex capacity planning for future growth. To get top-notch performance from Amazon ES, you must adequately scale it. With the more straightforward data of these investigations, you could instead work with more traditional SQL queries.

Forensic data grows quickly, so using a relational database means you might quickly outgrow your capacity. Instead, take advantage of AWS serverless technologies like AWS Glue, Athena, and Amazon QuickSight. These technologies enable forensic analysis without the operational overhead you would experience with Elasticsearch or a relational database. To learn more about this option, consult posts like How to extract, transform, and load data from analytic processing using AWS Glue and Work with partitioned data in AWS Glue.

Athena query

With your forensic tools now in place, you can use Athena to query your data and analyze the results. This lets you refine the data for your Kibana visualizations, or directly load it into Amazon QuickSight for additional visualization. Use the Athena console to experiment until you have the best query for your visual needs. Having the database in your AWS Glue Catalog means you can make ad hoc queries in Athena to inspect your data.

In the Athena console, create a new Query tab and enter the following query:

# SQL Query
SELECT date_format(from_unixtime("timestamp"/1000), '%Y-%m-%d %h:%i:%s') as event_date, client_ip, country, account_id, waf_name, region FROM "paritionedlogdata"."waf_logs_transformed" where year='2018' and month='12';

Replace <your-database-name> and <your-table-name> with the appropriate values for your environment. This query converts the numerical timestamp to an actual date format using the SQL according to Presto 0.176 documentation. It should return the following results.

You can see which IP addresses hit your environment the most over any period of time. In a production environment, you would run an ETL job to re-partition this data and transform it into a columnar format optimized for queries. If you would like more information about doing that, see the Best Practices When Using Athena with AWS Glue post.

Amazon QuickSight visualization

Now that you can query your data in Athena, you can visualize the results using Amazon QuickSight. First, grant Amazon QuickSight access to the S3 bucket where your Athena query results live.

  1. In the Amazon QuickSight console, log in.
  2. Choose Admin/username, Manage QuickSight.
  3. Choose Account settings, Security & permissions.
  4. Under QuickSight access to AWS services, choose Add or remove.
  5. Choose Amazon S3, then choose Select S3 buckets.
  6. Choose the output bucket for your central AWS WAF logs. Also, choose your Athena query results bucket. The query results bucket begins with aws-athena-query-results-*.

Amazon QuickSight can now access the data sources. To set up your visualizations, follow these steps:

  1. In the QuickSight console, choose Manage data, New data set.
  2. For Source, choose Athena.
  3. Give your new dataset a name and choose Validate connection.
  4. After you validate the connection, choose Create data source.
  5. Select Use custom SQL and give your SQL query a name.
  6. Input the same query that you used earlier in Athena, and choose Confirm query.
  7. Choose Import to SPICE for quicker analytics, Visualize.

Allow Amazon QuickSight several minutes. It alerts you after completing the import.

Now that you have imported your data into your analysis, you can apply a visualization:

  1. In Amazon QuickSight, New analysis.
  2. Select the last dataset that you created earlier and choose Create analysis.
  3. At the bottom left of the screen, choose Line Chart.
  4. Drag and drop event_date to the X-Axis
  5. Drag and drop client_ip to the ValueThis should create a visualization similar to the following image.
  6. Choose the right arrow at the top left of the visualization and choose Hide “other” categories.This should modify your visualization to look like the following image.

You can also map the countries from which the requests originate, allowing you to track global access anomalies. You can do this in QuickSight by selecting the “Points on map” visualization type and choosing the country as the data point to visualize.

You can also add a count of IP addresses to see if you have any unusual access patterns originating from specific IP addresses.

Conclusion

Although Amazon ES and Amazon QuickSight offer similar final results, there are trade-offs to the technical approaches that I highlighted. If your use case requires the analysis of data in real time, then Amazon ES is more suitable for your needs. If you prefer a serverless approach that doesn’t require capacity planning or cluster management, then the solution with AWS Glue, Athena, and Amazon QuickSight is more suitable.

In this post, I described an easy way to build operational dashboards that track key metrics over time. Doing this with AWS Glue, Athena, and Amazon QuickSight relieves the heavy lifting of managing servers and infrastructure. To monitor metrics in real time instead, the Amazon ES solution provides a way to do this with little operational overhead. The key here is the adaptability of the solution: putting different services together can provide different solutions to your problems to fit your exact needs.

For more information and use cases, see the following resources:

Hopefully, you have found this post informative and the proposed solutions intriguing. As always, AWS welcomes all feedback or comment.

 


About the Authors

Aaron Franco is a solutions architect at Amazon Web Services .

 

 

 

 

 

 

 

 

Detect fraudulent calls using Amazon QuickSight ML insights

Post Syndicated from Guy Ben Baruch original https://aws.amazon.com/blogs/big-data/detect-fraudulent-calls-using-amazon-quicksight-ml-insights/

The financial impact of fraud in any industry is massive. According to the Financial Times article Fraud Costs Telecoms Industry $17bn a Year (paid subscription required), fraud costs the telecommunications industry $17 billion in lost revenues every year.

Fraudsters constantly look for new technologies and devise new techniques. This changes fraud patterns and makes detection difficult. Companies commonly combat this with a rules-based fraud detection system. However, once the fraudsters realize their current techniques or tools are being identified, they quickly find a way around it. Also, rules-based detection systems tend to struggle and slow down with a lot of data. This makes it difficult to detect fraud and act quickly, resulting in loss of revenue.

Overview

There are several AWS services that implement anomaly detection and could be used to combat fraud, but lets focus on the following three:

When trying to detect fraud, there are two high-level challenges:

  • Scale – The amount of data to be analyzed. For example, each call generates a call detail record (CDR) event. These CDRs include many pieces of information such as originating and terminating phone numbers, and duration of call. Multiply these CDR events times the number of telephone calls placed each day and you can get an idea of the scale that operators must manage.
  • Machine learning knowledge and skill – The right set of skills to help solve business problems with machine learning. Developing these skills or hiring qualified data scientists with adequate domain knowledge is not simple.

Introducing Amazon QuickSight ML Insights

Amazon QuickSight is a fast, cloud-powered BI service that makes it easy for everyone in an organization to get business insights from their data through rich, interactive dashboards. With pay-per-session pricing and a dashboard that can be embedded into your applications, BI is now even more cost-effective and accessible to everyone.

However, as the volume of data that customers generate grows daily, it’s becoming more challenging to harness their data for business insights. This is where machine learning comes in. Amazon is a pioneer in using machine learning to automate and scale various aspects of business analytics in the supply chain, marketing, retail, and finance.

ML Insights integrates proven Amazon technologies into Amazon QuickSight to provide customers with ML-powered insights beyond visualizations.

  • ML-powered anomaly detection to help customers uncover hidden insights by continuously analyzing across billions of data points.
  • ML-powered forecasting and what-if analysis to predict key business metrics with point-and-click simplicity.
  • Auto-narratives to help customers tell the story of their dashboard in a plain-language narrative.

In this post, I demonstrate how a Telecom provider with little to no ML expertise can use Amazon QuickSight ML capabilities to detect fraudulent calls.

Prerequisites

To implement this solution, you need the following resources:

  • Amazon S3 to stage a ‘ribbon’ call detail record sample in a CSV format.
  • AWS Glue running an ETL job in PySpark.
  • AWS Glue crawlers to discover the schema of the tables and update the AWS Glue Data Catalog.
  • Amazon Athena to query the Amazon QuickSight dataset.
  • Amazon QuickSight to build visualizations and perform anomaly detection using ML Insights.

Diagram of fraudulent call-detecting architecture, using a PySpark script to prepare the data and transform it into Parquet and an AWS Glue crawler to build the AWS Glue Data Catalog.

The dataset

For this post, I use a synthetic dataset, thanks to Ribbon Communications. The data was generated by call test generators, and is not customer or sensitive data.

Inspecting the data

The example below is a typical CDR. The STOP CDR shown below is generated after a call has been terminated.


As you can see, there are a lot of values here. Most of them are not relevant for fraud identification or prevention.

Revenue shared fraud

Revenue shared fraud is one of the most common fraud schemes threatening the telecom industry today. It involves using fraudulent or stolen numbers to repeatedly call a premium rate B-number, who then shares the cash generated with the fraudster.

Say that you’d like to detect national and international revenue share fraud using Amazon QuickSight ML. Consider the typical traits of a revenue share fraud phone call. The pattern for revenue share fraud is multiple A-numbers calling the same B-number or a range of B-numbers with the same prefix. The call duration is usually higher than average and could be up to two hours, which is the maximum length of time international switches allow. Generally, the calls originate from one cell or a group of cells.

One SIM may make short test calls to a variety of B-numbers as a precursor to the fraud itself, which most often happens when the risk of detection is lowest, for example, Friday night, weekends, or holidays. Conference calling may be used to make several concurrent calls from one A-number.

Often, SIMs used for this type of fraud are sold or activated in bulk from the same distributor or group of distributors. SIMs could be topped up using fraudulent online or IVR payments, such as using stolen credit card numbers. Both PAYG credit and bundles may be used.Based on the above use case, the following pieces of information are most relevant to detecting fraud.

  • Call duration
  • Calling number (A number)
  • Called number (B number)
  • Start time of the call
  • Accounting ID

You can use this reference to help identify those fields in a CDR.

Figure 2: Decoded CDR data, highlighting the relevant fields.

I identified the columns that I need out of 235 columns in the CDR.

Inspecting the raw sample data, I quickly see that it’s missing a header.

To make life easier, I converted the raw CSV data, added the column names, and converted to Parquet.

Discovering the data

In the AWS Glue console, set up a crawler and name it CDR_CRAWLER.

Point the crawler to s3://telco-dest-bucket/blog where the Parquet CDR data resides.

Next, create a new IAM role to be used by the AWS Glue crawler.

For Frequency, leave the default definition of Run on Demand.

Next, choose Add database and define the name of the database. This database contains the table discovered by the AWS Glue crawler.

Choose next and review the crawler settings. When you’re satisfied, choose Finish.

Next, choose Crawlers, select the crawler that you just created (CDR_CRAWLER), and choose Run crawler.

The AWS Glue crawler starts crawling the database. This can take one minute or more to complete.

When it’s complete, under Data catalog, choose Databases.  You should be able to see the new database created by the AWS Glue crawler. In this case, the name of the database is blog.

To view the tables created under this database, select the relevant database and choose Tables. The crawler’s table also points to the location of the Parquet format CDRs.

To see the table’s schema, select the table created by the crawler.

Data preparation

You have defined the relevant dimensions to use in the ML model to detect fraud. Now, you can use a PySpark script that I built earlier using an Amazon SageMaker notebook and an AWS Glue endpoint. The script covers the following tasks:

  • Reduce the dataset and focus only on the relevant columns.
  • Create a timestamp column, which you need for creating an analysis using Amazon QuickSight.
  • Transform files from CSV to Parquet for improved performance.

You can run the PySpark script on the raw CSV format of the CDRs that you are using. Here is the location of the raw CSV format:

s3:/telco-source-bucket/machine-learning-for-all/v1.0.0/data/cdr-stop/cdr_stop.csv

Here is the PySpark script that I created.

import sys    
from awsglue.transforms import *    
from awsglue.utils import getResolvedOptions    
from pyspark.context import SparkContext    
from awsglue.context import GlueContext    
from awsglue.job import Job    
import pyspark.sql.functions as fn    
from awsglue.dynamicframe import DynamicFrame    
    
    
sc = SparkContext.getOrCreate()    
glueContext = GlueContext(sc)    
spark = glueContext.spark_session    
    
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "demo_ml", table_name = "cdr_stop_csv", transformation_ctx = "datasource0")    
#apply mapping from source table to destination , we pick only the relevant columns     
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("col2", "string", "Accounting_ID", "string"), ("col13", "long", "Call_service_duration", "long"), ("col5", "string", "Start_Time_(MM/DD/YYYY)", "string"), ("col6", "string", "Start_Time_(HH/MM/SSs)", "string"), ("col19", "long", "Calling number", "string"), ("col20", "long", "Called number", "string")], transformation_ctx = "applymapping1")    
    
resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_struct", transformation_ctx = "resolvechoice2")    
resolvechoice2.printSchema()    
    
resolvechoice3 = ResolveChoice.apply(frame = resolvechoice2, choice = "MATCH_CATALOG", database = "demo_ml", table_name = "cdr_stop_csv", transformation_ctx = "resolvechoice3")    
resolvechoice3.printSchema()    
    
customDF = resolvechoice3.toDF()    
#create timestamp column    
customDF = customDF.withColumn('timestamp', fn.concat(fn.col("Start_Time_(MM/DD/YYYY)"),fn.lit(" "),fn.col("Start_Time_(HH/MM/SSs)")))    
    
#create timestamp2 column which is a substring of timestamp column    
customDF = customDF.withColumn('timestamp2',fn.substring(fn.col("timestamp"),1,19))    
#create Date column     
customDF =customDF.withColumn("Date",fn.unix_timestamp(fn.col("timestamp2"),"MM/dd/yyyy HH:mm:ss").cast("timestamp"))    
    
#remove temporary fields     
customDF = customDF.drop('timestamp','timestamp2')    
    
customDynamicFrame = DynamicFrame.fromDF(customDF, glueContext, "customDF_df")    
#transform to parquet format and land in S3 path    
datasink4 = glueContext.write_dynamic_frame.from_options(frame = customDynamicFrame, connection_type = "s3", connection_options = {"path": "s3://telco-dest-bucket/blog/"}, format = "parquet", transformation_ctx = "datasink4")    

The dataset has been cataloged in AWS Glue Data Catalog and is queryable using Athena.

Amazon QuickSight and anomaly detection

Next, build out anomaly detection using Amazon QuickSight. To get started, follow these steps.

  1. In the Amazon QuickSight console, choose new analysis.
  2. click on create new data set
  3. select Athena
  4. enter a data source name
  5. click on create data source
  6. select from the drop down list the relevant database and table that were created by the AWS Glue crawlers and click on select
  7. select directly query your data and click visualize

Visualizing the data using Amazon QuickSight

  1. Under visual types, choose Line chart.
  2. Drag call_service_duration to the Value field well.
  3. Drag timestamp_new to the X axis field well.

Amazon QuickSight generates a dashboard, as in the following screenshot.

The x-axis is the timestamp. By default, it’s based on the aggregates of one day. This can be changed by choosing a different value.

Because I currently define the timestamp to look on one-day aggregations, the call duration is a sum of all call durations from all call records within a day. I can begin the search by looking for days where the total call duration is high.

Anomaly detection

Now look at how to start using the ML insights anomaly detection feature.

  1. On the top of the Insights panel, choose Add anomaly to sheet. This creates an insights visual for anomaly detection.
  2. On the top of the screen, choose Field Wells and add at least one field to the Categories, as in the following example. I added the calling/called number, as those become relevant for fraud use cases; for example, one A-number calling multiple B-numbers or multiple A-numbers calling B-numbers.
    The categories represent the dimensional values by which Amazon QuickSight splits the metric. For example, you can analyze anomalies on sales across all product categories and product SKUs—assuming there are 10 product categories, each with 10 product SKUs. Amazon QuickSight splits the metric by the 100 unique combinations and runs anomaly detection on each of the split metric.
  3. To configure the anomaly detection job, choose Get Started.
  4. On the anomaly detection configuration screen, set up the following options:
  • Analyze all combinations of these categories—By default, if you have selected three categories, Amazon QuickSight runs anomaly detection on the following combinations hierarchically: A, AB, ABC. If you select this option, QuickSight analyzes all combinations: A, AB, ABC, BC, AC. If your data is not hierarchical, check this option.
  • Schedule—Set this option to run anomaly detection on your data hourly, daily, weekly, or monthly, depending on your data and needs. For Start schedule on and Timezone, enter values and choose OK.Important: The schedule does not take effect until you publish the analysis as a dashboard. Within the analysis, you have the option to run the anomaly detection manually (without the schedule).Contribution analysis on anomaly – You can select up to four additional dimensions for Amazon QuickSight to analyze the top contributors when an anomaly is detected. For example, Amazon QuickSight can show you the top customers that contributed to a spike in sale. In my current example, I added one additional dimension: the accounting ID. If you think about a telecom fraud case, you can also consider fields like charging time or cell ID as additional dimensions.
  1. After setting the configuration, choose Run Now to execute the job manually, which includes the “Detecting anomalies… This may take a while…” message. Depending on the size of your dataset, this may take a few minutes or up to an hour.
  2. When the anomaly detection job is complete, any anomalies are called out in the insights visual. By default, only the top anomalies for the latest time period in the data are shown in the insights visuals.

    Anomaly detection reveals several B numbers being called from multiple A numbers with a high call service duration on August 29, 2018. That looks interesting!
  3. To explore all anomalies for this insight, select the menu on the top-right corner of the visual and choose Explore Anomalies.
  4. On the Anomalies detailed page, you can see all the anomalies for the latest period.
    In the view, you can see that two anomalies were detected, showing two time series.The title of the visuals represents the metric that is run on the unique combination of the categorical fields. In this case:
  • [All] | 9645000024
  • 3512000024 | [ALL]So the system detected anomalies for multiple A-numbers calling 9645000024, and 351200024 calling multiple B numbers. In both cases, it observed a high call duration. The labeled data point on the chart represents the most recent anomaly that is detected for that time series.
  1. To expose a date picker, choose show anomalies by date at the top-right corner. This chart shows the number of anomalies that were detected for each day (or hour, depending on your anomaly detection configuration). You can select a particular day to see the anomalies detected for that day.For example, selecting August 10, 2018 on the top chart shows the anomalies for that day:

    Important:
    The first 32 points in the dataset are used for training and are not scored by the anomaly detection algorithm. You may not see any anomalies on the first 32 data points.You can expand the filter controls on the top of the screen. With the filter controls, you can change the anomaly threshold to show high, medium, or low significance anomalies. You can choose to show only anomalies that are higher than expected or lower than expected. You can also filter by the categorical values that are present in your dataset to look at anomalies only for those categories.
  2. Look at the contributors columns. When you configured the anomaly detection, you defined the accounting ID as another dimension. If this were real call traffic instead of practice data, you would be able to single out specific accounting IDs that contribute to the anomaly.
  3. When you’re done, choose Back to analysis.

Summary

In this post, I explored a common fraud pattern called shared revenue fraud. I looked at how to extract the relevant data for training the anomaly detection model in Amazon QuickSight. I then used this data to detect anomalies based on call duration, calling party, and called party, looking at additional contributors like Accounting ID. The entire process used serverless technologies and little to no machine learning experience.

For more information about options and strategies, see Amazon QuickSight Announces General Availability of ML Insights.

If you have questions or suggestions, please comment below.

 


About the Author

Guy Ben Baruch is a solutions architect with Amazon Web Services.

 

 

 

Introducing Amazon QuickSight fine-grained access control over Amazon S3 and Amazon Athena

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/introducing-amazon-quicksight-fine-grained-access-control-over-amazon-s3-and-amazon-athena/

Today, AWS is excited to announce the availability of fine-grained access control for AWS Identity and Access Management (IAM)-permissioned resources in Amazon QuickSight. Fine-grained access control allows Amazon QuickSight account administrators to control authors’ default access to connected AWS resources. Fine-grained access control enables administrators to use IAM policies to scope down access permissions, limiting specific authors’ access to specific items within the AWS resources. Administrators can now apply this new level of access control to Amazon S3, Amazon Athena, and Amazon RDS/Redshift database discovery.

Fine-Grained Access Control Setup

Here’s how fine-grained access control works in Amazon QuickSight:

Imagine an AWS customer, Acme. Acme’s account contains three S3 buckets, called b1, b2, and b3. Amazon QuickSight is configured to read data via Athena. Assume that Acme’s administrators have configured Amazon QuickSight service role permissions with access to all three buckets. The new default access setting introduced today enables administrators to limit access to Acme’s data for all users by default. Administrators grant access to specific buckets (b1, b2, b3) to individual users or groups within Acme via IAM policies.

In the following example, the policies assigned to Acme users A and B, and Group X grant them access to buckets 1, 2, and 3. Group Y is not assigned, as shown in the following diagram.

When User A attempts to read data via Athena from bucket 1, AWS evaluates the IAM policy associated with the user. Since the policy assigned to User A grants access to bucket 1, the query succeeds. User A can access data from bucket 1. Similarly, User B and users in group X can access data in buckets 2 and 3, respectively.

However, when a user from group Y tries to access bucket 2, QuickSight doesn’t allow any access to data. Remember, group Y has no user-level assignments. Users from group Y are denied access to bucket 2 because Amazon QuickSight requires explicit permissions to access data, as shown in the following diagram.

In an S3 data lake, Amazon QuickSight enables administrators to restrict each author’s data access using IAM policies. (Other AWS mechanisms, outside of Amazon QuickSight, provide for the modification of the policies themselves.) This fine-grained access control relies on the underlying IAM policy evaluation infrastructure. Currently, there is a ten-policy limit on policies that may be linked together at runtime to evaluate the user’s permissions.

Amazon QuickSight also offers an updated UI for AWS permissions management. Administrators can now access an account’s default resource setting, as shown in the following screenshot:

You can set Default resource access to Allow or Deny, based on the administration model. Choose the appropriate option and press Update. (Or, if you decide not to proceed, press Cancel.)

As before, you can specify AWS resource permissions for the Amazon QuickSight account through the new UI. Select resources from the checkboxes on the right side of the screen, as shown in the following screenshot:

And, as shown in the next screenshot, S3 access can be granted to buckets within the AWS account or other AWS accounts:

To control access for specific users or groups, use the newly introduced fine-grained access control feature, illustrated in this screenshot:

The IAM policy assignments button leads you to a page displaying all assignments in the account and lets you create a new assignment, as shown in the following screenshot.

Creating a new policy assignment involves only two steps:

  1. Pick from an IAM policy from those on the AWS account list.
  2. Assign to specific users or groups.

If you haven’t yet configured groups in Amazon QuickSight, you can do so using AWS APIs for accounts using SSO or Amazon QuickSight-native credentials. Groups are also natively available in an AD-connected account.

Fine-grained access control in Amazon QuickSight, when combined with data in S3 and Athena, allows you to set up a secure environment for data exploration across the organization. This feature is available in Amazon QuickSight Enterprise Edition in all supported AWS Regions starting today.

 


About the Author

Jose Kunnackal is a principal product manager for Amazon QuickSight.

 

 

 

Amazon QuickSight updates: Multiple sheets in dashboards, axis label orientation options, and more

Post Syndicated from Sahitya Pandiri original https://aws.amazon.com/blogs/big-data/amazon-quicksight-updates-multiple-sheets-in-dashboards-axis-label-orientation-options-and-more/

Today, we are pleased to announce a set of updates to Amazon QuickSight:

  • Richer dashboards with multiple sheets in your regular and embedded dashboards
  • Multiple axis label orientation options for better readability of dashboards
  • More calculations such as standard deviation, variance and conditional string functions on SPICE
  • Enhanced URL actions for supporting a broader set of interaction scenarios
  • One-click duplication of visuals for faster authoring

Multiple sheets in Dashboards

First, let’s look at the update to dashboards that allows multiple sheets, accessed via a new tab control. This allows better organization of information within the dashboard, where visuals related to specific subject areas or topics can be organized in separate sheets and distinctly identified through the tab name. As a reader, you can then navigate to a single dashboard to get a comprehensive view of all the insights related to a topic.

Each sheet within a dashboard has a distinct URL, which allows you to bookmark the specific sheet, or share a link to the specific sheet with others. Additionally, dashboards can also be setup to filter content at a dashboard level, or at a per-sheet level, with the filters being passed either via on-screen controls or using the URL.

Filtering through on-screen controls

Navigating to another dashboard and passing filters via URL

If you are authoring a dashboard, you will find that every dashboard now starts off as a single sheet. You can use the “add sheet” icon in the tabs section to add a new sheet to your dashboard. Note that the tab control only shows up on a dashboard if there is more than one sheet. When authoring, you will also be able to rename a sheet, change the order of sheets or delete a sheet. The first sheet will always be the one presented to your readers by default, so this would be where you want to provide a summary of the dashboard with KPIs, ML-Insights, or high level trends. Email reports will send the first sheet of the dashboard to all your viewers; we are working on additional rendering options for email reports.

Previously, on the Big Data Blog, we covered ways to add parameters and on-screen controls to dashboards. Parameter values are global across a dashboard, which allow you to maintain user inputs and filter content according to user selections across the dashboard. On-screen controls are specific to each sheet, since the context of these controls is often confined to the sheet itself. If you want to discreetly filter data based on user or group-level attributes, you may also use row-level security to do so.

Multi-sheet dashboards can be shared through the regular website, or can be embedded in your custom application. To embed a dashboard in your application, visit Embed interactive dashboards in your application with Amazon QuickSight.

Axis label orientation

When authoring a dashboard, you will see that Amazon QuickSight now automatically selects a horizontal or angled layout to provide better readability. You can also manually navigate to a visual’s options menu and choose “Format visual”, and then choose the axis label orientation that is most suited to your dashboard.

Duplicate visual option

When authoring a dashboard, there are often situations where you want to quickly clone a visual and make changes. With the new “Duplicate visual” option, Amazon QuickSight allows you to achieve this via a one-click action from the visual’s options menu. This new option creates a copy of the visual, along with its filters and other settings such as visual size and format, within the same sheet or a different sheet within the dashboard, thus removing the overhead of creating visuals from scratch each time.

URL action open options

URL actions allow you to configure actions outside of QuickSight that can be triggered by a user viewing a QuickSight dashboard. With today’s release we will allow you to choose how these URLs are opened, with options of overriding the viewer’s QuickSight browser tab, loading in a new tab, or a new browser window altogether.

In addition to these features that improve author and reader experience, QuickSight now supports additional statistical capabilities through new functions in calculated fields. These statistical functions include standard deviation and variance calculations and conditional functions on string fields. With standard deviation and variance, you can now determine dispersion of a metric relative to the mean. You can do this across your entire data set or partition by a select dimension. For more information, see Functions by Category in the Amazon QuickSight User Guide.

With conditional string functions on SPICE, you can now use coalesce, ifelse, isNotNull, isNull, nullif while building reports on SPICE datasets. For more information, see Calculated Field Function and Operator Reference for Amazon QuickSight in the Amazon QuickSight user guide.

All of these new features are now available in both Standard and Enterprise Editions in all supported Amazon QuickSight regions – US East (N. Virginia and Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Singapore, Sydney and Tokyo).


About the Author

Sahitya Pandiri is a technical program manager with Amazon Web Services. Sahitya has been in the product/program management for 5 years now, and has built multiple products in the retail, healthcare and analytics spaces. She enjoys problem solving, and leveraging technology to simplify processes.

 

 

 

Enabling serverless security analytics using AWS WAF full logs, Amazon Athena, and Amazon QuickSight

Post Syndicated from Umesh Ramesh original https://aws.amazon.com/blogs/security/enabling-serverless-security-analytics-using-aws-waf-full-logs/

Traditionally, analyzing data logs required you to extract, transform, and load your data before using a number of data warehouse and business intelligence tools to derive business intelligence from that data—on top of maintaining the servers that ran behind these tools.

This blog post will show you how to analyze AWS Web Application Firewall (AWS WAF) logs and quickly build multiple dashboards, without booting up any servers. With the new AWS WAF full logs feature, you can now log all traffic inspected by AWS WAF into Amazon Simple Storage Service (Amazon S3) buckets by configuring Amazon Kinesis Data Firehose. In this walkthrough, you’ll create an Amazon Kinesis Data Firehose delivery stream to which AWS WAF full logs can be sent, and you’ll enable AWS WAF logging for a specific web ACL. Then you’ll set up an AWS Glue crawler job and an Amazon Athena table. Finally, you’ll set up Amazon QuickSight dashboards to help you visualize your web application security. You can use these same steps to build additional visualizations to draw insights from AWS WAF rules and the web traffic traversing the AWS WAF layer. Security and operations teams can monitor these dashboards directly, without needing to depend on other teams to analyze the logs.

The following architecture diagram highlights the AWS services used in the solution:

Figure 1: Architecture diagram

Figure 1: Architecture diagram

AWS WAF is a web application firewall that lets you monitor HTTP and HTTPS requests that are forwarded to an Amazon API Gateway API, to Amazon CloudFront or to an Application Load Balancer. AWS WAF also lets you control access to your content. Based on conditions that you specify—such as the IP addresses from which requests originate, or the values of query strings—API Gateway, CloudFront, or the Application Load Balancer responds to requests either with the requested content or with an HTTP 403 status code (Forbidden). You can also configure CloudFront to return a custom error page when a request is blocked.

Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk. With Kinesis Data Firehose, you don’t need to write applications or manage resources. You configure your data producers to send data to Kinesis Data Firehose, and it automatically delivers the data to the destination that you specified. You can also configure Kinesis Data Firehose to transform your data before delivering it.

AWS Glue can be used to run serverless queries against your Amazon S3 data lake. AWS Glue can catalog your S3 data, making it available for querying with Amazon Athena and Amazon Redshift Spectrum. With crawlers, your metadata stays in sync with the underlying data (more details about crawlers later in this post). Amazon Athena and Amazon Redshift Spectrum can directly query your Amazon S3 data lake by using the AWS Glue Data Catalog. With AWS Glue, you access and analyze data through one unified interface without loading it into multiple data silos.

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Amazon QuickSight is a business analytics service you can use to build visualizations, perform one-off analysis, and get business insights from your data. It can automatically discover AWS data sources and also works with your data sources. Amazon QuickSight enables organizations to scale to hundreds of thousands of users and delivers responsive performance by using a robust in-memory engine called SPICE.

SPICE stands for Super-fast, Parallel, In-memory Calculation Engine. SPICE supports rich calculations to help you derive insights from your analysis without worrying about provisioning or managing infrastructure. Data in SPICE is persisted until it is explicitly deleted by the user. SPICE also automatically replicates data for high availability and enables Amazon QuickSight to scale to hundreds of thousands of users who can all simultaneously perform fast interactive analysis across a wide variety of AWS data sources.

Step one: Set up a new Amazon Kinesis Data Firehose delivery stream

  1. In the AWS Management Console, open the Amazon Kinesis Data Firehose service and choose the button to create a new stream.
    1. In the Delivery stream name field, enter a name for your new stream that starts with aws-waf-logs- as shown in the screenshot below. AWS WAF filters all streams starting with the keyword aws-waf-logs when it displays the delivery streams. Note the name of your stream since you’ll need it again later in the walkthrough.
    2. For Source, choose Direct PUT, since AWS WAF logs will be the source in this walkthrough.

      Figure 2: Select the delivery stream name and source

      Figure 2: Select the delivery stream name and source

  2. Next, you have the option to enable AWS Lambda if you need to transform your data before transferring it to your destination. (You can learn more about data transformation in the Amazon Kinesis Data Firehose documentation.) In this walkthrough, there are no transformations that need to be performed, so for Record transformation, choose Disabled.
    Figure 3: Select "Disabled" for record transformations

    Figure 3: Select “Disabled” for record transformations

    1. You’ll have the option to convert the JSON object to Apache Parquet or Apache ORC format for better query performance. In this example, you’ll be reading the AWS WAF logs in JSON format, so for Record format conversion, choose Disabled.

      Figure 4: Choose "Disabled" to not convert the JSON object

      Figure 4: Choose “Disabled” to not convert the JSON object

  3. On the Select destination screen, for Destination, choose Amazon S3.
    Figure 5: Choose the destination

    Figure 5: Choose the destination

    1. For the S3 destination, you can either enter the name of an existing S3 bucket or create a new S3 bucket. Note the name of the S3 bucket since you’ll need the bucket name in a later step in this walkthrough.
    2. For Source record S3 backup, choose Disabled, because the destination in this walkthrough is an S3 bucket.

      Figure 6: Enter the S3 bucket name, and select "Disabled" for the Source record S3 backup

      Figure 6: Enter the S3 bucket name, and select “Disabled” for the source record S3 backup

  4. On the next screen, leave the default conditions for Buffer size, Buffer interval, S3 compression and S3 encryption as they are. However, we recommend that you set Error logging to Enabled initially, for troubleshooting purposes.
    1. For IAM role, select Create new or choose. This opens up a new window that will prompt you to create firehose_delivery_role, as shown in the following screenshot. Choose Allow in this window to accept the role creation. This grants the Kinesis Data Firehose service access to the S3 bucket.

      Figure 7: Select "Create new or choose" for IAM Role

      Figure 7: Select “Allow” to create the IAM role “firehose_delivery_role”

  5. On the last step of configuration, review all the options you’ve chosen, and then select Create delivery stream. This will cause the delivery stream to display as “Creating” under Status. In a couple of minutes, the status will change to “Active,” as shown in the below screenshot.

    Figure 8: Review the options you selected

    Figure 8: Review the options you selected

Step two: Enable AWS WAF logging for a specific Web ACL

  1. From the AWS Management Console, open the AWS WAF service and choose Web ACLs. Open your Web ACL resource, which can either be deployed on a CloudFront distribution or on an Application Load Balancer.
    1. Choose the Web ACL for which you want to enable logging. (In the below screenshot, we’ve selected a Web ACL in the US East Region.)
    2. On the Logging tab, choose Enable Logging.

      Figure 9: Choose "Enable Logging"

      Figure 9: Choose “Enable Logging”

  2. The next page displays all the delivery streams that start with aws-waf-logs. Choose the Amazon Kinesis Data Firehose delivery stream that you created for AWS WAF logs at the start of this walkthrough. (In the screenshot below, our example stream name is “aws-waf-logs-us-east-1)
    1. You can also choose to redact certain fields that you wish to exclude from being captured in the logs. In this walkthrough, you don’t need to choose any fields to redact.
    2. Select Create.

      Figure 10: Choose your delivery stream, and select "Create"

      Figure 10: Choose your delivery stream, and select “Create”

After a couple of minutes, you’ll be able to inspect the S3 bucket that you defined in the Kinesis Data Firehose delivery stream. The log files are created in directories by year, month, day, and hour.

Step three: Set up an AWS Glue crawler job and Amazon Athena table

The purpose of a crawler within your Data Catalog is to traverse your data stores (such as S3) and extract the metadata fields of the files. The output of the crawler consists of one or more metadata tables that are defined in your Data Catalog. When the crawler runs, the first classifier in your list to successfully recognize your data store is used to create a schema for your table. AWS Glue provides built-in classifiers to infer schemas from common files with formats that include JSON, CSV, and Apache Avro.

  1. In the AWS Management Console, open the AWS Glue service and choose Crawler to setup a crawler job.
  2. Choose Add crawler to launch a wizard to setup the crawler job. For Crawler name, enter a relevant name. Then select Next.

    Figure 11: Enter "Crawler name," and select "Next"

    Figure 11: Enter “Crawler name,” and select “Next”

  3. For Choose a data store, select S3 and include the path of the S3 bucket that stores your AWS WAF logs, which you made note of in step 1.3. Then choose Next.

    Figure 12: Choose a data store

    Figure 12: Choose a data store

  4. When you’re given the option to add another data store, choose No.
  5. Then, choose Create an IAM role and enter a name. The role grants access to the S3 bucket for the AWS Glue service to access the log files.

    Figure 13: Choose "Create an IAM role," and enter a name

    Figure 13: Choose “Create an IAM role,” and enter a name

  6. Next, set the frequency to Run on demand. You can also schedule the crawler to run periodically to make sure any changes in the file structure are reflected in your data catalog.

    Figure 14: Set the "Frequency" to "Run on demand"

    Figure 14: Set the “Frequency” to “Run on demand”

  7. For output, choose the database in which the Athena table is to be created and add a prefix to identify your table name easily. Select Next.

    Figure 15: Choose the database, and enter a prefix

    Figure 15: Choose the database, and enter a prefix

  8. Review all the options you’ve selected for the crawler job and complete the wizard by selecting the Finish button.
  9. Now that the crawler job parameters are set up, on the left panel of the console, choose Crawlers to select your job and then choose Run crawler. The job creates an Amazon Athena table. The duration depends on the size of the log files.

    Figure 16: Choose "Run crawler" to create an Amazon Athena table

    Figure 16: Choose “Run crawler” to create an Amazon Athena table

  10. To see the Amazon Athena table created by the AWS Glue crawler job, from the AWS Management Console, open the Amazon Athena service. You can filter by your table name prefix.
      1. To view the data, choose Preview table. This displays the table data with certain fields showing data in JSON object structure.
    Figure 17: Choose "Preview table" to view the data

    Figure 17: Choose “Preview table” to view the data

Step four: Create visualizations using Amazon QuickSight

  1. From the AWS Management Console, open Amazon QuickSight.
  2. In the Amazon QuickSight window, in the top left, choose New Analysis. Choose New Data set, and for the data source choose Athena. Enter an appropriate name for the data source name and choose Create data source.

    Figure 18: Enter the "Data source name," and choose "Create data source"

    Figure 18: Enter the “Data source name,” and choose “Create data source”

  3. Next, choose Use custom SQL to extract all the fields in the JSON object using the following SQL query:
    
        ```
        with d as (select
        waf.timestamp,
            waf.formatversion,
            waf.webaclid,
            waf.terminatingruleid,
            waf.terminatingruletype,
            waf.action,
            waf.httpsourcename,
            waf.httpsourceid,
            waf.HTTPREQUEST.clientip as clientip,
            waf.HTTPREQUEST.country as country,
            waf.HTTPREQUEST.httpMethod as httpMethod,
            map_agg(f.name,f.value) as kv
        from sampledb.jsonwaflogs_useast1 waf,
        UNNEST(waf.httprequest.headers) as t(f)
        group by 1,2,3,4,5,6,7,8,9,10,11)
        select d.timestamp,
            d.formatversion,
            d.webaclid,
            d.terminatingruleid,
            d.terminatingruletype,
            d.action,
            d.httpsourcename,
            d.httpsourceid,
            d.clientip,
            d.country,
            d.httpMethod,
            d.kv['Host'] as host,
            d.kv['User-Agent'] as UA,
            d.kv['Accept'] as Acc,
            d.kv['Accept-Language'] as AccL,
            d.kv['Accept-Encoding'] as AccE,
            d.kv['Upgrade-Insecure-Requests'] as UIR,
            d.kv['Cookie'] as Cookie,
            d.kv['X-IMForwards'] as XIMF,
            d.kv['Referer'] as Referer
        from d;
        ```        
        

  4. To extract individual fields, copy the previous SQL query and paste it in the New custom SQL box, then choose Edit/Preview data.
    Figure 19: Paste the SQL query in "New custom SQL query"

    Figure 19: Paste the SQL query in “New custom SQL query”

    1. In the Edit/Preview data view, for Data source, choose SPICE, then choose Finish.

      Figure 20: Choose "Spice" and then "Finish"

      Figure 20: Choose “Spice” and then “Finish”

  5. Back in the Amazon Quicksight console, under the Fields section, select the drop-down menu and change the data type to Date.

    Figure 21: In the Amazon Quicksight console, change the data type to "Date"

    Figure 21: In the Amazon Quicksight console, change the data type to “Date”

  6. After you see the Date column appear, enter an appropriate name for the visualizations at the top of the page, then choose Save.

    Figure 22: Enter the name for the visualizations, and choose "Save"

    Figure 22: Enter the name for the visualizations, and choose “Save”

  7. You can now create various visualization dashboards with multiple visual types by using the drag-and-drop feature. You can drag and drop combinations of fields such as Action, Client IP, Country, Httpmethod, and User Agents. You can also add filters on Date to view dashboards for a specific timeline. Here are some sample screenshots:
    Figure 23: Visualization dashboard samples

    Figure 23a: Visualization dashboard samples

    Figure 23: Visualization dashboard samples

    Figure 23b: Visualization dashboard samples

    Figure 23: Visualization dashboard samples

    Figure 23c: Visualization dashboard samples

    Figure 23: Visualization dashboard samples

    Figure 23d: Visualization dashboard samples

Conclusion

You can enable AWS WAF logs to Amazon S3 buckets and analyze the logs while they are being streamed by configuring Amazon Kinesis Data Firehose. You can further enhance this solution by automating the streaming of data and using AWS Lambda for any data transformations based on your specific requirements. Using Amazon Athena and Amazon QuickSight makes it easy to analyze logs and build visualizations and dashboards for executive leadership teams. Using these solutions, you can go serverless and let AWS do the heavy lifting for you.

Author photo

Umesh Kumar Ramesh

Umesh is a Cloud Infrastructure Architect with Amazon Web Services. He delivers proof-of-concept projects, topical workshops, and lead implementation projects to various AWS customers. He holds a Bachelor’s degree in Computer Science & Engineering from National Institute of Technology, Jamshedpur (India). Outside of work, Umesh enjoys watching documentaries, biking, and practicing meditation.

Author photo

Muralidhar Ramarao

Muralidhar is a Data Engineer with the Amazon Payment Products Machine Learning Team. He has a Bachelor’s degree in Industrial and Production Engineering from the National Institute of Engineering, Mysore, India. Outside of work, he loves to hike. You will find him with his camera or snapping pictures with his phone, and always looking for his next travel destination.