All posts by Peter Chung

Building a Showback Dashboard for Cost Visibility with Serverless Architectures

Post Syndicated from Peter Chung original https://aws.amazon.com/blogs/architecture/building-a-showback-dashboard-for-cost-visibility-with-serverless-architectures/

Enterprises with centralized IT organizations and multiple lines of businesses frequently use showback or chargeback mechanisms to hold their departments accountable for their technology usage and costs. Chargeback involves actually billing a department for the cost of their division’s usage. Showback focuses on visibility to make the department more cost conscientious and encourage operational efficiency.

Building a showback mechanism can be potentially challenging for business and financial analysts of an AWS Organization. You may not have the scripting or data engineering skills needed to coordinate workflows and build reports at scale. Although you can use AWS Cost Explorer as starting point, you may want greater customizability, larger datasets beyond a one-year period, and more of a business intelligence (BI) experience.

In this post, we discuss the benefits of building a showback dashboard using the AWS Cost and Usage Report (AWS CUR). You can track costs by cost center, business unit, or project using managed services. Using a showback strategy, you can consolidate and present costs to a business unit to show resource use over a set period of time for your entire AWS Organization. Building this solution with managed services allows you to spend time understanding your costs rather than maintaining the underlying infrastructure.

This solution highlights AWS Glue DataBrew to prepare your data into the appropriate format for dashboards. We recommend DataBrew because it provides a no-code environment for data transformation. It allows anyone to create dashboards similar to those built in the Cloud Intelligence Dashboards Workshop for your Organization.

Figure 1. QuickSight showback dashboard using CUR data transformed by Glue DataBrew and leveraging QuickSight insights

Figure 1. QuickSight showback dashboard using CUR data transformed by Glue DataBrew and leveraging QuickSight insights

Tags for cost allocation

The success of your showback dashboard partially depends on your cost allocation tagging strategy. Typically, customers use business tags such as cost center, business unit, or project to associate AWS costs with traditional financial reporting dimensions within their organization.

The CUR supports the ability to break down AWS costs by tag. For example, if a group of resources are labeled with the same tag, you’ll be able to see the total cost and usage of that group of resources. Read more about Tagging Best Practices to develop a tagging strategy for your organization.

A serverless data workflow for showback dashboards

You can build showback dashboards with managed services such as Amazon QuickSight, without the need to write any code or manage any servers.

Figure 2. A serverless architecture representing data workflow

Figure 2. A serverless architecture representing data workflow

AWS automatically delivers the data you need for showback dashboards through the CUR. Once this data arrives in an Amazon Simple Storage Service (S3) bucket, you can transform the data without the need to write any code by using DataBrew. You can also automatically identify the data schema, and catalog the data’s properties to run queries using Amazon Athena. Lastly, you can visualize the results by publishing and sharing dashboards to key stakeholders within your organization using Amazon QuickSight.

The key benefits of this approach are:

  • Automatic data delivery
  • No-code data transformation
  • Automatic cataloging and querying
  • Serverless data visualization

Let’s take a look at each in more detail.

Automatic data delivery

The CUR is the source for historical cost and usage data. The CUR provides the most comprehensive set of cost and usage data available and will include your defined cost allocation tags for your entire Organization. You configure CUR to deliver your billing data to an Amazon S3 bucket at the payer account level. This will consolidate data for all linked accounts. After delivery starts, Amazon updates the CUR files at least once a day.

No-code data transformation

You can use DataBrew to transform the data in the Amazon S3 bucket aggregating cost and usage according to your tagging strategy. DataBrew summarizes your data for discovery. You can also run transformations called “jobs” in DataBrew without writing any code, using over 250 built-in transforms. Figures 3 through 5 show several job examples.

Figure 3. DataBrew recipe action: rename column

Figure 3. DataBrew recipe action: rename column

Figure 4. DataBrew recipe action: Create column from function

Figure 4. DataBrew recipe action: Create column from function

Figure 5. DataBrew recipe action: fill missing values

Figure 5. DataBrew recipe action: fill missing values

For a full list of columns available in CUR, review the CUR Data Dictionary. Following is a list of relevant columns for an executive summary showback dashboard:

  • bill_billing_period_start_date
  • line_item_usage_account_id
  • line_item_line_item_type
  • product_product_name
  • product_product_family
  • product_product_transfer_type
  • savings_plan_savings_plan_effective cost
  • reservation_effective_cost
  • line_item_unblended_cost

Based on data refresh and business requirements, DataBrew can run a job on a recurring basis (for example, every 12 hours). This can be run at a particular time of day, or as defined by a valid CRON expression. This helps you automate your transformation workflows.

Automatic cataloging and querying

You can use a Glue crawler to automatically classify your data to determine the data’s format, schema, and associated properties. The crawlers write metadata to an AWS Glue Data Catalog to help data users find the data they need.

With the results in Amazon S3, and the metadata in the Glue Data Catalog, you can run standard SQL to queries with Athena. This will help you make more informed business decisions by tracking financial metrics and optimizing costs. This is done directly in Amazon S3 without having to move around data. Using standard SQL, you can create views that aggregate cost and usage by your defined tags.

Serverless data visualization

You can use Amazon QuickSight to create and share dashboards with your teams for cost visibility. QuickSight provides native integration with Athena and S3, and lets you easily create and publish interactive BI dashboards that include ML-powered insights. When building a showback dashboard such as the example in Figure 1, QuickSight authors create visuals and publish interactive dashboards.

Readers log in using your preferred authentication mechanism to view the shared dashboard. You can then filter data based on billing periods, account number, or cost allocation tags. You can also drill down to details using a web browser or mobile app.

Conclusion

In this blog, we’ve discussed designing and building a data transformation process and a showback dashboard. This gives you highly granular cost visualization without having to provision and manage any servers. You can use managed services such as AWS Glue DataBrew, Amazon Athena, and Amazon QuickSight to crawl, catalog, analyze, and visualize your data.

We recommend defining your organization tagging strategy to be able to view costs by tags. You can then get started by creating Cost and Usage Reports. With the data in Amazon S3, you can use the services described in this post to transform the data that works for your business. Additionally, you can get started today by experimenting with the Cloud Intelligence Dashboards Workshop. This workshop provides examples of visualizations that you can build using native AWS services on top of your Cost and Usage Report. You will be able to get cost, usage, and operational insights about your AWS Cloud usage.

Building a Cloud-based OLAP Cube and ETL Architecture with AWS Managed Services

Post Syndicated from Peter Chung original https://aws.amazon.com/blogs/architecture/building-a-cloud-based-olap-cube-and-etl-architecture-with-aws-managed-services/

For decades, enterprises used online analytical processing (OLAP) workloads to answer complex questions about their business by filtering and aggregating their data. These complex queries were compute and memory-intensive. This required teams to build and maintain complex extract, transform, and load (ETL) pipelines to model and organize data, oftentimes with commercial-grade analytics tools.

In this post, we discuss building a cloud-based OLAP cube and ETL architecture that will yield faster results at lower costs without sacrificing performance by:

  • Connecting your on-premises database to the cloud for data profiling, discovery, and transformation
  • Running OLAP workloads without costly third-party software licenses, dedicated infrastructure, or the need to migrate data
  • Using AWS Glue Data Catalog, Amazon Athena, Amazon QuickSight, and Amazon SageMaker to catalog and visualize data with machine learning (ML)

Data analytics pipeline with AWS Managed Services

The proposed architecture in Figure 1 relies on AWS Managed Services. AWS Glue DataBrew is a no-code data transformation service that you can use to quickly build your transformation jobs. AWS Glue crawlers collect metadata from the transformed data and catalogs it for analytics and visualization using Athena and QuickSight. SageMaker will build, train, and deploy ML models.

This architecture will help you get answers from your data to your users as fast as possible without needing to migrate your data to AWS. There is no coding required, so you can leverage data transformation, cataloging, analytics, and ML quickly.

Figure 1. Example architecture using AWS Managed Services

Figure 1. Example architecture using AWS Managed Services

Benefits of AWS Managed Services for data analytics

Immediate connectivity to on-premises databases

The example architecture in Figure 1 begins with an online transaction processing (OLTP) database running in your corporate data center. Figure 2 shows how you can establish a Java database connectivity (JDBC) connection from the OLTP database to DataBrew running in AWS to run OLAP workloads. DataBrew supports data sources using JDBC for common data stores such as Microsoft SQL Server, MySQL, Oracle, and PostgreSQL.

DataBrew - JDBC connection to data source

Figure 2. DataBrew – JDBC connection to data source

Automatic data discovery

Figures 3 through 6 show how DataBrew summarizes your data for discovery. You can profile your data to understand patterns and detect anomalies. You can also run transformations called “jobs” in DataBrew without writing any code using over 250 built-in transforms.

DataBrew - dataset profiling overview

Figure 3. DataBrew – dataset profiling overview

 

DataBrew - data correlation patterns

Figure 4. DataBrew – data correlation patterns

 

DataBrew - data points distribution

Figure 5. DataBrew – data points distribution

No-code data transformation and cataloging

To run OLAP-type transactions, you can create jobs based on the transformation steps shown in Figure 6. These steps collectively are referred to as DataBrew recipes. These recipe results can be run as a job and outputted to an Amazon Simple Storage Service (Amazon S3) bucket.

A DataBrew project user interface view with sample data and transformation functions

Figure 6. A DataBrew project user interface view with sample data and transformation functions

Scheduled DataBrew jobs act similarly to scheduled ETL pipelines in OLAP. Based on data refresh and business requirements, DataBrew can run a job on a recurring basis (for example, every 12 hours). This can be run at a particular time of day, or as defined by a valid CRON expression. This helps you automate your transformation workflows.

The OLAP catalog is a set of metadata that sits between the actual OLAP data stored and applications. To create a Data Catalog, you can use AWS Glue crawlers to automatically classify your data to determine the data’s format, schema, and associated properties. Figure 7 shows the results of a crawler’s results written to Data Catalog as metadata to help data users find the data they need.

AWS Glue crawler metadata table output of column names and data types

Figure 7. AWS Glue crawler metadata table output of column names and data types

Data analytics without third-party software licenses

You can run analytics on your data by referring to the metadata definitions in the Data Catalog as references to the actual data in Amazon S3 using Athena. Athena is well suited for running one-time queries using standard SQL to query the transformed data directly in Amazon S3 without having to move data around. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Enterprises often supplement their OLAP workloads with separate visualization and business intelligence (BI) tools. These tools often come with their own licensing, server management, and security considerations.

You can visualize curated data using QuickSight, a scalable, serverless, embeddable, ML-powered BI service. QuickSight lets you easily create and publish interactive BI dashboards that include ML-powered insights, as shown in Figure 8. These dashboards can be shared with other users and embedded within your own applications.

A sample of data visualization options with Amazon QuickSight

Figure 8. A sample of data visualization options with Amazon QuickSight

Finally, you can incorporate ML workloads to OLAP workloads using SageMaker. In the past, ML workloads were often expensive, resource-intensive, and inaccessible. SageMaker provides a fully managed ML service to quickly and easily build and train ML models and directly deploy them into a production-ready hosted environment.

Conclusion

In this post, we show you how to connect your on-premises database using a JDBC connection to DataBrew for data profiling, discovery, and transformation. We looked at how you can use DataBrew recipes and jobs to run OLAP workloads without costly third-party software licenses, dedicated infrastructure, or the need to migrate any data. We also looked at AWS capabilities in data cataloging, visualization, and machine learning using Data Catalog, Athena, QuickSight, and SageMaker without having to manage any servers.

Laying the foundation to modernize an analytics workflow is critical for many enterprises that are looking to reduce the time it takes to understand their business. With AWS, you can perform enterprise-scale analytics with our portfolio of analytics services.