Tag Archives: AWS Cost and Usage Report

How potential performance upside with AWS Graviton helps reduce your costs further

Post Syndicated from Markus Adhiwiyogo original https://aws.amazon.com/blogs/compute/how-potential-performance-upside-with-aws-graviton-helps-reduce-your-costs-further/

Amazon Web Services (AWS) provides many mechanisms to optimize the price performance of workloads running on Amazon Elastic Compute Cloud (Amazon EC2), and the selection of the optimal infrastructure to run on can be one of the most impactful levers. When we started building the AWS Graviton processor, our goal was to optimize AWS Graviton features and capabilities to deliver a processor that provides the best price performance across a broad array of cloud workloads running on Amazon EC2. That goal continues to be our guiding principle, and today customers who adopt AWS Graviton-based EC2 instances see up to 40% better price performance on their cloud workloads when compared to equivalent non-Graviton EC2 instances. The price performance improvement is the result of both the performance improvement and the lower price in using AWS Graviton-based instances.

Price performance blends the cost of infrastructure with the amount of work you can achieve with infrastructure usage. After talking to many AWS Graviton customers, we’ve learned that the cost savings go beyond the lower AWS Graviton-based instances price. Many AWS Graviton customers told us that the performance increase from AWS Graviton allows them to consume fewer computing hours than comparable non-Graviton instances for equivalent workload throughput. In turn, this leads to further cost reduction.

The following are some of examples from our customers:

  • Pinterest achieved 47% cost savings and 38% savings on compute resources while reducing carbon emissions by 62% for its web API workload.
  • SAP powers its SAP HANA Cloud with AWS Graviton to enhance its price performance by 35% while lowering carbon impact by 45%.
  • Sprinklr improved their machine learning (ML) inference workloads’ throughput by up to 20% while reducing costs by up to 25%.

You can find more customer examples in the AWS Graviton testimonials page.

To help organizations capture similar benefits, we’ve enhanced the AWS Graviton Savings Dashboard (GSD) with new features that account for both pricing and performance improvements. In the following section we explore these new capabilities and how they can help optimize your infrastructure costs.

Understanding performance-driven cost optimization in the GSD

The GSD helps organizations identify ideal workloads for AWS Graviton migration through automated resource matching and data-driven visualizations. You can learn the GSD details and setup in this AWS compute post.

Although the dashboard has traditionally focused on calculating direct cost savings from the AWS Graviton pricing advantages, we’ve observed that customers often experience more benefits when their applications perform more efficiently on AWS Graviton processors, leading to decreased compute resource usage. To better reflect these real-world scenarios, we’ve enhanced the dashboard with new features highlighting Normalized Instance Hours (NIH) analysis capabilities so that you can model potential savings based on both pricing benefits and compute hour reductions. Although this tool helps estimate potential savings, actual performance improvements can only be determined by testing your specific workloads on AWS Graviton instances. Performance is always workload and use case specific, so we encourage you to test your AWS Graviton-based workloads using the Optimization and Performance Runbook to help you determine the actual possible NIH percent reduction.

Key dashboard components

This section outlines the following three key dashboard components: NIH reduction analysis, enhanced cost analysis visualizations, and detailed savings analysis.

NIH reduction analysis

The dashboard now features a new slider that lets you model potential cost savings by inputting the percentage reduction in NIH. Many organizations have found it challenging to calculate their total possible savings since the benefits come from two sources: the lower instance pricing of AWS Graviton and the reduced compute hours.

You can use the slider to model different cost scenarios by adjusting a theoretical NIH reduction between 0% and 40%. You can use this slider to input NIH reductions validated through your workload testing, model the combined impact of both pricing benefits and reduced compute hours, and explore different scenarios to help prioritize which workloads to test first.

Figure 1: NIH slider location

Figure 1: NIH slider location

Assume that your testing shows that your workload runs just as effectively with 15% fewer normalized instance hours on AWS Graviton. You can now plug that exact number into the slider to see your modeled savings combining both pricing differences and compute hour reductions. Although we’ve heard success stories of significant reductions from customers, we recommend starting your initial estimate with a conservative 10% baseline and adjusting based on your own testing results.

Enhanced cost analysis visualizations

The dashboard presents key visualizations that demonstrate the direct relationship between NIH reduction and cost savings. First, you see the Potential Graviton Base Savings from pricing differences alone. In the following diagram, we can observe an example of $61.54K of cost savings from migrating to equivalent AWS Graviton instances. Next, the Estimated Additional Savings Due to Performance in the same diagram shows $42.40K in savings if your performance testing confirms a 15% NIH reduction in your workload. Finally, the dashboard sums these two values into the Total Potential Graviton Savings of $103.94K. The Total Potential Graviton Savings helps visualize how both pricing benefits and any validated compute hour reductions could contribute to your overall savings.

Figure 2: Visualization with relationship between NIH reduction and cost savings

Figure 2: Visualization with relationship between NIH reduction and cost savings

The Amortized Cost Breakdown and Normalized Instance Hrs Breakdown charts in the following figure show 6-month historical trends, helping you spot patterns such as seasonal spikes or high-usage periods. These patterns can help you identify where even small efficiency improvements might yield significant savings, for example, workloads with consistently high usage or predictable peak periods that would be good candidates for testing.

Figure 3: Amortized Cost, NIH, and Total Potential Savings Breakdown charts

Figure 3: Amortized Cost, NIH, and Total Potential Savings Breakdown charts

Detailed savings analysis

Building on our commitment to help customers optimize cloud costs, we’ve enhanced the Potential Graviton Savings Details table with two columns focused on performance-based savings modeling. The Estimated Additional Savings Due to Performance column shows the modeled savings based on your chosen NIH reduction percentage, while Total Potential Graviton Savings combines this with the base pricing benefits.

Figure 4: Potential Graviton Savings Details table

Figure 4: Potential Graviton Savings Details table

As you examine your current instance family, you can observe both baseline AWS Graviton savings and these added saving opportunities clearly laid out in a comprehensive breakdown. The analysis presents your total savings potential in both dollar amounts and percentages. This allows you to build a compelling business case for migration. Although this detailed breakdown provides valuable planning insights, remember that actual savings may vary depending on your specific workload patterns, implementation approaches, and operational considerations.

Conclusion

The Graviton Savings Dashboard (GSD) serves as a powerful analytics tool that streamlines your journey to cost-effective cloud computing. The GSD provides clear visualizations and interactive features to help you understand and maximize potential savings when migrating to AWS Graviton-based instances. To further explore the new features, navigate to the GSD interactive demo, where you can model an example of potential savings using the NIH reduction slider and detailed cost breakdowns.

Ready to explore how AWS Graviton can transform your infrastructure costs? Visit the GSD page to deploy or update your GSD dashboard. Access implementation guides, such as the CFM Technical Implementation Playbook (CFM TIPs), and start optimizing your cloud spend today with the enhanced capabilities of the GSD.

Over 85,000 AWS customers have discovered the benefits of AWS Graviton, with many completing their adoptions in just hours. We have created this resource guide so that you can accelerate your AWS Graviton adoption with minimal effort and enjoy significant price performance benefits.

“What I always tell customers is one week, one application, one engineer, and see what you can do. They always are pleasantly surprised by how much progress they can make. If you’re out there and you haven’t yet moved to AWS Graviton, what are you waiting for? Let’s make it happen!”

Dave Brown, VP, AWS Compute & ML Services

Important note about performance testing
The GSD does not attempt to estimate the potential NIH percent reduction or your workload’s performance when transitioned to AWS Graviton. You can use it to perform what-if analysis of your potential savings for a projected NIH percent reduction. In the absence of this variable, GSD only considers the price delta between instance types and misses an important contributor to the overall savings potential of AWS Graviton from the performance upside. Compute performance is always workload and use case specific, so we encourage you to test your AWS Graviton-based workloads using the Optimization and Performance Runbook to help you determine the actual possible NIH percent reduction.

Monitor, analyze, and manage capacity usage from a single interface with Amazon EC2 Capacity Manager

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/monitor-analyze-and-manage-capacity-usage-from-a-single-interface-with-amazon-ec2-capacity-manager/

Today, I’m happy to announce Amazon EC2 Capacity Manager, a centralized solution to monitor, analyze, and manage capacity usage across all accounts and AWS Regions from a single interface. This service aggregates capacity information with hourly refresh rates and provides prioritized optimization opportunities, streamlining capacity management workflows that previously required custom automation or manual data collection from multiple AWS services.

Organizations using Amazon Elastic Compute Cloud (Amazon EC2) at scale operate hundreds of instance types across multiple Availability Zones and accounts, using On-Demand Instances, Spot Instances, and Capacity Reservations. This complexity means customers currently access capacity data through various AWS services including the AWS Management Console, Cost and Usage Reports, Amazon CloudWatch, and EC2 describe APIs. This distributed approach can create operational overhead through manual data collection, context switching between tools, and the need for custom automation to aggregate information for capacity optimization analysis.

EC2 Capacity Manager helps you overcome these operational complexities by consolidating all capacity data into a unified dashboard. You can now view cross-account and cross-Region capacity metrics for On-Demand Instances, Spot Instances, and Capacity Reservations across all commercial AWS Regions from a single location, eliminating the need to build custom data collection tools or navigate between multiple AWS services.

This consolidated visibility can help you discover cost savings by highlighting underutilized Capacity Reservations, analyzing usage patterns across instance types, and providing insights into Spot Instance interruption patterns. By having access to comprehensive capacity data in one place, you can make more informed decisions about rightsizing your infrastructure and optimizing your EC2 spending.

Let me show you the capabilities of EC2 Capacity Manager in detail.

Getting started with EC2 Capacity Manager
On the AWS Management Console, I navigate to Amazon EC2 and select Capacity Manager from the navigation pane. I enable EC2 Capacity Manager through the service settings. The service aggregates historical data from the previous 14 days during initial setup.

The main Dashboard displays capacity utilization across all instance types through a comprehensive overview section that presents key metrics at a glance. The capacity overview cards for Reservations, Usage, and Spot show trend indicators and percentage changes to help you identify capacity patterns quickly. You can apply filtering through the date filter controls, which include date range selection, time zone configuration, and interval settings.

You can select different units to analyze data by vCPUs, instance counts, or estimated costs to understand resource consumption patterns. Estimated costs are based on published On-Demand rates and do not include Savings Plans or other discounts. This pricing reference helps you compare the relative impact of underutilized capacity across different instance types—for example, 100 vCPU hours of unused p5 reservations represents a larger cost impact than 100 vCPU hours of unused t3 reservations.

The dashboard includes detailed Usage metrics with both total usage visualization and usage over time charts. The total usage section shows the breakdown between reserved usage, unreserved usage, and Spot usage. The usage over time chart provides visualization that tracks capacity trends over time, helping you identify usage patterns and peak demand periods.

Under Reservation metrics, Reserved capacity trends visualizes used and unused reserved capacity across the selected period, showing the proportion of reserved vCPU hours that remain unutilized compared with those actively consumed, helping you track reservation efficiency patterns and identify periods of consistent low utilization. This visibility can help you reduce costs by identifying underutilized reservations and helping you to make informed decisions about capacity adjustments.

The Unused capacity section lists underutilized capacity reservations by instance type and Availability Zone combinations, displaying specific utilization percentages and instance types across different Availability Zones. This prioritized list helps you identify potential savings with direct visibility into unused capacity costs.

The Usage tab provides detailed historical trends and usage statistics across all AWS Regions for Spot Instances, On-Demand Instances, Capacity Reservations, Reserved Instances, and Savings Plans. Dedicated Hosts usage is not included. The Dimension filter helps you group by and filter capacity data by Account ID, Region, Instance Family, Availability Zone, and Instance Type, creating custom views that reveal usage patterns across your accounts and AWS Organizations. This helps you analyze specific configurations and compare performance across accounts or Regions.

The Aggregations section provides a comprehensive usage table across EC2 and Spot Instances. You can select different units to analyze data by vCPUs, instance counts, or estimated costs to understand resource consumption patterns. The table shows instance family breakdowns with total usage statistics, reserved usage hours, unreserved usage hours, and Spot usage data. Each row includes a View breakdown action for a detailed analysis.

The Capacity usage or estimated cost trends section visualizes usage trends, reserved usage, unreserved usage, and Spot usage. You can filter the displayed data and adjust the unit of measurement to view historical patterns. These filtering and analysis tools help you identify usage trends, compare costs across dimensions, and make informed decisions for capacity planning and optimization.

When you choose View breakdown from the Aggregations table, you access detailed Usage breakdown based on the dimension filters you selected. This breakdown view shows usage patterns for individual instance types within the selected family and Availability Zone combinations, helping you identify specific optimization opportunities.

The Reservations tab displays capacity reservation utilization with automated analysis capabilities that generate prioritized lists of optimization opportunities. Similar to the Usage tab, you can apply dimension filters by Account ID, Region, Instance Family, Availability Zone, and Instance Type along with additional options related to the reservation details. On each of the tabs you can drill down to see data for individual line items. For reservations specifically, you can view specific reservations and access detailed information about On-Demand Capacity Reservations (ODCRs), including utilization history, configuration parameters, and current status. When the ODCR exists in the same account as Capacity Manager, you can modify reservation parameters directly from this interface, eliminating the need to navigate to separate EC2 console sections for reservation management.

The Statistics section provides summary metrics, including total reservations count, overall utilization percentage, reserved capacity totals, used and unused capacity volumes, average scheduled reservations, and counts of accounts, instance families, and Regions with reservations.

This consolidated view helps you understand reservation distribution and utilization patterns across your infrastructure. For example, you might discover that your development accounts consistently show 30% reservation utilization while production accounts exceed 95%, indicating an opportunity to redistribute or modify reservations. Similarly, you could identify that specific instance families in certain Regions have sustained low utilization rates, suggesting candidates for reservation adjustments or workload optimization. These insights help you make data-driven decisions about reservation purchases, modifications, or cancellations to better align your reserved capacity with actual usage patterns.

The Spot tab focuses on Spot Instance usage and displays the amount of time your Spot instances run before being interrupted. This analysis of Spot Instance usage patterns helps you identify optimization opportunities for Spot Instance workloads. You can use Spot placement score recommendations to improve workload flexibility.

For organizations requiring data export capabilities, Capacity Manager includes data exports to Amazon Simple Storage Service (Amazon S3) buckets for capacity analysis. You can view and manage your data exports through the Data exports tab, which helps you create new exports, monitor delivery status, and configure export schedules to analyze capacity data outside the AWS Management Console.

Data exports extend your analytical capabilities by storing capacity data beyond the 90-day retention period available through the console and APIs. This extended retention enables long-term trend analysis and historical capacity planning. You can also integrate exported data with existing analytics workflows, business intelligence tools, or custom reporting systems to incorporate EC2 capacity metrics into broader infrastructure analysis and decision-making processes.

The Settings section provides configuration options for AWS Organizations integration, enabling centralized capacity management across multiple accounts. Organization administrators can enable enterprise-wide capacity visibility or delegate access to specific accounts while maintaining appropriate permissions and access controls.

Now available
EC2 Capacity Manager eliminates the operational overhead of collecting and analyzing capacity data from multiple sources. The service provides automated optimization opportunities, centralized multi-account visibility, and direct access to capacity management tools. You can reduce manual analysis time while improving capacity utilization and cost optimization across your EC2 infrastructure.

Amazon EC2 Capacity Manager is available at no additional cost. To begin using Amazon EC2 Capacity Manager, visit the Amazon EC2 console or access the service APIs. The service is available in all commercial AWS Regions.

To learn more, visit the EC2 Capacity Manager documentation.

— Esra

Perform per-project cost allocation in Amazon SageMaker Unified Studio

Post Syndicated from Enrique Salgado Hernández original https://aws.amazon.com/blogs/big-data/perform-per-project-cost-allocation-in-amazon-sagemaker-unified-studio/

Amazon SageMaker Unified Studio is a single data and AI development environment where you can find and access your data and act on it using AWS resources for SQL analytics, data processing, model development, and generative AI application development.

SageMaker Unified Studio is part of the next generation of Amazon SageMaker. SageMaker brings together AWS artificial intelligence and machine learning (AI/ML) and analytics capabilities and delivers an integrated experience for analytics and AI with unified access to data.

With SageMaker Unified Studio, you can create domains and projects, providing a single interface to build, deploy, execute, and monitor end-to-end workflows. This approach helps drive collaboration across teams and facilitates agile development.

SageMaker Unified Studio implements resource tagging when AWS resources are provisioned. You can use these tags to track and allocate costs for the various resources created as part of the domains and projects within SageMaker Unified Studio.

This post demonstrates how to perform cost allocation using these resource tags, so finance analysts and business analysts can implement and follow Financial Operations (FinOps) best practices to control and track cloud infrastructure costs.

Solution overview

The following diagram illustrates how tagging works within SageMaker domains.

High level diagram that illustrates SageMaker Unified Studio entities (domains, projects and environments) are organized and how tags are applied to each of them

Before reviewing the implementation details, let’s explore several key SageMaker concepts: domain, project, project profile, and environment blueprint. For more information, refer to the SageMaker Unified Studio Administrator Guide.

  • Domain – A domain is an organizing entity created by an administrator. Administrators assign users to domains to enable collaboration using similar tools, assets, and resources. A domain can represent a business organization or a business unit containing people who collaborate and share resources. After creating a domain, administrators share the URL with users to access the portal.
  • Projects – Projects exist within each domain. A project provides a boundary where users can collaborate on a business use case. Users can create and share data, computing, and other resources within projects.
  • Project profile – When you create a project, you must select a project profile. A project profile is a template that governs infrastructure for the project, simplifying project creation with preconfigured settings and resources ready for use.
  • Environment blueprints – Environment blueprints are reusable templates for creating environments. They define settings for resource deployment and provide information for provisioning. Each blueprint uses an AWS CloudFormation template to create resources in a repeatable and scalable manner.

For effective cost tracking and allocation, make sure your SageMaker resources have proper tags. You can configure these as cost allocation tags to group and filter across AWS Billing and Cost Management tools (such as AWS Cost Explorer and AWS Data Exports).

As of this writing, SageMaker domains support tagging at the blueprint, domain, project, and environment level. When you create projects or add resources within an existing project, the following tags are automatically added to resources through CloudFormation resource tags, configured for each blueprint stack:

  • AmazonDataZoneBlueprint – Type of blueprint corresponding to this blueprint’s CloudFormation template (for example, Tooling)
  • AmazonDataZoneDomainAmazon DataZone domain associated with this CloudFormation template
  • AmazonDataZoneEnvironment – Amazon DataZone environment ID associated with this CloudFormation template
  • AmazonDataZoneProject – Amazon DataZone project associated with this CloudFormation template

To track costs in SageMaker Unified Studio, you will perform the following steps:

  1. Create a SageMaker domain and project.
  2. Configure cost and billing settings by enabling cost allocation tags.
  3. (Optional) Generate costs for your project.
  4. Track costs using Cost Explorer and Data Exports.

Prerequisites

This post requires the following configurations in your AWS account:

  • AWS IAM Identity Center enabled in your organization management account (preferred) or in the member account where you will use SageMaker Unified Studio. For instructions on enabling IAM Identity Center, refer to Enable IAM Identity Center.
  • Cost Explorer enabled in your organization management account (preferred) or in the member account where you will use SageMaker Unified Studio. For configuration steps, refer to Enabling Cost Explorer.

Either legacy AWS Cost and Usage Reports (AWS CUR) with Amazon Athena integration or Data Exports configured and integrated with Athena for queries. For setup instructions, refer to creating Data Exports.

Create a SageMaker Unified Studio domain and project

Complete the following steps to set up your domain and project:

  1. Create a SageMaker Unified Studio domain using the Quick setup option (recommended for new users) or manual setup.

After domain creation, you will be redirected to the domain overview page.

  1. Choose Open Unified Studio.
  2. On the SageMaker Unified Studio console, choose Create project.
  3. For Project profile, choose SQL analytics, then choose Continue.

SageMaker Unified Studio create project wokflow (configuration page)

  1. Choose Continue to keep the default blueprint parameters.
  2. Review the configuration summary, then choose Create project.

SageMaker Unified Studio create project wokflow (confirmation page)

After the project is created, you will be redirected to the project overview page. Record the project ID and domain ID.

Project details page showing various details such as project id, project name and project IAM role ARN

Cost and billing configuration

As mentioned earlier, to track costs in SageMaker Unified Studio, you must configure cost allocation tags. Refer to Organizing and tracking costs using AWS cost allocation tags for more information about this feature.

Complete the following steps:

  1. On the AWS Billing and Cost Management console, under Cost organization in the navigation pane, choose Cost allocation tags.
  2. Select the following tags and choose Activate:
    1. AmazonDataZoneDomain
    2. AmazonDataZoneProject
    3. AmazonDataZoneEnvironment
    4. AmazonDataZoneBlueprint

The AmazonDataZoneProject and AmazonDataZoneDomain tags correspond to the project and domain ID values you recorded earlier.

AWS cost allocation tags interface showing the AWS tags that are currently configured as cost allocation tags

Cost allocation tags configuration doesn’t apply retroactively. If you want to monitor costs associated with these tags in the AWS Billing and Cost Management tools before the activation date, you must request a cost allocation tag backfill. The backfill operation can take several hours to complete.

Generate costs for the project

This section explains how to generate costs associated with the underlying data backend (Amazon Redshift in this case) to examine them using AWS billing tools. You can skip this section if you’re tracking costs on an active project.

To generate costs, we use the table structure used in the Redshift Immersion Labs. Refer to Create Tables for more details.

To run queries in SageMaker Unified Studio, follow these steps:

  1. In your project, choose New and then Query.

Image that shows the query button within the SageMaker Unified Studio project overview page allowing users to open the query editor tool

  1. Use the Amazon Redshift Serverless compute configured for the project to generate the costs:
    1. Choose the Redshift (Lakehouse) connection.
    2. Choose the dev database.
    3. Choose the project schema.
    4. Choose Choose.

Image that shows the conection selector available in SageMaker Unified Studio. In this case Redshift LakeHouse connection is selected with dev database and project schema selected underneath

  1. Copy and execute the SQL statements provided in the following GitHub repo into the SageMaker Unified Studio query editor to create, load, and validate data on the tables.

View of the Query editor within the SageMaker Unified Studio portal. Image contains two SQL queries (create tables and COPY data operation)

After running these steps, you will have generated some Amazon Redshift costs that will be present for further analysis in AWS Billing and Cost Management tools. However, these tools (Cost Explorer and Data Exports) are refreshed least one time every 24 hours, so you might need to wait up to 24 hours before proceeding to the next section.

Tracking costs in AWS Billing and Cost Management tools

With the cost allocation tags enabled, you can use AWS Billing and Cost Management tools to analyze and track costs, including Cost Explorer and Data Exports. For more information about using these tools, refer to the AWS Billing and Cost Management User Guide.

Check costs in Cost Explorer

You can check your SageMaker Unified Studio costs using Cost Explorer. With this tool, you can view and analyze your costs and usage through an interface with pre-built filters and aggregation capabilities for various metrics. For more information, refer to the Analyzing your costs and usage with AWS Cost Explorer.

To access Cost Explorer, complete the following steps:

  1. On the AWS Management Console, choose your account name in the top right corner and choose Billing Dashboard, or search for “Cost Explorer” in the console search bar.
  2. On the Billing Dashboard, choose Cost Explorer in the navigation pane.
  3. For first-time users, choose Launch Cost Explorer to enable the service.

AWS can take up to 24 hours to prepare your cost data.

  1. To view overall costs per project, configure the following report parameters:
    1. For Date Range, enter your range.
    2. For Granularity, choose Monthly.
    3. For Dimension, choose Tag.
    4. For Tag, enter your tag (AmazonDataZoneProject).

Image that shows how to group by a particular dimension (tag) in cost explorer

The following screenshot shows a sample report.

AWS cost explorer report showing costs by SageMaker Unified Studio project

  1. To view different service costs for a specific project, update the following parameters:
    1. For Dimension, choose Service.Image that shows how to group by a particular dimension (service) in cost explorer
    2. For Tag¸ choose AmazonDataZoneProject and choose the value of the project you want to inspect (in this case, 4z9d694nbsnyqx).

Image that illustrates how to filter by a specific dimension (tag) and value in cost explorer

The results should look similar to the following screenshot.

AWS cost explorer report showing service costs for a particular SageMaker Unified Studio project

Check costs using Data Exports

With Data Exports, you can query your cost and usage in AWS with the maximum flexibility degree compared to other tools such as Cost Explorer. It provides a comprehensive set of measures and dimensions that you can include in the export to create a personalized report. This report is then delivered to Amazon Simple Storage Service (Amazon S3) so you can configure it with Athena, so it can be queried using SQL or business intelligence (BI) tools such as Amazon QuickSight.

This post assumes you have already configured a data export and you have it integrated with Athena (refer to Processing data exports for more information). For instructions on setting up CUR and Athena integration, refer to Creating reports.

Check costs by project

Use the following query to check costs by project:

SELECT product_servicecode,
    product_product_family,
    resource_tags[ 'user_amazon_data_zone_project' ] as user_amazon_data_zone_project,
    round(sum(line_item_unblended_cost), 2) costs,
    line_item_line_item_description 
FROM "data_exports"."data_exportdata"
where resource_tags [ 'user_amazon_data_zone_project' ] != ''
group by product_product_family,
    product_servicecode,
    resource_tags[ 'user_amazon_data_zone_project' ],
    line_item_line_item_description
order by round(sum(line_item_unblended_cost), 2) DESC;

Results will look similar to the following screenshot on the Athena console.

Athena SQL query results when querying cost and usage data from data exports

The preceding query shows your costs grouped by:

  • Project (using tags)
  • Service
  • Product family, which corresponds to the subtype for a given product usage charge (for example, ML Instance for SageMaker, or Managed Storage for Amazon Redshift)

Check costs for individual projects

To check costs for a specific SageMaker Unified Studio project (for example, the sample project 4z9d694nbsnyqx created during this walkthrough), you can use the following query:

SELECT product_servicecode,
    product_product_family,
    resource_tags[ 'user_amazon_data_zone_project' ] as user_amazon_data_zone_project,
    round(sum(line_item_unblended_cost), 2) costs,
    line_item_line_item_description 
FROM "data_exports"."data_exportdata"
where resource_tags [ 'user_amazon_data_zone_project' ] != ''
and resource_tags [ 'user_amazon_data_zone_project' ] = <provide the project id here>
group by product_product_family,
    product_servicecode,
    resource_tags[ 'user_amazon_data_zone_project' ],
    line_item_line_item_description
order by round(sum(line_item_unblended_cost), 2) DESC;

Monitor costs with Data Exports and QuickSight

If you enabled Athena to work with Data Exports, you can also configure QuickSight to query this data source. With QuickSight, you can create interactive dashboards to track SageMaker costs in SageMaker Unified Studio at scale.

Configure access and permissions

To create CUR dashboards in QuickSight, first complete the following steps:

  1. Subscribe to QuickSight and have an author user account. For instructions on subscribing to QuickSight, refer to Signing up for an Amazon QuickSight subscription.
  2. Enable access to Athena and your CUR S3 bucket in the Security & permissions section of the QuickSight administration console. You need QuickSight administrator permissions to access this console.

Image shows QuickSight administration console where administrators can edit the AWS services (Athena in this case) that QuickSight is allowed to access

  1. If you’re using AWS Lake Formation, make sure your QuickSight user is authorized to query the CUR database and table. For more information about granting access in Lake Formation, refer to Granting permissions on Data Catalog resources.

Create a QuickSight dataset

The next step is to create a dataset in QuickSight using a SQL query. For instructions on creating a dataset with SQL, refer to Using SQL to customize data. Use the following SQL expression:

SELECT product_servicecode,
    product_product_family,
    resource_tags[ 'user_amazon_data_zone_environment' ] as user_amazon_data_zone_environment,
    resource_tags[ 'user_amazon_data_zone_project' ] as user_amazon_data_zone_project,
    resource_tags[ 'user_amazon_data_zone_domain' ] as user_amazon_data_zone_domain,
    line_item_unblended_cost,
    line_item_usage_start_date,
    line_item_line_item_description
FROM "data_exports"."data_exportdata"
where resource_tags [ 'user_amazon_data_zone_environment' ] != '' or resource_tags [ 'user_amazon_data_zone_project' ] != ''

Image of QuickSight dataset preparation page. Shows a SQL query that is used to extract data from the data exports previously configured.

The preceding query includes only cost and usage data that’s tagged with either user_amazon_data_zone_environment or user_amazon_data_zone_project to focus on SageMaker associated costs. To include other AWS costs, you must modify these filters.

Create QuickSight dashboards

Using the authoring capabilities of QuickSight, you can create interactive dashboards where business stakeholders can explore and track costs associated with SageMaker Unified Studio projects. You can use these dashboards to review relevant cost metrics at a glance that are derived from the Data Exports dimensions and metrics included in your dataset, as shown in the following screenshot. For more information about adding visuals to analyses, refer to Adding visuals to Amazon QuickSight analyses.

Example of a QuickSight dashboard consuming data exports cost and usage data. Dashboard contains multiple visuals that illustrate SageMaker Unified Studio costs by project and service

The preceding example shows a dashboard built using QuickSight connected to a Data Exports dataset. The dashboard contains the following visuals:

  • KPI visual showing the current monthly costs for SageMaker Unified Studio along with the month over month (MoM) variation and history
  • Autonarrative visual analyzing SageMaker Unified Studio costs (highest) by month
  • Vertical stacked bar chart showing SageMaker Unified Studio costs by month (grouped by project)
  • Donut chart showing SageMaker Unified Studio cost by service
  • Heat map visual correlating costs by project ID and service

Using this approach (QuickSight and Data Exports), you can create highly customizable dashboards to explore and monitor your SageMaker Unified Studio costs. Furthermore, you can create automated reports using the QuickSight reporting feature to send these by email to the relevant stakeholders.

Clean up

Delete the resources you created as part of this post when you’re done with them to avoid monthly charges. This includes SageMaker resources, created Data Export reports and the QuickSight subscription (in case it was created to visualize costs).

  1. Delete SageMaker resources
    1. Log in to the SageMaker domain using an admin role.
    2. Delete the project you created.
    3. Delete the SageMaker domain.
  2. Delete Data Exports reports
    1. On the AWS Billing console, in the navigation pane, choose Cost & Usage Reports.
    2. Select the report you want to delete.
    3. Choose Delete.
    4. Confirm the deletion by choosing Delete report.

For more information about managing Data Exports, refer to Deleting exports.

  1. Unsubscribe from QuickSight
    1. On the QuickSight console, choose your profile name in the top right corner.
    2. Choose Manage QuickSight.
    3. Choose Account settings.
    4. At the bottom of the page, choose Delete your QuickSight account.
    5. Review the information about data deletion.
    6. Enter delete to confirm.
    7. Choose Delete.

IMPORTANT NOTE: Before unsubscribing, make sure you backed up any dashboards or analyses you want to keep. After deletion, you can’t recover your QuickSight assets. For more information about managing your QuickSight subscription, refer to Deleting your Amazon QuickSight subscription and closing the account.

Conclusion

Managing costs on a unified platform like SageMaker can seem challenging because it aggregates many tools and services with different cost models. In this post, we showed how to use AWS Billing and Cost Management tools to aggregate and categorize costs across the various services used within SageMaker. With this approach, you can monitor and track respective service costs, either in aggregate or focusing on a particular project.

Start taking control of your analytics and ML costs today. With AWS Billing and Cost Management tools with SageMaker, you can:

  • Track and monitor your service costs
  • Break down expenses by project or service
  • Implement efficient back charging mechanisms to the different business units or organizations using SageMaker within your organization

For further reading, refer to Analyzing your costs and usage with AWS Cost Explorer and Processing Data Exports (using Athena).


About the authors

Enrique Salgado Hernández is a Senior Specialist Solutions Architect at AWS with more than 10 years of experience working in the cloud. He specializes in designing and implementing large-scale analytics architectures across various industry sectors. He is passionate about working with customers to solve their problems by supporting them during their cloud journey.

Angel Conde Manjon is a Senior EMEA Data & AI PSA, based in Madrid. He previously worked on research related to data analytics and AI in diverse European research projects. In his current role, Angel helps partners develop businesses centered on data and AI.

Managing AWS Lambda runtime upgrades

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/managing-aws-lambda-runtime-upgrades/

This post is written by Julian Wood, Principal Developer Advocate, and Dan Fox, Principal Specialist Serverless Solutions Architect.

AWS Lambda supports multiple programming languages through the use of runtimes. A Lambda runtime provides a language-specific execution environment, which provides the OS, language support, and additional settings, such as environment variables and certificates that you can access from your function code.

You can use managed runtimes that Lambda provides or build your own. Each major programming language release has a separate managed runtime, with a unique runtime identifier, such as python3.11 or nodejs20.x.

Lambda automatically applies patches and security updates to all managed runtimes and their corresponding container base images. Automatic runtime patching is one of the features customers love most about Lambda. When these patches are no longer available, Lambda ends support for the runtime. Over the next few months, Lambda is deprecating a number of popular runtimes, triggered by end of life of upstream language versions and of Amazon Linux 1.

Runtime Deprecation
Node.js 14 Nov 27, 2023
Node.js 16 Mar 11, 2024
Python 3.7 Nov 27, 2023
Java 8 (Amazon Linux 1) Dec 31, 2023
Go 1.x Dec 31, 2023
Ruby 2.7 Dec 07, 2023
Custom Runtime (provided) Dec 31, 2023

Runtime deprecation is not unique to Lambda. You must upgrade code using Python 3.7 or Node.js 14 when those language versions reach end of life, regardless of which compute service your code is running on. Lambda can help make this easier by tracking which runtimes you are using and providing deprecation notifications.

This post contains considerations and best practices for managing runtime deprecations and upgrades when using Lambda. Adopting these techniques makes managing runtime upgrades easier, especially when working with a large number of functions.

Specifying Lambda runtimes

When you deploy your function as a .zip file archive, you choose a runtime when you create the function. To change the runtime, you can update your function’s configuration.

Lambda keeps each managed runtime up to date by taking on the operational burden of patching the runtimes with security updates, bug fixes, new features, performance enhancements, and support for minor version releases. These runtime updates are published as runtime versions. Lambda applies runtime updates to functions by migrating the function from an earlier runtime version to a new runtime version.

You can control how your functions receive these updates using runtime management controls. Runtime versions and runtime updates apply to patch updates for a given Lambda runtime. Lambda does not automatically upgrade functions between major language runtime versions, for example, from nodejs14.x to nodejs18.x.

For a function defined as a container image, you choose a runtime and the Linux distribution when you create the container image. Most customers start with one of the Lambda base container images, although you can also build your own images from scratch. To change the runtime, you create a new container image from a different base container image.

Why does Lambda deprecate runtimes?

Lambda deprecates a runtime when upstream runtime language maintainers mark their language end-of-life or security updates are no longer available.

In almost all cases, the end-of-life date of a language version or operating system is published well in advance. The Lambda runtime deprecation policy gives end-of-life schedules for each language that Lambda supports. Lambda notifies you by email and via your Personal Health Dashboard if you are using a runtime that is scheduled for deprecation.

Lambda runtime deprecation happens in several stages. Lambda first blocks creating new functions that use a given runtime. Lambda later also blocks updating existing functions using the unsupported runtime, except to update to a supported runtime. Lambda does not block invocations of functions that use a deprecated runtime. Function invocations continue indefinitely after the runtime reaches end of support.

Lambda is extending the deprecation notification period from 60 days before deprecation to 180 days. Previously, blocking new function creation happened at deprecation and blocking updates to existing functions 30 days later. Blocking creation of new functions now happens 30 days after deprecation, and blocking updates to existing functions 60 days after.

Lambda occasionally delays deprecation of a Lambda runtime for a limited period beyond the end of support date of the language version that the runtime supports. During this period, Lambda only applies security patches to the runtime OS. Lambda doesn’t apply security patches to programming language runtimes after they reach their end of support date.

Can Lambda automatically upgrade my runtime?

Moving from one major version of the language runtime to another has a significant risk of being a breaking change. Some libraries and dependencies within a language have deprecation schedules and do not support versions of a language past a certain point. Moving functions to new runtimes could potentially impact large-scale production workloads that customers depend on.

Since Lambda cannot guarantee backward compatibility between major language versions, upgrading the Lambda runtime used by a function is a customer-driven operation.

Lambda function versions

You can use function versions to manage the deployment of your functions. In Lambda, you make code and configuration changes to the default function version, which is called $LATEST. When you publish a function version, Lambda takes a snapshot of the code, runtime, and function configuration to maintain a consistent experience for users of that function version. When you invoke a function, you can specify the version to use or invoke the $LATEST version. Lambda function versions are required when using Provisioned Concurrency or SnapStart.

Some developers use an auto-versioning process by creating a new function version each time they deploy a change. This results in many versions of a function, with only a single version actually in use.

While Lambda applies runtime updates to published function versions, you cannot update the runtime major version for a published function version, for example from Node.js 16 to Node.js 20. To update the runtime for a function, you must update the $LATEST version, then create a new published function version if necessary. This means that different versions of a function can use different runtimes. The following shows the same function with version 1 using Node.js 14.x and version 2 using Node.js 18.x.

Version 1 using Node.js 14.x

Version 1 using Node.js 14.x

Version 2 using Node.js 18.x

Version 2 using Node.js 18.x

Ensure you create a maintenance process for deleting unused function versions, which also impact your Lambda storage quota.

Managing function runtime upgrades

Managing function runtime upgrades should be part of your software delivery lifecycle, in a similar way to how you treat dependencies and security updates. You need to understand which functions are being actively used in your organization. Organizations can create prioritization based on security profiles and/or function usage. You can use the same communication mechanisms you may already be using for handling security vulnerabilities.

Implement preventative guardrails to ensure that developers can only create functions using supported runtimes. Using infrastructure as code, CI/CD pipelines, and robust testing practices makes updating runtimes easier.

Identifying impacted functions

There are tools available to check Lambda runtime configuration and to identify which functions and what published function versions are actually in use. Deleting a function or function version that is no longer in use is the simplest way to avoid runtime deprecations.

You can identify functions using deprecated or soon to be deprecated runtimes using AWS Trusted Advisor. Use the AWS Lambda Functions Using Deprecated Runtimes check, in the Security category that provides 120 days’ notice.

AWS Trusted Advisor Lambda functions using deprecated runtimes

AWS Trusted Advisor Lambda functions using deprecated runtimes

Trusted Advisor scans all versions of your functions, including $LATEST and published versions.

The AWS Command Line Interface (AWS CLI) can list all functions in a specific Region that are using a specific runtime. To find all functions in your account, repeat the following command for each AWS Region and account. Replace the <REGION> and <RUNTIME> parameters with your values. The --function-version ALL parameter causes all function versions to be returned; omit this parameter to return only the $LATEST version.

aws lambda list-functions --function-version ALL --region <REGION> --output text —query "Functions[?Runtime=='<RUNTIME>'].FunctionArn"

You can use AWS Config to create a view of the configuration of resources in your account and also store configuration snapshot data in Amazon S3. AWS Config queries do not support published function versions, they can only query the $LATEST version.

You can then use Amazon Athena and Amazon QuickSight to make dashboards to visualize AWS Config data. For more information, see the Implementing governance in depth for serverless applications learning guide.

Dashboard showing AWS Config data

Dashboard showing AWS Config data

There are a number of ways that you can track Lambda function usage.

You can use Amazon CloudWatch metrics explorer to view Lambda by runtime and track the Invocations metric within the default CloudWatch metrics retention period of 15 months.

Track invocations in Amazon CloudWatch metrics

Track invocations in Amazon CloudWatch metrics

You can turn on AWS CloudTrail data event logging to log an event every time Lambda functions are invoked. This helps you understand what identities are invoking functions and the frequency of their invocations.

AWS Cost and Usage Reports can show which functions are incurring cost and in use.

Limiting runtime usage

AWS CloudFormation Guard is an open-source evaluation tool to validate infrastructure as code templates. Create policy rules to ensure that developers only chose approved runtimes. For more information, see Preventative Controls with AWS CloudFormation Guard.

AWS Config rules allow you to check that Lambda function settings for the runtime match expected values. For more information on running these rules before deployment, see Preventative Controls with AWS Config. You can also reactively flag functions as non-compliant as your governance policies evolve. For more information, see Detective Controls with AWS Config.

Lambda does not currently have service control policies (SCP) to block function creation based on the runtime

Upgrade best practices

Use infrastructure as code tools to build and manage your Lambda functions, which can make it easier to manage upgrades.

Ensure you run tests against your functions when developing locally. Include automated tests as part of your CI/CD pipelines to provide confidence in your runtime upgrades. When rolling out function upgrades, you can use weighted aliases to shift traffic between two function versions as you monitor for errors and failures.

Using runtimes after deprecation

AWS strongly advises you to upgrade your functions to a supported runtime before deprecation to continue to benefit from security patches, bug-fixes, and the latest runtime features. While deprecation does not affect function invocations, you will be using an unsupported runtime, which may have unpatched security vulnerabilities. Your function may eventually stop working, for example, due to a certificate expiry.

Lambda blocks function creation and updates for functions using deprecated runtimes. To create or update functions after these operations are blocked, contact AWS Support.

Conclusion

Lambda is deprecating a number of popular runtimes over the next few months, reflecting the end-of-life of upstream language versions and Amazon Linux 1. This post covers considerations for managing Lambda function runtime upgrades.

For more serverless learning resources, visit Serverless Land.

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Post Syndicated from Sandeep Bajwa original https://aws.amazon.com/blogs/big-data/manage-your-data-warehouse-cost-allocations-with-amazon-redshift-serverless-tagging/

Amazon Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. Developers, data scientists, and analysts can work across databases, data warehouses, and data lakes to build reporting and dashboarding applications, perform real-time analytics, share and collaborate on data, and even build and train machine learning (ML) models with Redshift Serverless.

Tags allows you to assign metadata to your AWS resources. You can define your own key and value for your resource tag, so that you can easily manage and filter your resources. Tags can also improve transparency and map costs to specific teams, products, or applications. This way, you can raise cost awareness and also make teams and users accountable for their own cost and usage.

You can now use tagging in Redshift Serverless to categorize the following resources based on your grouping needs:

  • Namespace – A collection of database objects and users
  • Workgroup – A collection of compute resources
  • Snapshot – Point-in-time backups of a cluster
  • Recovery point – Recovery points in Redshift Serverless are created every 30 minutes and saved for 24 hours

When using Redshift Serverless, you may have to manage data across many business departments, environments, and billing groups. In doing so, you’re usually faced with one of the following tasks:

  • Cost allocation and financial management – You want to know what you’re spending on AWS for a given project, line of business, or environment
  • Operations support and incident management – You want to send issues to the right teams and users
  • Access control – You want to constrain user access to certain resources
  • Security risk management – You want to group resources based on their level of security or data sensitivity and make sure proper controls are in place

In this post, we focus on tagging Redshift Serverless resources for cost allocation and reporting purposes. Knowing where you have incurred costs at the resource, workload, team, and organization level enhances your ability to budget and manage cost.

Solution overview

Let’s say that your company has two departments: marketing and finance. Each department has multiple cost centers and environments, as illustrated in the following figure. In AWS Cost Explorer, you want to create cost reports for Redshift Serverless by department, environment, and cost center.

We start with creating and applying user-defined tags to Amazon Serverless workgroups for respective departments, environments, and cost centers. You can use both the AWS Command Line Interface (AWS CLI) and Redshift Serverless console to tag serverless resources.

The high-level steps are as follows:

  1. Create tags.
  2. View and edit tags.
  3. Set up cost allocation tags.
  4. Create cost reports.

Create tags

To create tags, complete the following steps:

  1. On the Amazon Redshift console, choose Manage tags in the navigation pane.
    Amazon Redshift console
  2. For Filter by resource type, you can filter by Workgroup, Namespace, Snapshot, and Recovery Point.
  3. Optionally, you can search for resources by an existing tag by entering values for Tag key or Tag value. For this post, we don’t include any tag filters, so we can view all the resources across our account.
    search for resources by an existing tag by entering values
  4. Select your resource from the search results and choose Manage tags to customize the tag key and value parameters.

Here, you can add new tags, remove tags, save changes, and cancel your changes if needed.

  1. Because we want to allocate cost across the various departments, we add a new key called department and a new value called marketing.
  1. Choose Save changes.
    Save Changes
  2. Confirm the changes by choosing Apply changes.
    Apply changes

For more details on tagging, refer to Tagging resources overview.

View and edit tags

If you already have resources such as workgroups (listed on the Workgroup configuration page) or snapshots (listed on the Data backup page), you can create new tags or edit existing tags on the given resource. In the following example, we manage tags on an existing workgroup.

  1. On the Amazon Redshift console, choose Workgroup configuration in the navigation pane.
  2. Select your workgroup and on the Actions menu, choose Manage tags.
    Select your workgroup and on the Actions menu, choose Manage tags.

Now we can remove existing tags or add new tags. For our use case, let’s assume that the marketing department is no longer using the default workgroup, so we want to remove the current tag.

  1. Choose Remove next to the marketing tag.
    Choose Remove next to the marketing tag.

We are given the option to choose Undo if needed.

  1. Choose Save changes and then Apply the changes to confirm.
    Choose Save changes and then Apply the changes to confirm.

After we apply the tags, we can view the full list of resources. The number of tags applied to each resource is found in the Tags column.
view the full list of resources

Set up cost allocation tags

After you create and apply the user-defined tags to your Redshift Serverless workgroups, it can take up to 24 hours for the tags to appear on your cost allocation tags page for activation. You can activate tags by using the AWS Billing console for cost allocation tracking with the following steps:

  1. On the AWS Billing console, choose Cost allocation tags in the navigation pane.
  2. Under User-defined cost allocation tags¸ select the tags you created and applied (for this example, cost-center).
  3. Choose Activate.
    Choose Activate

After we activate all the tags we created, we can view the full list by choosing Active on the drop-down menu.

Create cost reports

After you activate the cost allocation tags, they appear on your cost allocation reports in Cost Explorer.

Cost Explorer helps you manage your AWS costs by giving you detailed insights into the line items in your bill. In Cost Explorer, you can visualize daily, monthly, and forecasted spend by combining an array of available filters. Filters allow you to narrow down costs according to AWS service type, linked accounts, and tags.

The following screenshot shows the preconfigured reports in Cost Explorer.
preconfigured reports in Cost Explorer.

To create custom reports for your cost and usage data, complete the following steps:

  1. On the AWS Cost Management console, choose Reports in the navigation pane.
  2. Choose Create new report.
  3. Select the report type you want to create (for this example, we select Cost and usage).
  4. Choose Create Report.
    Create Report
  5. To view weekly Redshift Serverless cost by cost center, choose the applicable settings in the Report parameters pane. For this post, we group data by the cost-center tag and filter data by the department tag.
  6. Save the report for later use by choosing Save to report library.
    Save the report for later use by choosing Save to report library.
  7. Enter a name for your report, then choose Save report.
    Save Report

The following screenshot shows a sample report for daily Redshift Serverless cost by department.

sample report for daily Redshift Serverless cost by department.

The following screenshot shows an example report of weekly Redshift Serverless cost by environment.

example report of weekly Redshift Serverless cost by environment.

Conclusion

Tagging resources in Amazon Redshift helps you maintain a central place to organize and view resources across the service for billing management. This feature saves you hours of manual work you would spend in grouping your Amazon Redshift resources via a spreadsheet or other manual alternatives.

For more tagging best practices, refer to Tagging AWS resources.


About the Authors

Sandeep Bajwa is a Sr. Analytics Specialist based out of Northern Virginia, specialized in the design and implementation of analytics and data lake solutions.

Michael Yitayew is a Product Manager for Amazon Redshift based out of New York. He works with customers and engineering teams to build new features that enable data engineers and data analysts to more easily load data, manage data warehouse resources, and query their data. He has supported AWS customers for over 3 years in both product marketing and product management roles.

How Strategic Blue uses Amazon QuickSight and AWS Cost and Usage Reports to help their customers save millions

Post Syndicated from Frank Contrepois original https://aws.amazon.com/blogs/big-data/how-strategic-blue-uses-amazon-quicksight-and-aws-cost-and-usage-reports-to-help-their-customers-save-millions/

This is a guest post co-written with Frank Contrepois from Strategic Blue.

For over 10 years, Strategic Blue has helped organizations unlock the most value from the cloud by enabling their customers to purchase non-standard commitments. By taking a commodity trading approach to purchasing from AWS, Strategic Blue helps customers purchase commitments for varying lengths of time, such as a 9-month Reserved Instance or an 18-month Savings Plan. Over the years, they’ve been able to help customers save millions by maximizing commitments.

In this post, we share how Strategic Blue uses Amazon QuickSight and AWS Cost and Usage Reports to help their customers save costs.

The challenge

Buying and selling AWS differently means that the cost and usage data available to Strategic Blue on the AWS Management Console only matches what they purchased—not what their customers purchased. In order to accurately bill their customers, Strategic Blue has built a billing system that takes AWS Cost and Usage Reports (CUR) as input and outputs a pro forma CUR that matches what their customers would expect to see. Although the CUR remains a critical source of data, it can have hundreds of columns and millions of rows, making it very difficult for Strategic Blue’s customers to visualize and understand.

Initially, in order to give their customers a way to visualize and understand their spend, Strategic Blue started building a dashboard from scratch using QuickSight, AWS Glue, and Amazon Athena. The idea was to build a dashboard accessible to all of Strategic Blue’s customers (as soon as they were onboarded) that enabled them to see their true cost and usage. QuickSight provides modern interactive dashboards that enable fast time-to-insights and have features that allow for the easy embedding of dashboards into web applications linked to a customer’s single sign-on (SSO) solution. Combined with row-level-security, a dashboard can be pre-filtered to show an individual user’s cost and usage information for only the accounts they are responsible for.

However, figuring out how to accurately show cost and usage insights from the CUR from scratch was a challenge. The amount of work required to decode the raw content of the CUR to build something that matches what you would see in AWS Cost Explorer quickly became overwhelming.

The solution

To help you visualize the treasure trove of data available in a CUR file, AWS created the Cloud Intelligence Dashboards solution. The Cloud Intelligence Dashboards provide a full stack of capabilities, deployed via AWS CloudFormation or other methods, that include six different QuickSight dashboards geared towards helping you optimize your spend on AWS.

Strategic Blue deployed the CUDOS Dashboard (one of the six dashboards available as of this writing) to solve their business case for providing their customers with accurate, comprehensive insights into their cost and usage.

“The Cloud Intelligence Dashboards provided all of the accurate CUR calculations and maintenance we need. This saves us many hours of reverse engineering to keep pace with AWS innovation and changes as their product and billing mechanisms mature.”

– Frank Contrepois, Head of FinOps at Strategic Blue.

The following screenshot shows an example of the dashboard.

Embedding and scaling to all customers

Strategic Blue uses the embedding and row-level security features of QuickSight to scale the dashboards to every customer the moment they are onboarded. By integrating their SSO Amazon Cognito with QuickSight, Strategic Blue is able to provide customers instant access to an instance of CUDOS through a portal as soon as they are onboarded to Strategic Blue’s platform. The row-level security setup managed by administrators within Strategic Blue makes sure that their customers only ever see their own cost and usage data. This helps Strategic Blue offer the protection and security customers expect.

To see the Strategic Blue portal with embedded CUDOS in action, watch the following demo video, or to unlock your savings potential in the cloud, sign up with Strategic Blue now.

Now that CUDOS is embedded into Strategic Blue’s portal, they are able to offer additional value free of charge to their customers. Strategic Blue’s portal uses the customer’s CUDOS dashboard to find opportunities to save and optimize. After the engagement, Strategic Blue will continue to analyze the impact of their customer’s cost optimization measures, reporting back to them what they’ve saved and what additional opportunities there are to further optimize.

Strategic Blue is already seeing an impact. Since enabling an embedded version of CUDOS for every customer and for internal teams, Strategic Blue is seeing a 27% increase in positive feedback for regular meetings, 31% fewer support tickets related to questions about cost, and a move from tactical to strategic FinOps conversations.

The future

Strategic Blue plans to evaluate and embed the rest of the Cloud Intelligence Dashboards, such as the KPI Dashboard to help customers track goals and metrics around cost optimization, and the Trusted Advisor Organizational Dashboard, which helps customers stay on top of underutilized resources as well as operational information such as security and fault tolerance postures. With the foundation in place and QuickSight’s ease of use, Strategic Blue will continue working closely with customers to customize and refine the dashboards to meet specific customer needs, improving and expanding their offerings.

Conclusion

In this post, we covered how Strategic Blue used Cloud Intelligence Dashboards built on QuickSight to help customers get accurate insights into their cloud cost and usage. Strategic Blue’s customers can access stunning and interactive dashboards that can be customized with ease and without the need for technical skills. The comprehensive embedding capabilities, granular row-level-security, and SSO integrations of QuickSight ensure that Strategic Blue is able to provide customers secure and seamless access to their Cloud Intelligence Dashboards.

It’s easy to install Cloud Intelligence Dashboards in your account. Visit Cloud Intelligence Dashboards for details on how to deploy the dashboards today, and let us know how it goes.


About the Authors

Frank Contrepois is the Head of FinOps at Strategic Blue. Frank and his team support a wide range of customers to implement Cloud FinOps including small-and medium-sized organizations, private equity (PE) funds and investors, enterprises and global cloud resellers striking significant deals. He is an AWS APN Ambassador and has several AWS and GCP pro-level certifications. Outside of work, he is a husband, father of two wonderful boys, and co-host of the “What’s new with in Cloud FinOps” podcast.

Aaron Edell is Head of GTM for Customer Cloud Intelligence for Amazon Web Services. He is responsible for building and scaling businesses around Cloud Financial Management, FinOps, and the Well-Architected Cost Optimization pillar. He focuses his GTM efforts on the Cloud Intelligence Dashboards and remains obsessed with helping all customers get better visibility and access to their cost and usage data.

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

Post Syndicated from Dagar Katyal original https://aws.amazon.com/blogs/big-data/analyze-amazon-s3-storage-costs-using-aws-cost-and-usage-reports-amazon-s3-inventory-and-amazon-athena/

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. As the application portfolio grows, customers tend to store data from multiple application and different business functions in a single S3 bucket, which can grow the storage in S3 buckets to hundreds of TBs. The AWS Billing console provides a way to look at the total storage cost of data stored in Amazon S3, but sometimes IT organizations need to understand the breakdown of costs of a particular S3 bucket by various prefixes or objects corresponding to a particular user or application. There are various reasons to analyze the costs of S3 buckets, such as to identify the spend breakdown, do internal chargebacks, understand the cost breakdown by business unit and application, and many more. As of this writing, there is no easy way to do a cost breakdown of S3 buckets by objects and prefixes.

In this post, we discuss a solution using Amazon Athena to query AWS Cost and Usage Reports and Amazon S3 Inventory reports to analyze the cost by prefixes and objects in an S3 bucket.

Overview of solution

The following figure shows the architecture for this solution. First, we enable the AWS Cost and Usage Reports (AWS CUR) and Amazon S3 Inventory features, which save the output into two separate pre-created S3 buckets. We then use Athena to query these S3 buckets for AWS CUR data and S3 object inventory data to correlate and allocate the cost breakdown at the object or prefix level.

architecture diagram

To implement the solution, we complete the following steps:

  1. Create S3 buckets for AWS CUR, S3 object inventory, and Athena results. Alternatively, you can create these respective buckets when enabling the respective individual features, but for the purpose of this post, we create all of them at the beginning.
  2. Enable the Cost and Usage Reports.
  3. Enable Amazon S3 Inventory configuration.
  4. Create AWS Glue Data Catalog tables for the CUR and S3 object inventory to query using Athena.
  5. Run queries in Athena.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Create S3 buckets

Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize data, and configure fine-tuned access controls to meet specific business, organizational, and compliance requirements.

For this post, we use the S3 bucket s3-object-cost-allocation as the primary bucket for cost allocation. This S3 bucket is conveniently modeled to contain several prefixes and objects of different sizes for which cost allocation needs to be done based on the overall cost of the bucket. In a real-world scenario, you should use a bucket that has data for multiple teams and for which you need to allocate costs by prefix or object. Going forward, we refer to this bucket as the primary object bucket.

The following screenshot shows our S3 bucket and folders.

example Folders created

Now let’s create the three additional operational S3 buckets to store the datasets generated to calculate costs for the objects. You can create the following buckets or any existing buckets as needed:

  • cur-cost-usage-reports-<account_number> – This bucket is used to save the Cost and Usage Reports for the account.
  • S3-inventory-configurations-<account_number> – This bucket is used to save the inventory configurations of our primary object bucket.
  • athena-query-bucket-<account_number> – This bucket is used to save the query results from Athena.

Complete the following steps to create your S3 buckets:

  • On the Amazon S3 console, choose Buckets in the navigation pane.
  • Choose Create bucket.
  • For Bucket name, enter the name of your bucket (cur-cost-usage-reports-<account_number>).
  • For AWS Region, choose your preferred Region.
  • Leave all other settings at default (or according to your organization’s standards).
  • Choose Create bucket.
    create s3 bucket
  • Repeat these steps to create s3-inventory-configurations-<account_number> and athena-query-bucket-<account_number>.

Enable the Cost and Usage Reports

The AWS Cost and Usage Reports (AWS CUR) contains the most comprehensive set of cost and usage data available. You can use Cost and Usage Reports to publish your AWS billing reports to an S3 bucket that you own. You can receive reports that break down your costs by the hour, day, or month; by product or product resource; or by tags that you define yourself.

Complete the following steps to enable Cost and Usage Reports for your account:

  • On the AWS Billing console, in the navigation pane, choose Cost & Usage Reports.
  • Choose Create report.
  • For Report name, enter a name for your report, such as account-cur-s3.
  • For Additional report details, select Include resource IDs to include the IDs of each individual resource in the report.Including resource IDs will create individual line items for each of your resources. This can increase the size of your Cost and Usage Reports files significantly, which can affect the S3 storage costs for your CUR, based on your AWS usage. We need this feature enabled for this post.
  • For Data refresh settings, select whether you want the Cost and Usage Reports to refresh if AWS applies refunds, credits, or support fees to your account after finalizing your bill.When a report refreshes, a new report is uploaded to Amazon S3.
  • Choose Next.
  • For S3 bucket, choose Configure.
  • For Configure S3 Bucket, select an existing bucket created in the previous section (cur-cost-usage-reports-<account_number>) and choose Next.
  • Review the bucket policy, select I have confirmed that this policy is correct, and choose Save. This default bucket policy provides Cost and Usage Reports access to write data to Amazon S3.
  • For Report path prefix, enter cur-data/account-cur-daily.
  • For Time granularity, choose Daily.
  • For Report versioning, choose Overwrite existing report.
  • For Enable report data integration for, select Amazon Athena.
  • Choose Next.
  • After you have reviewed the settings for your report, choose Review and Complete.
    create cost and usage report

The Cost and Usage reports will be delivered to the S3 buckets within 24 hours.

The following sample CUR in CSV format shows different columns of the Cost and Usage Report, including bill_invoice_id, bill_invoicing_entity, bill_payer_account_id, and line_item_product_code, to name a few.

sample cost and usage report

Enable Amazon S3 Inventory configuration

Amazon S3 Inventory is one of the tools Amazon S3 provides to help manage your storage. You can use it to audit and report on the replication and encryption status of your objects for business, compliance, and regulatory needs. Amazon S3 Inventory provides comma-separated values (CSV), Apache Optimized Row Columnar (ORC), or Apache Parquet output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (objects that have names that begin with a common string).

Complete the following steps to enable Amazon S3 Inventory on the primary object bucket:

  • On the Amazon S3 console, choose Buckets in the navigation pane.
  • Choose the bucket for which you want to configure Amazon S3 Inventory.
    This will be the existing bucket in your account that has data that needs to be analyzed. This could be your data lake or application S3 bucket. We created the bucket s3-object-cost-allocation with some sample data and folder structure.
  • Choose Management.
  • Under Inventory configurations, choose Create inventory configuration.
  • For Inventory configuration name, enter s3-object-cost-allocation.
  • For Inventory scope, leave Prefix blank.
    This is to ensure that all objects are covered for the report.
  • For Object Versions, select Current version only.
  • For Report details, choose This account.
  • For Destination, choose the destination bucket we created (s3-inventory-configurations-<account_number>).
  • For Frequency, choose Daily.
  • For Output format, choose as Apache Parquet.
  • For Status, choose Enable.
  • Keep server-side encryption disabled. To use server-side encryption, choose Enable and specify the encryption key.
  • For Additional fields, select the following to add to the inventory report:
    • Size – The object size in bytes.
    • Last modified date – The object creation date or the last modified date, whichever is the latest.
    • Multipart upload – Specifies that the object was uploaded as a multipart upload. For more information, see Uploading and copying objects using multipart upload.
    • Replication status – The replication status of the object. For more information, see Using the S3 console.
    • Encryption status – The server-side encryption used to encrypt the object. For more information, see Protecting data using server-side encryption.
    • Bucket key status – Indicates whether a bucket-level key generated by AWS KMS applies to the object.
    • Storage class – The storage class used for storing the object.
    • Intelligent-Tiering: Access tier – Indicates the access tier of the object if it was stored in Intelligent-Tie
      create s3 inventory
  • Choose Create.
    s3 inventory configuration

It may take up to 48 hours to deliver the first report.

Create AWS Glue Data Catalog tables for CUR and Amazon S3 Inventory reports

Wait for up to 48 hours for the previous step to generate the reports. In this section, we use Athena to create and define AWS Glue Data Catalog tables for the data that has been created using Cost and Usage Reports and Amazon S3 Inventory reports.

Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats. Athena provides a simplified, flexible way to analyze petabytes of data where it lives.

Complete the following steps to create the tables using Athena:

  • Navigate to the Athena console.
  • If you’re using Athena for the first time, you need to set up a query result location in Amazon S3. If you preconfigured this in Athena , you can skip this step.
    • Choose View settings.
      athena setup query bucket
    • Choose Manage.
    • In the section Query result location and encryption, choose Browse S3 and choose the bucket that we created (athena-query-bucket-<account_number>).
    • Choose Save.
      Athena Config
    • Navigate back to the Athena query editor.
  • Run the following query in Athena to create a table for Cost and Usage Reports. Verify and update the section for <<LOCATION>> at the end of the query and point it to the correct S3 bucket and location. Note that the new table name should be account_cur.
    CREATE EXTERNAL TABLE `account_cur`(
    `identity_line_item_id` string,
    `identity_time_interval` string,
    `bill_invoice_id` string,
    `bill_billing_entity` string,
    `bill_bill_type` string,
    `bill_payer_account_id` string,
    `bill_billing_period_start_date` timestamp,
    `bill_billing_period_end_date` timestamp,
    `line_item_usage_account_id` string,
    `line_item_line_item_type` string,
    `line_item_usage_start_date` timestamp,
    `line_item_usage_end_date` timestamp,
    `line_item_product_code` string,
    `line_item_usage_type` string,
    `line_item_operation` string,
    `line_item_availability_zone` string,
    `line_item_resource_id` string,
    `line_item_usage_amount` double,
    `line_item_normalization_factor` double,
    `line_item_normalized_usage_amount` double,
    `line_item_currency_code` string,
    `line_item_unblended_rate` string,
    `line_item_unblended_cost` double,
    `line_item_blended_rate` string,
    `line_item_blended_cost` double,
    `line_item_line_item_description` string,
    `line_item_tax_type` string,
    `line_item_legal_entity` string,
    `product_product_name` string,
    `product_availability` string,
    `product_description` string,
    `product_durability` string,
    `product_event_type` string,
    `product_fee_code` string,
    `product_fee_description` string,
    `product_free_query_types` string,
    `product_from_location` string,
    `product_from_location_type` string,
    `product_from_region_code` string,
    `product_group` string,
    `product_group_description` string,
    `product_location` string,
    `product_location_type` string,
    `product_message_delivery_frequency` string,
    `product_message_delivery_order` string,
    `product_operation` string,
    `product_platopricingtype` string,
    `product_product_family` string,
    `product_queue_type` string,
    `product_region` string,
    `product_region_code` string,
    `product_servicecode` string,
    `product_servicename` string,
    `product_sku` string,
    `product_storage_class` string,
    `product_storage_media` string,
    `product_to_location` string,
    `product_to_location_type` string,
    `product_to_region_code` string,
    `product_transfer_type` string,
    `product_usagetype` string,
    `product_version` string,
    `product_volume_type` string,
    `pricing_rate_code` string,
    `pricing_rate_id` string,
    `pricing_currency` string,
    `pricing_public_on_demand_cost` double,
    `pricing_public_on_demand_rate` string,
    `pricing_term` string,
    `pricing_unit` string,
    `reservation_amortized_upfront_cost_for_usage` double,
    `reservation_amortized_upfront_fee_for_billing_period` double,
    `reservation_effective_cost` double,
    `reservation_end_time` string,
    `reservation_modification_status` string,
    `reservation_normalized_units_per_reservation` string,
    `reservation_number_of_reservations` string,
    `reservation_recurring_fee_for_usage` double,
    `reservation_start_time` string,
    `reservation_subscription_id` string,
    `reservation_total_reserved_normalized_units` string,
    `reservation_total_reserved_units` string,
    `reservation_units_per_reservation` string,
    `reservation_unused_amortized_upfront_fee_for_billing_period` double,
    `reservation_unused_normalized_unit_quantity` double,
    `reservation_unused_quantity` double,
    `reservation_unused_recurring_fee` double,
    `reservation_upfront_value` double,
    `savings_plan_total_commitment_to_date` double,
    `savings_plan_savings_plan_a_r_n` string,
    `savings_plan_savings_plan_rate` double,
    `savings_plan_used_commitment` double,
    `savings_plan_savings_plan_effective_cost` double,
    `savings_plan_amortized_upfront_commitment_for_billing_period` double,
    `savings_plan_recurring_commitment_for_billing_period` double,
    `resource_tags_user_bucket_name` string,
    `resource_tags_user_cost_tracking` string)
    PARTITIONED BY (
    `year` string,
    `month` string)
    ROW FORMAT SERDE
    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    STORED AS INPUTFORMAT
    'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
    '<<LOCATION>>'
  • Run the following query in Athena to create the table for Amazon S3 Inventory. Verify and update the section for <<LOCATION>> at the end of the query and point it to the correct S3 bucket and location.
    • To get the exact value of the location, navigate to the bucket where inventory configurations are stored and navigate to the folder path Hive . Use the S3 URI to replace <<LOCATION>> in the query.query path location
      CREATE EXTERNAL TABLE s3_object_inventory(
               bucket string,
               key string,
               version_id string,
               is_latest boolean,
               is_delete_marker boolean,
               size bigint,
               last_modified_date bigint,
               storage_class string,
               is_multipart_uploaded boolean,
               replication_status string,
               encryption_status string,
               intelligent_tiering_access_tier string,
               bucket_key_status string
      ) PARTITIONED BY (
              dt string
      )
      ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
        STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
        OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
        LOCATION '<<LOCATION>>';
      
  • We need to refresh the partitions and add new inventory lists to the table. Use the following commands to add data to the CUR table and Amazon S3 Inventory table:
    MSCK REPAIR TABLE `account_cur`;
    
    MSCK REPAIR TABLE s3_object_inventory;

Run queries in Athena to allocate the cost of objects in an S3 bucket

Now we can query the data we have available to get a cost allocation breakdown at the prefix level.

We need to provide some information in the following queries:

  • Update <<YYYY-MM-DD>> with the date for which you want to analyze the data
  • Update <<prefix>> with the prefix values for your bucket that needs to be analyzed
  • Update <<bucket_name>> with the name of the bucket that needs to be analyzed

We use the following part of the query to calculate the size of storage being used by the target prefix that we want to calculate the cost for:

select date_parse(dt,'%Y-%m-%d-%H-%i') dt, cast (sum(size) as double) targetPrefixBytes
from s3_object_inventory
where date_parse(dt,'%Y-%m-%d-%H-%i') = cast('<<YYYY-MM-DD>>' as timestamp)
and key like '<<prefix>>/%'
group by dt

Next, we calculate the total size of the bucket on that particular date:

select date_parse(dt,'%Y-%m-%d-%H-%i') dt, cast (sum(size) as double) totalBytes
from s3_object_inventory
where date_parse(dt,'%Y-%m-%d-%H-%i') = cast('<<YYYY-MM-DD>>' as timestamp)
group by dt

We query the CUR table to get the cost of a particular bucket on a particular date:

select line_item_usage_start_date as dt, sum(line_item_blended_cost) as line_item_blended_cost
from "account_cur"
where line_item_product_code = 'AmazonS3'
and product_servicecode = 'AmazonS3'
and line_item_operation = 'StandardStorage'
and line_item_resource_id = '<<bucket_name>>'
and line_item_usage_start_date = cast('<<YYYY-MM-DD>>' as timestamp)
group by line_item_usage_start_date

Putting all of this together, we can calculate the cost of a particular prefix (folder or a file) on a specific date. The complete query is as follows:

with
cost as (select line_item_usage_start_date as dt, sum(line_item_blended_cost) as line_item_blended_cost
from "account_cur"
where line_item_product_code = 'AmazonS3'
and product_servicecode = 'AmazonS3'
and line_item_operation = 'StandardStorage'
and line_item_resource_id = '<<bucket_name>>'
and line_item_usage_start_date = cast('<<YYYY-MM-DD>>' as timestamp)
group by line_item_usage_start_date),
total as (select date_parse(dt,'%Y-%m-%d-%H-%i') dt, cast (sum(size) as double) totalBytes
from s3_object_inventory
where date_parse(dt,'%Y-%m-%d-%H-%i') = cast('<<YYYY-MM-DD>>' as timestamp)
group by dt),
target as (select date_parse(dt,'%Y-%m-%d-%H-%i') dt, cast (sum(size) as double) targetPrefixBytes
from s3_object_inventory
where date_parse(dt,'%Y-%m-%d-%H-%i') = cast('<<YYYY-MM-DD>>' as timestamp)
and key like '<<prefix>>/%'
group by dt)
select target.dt,
(target.targetPrefixBytes/ total.totalBytes * 100) percentUsed,
cost.line_item_blended_cost totalCost,
cost.line_item_blended_cost*(target.targetPrefixBytes/ total.totalBytes) as prefixCost
from target, total, cost
where target.dt = total.dt
and target.dt = cost.dt

The following screenshot shows the results table for the sample data we used in this post. We get the following information:

  • dt – Date
  • percentUsed – The percentage of prefix space compared to overall bucket space
  • totalCost – The total cost of the bucket
  • prefixCost – The cost of the space used by the prefix

final result percetage

Clean up

To stop incurring costs, be sure to disable Amazon S3 Inventory and Cost and Usage Reports when you’re done.

Delete the S3 buckets created for the Amazon S3 Inventory reports and Cost and Usage Reports to avoid storage charges.

Other methods for Amazon S3 storage analysis

Amazon S3 Storage Lens can provide a single view of object storage usage and activity across your entire Amazon S3 storage. With S3 Storage Lens, you can understand, analyze, and optimize storage with over 29 usage and activity metrics and interactive dashboards to aggregate data for your entire organization, specific accounts, Regions, buckets, or prefixes. All of this data is accessible on the Amazon S3 console or as raw data in an S3 bucket.

S3 Storage Lens doesn’t provide cost analysis based on an object or prefix in a single bucket. If you want visibility of storage usage and trends across the entire storage footprint along with recommendations on cost efficiency and data protection best practices, S3 Storage Lens is the right option. But if you want a cost analysis of specific S3 buckets and looking for ways to get cost allocation of S3 objects at the object or prefix level, the solution in this post would be the best fit.

Conclusion

In this post, we detailed how to create a cost breakdown model at the object or prefix level for S3 buckets that contains data for multiple business units and applications. We used Athena to query the reports and datasets produced by the AWS CUR and Amazon S3 Inventory features that, when correlated, give us the cost allocation at the object and prefix level. This solution gives you an easy way to calculate costs for independent objects and prefixes, which can be used for internal chargebacks or just to know the per-object or per-prefix spending in a shared S3 bucket.


About the Authors


Dagar Katyal
is a Senior Solutions Architect at AWS, based in Chicago, Illinois. He works with customers and provides guidance for key strategic initiatives important for their business. Dagar has an MBA and has spent years over 15 years working with customers on projects on analytics strategy, roadmap, and using data as a key differentiator. When not working with customers, Dagar spends time with his family and doing home improvement projects.


Saiteja Pudi
is a Solutions Architect at AWS, based in Dallas, Tx. He has been with AWS for more than 3 years now, helping customers derive the true potential of AWS by being their trusted advisor. He comes from an application development background, interested in Data Science and Machine Learning.