Maximize Amazon EC2 Capacity Reservations with Capacity Manager data exports

Post Syndicated from Venu Geddam original https://aws.amazon.com/blogs/compute/maximize-amazon-ec2-capacity-reservations-with-capacity-manager-data-exports/

In our previous post, we introduced Amazon EC2 Capacity Manager and its data export capability. Amazon EC2 Capacity Manager provides centralized visibility into your Amazon Elastic Compute Cloud (Amazon EC2) capacity usage across all accounts and Regions in your organization. It tracks capacity usage for three types of EC2 capacity: On-Demand instances, Spot instances, and On-Demand Capacity Reservations (ODCR). On the AWS Management Console, it provides 90 days of historical capacity data. With data exports to Amazon Simple Storage Service (Amazon S3), you can retain and analyze capacity trends beyond this period using your preferred analytics tools.

In this post, we demonstrate how to configure EC2 Capacity Manager data exports to Amazon S3 and query historical capacity data using Amazon Athena. This approach helps you identify long-term usage patterns, plan capacity needs, and optimize resource allocation across your organization.

Solution overview

The following diagram illustrates the solution architecture. EC2 Capacity Manager exports capacity data to Amazon S3, where Amazon Athena queries it using SQL with automatic partition discovery.

Architecture diagram showing EC2 Capacity Manager exporting capacity data to Amazon S3 on a scheduled basis, and Amazon Athena querying the data directly from S3 using SQL with partition projection for automatic partition discovery

The solution involves the following steps:

  1. Set up an S3 bucket for capacity data export.
  2. Configure EC2 Capacity Manager data export.
  3. Set up Amazon Athena to query the exported data.
  4. Run queries to analyze capacity patterns.

Prerequisites:

  • An AWS account with permissions to create S3 buckets and configure EC2 Capacity Manager.
  • AWS Command Line Interface (AWS CLI) installed and configured (optional, for CLI-based setup).
  • Familiarity with SQL for querying data in Athena.

Setting up data export to Amazon S3

EC2 Capacity Manager can export capacity data in compressed CSV (Gzip) or compressed Parquet (Snappy) format. Use Parquet format for query performance in Athena (Parquet’s columnar format is designed to optimize analytical queries).

Configure the data export

You can configure data export through the EC2 Capacity Manager console or AWS CLI.

To configure data export using the console:

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  2. In the navigation pane, choose Capacity Manager.
  3. Choose the Data exports tab.
  4. Choose Create data export.
  5. For Output format, select Parquet.
  6. For S3 bucket, choose Create bucket for me to create a new bucket with the required permissions, or select an existing bucket from the list.
  7. If you selected an existing bucket, add the bucket policy shown in the following section to grant EC2 Capacity Manager write access.
  8. (Optional) For S3 prefix, enter a prefix to organize your exported files (for example, capacity-data/).
  9. For Schedule, select Hourly.
  10. Choose Create.

Screenshot of EC2 Capacity Manager Create data export page in AWS Console showing export properties: Data export name, output format set to Parquet, S3 location for export delivery and Tags

After you create the data export, EC2 Capacity Manager displays the export details.

Screenshot showing EC2 Capacity Manager data export status with ‘Latest Delivery’ column displaying ‘delivered’ status, indicating the export is ready for Athena setup

Update the S3 bucket policy (existing buckets only)

If you use an existing S3 bucket, you must add the bucket policy shown in the following example to grant EC2 Capacity Manager write access.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "ec2.capacitymanager.amazonaws.com"
            },
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<BUCKET_NAME>",
                "arn:aws:s3:::<BUCKET_NAME>/*"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "<AWS_ACCOUNT_NUMBER>"
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:ec2:<AWS_REGION>:<AWS_ACCOUNT_NUMBER>:capacity-manager-data-export/*"
                }
            }
        }
    ]
}

Replace <AWS_ACCOUNT_NUMBER> with your AWS account number, <AWS_REGION> with your AWS Region (for example, us-west-2), and <BUCKET_NAME> with your bucket name.

To configure data export using the AWS CLI :

aws ec2 create-capacity-manager-data-export \
    --s3-bucket-name <BUCKET_NAME> \
    --s3-bucket-prefix <BUCKET_PREFIX>/ \
    --schedule hourly \
    --output-format <FORMAT> \
    --region <AWS_REGION>

# Replace:
# <BUCKET_NAME> with your bucket name,
# <BUCKET_PREFIX> with your bucket prefix (for example, capacity-data/),
# <FORMAT> with parquet or csv, and
# <AWS_REGION> with your AWS Region (for example, us-west-2)

# The output of the above command would give you a Data Export ID

{ "CapacityManagerDataExportId": "cmde-00a7d0e64e43889f1" }

After creating the data export, wait for the first export to complete before proceeding to set up Athena. You can check the export status using the following command:

aws ec2 describe-capacity-manager-data-exports --region <AWS_REGION>

The command returns details about your data export configuration, including the delivery status:

{
    "CapacityManagerDataExports": [
        {
            "CapacityManagerDataExportId": "cmde-00a7d0e64e43889f1",
            "S3BucketName": "capacity-manager-exports-123456789012",
            "S3BucketPrefix": "capacity-data/",
            "Schedule": "hourly",
            "OutputFormat": "parquet",
            "CreateTime": "2026-04-10T19:04:36.824000+00:00",
            "LatestDeliveryStatus": "delivered",
            "LatestDeliveryStatusMessage": "Successfully delivered to s3://capacity-manager-exports-123456789012/y=2026/m=04/d=10/h=15/...",
            "LatestDeliveryTime": "2026-04-10T19:05:39.176000+00:00"
        }
    ]
}

Wait until LatestDeliveryStatus shows "delivered" before proceeding to the next section. The first export typically appears in your S3 bucket in a couple of hours. Subsequent exports follow your configured schedule.

Setting up Amazon Athena to query capacity data

After EC2 Capacity Manager exports data to your S3 bucket, you can use Amazon Athena to query the data using standard SQL. Athena uses AWS Glue as its metadata store. Specifically, it relies on the AWS Glue Data Catalog, which contains table definitions that tell Athena where you have stored your data in S3 and how you have structured it. When you create tables in Athena, you’re actually creating metadata entries in the Data Catalog that Athena references when running queries.

Create an Athena database and table

You can create the table using AWS Glue crawler or manually with SQL. AWS Glue crawler automatically discovers the complete schema from your exported Parquet files, including optional fields like resource tags if enabled. It helps minimize manual schema definition efforts. If the export format changes in the future, you can re-run the crawler to update the table definition. For detailed instructions on creating a Glue Crawler, see Use a crawler to add a table in the Amazon Athena User Guide.

In this post, we create the table manually using a SQL statement. We also use partition projection for automatic partition discovery. We do this because EC2 Capacity Manager continuously adds new partitions to the S3 bucket according to your configured schedule. As new partitions arrive in S3, Athena doesn’t know about them until you run MSCK REPAIR TABLE or ALTER TABLE ADD PARTITION to update the AWS Glue Data Catalog. This becomes an overhead when data arrives frequently.

With partition projection, you define the partition scheme and specify its range/rules in the table properties. Athena then computes the partitions at query time instead of looking them up in the Glue Data Catalog. So partition projection automatically makes new partitions visible as soon as EC2 Capacity Manager exports the data to S3, eliminating the need for you to update metadata. The CREATE TABLE statement that follows defines the schema for EC2 Capacity Manager exports. If your capacity reservations are already tagged, add the corresponding tag columns (for example, tag_environment string or tag_costcenter string). Alternatively, use an AWS Glue crawler to automatically discover your complete schema, including tag columns.

  1. Open the Athena console at https://console.aws.amazon.com/athena/
  2. If prompted, configure a query result location in S3. This is where Athena writes query output. It is separate from the S3 bucket that stores your capacity data.
  3. Run the following query to create a database:
CREATE DATABASE IF NOT EXISTS capacity_manager_db;
  1. Create a table for Parquet format data:
CREATE EXTERNAL TABLE IF NOT EXISTS capacity_manager_db.capacity_data (
    metricgroupname string,
    periodstarttimestamp string,
    periodendtimestamp string,
    orgid string,
    accountid string,
    region string,
    `az-id` string,
    instancefamily string,
    instancetype string,
    platform string,
    tenancy string,
    reservationid string,
    `reservation arn` string,
    unusedfinancialowner string,
    reservationtype string,
    instancematchcriteria string,
    reservationcreatetimestamp string,
    reservationstarttimestamp string,
    reservationendtimestamp string,
    reservationenddatetype string,
    reservationstate string,
    reservationtotalcapacityhrsvcpu string,
    reservationtotalcapacityhrsinst string,
    reservationtotalestimatedcost string,
    reservationmaxsizevcpu string,
    reservationmaxsizeinst string,
    reservationminsizevcpu string,
    reservationminsizeinst string,
    reservationunusedtotalcapacityhrsvcpu string,
    reservationunusedtotalcapacityhrsinst string,
    reservationunusedtotalestimatedcost string,
    reservationmaxunusedsizevcpu string,
    reservationmaxunusedsizeinst string,
    reservationminunusedsizevcpu string,
    reservationminunusedsizeinst string,
    reservationmaxutilization string,
    reservationminutilization string,
    reservationavgutilizationvcpu string,
    reservationavgutilizationinst string,
    reservationavgfuturesizevcpu string,
    reservationavgfuturesizeinst string,
    reservationmaxfuturesizevcpu string,
    reservationmaxfuturesizeinst string,
    reservationminfuturesizevcpu string,
    reservationminfuturesizeinst string,
    reservationavgcommittedsizevcpu string,
    reservationavgcommittedsizeinst string,
    reservationmaxcommittedsizevcpu string,
    reservationmaxcommittedsizeinst string,
    reservationmincommittedsizevcpu string,
    reservationmincommittedsizeinst string,
    reservedtotalusagehrsvcpu string,
    reservedtotalusagehrsinst string,
    unreservedtotalusagehrsvcpu string,
    unreservedtotalusagehrsinst string,
    reservedtotalestimatedcost string,
    unreservedtotalestimatedcost string,
    spottotalusagehrsvcpu string,
    spottotalusagehrsinst string,
    spottotalestimatedcost string,
    spotavgruntimebeforeinterruptioninst string,
    spotmaxruntimebeforeinterruptioninst string,
    spotminruntimebeforeinterruptioninst string,
    spottotalinterruptionsinst string,
    spottotalinterruptionsvcpu string,
    spottotalcountinst string,
    spottotalcountvcpu string,
    spotinterruptionrateinst string,
    spotinterruptionratevcpu string
)
PARTITIONED BY (
    y string,
    m string,
    d string,
    h string
)
ROW FORMAT SERDE
    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
    's3://<BUCKET_NAME>/'
TBLPROPERTIES (
    'projection.enabled' = 'true',
    'projection.y.type' = 'integer',
    'projection.y.range' = '2024,2030',
    'projection.y.digits' = '4',
    'projection.m.type' = 'integer',
    'projection.m.range' = '01,12',
    'projection.m.digits' = '2',
    'projection.d.type' = 'integer',
    'projection.d.range' = '01,31',
    'projection.d.digits' = '2',
    'projection.h.type' = 'integer',
    'projection.h.range' = '00,23',
    'projection.h.digits' = '2',
    'storage.location.template' = 's3://<BUCKET_NAME>/y=${y}/m=${m}/d=${d}/h=${h}/'
);

Replace <BUCKET_NAME> in the LOCATION clause and storage.location.template property with your bucket name, and capacity-data/ with table name of your choice.

Now that your table is set up, you can explore how to query the exported data.

Example queries for common use cases

The following queries demonstrate how to analyze your EC2 capacity data for cost optimization and capacity planning. The queries use date values (year=‘2026’, month=‘04’) and table name capacity_data. Adjust the partition values to match your actual data’s time period and table name to match your value. When querying EC2 Capacity Manager export data:

  1. Metric group filtering: EC2 Capacity Manager exports contain two types of data that the metricgroupname column identifies: Reservation Usage for analyzing ODCR utilization and optimization opportunities, and Instance Usage for analyzing overall capacity consumption across reserved, unreserved, and Spot instances. Always filter by the appropriate metric group for your analysis needs.
  2. Partition filtering: Always include partition filters (y, m, d, h) to improve query performance.
  3. Numeric operations: Use CAST to convert string columns to numeric types for proper comparison and sorting (for example, CAST(reservationavgutilizationinst AS double)).
  4. NULL handling: Use COALESCE to handle NULL values in calculations (for example, COALESCE(CAST(column AS double), 0)) to prevent NULL results in totals. Without COALESCE, when you add a column value NULL to a non-NULL value, the result is NULL.

Use case 1: Identify underutilized ODCRs

Discover On-Demand Capacity Reservations with low utilization that are generating cost waste. Identify specific reservations to downsize, cancel, or share with other teams to reduce unnecessary spending.

SELECT
    reservationid,
    instancetype,
    region,
    "az-id",
    reservationstate,
    ROUND(CAST(reservationavgutilizationinst AS double) * 100, 2) AS utilization_pct,
    CAST(reservationtotalcapacityhrsinst AS double) AS total_capacity_hrs,
    CAST(reservationunusedtotalcapacityhrsinst AS double) AS unused_capacity_hrs,
    ROUND(CAST(reservationunusedtotalestimatedcost AS double), 4) AS wasted_cost_usd
FROM capacity_data
WHERE metricgroupname = 'Reservation Usage'
    AND CAST(reservationavgutilizationinst AS double) < 0.5
    AND reservationavgutilizationinst IS NOT NULL
    AND y = '2026'
    AND m = '04'
ORDER BY wasted_cost_usd DESC
LIMIT 3;

Sample output:

reservationid instancetype region az-id reservationstate utilization_pct total_capacity_hrs unused_capacity_hrs wasted_cost_usd
cr-041aedfba865106c1 m5.8xlarge us-west-2 usw2-az2 active 0 1 1 1.536
cr-089f17dc178e993a8 m5.8xlarge us-west-2 usw2-az1 active 0 1 1 1.536
cr-089f17dc178e993a8 m5.8xlarge us-west-2 usw2-az1 active 0 1 1 1.536

Use case 2: ODCR utilization summary by instance type

Get a comprehensive view of ODCR utilization across instance types to identify which instance families have the worst utilization rates. This helps prioritize optimization efforts on the reservations with the highest cost impact.

SELECT
    instancetype,
    COUNT(DISTINCT accountid) AS account_count,
    COUNT(DISTINCT reservationid) AS reservation_count,
    ROUND(SUM(COALESCE(CAST(reservationtotalcapacityhrsinst AS double), 0)), 2) AS total_odcr_capacity,
    ROUND(SUM(COALESCE(CAST(reservationunusedtotalcapacityhrsinst AS double), 0)), 2) AS total_unused_capacity,
    ROUND(AVG(COALESCE(CAST(reservationavgutilizationinst AS double), 0)) * 100, 2) AS avg_utilization_pct,
    ROUND(SUM(COALESCE(CAST(reservationunusedtotalestimatedcost AS double), 0)), 2) AS total_unused_cost_usd
FROM capacity_data
WHERE metricgroupname = 'Reservation Usage'
    AND y = '2026'
    AND m = '04'
GROUP BY instancetype
ORDER BY total_unused_cost_usd DESC
LIMIT 3;

Sample output:

instancetype account_count reservation_count total_odcr_capacity total_unused_capacity avg_utilization_pct total_unused_cost_usd
m5.8xlarge 1 2 311.99 311.99 0.0 479.22
c5.9xlarge 1 1 211.0 211.0 0.0 322.83
t3.micro 1 1 1055.0 1055.0 0.0 10.97

Use case 3: Identify peak usage patterns

Analyze average hourly usage patterns across reserved, unreserved, and Spot capacity to identify when your workloads typically hit peak demand. This breakdown helps you understand your capacity mix, plan for peak periods, and optimize your purchasing strategy.

SELECT
    h AS hour,
    ROUND(AVG(COALESCE(CAST(reservedtotalusagehrsinst AS double), 0)), 2) AS avg_reserved_usage_hours,
    ROUND(AVG(COALESCE(CAST(unreservedtotalusagehrsinst AS double), 0)), 2) AS avg_unreserved_usage_hours,
    ROUND(AVG(COALESCE(CAST(spottotalusagehrsinst AS double), 0)), 2) AS avg_spot_usage_hours,
    ROUND(AVG(COALESCE(CAST(reservedtotalusagehrsinst AS double), 0) + COALESCE(CAST(unreservedtotalusagehrsinst AS double), 0) + COALESCE(CAST(spottotalusagehrsinst AS double), 0)), 2) AS avg_total_usage_hours
FROM capacity_data
WHERE metricgroupname = 'Instance Usage'
    AND y = '2026'
    AND m = '04'
GROUP BY h
ORDER BY avg_total_usage_hours DESC
LIMIT 3;

Sample output:

hour avg_reserved_usage_hours avg_unreserved_usage_hours avg_spot_usage_hours avg_total_usage_hours
09 0.75 0.5 0 1.25
10 0.75 0.5 0 1.25
15 0.75 0.5 0 1.25

Use case 4: Regional capacity distribution

Understand how your ODCR capacity is distributed across AWS Regions and instance types. This geographic view helps you identify Regions with excess capacity that could be redistributed or consolidated to improve utilization and reduce costs.

SELECT
    region,
    instancetype,
    ROUND(SUM(COALESCE(CAST(reservationtotalcapacityhrsinst AS double), 0)), 2) AS total_reserved_capacity,
    ROUND(SUM(COALESCE(CAST(reservationunusedtotalcapacityhrsinst AS double), 0)), 2) AS unused_reserved_capacity,
    ROUND(AVG(COALESCE(CAST(reservationavgutilizationinst AS double), 0)) * 100, 2) AS avg_utilization_pct
FROM capacity_data
WHERE metricgroupname = 'Reservation Usage'
    AND y = '2026'
    AND m = '04'
GROUP BY region, instancetype
ORDER BY region, total_reserved_capacity DESC
LIMIT 3;

Sample output:

region instancetype total_reserved_capacity unused_reserved_capacity avg_utilization_pct
us-west-2 t2.nano 1484.0 1060.0 33.34
us-west-2 t3.micro 1060.0 1060.0 0.0
us-west-2 t2.micro 848.0 636.0 25.0

Use case 5: Unused capacity reservations by Region and Availability Zone

Pinpoint exactly where you have unused ODCR capacity at the Availability Zone level. This granular view enables you to share unused capacity with other teams in the same AZ or modify reservations to better match actual usage patterns.

SELECT
    region,
    "az-id",
    instancetype,
    ROUND(SUM(COALESCE(CAST(reservationunusedtotalcapacityhrsinst AS double), 0)), 2) AS unused_capacity_instances,
    ROUND(AVG(COALESCE(CAST(reservationavgutilizationinst AS double), 0)) * 100, 2) AS avg_utilization_pct,
    ROUND(SUM(COALESCE(CAST(reservationunusedtotalestimatedcost AS double), 0)), 2) AS unused_cost_usd
FROM capacity_data
WHERE metricgroupname = 'Reservation Usage'
    AND y = '2026'
    AND m = '04'
    AND CAST(reservationunusedtotalcapacityhrsinst AS double) > 0
GROUP BY region, "az-id", instancetype
ORDER BY unused_cost_usd DESC
LIMIT 3;

Sample output:

region az-id instancetype unused_capacity_instances avg_utilization_pct unused_cost_usd
us-west-2 usw2-az1 c5.9xlarge 212.0 0.0 324.36
us-west-2 usw2-az2 m5.8xlarge 157.0 0.0 241.15
us-west-2 usw2-az1 m5.8xlarge 157.0 0.0 241.15

Clean up

To avoid incurring future charges, delete the resources you created:

Warning: This permanently deletes the table definition. Verify that you no longer need to query this data before proceeding.

  1. Delete the Athena table by running the following query: DROP TABLE IF EXISTS capacity_manager_db.capacity_data;
  2. Delete the database by running the following query: DROP DATABASE IF EXISTS capacity_manager_db;
  3. Navigate to the Athena console settings.
  4. Note the query result location S3 bucket.
  5. If you created this bucket specifically for this tutorial:
    1. Empty the S3 bucket by running: aws s3 rm s3://<QUERY_RESULT_BUCKET_NAME> --recursive
    2. Delete the S3 bucket by running: aws s3 rb s3://<QUERY_RESULT_BUCKET_NAME>
  6. Delete the data export configuration by using AWS Management Console or AWS CLI.
    1. If using AWS Console, select the “Delete data export” option in the Actions menu.

Screenshot of AWS Management Console showing the Delete option in the Actions menu for an EC2 Capacity Manager data export configuration

  1. To delete the configuration using AWS CLI:
    # List your data export configurations to find the export ID.
    aws ec2 describe-capacity-manager-data-exports --region <AWS_REGION>
    
    # The output shows details for export configuration which has the Data Export ID
    # Sample output is shown in the configure data export section above.
    
    # Then delete the export configuration using the ID from the output
    aws ec2 delete-capacity-manager-data-export \
        --data-export-id <EXPORT_ID> \
        --region <AWS_REGION>

    Replace <EXPORT_ID> with your data export ID and <AWS_REGION> with your AWS Region.

    Warning: Deleting the S3 bucket permanently removes all exported capacity data. Verify that you have backed up any data you need before proceeding.

  1. (Optional) Delete the S3 bucket and data: If you no longer need the exported data, complete the following steps:
    1. Empty the S3 bucket by running: aws s3 rm s3://<BUCKET_NAME> --recursive
    2. Delete the S3 bucket by running: aws s3 rb s3://<BUCKET_NAME>

Conclusion

In this post, we demonstrated how to configure EC2 Capacity Manager data exports to Amazon S3 and query historical capacity data using Amazon Athena. This approach enables you to retain capacity data beyond the 90-day console limit.

As you scale your capacity management practices, consider integrating these exports with your existing analytics and monitoring workflows. By combining EC2 Capacity Manager data with your broader infrastructure metrics, you can make data-driven decisions about capacity allocation and optimization across your organization.

To deepen your understanding, explore the EC2 Capacity Manager documentation for additional features, learn more about Amazon Athena for advanced query capabilities, and review EC2 capacity optimization best practices. Share your feedback and tell us how you’re using EC2 Capacity Manager data exports to optimize your capacity planning in the comments.