Tag Archives: Amazon Elastic Block Store (Amazon EBS)

AWS Weekly Roundup: Kiro waitlist, EBS Volume Clones, EC2 Capacity Manager, and more (October 20, 2025)

2025-10-20 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-kiro-waitlist-ebs-volume-clones-ec2-capacity-manager-and-more-october-20-2025/

I’ve been inspired by all the activities that tech communities around the world have been hosting and participating in throughout the year. Here in the southern hemisphere we’re starting to dream about our upcoming summer breaks and closing out on some of the activities we’ve initiated this year. The tech community in South Africa is participating in Amazon Q Developer coding challenges that my colleagues and I are hosting throughout this month as a fun way to wind down activities for the year. The first one was hosted in Johannesburg last Friday with Durban and Cape Town coming up next.

Last week’s launches
These are the launches from last week that caught my attention:

Kiro is now available for every developer — Since its launch more than 90 days ago, more than 100,000 developers have joined the waitlist to try Kiro out. The waitlist is gone so if you want to try out this spec-driven approach to coding with AI, sign up now.
Amazon EC2 Capacity Manager — If you’re using Amazon Elastic Compute Cloud (Amazon EC2) at scale operate hundreds of instance types across multiple Availability Zones and accounts, using On-Demand Instances, Spot Instances, and Capacity Reservations, you’ll be pleased to learn that EC2 Capacity Manager is now available to provide you a centralized solution to monitor, analyze, and manage capacity usage across all account and AWS Regions from a single interface.
Amazon EBS Volume Clones — Sometimes you need production data to test a fix in a non-production environment before implementing it in production. Usually you’d take an EBS snapshot of this data and then create a new volume from that snapshot, meanwhile dealing with the operational overhead of this process. Learn about the availability of Amazon EBS Volume Clones, a new capability for you to create instant point-in-time copies of your EBS volumes within the same Availability Zone.

Additional updates
I thought these projects, blog posts, and news items were also interesting:

AWS Transfer Family SFTP connectors now support VPC-based connectivity — AWS Transfer Family SFTP connectors now support connectivity to remote SFTP servers through Amazon Virtual Private Cloud (Amazon VPC) environments.
As your business evolves, you might need to migrate workloads between AWS Regions. Perhaps you’re looking to reduce latency for users in new geographic areas, meet Region-specific compliance requirements, or you’re an ISV expanding your product’s availability. Whatever your reason, cross-Region migration needs careful planning, especially when dealing with encrypted resources. Read how to migrate encrypted Amazon EC2 instances across AWS Regions without sharing AWS KMS keys.
Internet of Things (IoT) devices have transformed how we interact with our environments, from homes to industrial settings. However, as the number of connected devices grows, so does the complexity of managing them. Learn how to build a device management agent with Amazon Bedrock AgentCore.

Upcoming AWS events
Keep a look out and be sure to sign up for these upcoming events:

AWS re:Invent 2025 (December 1-5, 2025, Las Vegas) — AWS flagship annual conference offering collaborative innovation through peer-to-peer learning, expert-led discussions, and invaluable networking opportunities.

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here for upcoming in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

– Veliswa.

Introducing Amazon EBS Volume Clones: Create instant copies of your EBS volumes

2025-10-15 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/introducing-amazon-ebs-volume-clones-create-instant-copies-of-your-ebs-volumes/

As someone that used to work at Sun Microsystems, where ZFS was invented, I’ve always loved working with storage systems that offer instant volume copies for my development and testing needs.

Today, I’m excited to share that AWS is bringing similar capabilities to Amazon Elastic Block Store (Amazon EBS) with the launch of Amazon EBS Volume Clones, a new capability that lets you create instant point-in-time copies of your EBS volumes within the same Availability Zone.

Many customers need to create copies of their production data to support development and testing activities in a separate nonproduction environment. Until now, this process required taking an EBS snapshot (stored in Amazon Simple Storage Service (Amazon S3)) and then creating a new volume from that snapshot. Although this approach works, the process creates operational overhead due to multiple steps.

With Amazon EBS Volume Clones, you can now create copies of your EBS volumes with a single API call or console click. The copied volumes are available within seconds and provide immediate access to your data with single-digit millisecond latency. This makes Volume Clones particularly useful for quickly setting up test environments with production data or creating temporary copies of databases for development purposes.

Let me show you how Volume Clones works
For this post, I created a small Amazon Elastic Compute Cloud (Amazon EC2) instance, with an attached volume. I created a file on the root file system with the command echo "Hello CopyVolumes" > hello.txt.

To initiate the copy, I open a browser on the AWS Management Console and I navigate to EC2, Elastic Block Store, Volumes. I select the volume I want to copy.

Note that, at the time of publication of this post, only encrypted volumes can be copied.

On the Actions menu, I choose the Copy Volume option.

Next, I choose the details of the target volume. I can change the Volume type and adjust the Size, IOPS, and Throughput parameters. I choose Copy volume to start the Volume Clone operation.

The copied volume enters the Creating state and becomes available within seconds. I can then attach it to an EC2 instance and start using it immediately.

Data blocks are copied from the source volume and written to the volume copy in the background. The volume remains in the Initializing state until the process is complete. I can monitor its progress with the describe-volume-status API. The initializing operation doesn’t affect the performance of the source volume. I can continue using it normally during the copy process.

I love that the copied volume is available immediately. I don’t need to wait for its initialization to complete. During the initialization phase, my copied volume delivers performance based on the lowest of: a baseline of 3,000 IOPS and 125 MiB/s, the source volume’s provisioned performance, or the copied volume’s provisioned performance.

After initialization is completed, the copied volume becomes fully independent of the source volume and delivers its full provisioned performance.

Alternatively, I can use the AWS Command Line Interface (AWS CLI) to initiate the copy:

aws ec2 copy-volumes                          \
     --source-volume-id vol-1234567890abcdef0 \
     --size 500                               \
     --volume-type gp3

After the volume copy is created, I attach it to my EC2 instance and mount it. I can check the file I created at start is present.

First, I attach the volume from my laptop, using the attach-volume command:

aws ec2 attach-volume \
         --volume-id 'vol-09b700e3a23a9b4ad' \
         --instance-id 'i-079e6504ad25b029e'   \
         --device '/dev/sdb'

Then, I connect to the instance, and I type these commands:

$ sudo lsblk -f
NAME          FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
nvme0n1                                                                              
├─nvme0n1p1   xfs          /     49e26d9d-0a9d-4667-b93e-a23d1de8eacd    6.2G    22% /
└─nvme0n1p128 vfat   FAT16       3105-2F44                               8.6M    14% /boot/efi
nvme1n1                                                                              
├─nvme1n1p1   xfs          /     49e26d9d-0a9d-4667-b93e-a23d1de8eacd                
└─nvme1n1p128 vfat   FAT16       3105-2F44     

$ sudo mount -t xfs /dev/nvme1n1p1 /data

$ df -h
Filesystem        Size  Used Avail Use% Mounted on
devtmpfs          4.0M     0  4.0M   0% /dev
tmpfs             924M     0  924M   0% /dev/shm
tmpfs             370M  476K  369M   1% /run
/dev/nvme0n1p1    8.0G  1.8G  6.2G  22% /
tmpfs             924M     0  924M   0% /tmp
/dev/nvme0n1p128   10M  1.4M  8.7M  14% /boot/efi
tmpfs             185M     0  185M   0% /run/user/1000
/dev/nvme1n1p1    8.0G  1.8G  6.2G  22% /data

$ cat /data/home/ec2-user/hello.txt 
Hello CopyVolumes

Things to know
Volume Clones creates copies within the same Availability Zone as your source volume. You can create copies from encrypted volumes only, and the size of your copy must be equal to or greater than the source volume.

Volume Clones creates crash-consistent copies of your volumes, exactly like snapshots. For application consistency, you need to pause application I/O operations before creating the copy. For example, with PostgreSQL databases, you can use the pg_start_backup() and pg_stop_backup() functions to pause writes and create a consistent copy. At the operating system level on Linux with XFS, you can use the xfs_freeze command to temporarily suspend and resume access to the file system and ensure all cached updates are written to disk.

Although Volume Clones creates point-in-time copies, it complements rather than replaces EBS snapshots for backup purposes. EBS snapshots remain the recommended solution for data backup and protection against AZ-level and volume failures. Snapshots provide incremental backups to Amazon S3 with 11 nines of durability, compared to Volume Clones which maintains EBS volume durability (99.999% for io2, 99.9% for other volume types). Consider using Volume Clones specifically for test and development environment scenarios where you need instant access to volume copies.

Copied volumes exist independently of their source volumes and continue to incur standard EBS volume charges until you delete them. To manage costs effectively, implement governance rules to identify and remove copied volumes that are no longer needed for your development or testing activities.

Pricing and availability
Volume Clones supports all EBS volume types and works with volumes in the same AWS account and Availability Zone. This new capability is available in all AWS commercial Regions, selected Local Zones, and in the AWS GovCloud (US).

For pricing, you’re charged a one-time fee per GiB of data on the source volume at initiation and standard EBS pricing for the new volume.

I find Volume Clones particularly valuable for database workloads and continuous integration (CI) scenarios. For instance, you can quickly create a copy of your production database for testing new features or troubleshooting issues without impacting your production environment or waiting for data to hydrate from Amazon S3.

To get started with Amazon EBS Volume Clones, visit the Amazon EBS section on the console or check out the EBS documentation. I look forward to hearing how you use this capability to improve your development workflows.

— seb

Accelerate the transfer of data from an Amazon EBS snapshot to a new EBS volume

2025-05-07 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/accelerate-the-transfer-of-data-from-an-amazon-ebs-snapshot-to-a-new-ebs-volume/

Today we are announcing the general availability of Amazon Elastic Block Store (Amazon EBS) Provisioned Rate for Volume Initialization, a feature that accelerates the transfer of data from an EBS snapshot, a highly durable backup of volumes stored in Amazon Simple Storage Service (Amazon S3) to a new EBS volume.

With Amazon EBS Provisioned Rate for Volume Initialization, you can create fully performant EBS volumes within a predictable amount of time. You can use this feature to speed up the initialization of hundreds of concurrent volumes and instances. You can also use this feature when you need to recover from an existing EBS Snapshot and need your EBS volume to be created and initialized as quickly as possible. You can use this feature to quickly create copies of EBS volumes with EBS Snapshots in a different Availability Zone, AWS Region, or AWS account. Provisioned Rate for Volume Initialization for each volume is charged based on the full snapshot size and the specified volume initialization rate.

This new feature expedites the volume initialization process by fetching the data from an EBS Snapshot to an EBS volume at a consistent rate that you specify between 100 MiB/s and 300 MiB/s. You can specify this volume initialization rate at which the snapshot blocks are to be downloaded from Amazon S3 to the volume.

With specifying the volume initialization rate, you can create a fully performant volume in a predictable time, enabling increased operational efficiency and visibility on the expected time of completion. If you run utilities like fio/dd to expedite volume initialization for your workflows like application recovery and volume copy for testing and development, it will remove the operational burden of managing such scripts with the consistency and predictability to your workflows.

Get started with specifying the volume initialization rate
To get started, you can choose the volume initialization rate when you launch your EC2 instance or create your volume from the snapshot.

1. Create a volume in the EC2 launch wizard
When launching new EC2 instances in the launch wizard of EC2 console, you can enter a desired Volume initialization rate in the Storage (volumes) section.

You can also set the volume initialization rate when creating and modifying the EC2 Launch Templates.

In the AWS Command Line Interface (AWS CLI), you can add VolumeInitializationRate parameter to the block device mappings when call run-instances command.

aws ec2 run-instances \
    --image-id ami-0abcdef1234567890 \
    --instance-type t2.micro \
    --subnet-id subnet-08fc749671b2d077c \
    --security-group-ids sg-0b0384b66d7d692f9 \
    --key-name MyKeyPair \
    --block-device-mappings file://mapping.json

Contents of mapping.json. This example adds /dev/sdh an empty EBS volume with a size of 8 GiB.

[
    {
        "DeviceName": "/dev/sdh",
        "Ebs": {
            "VolumeSize": 8
            "VolumeType": "gp3",            
            "VolumeInitializationRate": 300
		 } 
     } 
]

To learn more, visit block device mapping options, which defines the EBS volumes and instance store volumes to attach to the instance at launch.

2. Create a volume from snapshots
When you create a volume from snapshots, you can also choose Create volume in the EC2 console and specify the Volume initialization rate.

Confirm your new volume with the initialization rate.

In the AWS CLI, you can use VolumeInitializationRate parameter and when calling create-volume command.

aws ec2 create-volume --region us-east-1 --cli-input-json '{
    "AvailabilityZone": "us-east-1a",
    "VolumeType": "gp3",
    "SnapshotId": "snap-07f411eed12ef613a",
    "VolumeInitializationRate": 300
}'

If the command is run successfully, you will receive the result below.

{
    "AvailabilityZone": "us-east-1a",
    "CreateTime": "2025-01-03T21:44:53.000Z",
    "Encrypted": false,
    "Size": 100,
    "SnapshotId": "snap-07f411eed12ef613a",
    "State": "creating",
    "VolumeId": "vol-0ba4ed2a280fab5f9",
    "Iops": 300,
    "Tags": [],
    "VolumeType": "gp2",
    "MultiAttachEnabled": false,
    "VolumeInitializationRate": 300
}

You can also set the volume initialization rate when replacing root volumes of EC2 instances and provisioning EBS volumes using the EBS Container Storage Interface (CSI) driver.

After creation of the volume, EBS will keep track of the hydration progress and publish an Amazon EventBridge notification for EBS to your account when the hydration completes so that they can be certain when their volume is fully performant.

To learn more, visit Create an Amazon EBS volume and Initialize Amazon EBS volumes in the Amazon EBS User Guide.

Now available
Amazon EBS Provisioned Rate for Volume Initialization is now available and supported for all EBS volume types today. You will be charged based on the full snapshot size and the specified volume initialization rate. To learn more, visit Amazon EBS Pricing page.

To learn more about Amazon EBS including this feature, take the free digital course on the AWS Skill Builder portal. Course includes use cases, architecture diagrams and demos.

Give this feature a try in the Amazon EC2 console today and send feedback to AWS re:Post for Amazon EBS or through your usual AWS Support contacts.

— Channy

How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Time-based snapshot copy for Amazon EBS

2024-11-27 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/time-based-snapshot-copy-for-amazon-ebs/

You can now specify a desired completion duration (15 minutes to 48 hours) when you copy an Amazon Elastic Block Store (Amazon EBS) snapshot within or between AWS Regions and/or accounts. This will help you to meet time-based compliance and business requirements for critical workloads. For example:

Testing – Distribute fresh data on a timely basis as part of your Test Data Management (TDM) plan.

Development – Provide your developers with updated snapshot data on a regular and frequent basis.

Disaster Recovery – Ensure that critical snapshots are copied in order to meet a Recovery Point Objective (RPO).

Regardless of your use case, this new feature gives you consistent and predictable copies. This does not affect the performance or reliability of standard copies—you can choose the option and timing that works best for each situation.

Creating a Time-Based Snapshot Copy
I can create time-based snapshot copies from the AWS Management Console, CLI (copy-snapshot), or API (CopySnapshot). While working on this post I created two EBS volumes (100 GiB and 1 TiB), filled each one with files, and created snapshots:

To create a time-based snapshot, I select the source as usual and choose Copy snapshot from the Action menu. I enter a description for the copy, choose the us-east-1 AWS Region as the destination, select Enable time-based copy, and (because this is a time-critical snapshot), enter a 15 minute Completion duration:

When I click Copy snapshot, the request will be accepted (and the copy will become Pending) only if my account’s throughput quotas are not already exceeded due to the throughput consumed by other active copies that I am making to the destination region. If the account level throughput quota is already exceeded, the console will display an error.

I can click Launch copy duration calculator to get a better idea of the minimum achievable copy duration for the snapshot. I open the calculator, enter my account’s throughput limit, and choose an evaluation period:

The calculator then uses historical data collected over the course of previous snapshot copies to tell me the minimum achievable completion duration. In this example I copied 1,800,000 MiB in the last 24 hours; with time-based copy and my current account throughput quota of 2000 MiB/second I can copy this much data in 15 minutes.

While the copy is in progress, I can monitor progress using the console or by calling DescribeSnapshots and examining the progress field of the result. I can also use the following Amazon EventBridge events to take actions (if the copy operation crosses regions, the event is sent in the destination region):

copySnapshot – Sent after the copy operation completes.

copyMissedCompletionDuration – Sent if the copy is still pending when the deadline has passed.

Things to Know
And that’s just about all there is to it! Here’s what you need to know about time-based snapshot copies:

CloudWatch Metrics – The SnapshotCopyBytesTransferred metric is emitted in the destination region, and reflect the amount of data transferred between the source and destination region in bytes.

Duration – The duration can range from 15 minutes to 48 hours in 15 minute increments, and is specified on a per-copy basis.

Concurrency – If a snapshot is being copied and I initiate a second copy of the same snapshot to the same destination, the duration for the second one starts when the first one is completed.

Throughput – There is a default per-account limit of 2000 MiB/second between each source and destination pair. If you need additional throughput in order to meet your RPO you can request an increase via the AWS Support Center. Maximum per-snapshot throughput is 500 MiB/second and cannot be increased.

Pricing – Refer to the Amazon EBS Pricing page for complete pricing information.

Regions – Time-based snapshot copies are available in all AWS Regions.

— Jeff;

How AWS powered Prime Day 2024 for record-breaking sales

2024-08-13 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/how-aws-powered-prime-day-2024-for-record-breaking-sales/

The last Amazon Prime Day 2024 (July 17-18) was Amazon’s biggest Prime Day shopping event ever, with record sales and more items sold during the two-day event than any previous Prime Day event. Prime members shopped for millions of deals and saved billions across more than 35 categories globally.

I live in South Korea, but luckily I was staying in Seattle to attend the AWS Heroes Summit during Prime Day 2024. I signed up for a Prime membership and used Rufus, my new AI-powered conversational shopping assistant, to search for items quickly and easily. Prime members in the U.S. like me chose to consolidate their deliveries on millions of orders during Prime Day, saving an estimated 10 million trips. This consolidation results in lower carbon emissions on average.

We know from Jeff’s annual blog post that AWS runs the Amazon website and mobile app that makes these short-term, large scale global events feasible. (check out his 2016, 2017, 2019, 2020, 2021, 2022, and 2023 posts for a look back). Today I want to share top numbers from AWS that made my amazing shopping experience possible.

Prime Day 2024 – all the numbers
Here are some of the most interesting and/or mind-blowing metrics:

Amazon EC2 – Since many of Amazon.com services such as Rufus and Search use AWS artificial intelligence (AI) chips under the hood, Amazon deployed a cluster of over 80,000 Inferentia and Trainium chips for Prime Day. During Prime Day 2024, Amazon used over 250K AWS Graviton chips to power more than 5,800 distinct Amazon.com services (double that of 2023).

Amazon EBS – In support of Prime Day, Amazon provisioned 264 PiB of Amazon EBS storage in 2024, a 62 percent increase compared to 2023. When compared to the day before Prime Day 2024, Amazon.com performance on Amazon EBS jumped by 5.6 trillion read/write I/O operations during the event, or an increase of 64 percent compared to Prime Day 2023. Also, when compared to the day before Prime Day 2024, Amazon.com transferred an incremental 444 petabytes of data during the event, or an increase of 81 percent compared to Prime Day 2023.

Amazon Aurora – On Prime Day, 6,311 database instances running the PostgreSQL-compatible and MySQL-compatible editions of Amazon Aurora processed more than 376 billion transactions, stored 2,978 terabytes of data, and transferred 913 terabytes of data.

Amazon DynamoDB – DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made tens of trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 146 million requests per second.

Amazon ElastiCache – ElastiCache served more than quadrillion requests on a single day with a peak of over 1 trillion requests per minute.

Amazon QuickSight – Over the course of Prime Day 2024, one Amazon QuickSight dashboard used by Prime Day teams saw 107K unique hits, 1300+ unique visitors, and delivered over 1.6M queries.

Amazon SageMaker – SageMaker processed more than 145B inference requests during Prime Day.

Amazon Simple Email Service (Amazon SES) – SES sent 30 percent more emails for Amazon.com during Prime Day 2024 vs 2023, delivering 99.23 percent of those emails to customers.

Amazon GuardDuty – During Prime Day 2024, Amazon GuardDuty monitored nearly 6 trillion log events per hour, a 31.9% increase from the previous year’s Prime Day.

AWS CloudTrail – CloudTrail processed over 976 billion events in support of Prime Day 2024.

Amazon CloudFront – CloudFront handled a peak load of over 500 million HTTP requests per minute, for a total of over 1.3 trillion HTTP requests during Prime Day 2024, a 30 percent increase in total requests compared to Prime Day 2023.

Prepare to Scale
As Jeff noted in every year, rigorous preparation is key to the success of Prime Day and our other large-scale events. For example, 733 AWS Fault Injection Service experiments were run to test resilience and ensure Amazon.com remains highly available on Prime Day.

If you are preparing for a similar business-critical events, product launches, and migrations, I strongly recommend that you take advantage of newly-branded AWS Countdown, a support program designed for your project lifecycle to assess operational readiness, identify and mitigate risks, and plan capacity, using proven playbooks developed by AWS experts. For example, with additional help from AWS Countdown, Legal Zoom successfully migrated 450 servers with minimal issues and continues to leverage AWS Countdown Premium to streamline and expedite the launch of SaaS applications.

We look forward to seeing what other records will be broken next year!

— Channy & Jeff;

Using Amazon APerf to go from 50% below to 36% above performance target

2024-08-07 Macey Neff

Post Syndicated from Macey Neff original https://aws.amazon.com/blogs/compute/using-amazon-aperf-to-go-from-50-below-to-36-above-performance-target/

This post is written by Tyler Jones, Senior Solutions Architect – Graviton, AWS.

Performance tuning the Renaissance Finagle-http benchmark

Sometimes software doesn’t perform the way it’s expected to across different systems. This can be due to a configuration error, code bug, or differences in hardware performance. Amazon APerf is a powerful tool designed to help identify and address performance issues on AWS instances and other computers. APerf captures comprehensive system metrics simultaneously and then visualizes them in an interactive report. The report allows users to analyze metrics such as CPU usage, interrupts, memory usage, and CPU core performance counters (PMU) together. APerf is particularly useful for performance tuning workloads across different instance types, as it can generate side-by-side reports for easy comparison. APerf is valuable for developers, system administrators, and performance engineers who need to optimize application performance on AWS. From here on we use the Renaissance benchmark as an example to demonstrate how APerf is used to debug and find performance bottlenecks.

The example

The Renaissance finagle-http benchmark was unexpectedly found to run 50% slower on a c7g.16xl Graviton3 than on a reference instance, both initially using the Linux-5.15.60 kernel. This is unexpected behavior.Graviton3 should be performing as good or better than our reference instance as it does for other Java based workloads. It’s likely there is a configuration problem somewhere. The Renaissance finagle-http benchmark is written in Scala but produces Java byte code, so our investigation will focus on the Java JVM as well as system-level configurations.

Overview

System performance tuning is an iterative process that is conducted in two main phases, the first focuses on overall system issues, and the second focuses on CPU core bottlenecks. APerf is used to assist in both phases.

APerf can render several instances’ data in one report, side by side, typically a reference system and the system to be tuned. The reference system provides values to compare against. In isolation, metrics are harder to evaluate. A metric may be acceptable in general but the comparison to the reference system makes it easier to spot room for improvement.

APerf helps to identify unusual system behavior, such as high interrupt load, excessive I/O wait, unusual network layer patterns, and other such issues in the first phase. After adjustments are made to address these issues, for example by modifying JVM flags, the second phase starts. Using the system tuning of the first phase, fresh APerf data is collected and evaluated with a focus on CPU core performance metrics.

Any inferior metric of the SUT CPU core, as compared to the reference system, holds potential for improvement. In the following section we discuss the two phases in detail.
For more background on system tuning, refer to the Performance Runbook in the AWS Graviton Getting Started guide.

Initial data collection

Here is an example for how 240 seconds of system data is collected with APerf:

#enable PMU access
echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid
#APerf has to open more than the default limit of files.
ulimit -n 65536
#usually aperf would be run in another terminal. 
#For illustration purposes it is send to the background here
./aperf record --run-name finagle_1 --period 240 &
#With 64 CPUs it takes APerf a little less than 15s to report readiness 
#for data collection.
sleep 15
java -jar renaissance-gpl-0.14.2.jar -r 8 finagle-http

The APerf report is generated as follows:

./aperf report --run finagle_1

Then, the report can be viewed with a web browser:

firefox aperf_report_finagle_1/index.html

The APerf report can render several data sets in the same report, side-by-side.

./aperf report --run finagle_1_c7g --run finagle_1_reference --run ...

Note that it is crucial to examine the CPU usage over time shown in APerf. Some data may be gathered while the system is idle. The metrics during idle times have no significant value.

First phase: system level

In this phase we look for differences on the system level. To do this APerf collects data during runs of finagle-http on c7g.16xl and the reference. The reference system provides the target numbers to compare against. Any large difference warrants closer inspection.

The first differences can be seen in the following figure.

The APerf CPU usage plot shows larger drops on Graviton3 (highlighted in red) at the beginning of each run than on the reference instance.

Figure 1: CPU usage. c7g.16xl on the left, reference on the right.

The log messages about the GC runtime hint at a possible reason as they coincide with the dips in CPU usage.

====== finagle-http (web) [default], iteration 0 started ======
GC before operation: completed in 32.692 ms, heap usage 87.588 MB → 31.411 MB.
====== finagle-http (web) [default], iteration 0 completed (6534.533 ms) ======

The JVM tends to spend a significant amount of time in garbage collection, during which time it has to suspend all threads, and choosing a different GC may have a positive impact.

The default GC on OpenJDK17 is G1GC. Using parallelGC is an alternative given that the instances have 64 CPUs, and thus GC can be performed highly parallel. The Graviton Getting Started guide also recommends checking the GC log when working on Java performance issues. A cross check using the JVM’s -Xlog:gc option confirms the reduced GC time with parallelGC.

The second difference is evident in the following figure, the CPU-to-CPU interrupts (IPI). There is more than 10x higher activity on Graviton 3, which means additional IRQ work c7g.16xl on which the reference system does not have to spend CPU cycles.

Figure 2: IPI0/RES Interrupts. c7g.16xl on the left, reference on the right.

Grepping through kernel commit messages can help find patches that address a particular issue, such as IPI inefficiencies.

This scheduling patch improves performance by 19%. Another IPI patch provides an additional 1% improvement. Switching to Linux-5.15.61 allows us to use these IPI improvements. The following figure shows the effect in APerf.

Figure 3: IPI0 Interrupts (c7g.16xl)

Second phase: CPU core level

Now that the system level issues are mitigated, the focus is on the CPU cores. The data collected by APerf shows PMU data where the reference instance and Graviton3 differ significantly.

PMU metric	Graviton3	Reference system
Branch Prediction Misses/1000 Instructions	16	4
Instruction Translation Lookaside Buffer (TLB) Misses/1000 Instructions	8.3	4
Instructions Per Clock cycle	0.6	0.6*

*Note that reference has a 20% higher CPU clock than c7g.16xl. As a rule of thumb, instructions per clock (IPC) multiplied by clock rate equals work done by a CPU.

Addressing CPU Core bottlenecks

The first improvement in PMU metrics stems from the parallelGC option. Although the intention was to increase CPU usage, the following figure shows lowered branch miss counts as well. Limiting the JIT tiered compilation to only use C2 mode helps branch prediction by reducing branch indirection and increasing the locality of executed code. Finally adding Transparent Huge Pages helps branch prediction logic and avoids lengthy address translation look-ups in DDR memory. The following graphs show the effects of the chosen JVM options.

Branch misses/1000 instructions (c7g.16xl)

Figure 4: Branch misses per 1k instructions

JVM Options from left to right:

-XX:+UseParallelGC
-XX:+UseParallelGC -XX:-TieredCompilation
-XX:+UseParallelGC -XX:-TieredCompilation -XX:+UseTransparentHugePages

With the options listed under the preceding figure, APerf shows the branch prediction miss rate decreasing from the initial 16 to 11. Branch mis-predictions incur significant performance penalties as they result in wasted cycles spent computing results that ultimately need to be discarded. Furthermore, these mis-predictions cause the prefetching and cache subsystems to fail to load the necessary subsequent instructions into cache. Consequently, costly pipeline stalls and frontend stalls occur, preventing the CPU from executing instructions.

Code sparsity

Figure 5: Code sparsity

JVM Options from left to right:

-XX:+UseParallelGC
-XX:+UseParallelGC -XX:-TieredCompilation
-XX:+UseParallelGC -XX:-TieredCompilation -XX:+UseTransparentHugePages

Code sparsity is a measure of how compact the instruction code is packed and how closely related code is placed. This is where turning off tiered compilation shows its effect. Lower sparsity helps branch prediction and the cache subsystem.

Instruction TLB misses/1000 instructions (c7g.16xl)

Figure 6: Instruction TLB misses per 1k Instructions

JVM Options from left to right:

-XX:+UseParallelGC
-XX:+UseParallelGC -XX:-TieredCompilation
-XX:+UseParallelGC -XX:-TieredCompilation -XX:+UseTransparentHugePages

The big decrease in TLB misses is caused by the use of transparent huge pages, which increase the likelihood that a virtual address translation is present in the TLB, since fewer entries are needed. Translation table walks are avoided that otherwise need to traverse entries in DDR memory that cost hundreds of CPU cycles to read.

Instructions per clock cycle (IPC)

Figure 7: Instructions per clock cycle

JVM Options from left to right:

-XX:+UseParallelGC
-XX:+UseParallelGC -XX:-TieredCompilation
-XX:+UseParallelGC -XX:-TieredCompilation -XX:+UseTransparentHugePages

The preceding figure shows IPC increasing from 0.58 to 0.71 as JVM flags are added.

Results

This table summarizes the measures taken for performance improvement and their results.

JVM option set	baseline	1	2	3
IPC	0.6	0.6	0.63	0.71
Branch misses/1000 instructions	16	14	12	11
ITLB misses/1000 instructions	8.3	7.7	8.8	1.1
Benchmark runtime [ms]	6000	3922	3843	3512
execution time improvement vs baseline

+parallelGC
+parallelGC -tieredCompilation
+parallelGC -tieredCompilation +UseTransparentHugePages

Re-examining the setup of our testing enviornment: Changing where the load-generator lives

With the preceding efforts, c7g.16xl is within 91% of the reference system. The c7g.16xl branch prediction miss rate is still higher at 11 than the references at 4. As shown in the preceding figure, reduced branch prediction misses have a strong positive effect on performance. What follows is an experiment to achieve parity or better with the reference system based on the reduction of branch prediction misses.

Finagle-http serves HTTP requests generated by wrk2, which is a load generator implemented in C. The expectation is that the c7g.16xl branch predictor works better with the native wrk2 binary, unlike the Renaissance load generator, which is executing on the JVM. The wrk2 load generator and the finagle-http are assigned through taskset to separate sets of CPUs: 16 CPUs for wrk2 and 48 CPUs for finagle-http. The idea here is to have the branch predictors on these CPU sets focus on a limited code set. The following diagram illustrates the difference between the Renaissance and the experimental setup.

With this CPU performance tuned setup, c7g.16xl can now handle a 36% higher request load than the reference using the same configuration, at an average latency limit of 1.5ms. This illustrates the impact that system tuning with APerf can have. The same system that scored 50% lower than the comparison system now exceeds it by 36%.
The following APerf data shows the improvement of key PMU metrics that lead to the performance jump.

Branch prediction misses/1000 instructions

Figure 8: Branch misses per 1k instructions

Left chart: Optimized Java-only setup. Right chart: Finagle-http with wrk2 load generator

The branch prediction miss rate is reduced to 1.5 from 11 with the Java-only setup.

IPC

Figure 9: Instructions Per Clock Cycle

Left chart: Optimized Java-only setup. Right chart: Finagle-http with wrk2 load generator

The IPC steps up from 0.7 to 1.5 due to the improvement in branch prediction.

Code sparsity

Figure 10: Code Sparsity

Left chart: Optimized Java-only setup. Right chart: Finagle-http with wrk2 load generator

The code sparsity decreases to 0.014 from 0.21, a factor of 15.

Conclusion

AWS created the APerf tool to aid in root cause analysis and help address performance issues for any workload. APerf is a standalone binary that captures relevant data simultaneously as a time series, such as CPU usage, interrupt frequency, memory usage, and CPU core metrics (PMU counters). APerf can generate reports for multiple data captures, making it easy to spot differences between instance types. We were able to use this data to analyze why Graviton3 was underperforming and to also see the impact of our changes in terms of performance. Using APerf we were able to successfully adjust configuration parameters and go from 50% below our performance target to 36% more performant than our reference system and associated performance target. Without Aperf, collecting these metrics and visualizing them is a non-trival task. With Aperf you can capture and visualize these metrics with two short commands, saving you time and effort so you can focus on what matters most: getting the most performance from your application.

Amazon RDS now supports io2 Block Express volumes for mission-critical database workloads

2024-03-06 Abhishek Gupta

Post Syndicated from Abhishek Gupta original https://aws.amazon.com/blogs/aws/amazon-rds-now-supports-io2-block-express-volumes-for-mission-critical-database-workloads/

Today, I am pleased to announce the availability of Provisioned IOPS (PIOPS) io2 Block Express storage volumes for all database engines in Amazon Relational Database Service (Amazon RDS). Amazon RDS provides you the flexibility to choose between different storage types depending on the performance requirements of your database workload. io2 Block Express volumes are designed for critical database workloads that require high performance and high throughput at low latency.

Lower latency and higher availability for I/O intensive workloads
With io2 Block Express volumes, your database workloads will benefit from consistent sub-millisecond latency, enhanced durability to 99.999 percent over io1 volumes, and drive 20x more IOPS from provisioned storage (up to 1,000 IOPS per GB) at the same price as io1. You can upgrade from io1 volumes to io2 Block Express volumes without any downtime, significantly improving the performance and reliability of your applications without increasing storage cost.

“We migrated all of our primary Amazon RDS instances to io2 Block Express within 2 weeks,” said Samir Goel, Director of Engineering at Figma, a leading platform for teams that design and build digital products. “Io2 Block Express has had a profound impact on the availability of the database layer at Figma. We have deeply appreciated the consistency of performance with io2 Block Express — in our observations, the latency variability has been under 0.1ms.”

io2 Block Express volumes support up to 64 TiB of storage, up to 256,000 Provisioned IOPS, and a maximum throughput of 4,000 MiB/s. The throughput of io2 Block Express volumes varies based on the amount of provisioned IOPS and volume storage size. Here is the range for each database engine and storage size:

Database engine	Storage size	Provisioned IOPS	Maximum throughput
Db2, MariaDB, MySQL, and PostgreSQL	Between 100 and 65,536 GiB	1,000–256,000 IOPS	4,000 MiB/s
Oracle	Between 100 and 199 GiB	1,000–199,000 IOPS	4,000 MiB/s
Oracle	Between 200 and 65,536 GiB	1,000–256,000 IOPS	4,000 MiB/s
SQL Server	Between 20 and 16,384 GiB	1,000–64,000 IOPS	4,000 MiB/s

Getting started with io2 Block Express in Amazon RDS
You can use the Amazon RDS console to create a new RDS instance configured with an io2 Block Express volume or modify an existing instance with io1, gp2, or gp3 volumes.

Here’s how you would create an Amazon RDS for PostgreSQL instance with io2 Block Express volume.

Start with the basic information such as engine and version. Then, choose Provisioned IOPS SDD (io2) from the Storage type options:

Use the following AWS CLI command to create a new RDS instance with io2 Block Express volume:

aws rds create-db-instance --storage-type io2 --db-instance-identifier new-db-instance --db-instance-class db.t4g.large --engine mysql --master-username masteruser --master-user-password <enter password> --allocated-storage 400 --iops 3000

Similarly, to modify an existing RDS instance to use io2 Block Express volume:

aws rds modify-db-instance --db-instance-identifier existing-db-instance --storage-type io2 --allocated-storage 500 --iops 3000 --apply-immediately

Things to know

io2 Block Express volumes are available on all RDS databases using AWS Nitro System instances.
io2 Block Express volumes support an IOPS to allocated storage ratio of 1000:1. As an example, With an RDS for PostgreSQL instance, the maximum IOPS can be provisioned with volumes 256 GiB and larger (1,000 IOPS × 256 GiB = 256,000 IOPS).
For DB instances not based on the AWS Nitro System, the ratio of IOPS to allocated storage is 500:1. In this case, maximum IOPS can be achieved with 512 GiB volume (500 IOPS x 512 GiB = 256,000 IOPS).

Available now
Amazon RDS io2 Block Express storage volumes are supported for all RDS database engines and are available in US East (Ohio, N. Virginia), US West (N. California, Oregon), Asia Pacific (Hong Kong, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Stockholm), and Middle East (Bahrain) Regions.

In terms of pricing and billing, io1 volumes and io2 Block Express storage volumes are billed at the same rate. For more information, see the Amazon RDS pricing page.

Learn more by reading about Provisioned IOPS SSD storage in the Amazon RDS User Guide.

— Abhishek

Amazon ECS supports a native integration with Amazon EBS volumes for data-intensive workloads

2024-01-12 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/amazon-ecs-supports-a-native-integration-with-amazon-ebs-volumes-for-data-intensive-workloads/

Today we are announcing that Amazon Elastic Container Service (Amazon ECS) supports an integration with Amazon Elastic Block Store (Amazon EBS), making it easier to run a wider range of data processing workloads. You can provision Amazon EBS storage for your ECS tasks running on AWS Fargate and Amazon Elastic Compute Cloud (Amazon EC2) without needing to manage storage or compute.

Many organizations choose to deploy their applications as containerized packages, and with the introduction of Amazon ECS integration with Amazon EBS, organizations can now run more types of workloads than before.

You can run data workloads requiring storage that supports high transaction volumes and throughput, such as extract, transform, and load (ETL) jobs for big data, which need to fetch existing data, perform processing, and store this processed data for downstream use. Because the storage lifecycle is fully managed by Amazon ECS, you don’t need to build any additional scaffolding to manage infrastructure updates, and as a result, your data processing workloads are now more resilient while simultaneously requiring less effort to manage.

Now you can choose from a variety of storage options for your containerized applications running on Amazon ECS:

Your Fargate tasks get 20 GiB of ephemeral storage by default. For applications that need additional storage space to download large container images or for scratch work, you can configure up to 200 GiB of ephemeral storage for your Fargate tasks.
For applications that span many tasks that need concurrent access to a shared dataset, you can configure Amazon ECS to mount the Amazon Elastic File System (Amazon EFS) file system to your ECS tasks running on both EC2 and Fargate. Common examples of such workloads include web applications such as content management systems, internal DevOps tools, and machine learning (ML) frameworks. Amazon EFS is designed to be available across a Region and can be simultaneously attached to many tasks.
For applications that need high-performance, low-cost storage that does not need to be shared across tasks, you can configure Amazon ECS to provision and attach Amazon EBS storage to your tasks running on both Amazon EC2 and Fargate. Amazon EBS is designed to provide block storage with low latency and high performance within an Availability Zone.

To learn more, see Using data volumes in Amazon ECS tasks and persistent storage best practices in the AWS documentation.

Getting started with EBS volume integration to your ECS tasks
You can configure the volume mount point for your container in the task definition and pass Amazon EBS storage requirements for your Amazon ECS task at runtime. For most use cases, you can get started by simply providing the size of the volume needed for the task. Optionally, you can configure all EBS volume attributes and the file system you want the volume formatted with.

1. Create a task definition
Go to the Amazon ECS console, navigate to Task definitions, and choose Create new task definition.

In the Storage section, choose Configure at deployment to set EBS volume as a new configuration type. You can provision and attach one volume per task for Linux file systems.

When you choose Configure at task definition creation, you can configure existing storage options such as bind mounts, Docker volumes, EFS volumes, Amazon FSx for Windows File Server volumes, or Fargate ephemeral storage.

Now you can select a container in the task definition, the source EBS volume, and provide a mount path where the volume will be mounted in the task.

You can also use $aws ecs register-task-definition --cli-input-json file://example.json command line to register a task definition to add an EBS volume. The following snippet is a sample, and task definitions are saved in JSON format.

{
    "family": "nginx"
    ...
    "containerDefinitions": [
        {
            ...
            "mountPoints": [
                "containerPath": "/foo",
                "sourceVoumne": "new-ebs-volume"
            ],
            "name": "nginx",
            "image": "nginx"
        }
    ],
    "volumes": [
       {
           "name": "/foo",
           "configuredAtRuntime": true
       }
    ]
}

2. Deploy and run your task with EBS volume
Now you can run a task by selecting your task in your ECS cluster. Go to your ECS cluster and choose Run new task. Note that you can select the compute options, the launch type, and your task definition.

Note: While this example goes through deploying a standalone task with an attached EBS volume, you can also configure a new or existing ECS service to use EBS volumes with the desired configuration.

You have a new Volume section where you can configure the additional storage. The volume name, type, and mount points are those that you defined in your task definition. Choose your EBS volume types, sizes (GiB), IOPs, and the desired throughput.

You cannot attach an existing EBS volume to an ECS task. But if you want to create a volume from an existing snapshot, you have the option to choose your snapshot ID. If you want to create a new volume, then you can leave this field empty. You can choose the file system type, either ext3 or ext4 file systems on Linux.

By default, when a task is terminated, Amazon ECS deletes the attached volume. If you need the data in the EBS volume to be retained after the task exits, check Delete on termination. Also, you need to create an AWS Identity and Access Management (IAM) role for volume management that contains the relevant permissions to allow Amazon ECS to make API calls on your behalf. For more information on this policy, see infrastructure role in the AWS documentation.

You can also configure encryption on your EBS volumes using either Amazon managed keys and customer managed keys. To learn more about the options, see our Amazon EBS encryption in the AWS documentation.

After configuring all task settings, choose Create to start your task.

3. Deploy and run your task with EBS volume
Once your task has started, you can see the volume information on the task definition details page. Choose a task and select the Volumes tab to find your created EBS volume details.

Your team can organize the development and operations of EBS volumes more efficiently. For example, application developers can configure the path where your application expects storage to be available in the task definition, and DevOps engineers can configure the actual EBS volume attributes at runtime when the application is deployed.

This allows DevOps engineers to deploy the same task definition to different environments with differing EBS volume configurations, for example, gp3 volumes in the development environments and io2 volumes in production.

Now available
Amazon ECS integration with Amazon EBS is available in nine AWS Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). You only pay for what you use, including EBS volumes and snapshots. To learn more, see the Amazon EBS pricing page and Amazon EBS volumes in ECS in the AWS documentation.

Give it a try now and send feedback to our public roadmap, AWS re:Post for Amazon ECS, or through your usual AWS Support contacts.

— Channy

P.S. Special thanks to Maish Saidel-Keesing, a senior enterprise developer advocate at AWS for his contribution in writing this blog post.

AWS Weekly Roundup — AWS Lambda, AWS Amplify, Amazon OpenSearch Service, Amazon Rekognition, and more — December 18, 2023

2023-12-18 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-lambda-aws-amplify-amazon-opensearch-service-amazon-rekognition-and-more-december-18-2023/

My memories of Amazon Web Services (AWS) re:Invent 2023 are still fresh even when I’m currently wrapping up my activities in Jakarta after participating in AWS Community Day Indonesia. It was a great experience, from delivering chalk talks and having thoughtful discussions with AWS service teams, to meeting with AWS Heroes, AWS Community Builders, and AWS User Group leaders. AWS re:Invent brings the global AWS community together to learn, connect, and be inspired by innovation. For me, that spirit of connection is what makes AWS re:Invent always special.

Here’s a quick look of my highlights at AWS re:Invent and AWS Community Day Indonesia:

If you missed AWS re:Invent, you can watch the keynotes and sessions on demand. Also, check out the AWS News Editorial Team’s Top announcements of AWS re:Invent 2023 for all the major launches.

Recent AWS launches
Here are some of the launches that caught my attention in the past two weeks:

Query MySQL and PostgreSQL with AWS Amplify – In this post, Channy wrote how you can now connect your MySQL and PostgreSQL databases to AWS Amplify with just a few clicks. It generates a GraphQL API to query your database tables using AWS CDK.

Migration Assistant for Amazon OpenSearch Service – With this self-service solution, you can smoothly migrate from your self-managed clusters to Amazon OpenSearch Service managed clusters or serverless collections.

AWS Lambda simplifies connectivity to Amazon RDS and RDS Proxy – Now you can connect your AWS Lambda to Amazon RDS or RDS proxy using the AWS Lambda console. With a guided workflow, this improvement helps to minimize complexities and efforts to quickly launch a database instance and correctly connect a Lambda function.

New no-code dashboard application to visualize IoT data – With this announcement, you can now visualize and interact with operational data from AWS IoT SiteWise using a new open source Internet of Things (IoT) dashboard.

Amazon Rekognition improves Face Liveness accuracy and user experience – This launch provides higher accuracy in detecting spoofed faces for your face-based authentication applications.

AWS Lambda supports additional concurrency metrics for improved quota monitoring – Add CloudWatch metrics for your Lambda quotas, to improve visibility into concurrency limits.

AWS Malaysia now supports 3D-Secure authentication – This launch enables 3DS2 transaction authentication required by banks and payment networks, facilitating your secure online payments.

Announcing AWS CloudFormation template generation for Amazon EventBridge Pipes – With this announcement, you can now streamline the deployment of your EventBridge resources with CloudFormation templates, accelerating event-driven architecture (EDA) development.

Enhanced data protection for CloudWatch Logs – With the enhanced data protection, CloudWatch Logs helps identify and redact sensitive data in your logs, preventing accidental exposure of personal data.

Send SMS via Amazon SNS in Asia Pacific – With this announcement, now you can use SMS messaging across Asia Pacific from the Jakarta Region.

Lambda adds support for Python 3.12 – This launch brings the latest Python version to your Lambda functions.

CloudWatch Synthetics upgrades Node.js runtime – Now you can use Node.js 16.1 runtimes for your canary functions.

Manage EBS Volumes for your EC2 fleets – This launch simplifies attaching and managing EBS volumes across your EC2 fleets.

See you next year!
This is the last AWS Weekly Roundup for this year, and we’d like to thank you for being our wonderful readers. We’ll be back to share more launches for you on January 8, 2024.

Happy holidays!

— Donnie

Amazon EBS Snapshots Archive is now available with AWS Backup

2023-11-27 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/amazon-ebs-snapshots-archive-is-now-available-with-aws-backup/

Today we announce the availability of Amazon Elastic Block Store (Amazon EBS) Snapshots Archive with AWS Backup. Previously available only in the Amazon EC2 console or Amazon Data Lifecycle Manager, this feature gives you the ability to transition your infrequently accessed Amazon EBS Snapshots to low-cost archive, long-term storage of your rarely-accessed snapshots that do not need frequent or fast retrieval.

Amazon EBS Snapshots Archive in the AWS Backup console
Snapshots Archive with AWS Backup is only available for snapshots with a backup frequency of one month or longer (28-day cron expression) and a retention of more than 90 days. This is a protective measure to ensure that you don’t archive snapshots, such as hourly snapshots that wouldn’t benefit from the transition to the cold tier.

The ability to archive Amazon EBS Snapshots is a new parameter of the Lifecycle section of the AWS Backup Plans. You must explicitly opt into moving your Amazon EBS Snapshots to cold storage, because this has different properties of our existing cold storage including:

Always converting an incremental backup to a full backup.
Longer recovery time objective (RTO) (up to 72 hours).
Limitations on the frequency of backups that can be transitioned to cold storage (monthly or greater).

Time in warm storage indicates how long the backups will remain in warm storage before they are transitioned to cold storage. Total retention period is the total time the backups will be retained by AWS Backup, and its value is the sum of both warm and cold storage. For backups in cold storage, the minimum retention period is 90 days. This is why the default total retention is 98 days (8 days in warm + 90 days in cold). The bar graph shows the total retention of your backups and where the backups will reside during that time. In the example shown in this graph, 8 days is in warm storage (red bar), and 90 days is in cold storage (blue bar).

To restore or use the archived Amazon EBS snapshot today (outside of AWS Backup), you have to follow a two-step process:

Temporarily or permanently restore the snapshot from archive to standard tier.
Once it’s in standard tier, call the CreateVolume API from the standard tier.

With this announcement, using either the AWS Backup console or the API to restore the archived Amazon EBS snapshot in AWS Backup, the following restore workflow applies:

Enter the number of days you want to temporarily restore your snapshot from cold to standard tier.
Choose your volume configuration.

The end result will be a restored EBS volume. You will not have to manually move the snapshot from cold to standard tier, then restore the volume, this will be done automatically for you.

Now available
Amazon EBS Snapshots Archive with AWS Backup is available for you today in all AWS Regions except China and AWS GovCloud (US).

As usual, you pay as you go, with no minimum or fixed fees. There are two metrics that influence Amazon EBS Snapshots Archive billing: data storage and data retrieval. You are charged for a 90-day period at minimum. This means that if you delete a snapshot archive or permanently restore it less than 90 days after creation, then we charge for the full 90-day period. The AWS Backup pricing page has the details.

– Veliswa

New – Amazon EBS Snapshot Lock

2023-11-16 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-ebs-snapshot-lock/

You can now lock individual Amazon Elastic Block Store (Amazon EBS) snapshots in order to enforce better compliance with your data retention policies. Locked snapshots cannot be deleted until the lock is expired or released, giving you the power to keep critical backups safe from accidental or malicious deletion, including ransomware attacks.

The Need for Locking
AWS customers use EBS snapshots for backups, disaster recovery, data migration, and compliance. Customers in financial services and health care often need to meet specific compliance requirements, with prescribed time frames for retention, and also need to ensure that the snapshots are truly Write Once Read Many (WORM). In order to meet these requirements, customers have implemented solutions that use multiple AWS accounts with one-way “air gaps” between them.

EBS Snapshot Lock
The new EBS Snapshot Lock feature helps you to meet your retention and compliance requirements without the need for custom solutions. You can lock new and existing EBS snapshots using a lock duration that can range from one day to about 100 years. The snapshot is locked for the specified duration and cannot be deleted.

There are two lock modes:

Governance – This mode protects snapshots from deletions by all users. However, with the proper IAM permissions, the lock duration can be extended or shortened, the lock can be deleted, and the mode can be changed from Governance mode to Compliance mode.

Compliance – This mode protects snapshots from actions by the root user and all IAM users. After a cooling-off period of up to 72 hours, neither the snapshot nor the lock can be deleted until the lock duration expires, and the mode cannot be changed. With the proper IAM permissions the lock duration can be extended, but it cannot be shortened.

Snapshots in either mode can still be shared or copied. They can be archived to the low-cost Amazon EBS Snapshots Archive tier, and locks can be applied to snapshots that have already been archived.

Using Snapshot Lock
From the EBS Console I select a snapshot (Snap-Monthly-2023-09) and choose Manage snapshot lock from Snapshot Settings in the Actions menu:

This is a monthly snapshot and I want to lock it for one year. I choose Governance mode and select the duration, then click Save lock settings:

I try to delete it, and the deletion fails, as it should:

Now I would like to lock one of my annual snapshots for 5 years, using Compliance mode this time:

I set my cooling-off period to 24 hours, just in case I change my mind. Perhaps I have to run some kind of audit or final date validation on the snapshot before committing to keeping it around for five years.

Programmatically, I can use new API functions to establish and control locks on my EBS snapshots:

LockSnapshot – Lock a snapshot in governance or compliance mode, or modify the settings of a snapshot that is already locked.

UnlockSnapshot – Unlock a snapshot that is is governance mode, or is in compliance mode but within the cooling-off period.

DescribeLockedSnapshots – Get information about the lock status of my snapshots, with optional filtering based on the state of the lock.

IAM users must have the appropriate permissions (ec2:lockSnapshot, ec2:UnlockSnapshot, and ec2:DescribeLockedSnapshots) in order to use these functions.

Things to Know
Here are a couple of things to keep in mind about this new feature:

AWS Backup – AWS Backup independently manages retention for the snapshots that it creates. We do not recommend locking them.

Pricing – There is no extra charge for the use of this feature. You pay the usual rates for storage of snapshots and archived snapshots.

Regions – EBS Snapshot Locking is available in all commercial AWS Regions.

KMS Key Retention – If you are using customer-managed AWS Key Management Service (AWS KMS) keys to encrypt your EBS volumes and snapshots, you need to make sure that the key will remain valid for the lifetime of the snapshot.

— Jeff;

AWS Weekly Roundup – CloudFront security dashboard, EBS snapshots improvements, and more – November 13, 2023

2023-11-13 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-cloudfront-security-dashboard-ebs-snapshots-improvements-and-more-november-13-2023/

This week, it was really difficult to choose what to recap here because, as we’re getting closer to AWS re:Invent, service teams are delivering new capabilities at an incredible pace.

Last week’s launches
Here are some of the launches that caught my attention last week:

Amazon Aurora – Aurora MySQL zero-ETL integration with Amazon Redshift is now generally available. Get a walk-through in our AWS News Blog post. Here’s a recap of data integration innovations at AWS. Optimized reads for Aurora PostgreSQL provide up to 8x improved query latency and up to 30 percent cost savings for I/O-intensive applications. Here’s more of a deep dive from the AWS Database Blog.

Amazon EBS – You can now block public sharing of EBS snapshots. Read more about how that works in the launch post.

Amazon Data Lifecycle Manager – Support for pre- and post-script automation of EBS snapshots simplifies application-consistent snapshots. Here’s how to use it with Windows applications.

AWS Health – There’s now improved visibility into planned lifecycle events like end of standard support of a Kubernetes version in Amazon EKS, Amazon RDS certificate rotations, and end of support for other open source software. Here’s how it works.

Amazon CloudFront – Unified security dashboard to enable, monitor, and manage common security protections for your web applications directly from the CloudFront console. Read more at Introducing CloudFront Security Dashboard, a Unified CDN and Security Experience.

Amazon Connect – Reduced outbound telephony pricing across Europe and South America. It’s also easier now to deliver persistent chat experiences for end users.

AWS Lambda – Busy week for the Lambda team! There is now support for Amazon Linux 2023 as both a managed runtime and a container base image. More details in this Compute Blog post. There’s also enhanced auto scaling for Kafka event sources (the Compute Blog has a post with more details) and faster polling scale-up rate for Amazon SQS events when AWS Lambda functions are configured with SQS.

AWS CodeBuild – Now supports AWS Lambda compute to build and test software packages. Read about how it works in this post.

Amazon SQS – Now supports JSON protocol to reduce latency and client-side CPU usage. More in the launch post. There’s also a new integration for Amazon SQS in the Amazon EventBridge Pipes console (the week before that, Amazon Kinesis Data Streams was also integrated into the EventBridge Pipes console).

Amazon SNS – FIFO topics now support 3,000 messages per second by default.

Amazon EventBridge – There are 22 additional Amazon CloudWatch metrics to help you monitor the performance of your event buses. More info in this post from the AWS Compute Blog.

Amazon OpenSearch Service – Neural search makes it easier to create and manage semantic search applications.

Amazon Timestream – The UNLOAD statement simplifies exporting time-series data for additional insights.

Amazon Comprehend – New trust and safety features with toxicity detection and prompt safety classification. Read how to apply that to generative AI applications using LangChain.

AWS App Runner – Now available in London, Mumbai, and Paris AWS Regions.

AWS Application Migration Service – Support for AWS App2Container replatforming of .NET and Java based applications.

Amazon FSx for OpenZFS – Now available in ten additional AWS Regions with support for additional deployment types in seven Regions.

AWS Global Accelerator – There’s now IPv6 support for Network Load Balancer (NLB) endpoints. It was already available for Application Load Balancers (ALBs) and Amazon Elastic Compute Cloud (Amazon EC2) instances.

Amazon GuardDuty – New machine learning (ML) capability enhances threat detection for Amazon EKS.

Other AWS news
Some other news and blog posts that you might have missed:

AWS Local Zones Credit Program – If you have low-latency or data residency requirements for your application, our Local Zones Credit Program can get you started. Fill out our form to receive $500 in AWS credits and apply it to a Local Zones workload.

Amazon CodeWhisperer – Customizing coding companions for organizations and optimizing for sustainability.

Sharing what we have learned – Creating a correction of errors document to understand what went wrong and what would be done to prevent it from happening again.

Good tips for containers – Securing API endpoints using Amazon API Gateway and Amazon VPC Lattice.

Another post in this amazing series – Let’s Architect! Tools for developers.

A few highlights from Community.AWS:

Don’t miss the latest AWS open source newsletter by my colleague Ricardo.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS Community Days – Join a community-led conference run by AWS user group leaders in your region: Uruguay (November 14), Central Asia (Kazakhstan, Uzbekistan, Kyrgyzstan, and Mongolia on November 17–18), and Guatemala (November 18).

AWS re:Invent (November 27 – December 1) – Join us to hear the latest from AWS, learn from experts, and connect with the global cloud community. Browse the session catalog and attendee guides and check out the highlights for generative AI. In the AWS re:Invent Builder Hub you can find developer-focused sessions, events, competitions, and content.

Here you can browse all upcoming AWS-led in-person and virtual events and developer-focused events.

And that’s all from me for this week. We’re now taking a break. The next weekly roundup will be after re:Invent!

— Danilo

This post is part of our Weekly Roundup series. Check back for a quick roundup of interesting news and announcements from AWS!

New – Block Public Sharing of Amazon EBS Snapshots

2023-11-09 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-block-public-sharing-of-amazon-ebs-snapshots/

You now have the ability to disable public sharing of new, and optionally existing, Amazon Elastic Block Store (Amazon EBS) snapshots on a per-region, per-account basis. This provides you with another level of protection against accidental or inadvertent data leakage.

EBS Snapshot Review
You have had the power to create EBS snapshots since the launch of EBS in 2008, and have been able to share them privately or publicly since 2009. The vast majority of snapshots are kept private and are used for periodic backups, data migration, and disaster recovery. Software vendors use public snapshots to share trial-use software and test data.

Block Public Sharing
EBS snapshots have always been private by default, with the option to make individual snaps public as needed. If you do not currently use and do not plan to use public snapshots, you can now disable public sharing using the AWS Management Console, AWS Command Line Interface (AWS CLI), or the new EnableSnapshotBlockPublicAccess function. Using the Console, I visit the EC2 Dashboard and click Data protection and security in the Account attributes box:

Then I scroll down to the new Block public access for EBS snapshots section, review the current status, and click Manage:

I click Block public access, choose Block all public sharing, and click Update:

This is a per-region setting, and it takes effect within minutes. I can see the updated status in the console:

I inspect one of my snapshots in the region, and see that I cannot share it publicly:

As you can see, I still have the ability to share the snapshot with specific AWS accounts.

If I have chosen Block all public sharing, any snapshots that I have previously shared will no longer be listed when another AWS customer calls DescribeSnapshots in pursuit of publicly accessible snapshots.

Things to Know
Here are a couple of really important things to know about this new feature:

Region-Level – This is a regional setting, and must be applied in each region where you want to block the ability to share snapshots publicly.

API Functions & IAM Permissions – In addition to EnableSnapshotBlockPublicAccess, other functions for managing this feature include DisableSnapshotBlockPublicAccess and GetSnapshotBlockPublicAccessState. To use these functions (or their console/CLI equivalents) you must have the ec2:EnableSnapshotBlockPublicAccess, ec2:DisableSnapshotBlockPublicAccess, and ec2:GetSnapshotBlockPublicAccessState IAM permissions.

AMIs – This does not affect the ability to share Amazon Machine Images (AMIs) publicly. To learn how to manage public sharing of AMIs, visit Block public access to your AMIs.

— Jeff;

New – Create application-consistent snapshots using Amazon Data Lifecycle Manager and custom scripts

2023-11-07 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-create-application-consistent-snapshots-using-amazon-data-lifecycle-manager-and-custom-scripts/

Amazon Data Lifecycle Manager now supports the use of pre-snapshot and post-snapshot scripts embedded in AWS Systems Manager documents. You can use these scripts to ensure that Amazon Elastic Block Store (Amazon EBS) snapshots created by Data Lifecycle Manager are application-consistent. Scripts can pause and resume I/O operations, flush buffered data to EBS volumes, and so forth. As part of this launch we are also publishing a set of detailed blog posts that show you how to use this feature with self-managed relational databases and Windows Volume Shadow Copy Service (VSS).

Data Lifecycle Manager (DLM) Recap
As a quick recap, Data Lifecycle Manager helps you to automate the creation, retention, and deletion of Amazon EBS volume snapshots. Once you have completed the prerequisite steps such as onboarding your EC2 instance to AWS Systems Manager, setting up an IAM role for DLM, and tagging your SSM documents, you simply create a lifecycle policy and indicate (via tags) the applicable Amazon Elastic Compute Cloud (Amazon EC2) instances, set a retention model, and let DLM do the rest. The policies specify when they are to be run, what is to be backed up, and how long the snapshots must be retained. For a full walk-through of DLM, read my 2018 blog post, New – Lifecycle Management for Amazon EBS Snapshots.

Application Consistent Snapshots
EBS snapshots are crash-consistent, meaning that they represent the state of the associated EBS volume at the time that the snapshot was created. This is sufficient for many types of applications, including those that do not use snapshots to capture the state of an active relational database. To make a snapshot that is application-consistent, it is necessary to take pending transactions into account (either waiting for them to finish or causing them to fail), momentarily pause further write operations, take the snapshot, and then resume normal operations.

And that’s where today’s launch comes in. DLM now has the ability to tell the instance to prepare for an application-consistent backup. The pre-snapshot script can manage pending transactions, flush in-memory data to persistent storage, freeze the filesystem, or even bring the application or database to a stop. Then the post-snapshot script can bring the application or database back to life, reload in-memory caches from persistent storage, thaw the filesystem, and so forth.

In addition to the base-level support for custom scripts, you can also use this feature to automate the creation of VSS Backup snapshots:

Pre and Post Scripts
The new scripts apply to DLM policies for instances. Let’s assume that I have created a policy that references SSM documents with pre-snapshot and post-snapshot scripts, and that it applies to a single instance. Here’s what happens when the policy is run per its schedule:

The pre-snapshot script is started from the SSM document.
Each command in the script is run and the script-level status (success or failure) is captured. If enabled in the policy, DLM will retry failed scripts.
Multi-volume EBS snapshots are initiated for EBS volumes attached to the instance, with further control via the policy.
The post-snapshot script is started from the SSM document,
Each command in the script is run and and the script-level status (success or failure) is captured.

The policy contains options that give you control over the actions that are taken (retry, continue, or skip) when either of the scripts times out or fails. The status is logged, Amazon CloudWatch metrics are published, Amazon EventBridge events are emitted, and the status is also encoded in tags that are automatically assigned to each snapshot.

The pre-snapshot and post-snapshot scripts can perform any of the actions that are allowed in a command document: running shell scripts, running PowerShell scripts, and so forth. The actions must complete within the timeout specified in the policy, with an allowable range of 10 seconds to 120 seconds.

Getting Started
You will need to have a detailed understanding of your application or database in order to build a robust pair of scripts. In addition to handling the “happy path” when all goes well, your scripts need to plan for several failure scenarios. For example, a pre-snapshot script should fork a background task that will serve as a failsafe in case the post-snapshot script does not work as expected. Each script must return a shell-level status code, as detailed here.

Once I have written and tested my scripts and packaged them as SSM documents, I open the Data Lifecycle Manager page in the EC2 Console, select EBS snapshot policy, and click Next step:

I target all of my instances that are tagged with a Mode of Production, and use the default IAM role (if you use a different role, it must enable access to SSM), leave the rest of the values as-is, and proceed:

On the next page I scroll down to Pre and post scripts and expand the section. I click Enable pre and post scripts, choose Custom SSM document, and then select my SSM document from the menu. I also set the timeout and retry options, and choose to default to a crash-consistent backup if one of my scripts fails. I click Review policy, do one final check, and click Create policy on the following page:

My policy is created, and will take effect right away. After it has run at least once, I can inspect the CloudWatch metrics to check for starts, completions, and failures:

Additional Reading
Here are the first of the detailed blog posts that I promised you earlier:

We have more in the works for later this year and I will update the list above when they are published.

You can also read the documentation to learn more.

DLM Videos
While I’ve got your attention, I would like to share a couple of helpful videos with you:

This new feature is available now and you can start using it today!

— Jeff;

Quickly Restore Amazon EC2 Mac Instances using Replace Root Volume capability

2023-10-12 Macey Neff

Post Syndicated from Macey Neff original https://aws.amazon.com/blogs/compute/new-reset-amazon-ec2-mac-instances-to-a-known-state-using-replace-root-volume-capability/

This post is written by Sebastien Stormacq, Principal Developer Advocate.

Amazon Elastic Compute Cloud (Amazon EC2) now supports replacing the root volume on a running EC2 Mac instance, enabling you to restore the root volume of an EC2 Mac instance to its initial launch state, to a specific snapshot, or to a new Amazon Machine Image (AMI).

Since 2021, we have offered on-demand and pay-as-you-go access to Amazon EC2 Mac instances, in the same manner as our Intel, AMD and Graviton-based instances. Amazon EC2 Mac instances integrate all the capabilities you know and love from macOS with dozens of AWS services such as Amazon Virtual Private Cloud (VPC) for network security, Amazon Elastic Block Store (EBS) for expandable storage, Elastic Load Balancing (ELB) for distributing build queues, Amazon FSx for scalable file storage, and AWS Systems Manager Agent (SSM Agent) for configuring, managing, and patching macOS environments.

Just like for every EC2 instance type, AWS is responsible for protecting the infrastructure that runs all of the services oﬀered in the AWS cloud. To ensure that EC2 Mac instances provide the same security and data privacy as other Nitro-based EC2 instances, Amazon EC2 performs a scrubbing workflow on the underlying Dedicated Host as soon as you stop or terminate an instance. This scrubbing process erases the internal SSD, clears the persistent NVRAM variables, and updates the device firmware to the latest version enabling you to run the latest macOS AMIs. The documentation has more details about this process.

The scrubbing process ensures a sanitized dedicated host for each EC2 Mac instance launch and takes some time to complete. Our customers have shared two use cases where they may need to set back their instance to a previous state in a shorter time period or without the need to initiate the scrubbing workflow. The first use case is when patching an existing disk image to bring OS-level or applications-level updates to your fleet, without manually patching individual instances in-place. The second use case is during continuous integration and continuous deployment (CI/CD) when you need to restore an Amazon EC2 Mac instance to a defined well-known state at the end of a build.

To restart your EC2 Mac instance in its initial state without stopping or terminating them, we created the ability to replace the root volume of an Amazon EC2 Mac instance with another EBS volume. This new EBS volume is created either from a new AMI, an Amazon EBS Snapshot, or from the initial volume state during boot.

You just swap the root volume with a new one and initiate a reboot at OS-level. Local data, additional attached EBS volumes, networking configurations, and IAM profiles are all preserved. Additional EBS volumes attached to the instance are also preserved, as well as the instance IP addresses, IAM policies, and security groups.

Let’s see how Replace Root Volume works

To prepare and initiate an Amazon EBS root volume replacement, you can use the AWS Management Console, the AWS Command Line Interface (AWS CLI), or one of our AWS SDKs. For this demo, I used the AWS CLI to show how you can automate the entire process.

To start the demo, I first allocate a Dedicated Host and then start an EC2 Mac instance, SSH-connect to it, and install the latest version of Xcode. I use the open-source xcodeinstall CLI tool to download and install Xcode. Typically, you also download, install, and configure a build agent and additional build tools or libraries as required by your build pipelines.

Once the instance is ready, I create an Amazon Machine Image (AMI). AMIs are disk images you can reuse to launch additional and identical EC2 Mac instances. This can be done from any machine that has the credentials to make API calls on your AWS account. In the following, you can see the commands I issued from my laptop’s Terminal application.

#
# Find the instance’s ID based on the instance name tag
#
~ aws ec2 describe-instances \
--filters "Name=tag:Name,Values=RRV-Demo" \
--query "Reservations[].Instances[].InstanceId" \
--output text 

i-0fb8ffd5dbfdd5384

#
# Create an AMI based on this instance
#
~ aws ec2 create-image \
--instance-id i-0fb8ffd5dbfdd5384 \
--name "macOS_13.3_Gold_AMI"	\
--description "macOS 13.2 with Xcode 13.4.1"

{
 
"ImageId": "ami-0012e59ed047168e4"
}

It takes a few minutes to complete the AMI creation process.

After I created this AMI, I can use my instance as usual. I can use it to build, test, and distribute my application, or make any other changes on the root volume.

When I want to reset the instance to the state of my AMI, I initiate the replace root volume operation:

~ aws ec2 create-replace-root-volume-task	\
--instance-id i-0fb8ffd5dbfdd5384 \
--image-id ami-0012e59ed047168e4
{
"ReplaceRootVolumeTask": {
"ReplaceRootVolumeTaskId": "replacevol-07634c2a6cf2a1c61", "InstanceId": "i-0fb8ffd5dbfdd5384",
"TaskState": "pending", "StartTime": "2023-05-26T12:44:35Z", "Tags": [],
"ImageId": "ami-0012e59ed047168e4", "SnapshotId": "snap-02be6b9c02d654c83", "DeleteReplacedRootVolume": false
}
}

The root Amazon EBS volume is replaced with a fresh one created from the AMI, and the system triggers an OS-level reboot.

I can observe the progress with the DescribeReplaceRootVolumeTasks API

~ aws ec2 describe-replace-root-volume-tasks \
--replace-root-volume-task-ids replacevol-07634c2a6cf2a1c61

{
"ReplaceRootVolumeTasks": [
{
"ReplaceRootVolumeTaskId": "replacevol-07634c2a6cf2a1c61", "InstanceId": "i-0fb8ffd5dbfdd5384",
"TaskState": "succeeded", "StartTime": "2023-05-26T12:44:35Z",
"CompleteTime": "2023-05-26T12:44:43Z", "Tags": [],
"ImageId": "ami-0012e59ed047168e4", "DeleteReplacedRootVolume": false
}
]
}

After a short time, the instance becomes available again, and I can connect over ssh.

~ ssh [email protected]
Warning: Permanently added '3.0.0.86' (ED25519) to the list of known hosts.
Last login: Wed May 24 18:13:42 2023 from 81.0.0.0

┌───┬──┐	 |  |_ )
│ ╷╭╯╷ │	_| (	/
│ └╮	│   |\  |  |
│ ╰─┼╯ │ Amazon EC2
└───┴──┘ macOS Ventura 13.2.1
 
ec2-user@ip-172-31-58-100 ~ %

Additional thoughts

There are a couple of additional points to know before using this new capability:

By default, the old root volume is preserved. You can pass the –-delete-replaced-root-volume option to delete it automatically. Do not forget to delete old volumes and their corresponding Amazon EBS Snapshots when you don’t need them anymore to avoid being charged for them.
During the replacement, the instance will be unable to respond to health checks and hence might be marked as unhealthy if placed inside an Auto Scaled Group. You can write a custom health check to change that behavior.
When replacing the root volume with an AMI, the AMI must have the same product code, billing information, architecture type, and virtualization type as that of the instance.
When replacing the root volume with a snapshot, you must use snapshots from the same lineage as the instance’s current root volume.
The size of the new volume is the largest of the AMI’s block device mapping and the size of the old Amazon EBS root volume.
Any non-root Amazon EBS volume stays attached to the instance.
Finally, the content of the instance store (the internal SSD drive) is untouched, and all other meta-data of the instance are unmodified (the IP addresses, ENI, IAM policies etc.).

Pricing and availability

Replace Root Volume for EC2 Mac is available in all AWS Regions where Amazon EC2 Mac instances are available. There is no additional cost to use this capability. You are charged for the storage consumed by the Amazon EBS Snapshots and AMIs.

Check other options available on the API or AWS CLI and go configure your first root volume replacement task today!

New – NVMe Reservations for Amazon Elastic Block Store io2 Volumes

2023-09-19 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-nvme-reservations-for-amazon-elastic-block-store-io2-volumes/

Amazon Elastic Block Store (Amazon EBS) io2 and io2 Block Express volumes now support storage fencing using NVMe reservations. As I learned while writing this post, storage fencing is used to regulate access to storage for a compute or database cluster, ensuring that just one host in the cluster has permission to write to the volume at any given time. For example, you can set up SQL Server Failover Cluster Instances (FCI) and get higher application availability within a single Availability Zone without the need for database replication.

As a quick refresher, io2 Block Express volumes are designed to meet the needs of the most demanding I/O-intensive applications running on Nitro-based Amazon Elastic Compute Cloud (Amazon EC2) instances. Volumes can be as big as 64 TiB, and deliver SAN-like performance with up to 256,000 IOPS/volume and 4,000 MB/second of throughput, all with 99.999% durability and sub-millisecond latency. The volumes support other advanced EBS features including encryption and Multi-Attach, and can be reprovisioned online without downtime. To learn more, you can read Amazon EBS io2 Block Express Volumes with Amazon EC2 R5b Instances Are Now Generally Available.

Using Reservations
To make use of reservations, you simply create an io2 volume with Multi-Attach enabled, and then attach it to one or more Nitro-based EC2 instances (see Provisioned IOPS Volumes for a full list of supported instance types):

If you have existing io2 Block Express volumes, you can enable reservations by detaching the volumes from all of the EC2 instances, and then reattaching them. Reservations will be enabled as soon as you make the first attachment. If you are running Windows Server using AMIs data-stamped 2023.08 or earlier you will need to install the aws_multi_attach driver as described in AWS NVMe Drivers for Windows Instances.

Things to Know
Here are a couple of things to keep in mind regarding NVMe reservations:

Operating System Support – You can use NVMe reservations with Windows Server (2012 R2 and above, 2016, 2019, and 2022), SUSE SLES 12 SP3 and above, RHEL 8.3 and above, and Amazon Linux 2 & later (read NVMe reservations to learn more).

Cluster and Volume Managers – Windows Server Failover Clustering is supported; we are currently working to qualify other cluster and volume managers.

Charges – There are no additional charges for this feature. Each reservation counts as an I/O operation.

— Jeff;

AWS Weekly Roundup: Farewell EC2-Classic, EBS at 15 Years, and More (Sept. 4, 2023)

2023-09-05 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-farewell-ec2-classic-ebs-at-15-years-and-more-sept-4-2023/

Last week, there was some great reading about Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Block Store (Amazon EBS) written by AWS tech leaders.

Dr. Werner Vogels wrote Farewell EC2-Classic, it’s been swell, celebrating the 17 years of loyal duty of the original version that started what we now know as cloud computing. You can read how it made the process of acquiring compute resources simple, even though the stack running behind the scenes was incredibly complex.

We have come a long way since 2006, and we’re not done innovating for our customers. As celebrated in this year’s AWS Storage Day, Amazon EBS was launched 15 years ago this month. James Hamilton, SVP and distinguished engineer at Amazon, wrote Amazon EBS at 15 Years, about how the service has evolved to handle over 100 trillion I/O operations a day, and transfers over 13 exabytes of data daily.

As Dr. Werner said in his piece, “it’s a reminder that building evolvable systems is a strategy, and revisiting your architectures with an open mind is a must.” Our innovation efforts driven by customer feedback continue today, and this week is no different.

Last Week’s Launches
Here are some launches that got my attention:

Renaming Amazon Kinesis Data Analytics to Amazon Managed Service for Apache Flink – You can now use Amazon Managed Service for Apache Flink, a fully managed and serverless service for you to build and run real-time streaming applications using Apache Flink. All your existing running applications in Kinesis Data Analytics will work as-is, without any changes. To learn more, see my blog post.

Extended Support for Amazon Aurora and Amazon RDS – You can now get more time for support, up to three years, for Amazon Aurora and Amazon RDS database instances running MySQL 5.7, PostgreSQL 11, and higher major versions. This e will allow you time to upgrade to a new major version to help you meet your business requirements even after the community ends support for these versions.

Enhanced Starter Template for AWS Step Functions Workflow Studio – You can now use starter templates to streamline the process of creating and prototyping workflows swiftly, plus a new code mode, which enables builders to move easily between design and code authoring views. With the improved authoring experience in Workflow Studio, you can seamlessly alternate between a drag-and-drop visual builder experience or the new code editor so that you can pick your preferred tool to accelerate development.

To learn more, see Enhancing Workflow Studio with new features for streamlined authoring in the AWS Compute Blog.

Email Delivery History for Every Email in Amazon SES – You can now troubleshoot individual email delivery problems, confirm delivery of critical messages, and identify engaged recipients on a granular, single email basis. Email senders can investigate trends in delivery performance and see delivery and engagement status for each email sent using Amazon SES Virtual Deliverability Manager.

Response Streaming through Amazon SageMaker Real-time Inference – You can now continuously stream inference responses back to the client to help you build interactive experiences for various generative AI applications such as chatbots, virtual assistants, and music generators.

For more details on how to use response streaming along with examples, see Invoke to Stream an Inference Response and How containers should respond in the AWS documentation, and Elevating the generative AI experience: Introducing streaming support in Amazon SageMaker hosting in the AWS Machine Learning Blog.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates and news that you might have missed:

AI & Sports: How AWS & the NFL are Changing the Game – Over the last 5 years, AWS has partnered with the National Football League (NFL), helping fans better understand the game, helping broadcasters tell better stories, and helping teams use data to improve operations and player safety. Watch AWS CEO, Adam Selipsky, former NFL All-Pro Larry Fitzgerald, and the NFL Network’s Cynthia Frelund during their earlier livestream discussing the intersection of artificial intelligence and machine learning in sports.

Amazon Bedrock Story from Amazon Science – This is a good article explaining the benefits of using Amazon Bedrock to build and scale generative AI applications with leading foundation models, including Amazon’s Titan FMs, which focus on responsible AI to avoid toxic content.

Amazon EC2 Flexibility Score – This is an open source tool developed by AWS to assess any configuration used to launch instances through an Auto Scaling Group (ASG) against the recommended EC2 best practices. It converts the best practice adoption into a “flexibility score” that can be used to identify, improve, and monitor the configurations.

To learn more open-source news and updates, see this newsletter curated by my colleague Ricardo to bring you the latest open source projects, posts, events, and more.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS re:Invent – Ready to start planning your re:Invent? Browse the session catalog now. Join us to hear the latest from AWS, learn from experts, and connect with the global cloud community.

AWS Global Summits – The last in-person AWS Summit will be held in Johannesburg on Sept. 26.

AWS Community Days – Join a community-led conference run by AWS user group leaders in your region: Aotearoa (Sept. 6), Lebanon (Sept. 9), Munich (Sept. 14), Argentina (Sept. 16), Spain (Sept. 23), and Chile (Sept. 30). Visit the landing page to check out all the upcoming AWS Community Days.

CDK Day – A community-led fully virtual event on Sept. 29 with tracks in English and Spanish about CDK and related projects. Learn more at the website.

You can browse all upcoming AWS-led in-person and virtual events, and developer-focused events such as AWS DevDay.

— Channy

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Welcome to AWS Storage Day 2023

2023-08-09 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/welcome-to-aws-storage-day-2023/

Welcome to the fifth annual AWS Storage Day! This virtual event is happening today starting at 9:00 AM Pacific Time (12:00 PM Eastern Time) and is available for you to watch on the AWS On Air Twitch channel. The first AWS Storage Day was hosted in 2019, and this event has grown into an innovation day that we look forward to delivering to you every year. In last year’s Storage Day post, I wrote about the constant innovations in AWS Storage aimed at helping you put your data to work while keeping it secure and protected. This year, Storage Day is focused on storage for AI/ML, data protection and resiliency, and the benefits of moving to the cloud.

AWS Storage Day Key Themes
When it comes to storage for AI/ML, data volumes are increasing at an unprecedented rate, exploding from terabytes to petabytes and even to exabytes. With a modern data architecture on AWS, you can rapidly build scalable data lakes, use a broad and deep collection of purpose-built data services, scale your systems at a low cost without compromising performance, share data across organizational boundaries, and manage compliance, security, and governance, allowing you to make decisions with speed and agility at scale.
To train machine learning models and build Generative AI applications, you must have the right data strategy in place. So, I’m happy to see that, among the list of sessions to look forward to at the live event, the Optimize generative AI and ML with AWS Infrastructure session will discuss how you can transform your data into meaningful insights.

Whether you’re just getting started with the cloud, planning to migrate applications to AWS, or already building applications on AWS, we have resources to help you protect your data and meet your business continuity objectives. Our data protection and resiliency features and solutions can help you meet your business continuity goals and deliver disaster recovery during data loss events, across recovery point and time objectives (RPO and RTO). With the unprecedented data growth happening in the world today, determining where your data is stored, how it’s secured, and who has access to it is a higher priority than ever. Be sure to join the Protect data in AWS amid a rapidly evolving cyber landscape session to learn more.

When moving data to the cloud, you need to understand where you’re moving it for different use cases, the types of data you’re moving, and the network resources available, among other considerations. There are many reasons to move to the cloud, recently, Enterprise Strategy Group (ESG) validated that organizations reduced compute, networking, and storage costs by up to 66 percent by migrating on-premises workloads to AWS Cloud infrastructure. ESG confirmed that migrating on-premises workloads to AWS provides organizations with reduced costs, increased performance, improved operational efficiency, faster time to value, and improved business agility.
We have a number of sessions that discuss how to move to the cloud, based on your use case. I’m most looking forward to the Hybrid cloud storage and edge compute: AWS, where you need it session, which will discuss considerations for workloads that can’t fully move to the cloud.

Tune in to learn from experts about new announcements, leadership insights, and educational content related to the broad portfolio of AWS Storage services and features that address all these themes and more. Today, we have announcements related to Amazon Simple Storage Service (Amazon S3), Amazon FSx for Windows File Server, Amazon Elastic File System (Amazon EFS), Amazon FSx for OpenZFS, and more.

Let’s get into it.

15 Years of Amazon EBS
Not long ago, I was reading Jeff Barr’s post titled 15 Years of AWS Blogging! In this post, Jeff mentioned a few posts he wrote for the earliest AWS services and features. Amazon Elastic Block Store (Amazon EBS) is on this list as a service that simplifies the use of Amazon EC2.

Well, it’s been 15 years since the launch of Amazon EBS was announced, and today we celebrate 15 years of this service. If you were one of the original users who put Amazon EBS to good use and provided us with the very helpful feedback that helped us invent and simplify, iterate and improve, I’m sure you can’t believe how time flies. Today, Amazon EBS handles more than 100 trillion I/O operations daily, and over 390 million EBS volumes are created every day.

If you’re new to Amazon EBS, join us for a fireside chat with Matt Garman, Senior Vice President, Sales, Marketing, and Global Services at AWS, and learn the strategy and customer challenges behind the launch of the service in 2008. You’ll also hear from long-term EBS customer, Stripe, about its growth with EBS since Stripe was launched 12 years ago.

Amazon EBS has continuously improved its scalability and performance to support more customer workloads as the direct storage attachment for Amazon EC2 instances. With the launch of Amazon EC2 M7i instances, powered by custom 4th Generation Intel Xeon Scalable processors, on August 2, you can attach up to 128 Amazon EBS volumes, an increase from 28 on a previous generation M6i instance. The higher number of volume attachments means you can increase storage density per instance and improve resource utilization, reducing total compute cost.

You can host up to 127 containers per instance for larger database applications and scale them more cost effectively before needing to provision more instances and only pay for resources you need. With a higher number of volume attachments, you can fully utilize the memory and vCPU available on these powerful M7i instances as your database storage footprint grows. EBS is also increasing the number of multi-volume snapshots you can create, for up to 128 EBS volumes attached to an instance, enabling you to create crash-consistent backups of all volumes attached to an instance.

Join the 15 years of innovations with Amazon EBS session for a discussion about how the original vision for Amazon EBS has evolved to meet your growing demands for cloud infrastructure.

Mountpoint for Amazon S3
Now generally available, Mountpoint for Amazon S3 is a new open source file client that delivers high throughput access, lowering compute costs for data lakes on Amazon S3. Mountpoint for Amazon S3 is a file client that translates local file system API calls to S3 object API calls. Using Mountpoint for Amazon S3, you can mount an Amazon S3 bucket as a local file system on your compute instance, to access your objects through a file interface with the elastic storage and throughput of Amazon S3. Mountpoint for Amazon S3 supports sequential and random read operations on existing files, and sequential write operations for creating new files.

The Deep dive and demo of Mountpoint for Amazon S3 session demonstrates how to use the file client to access objects in Amazon S3 using file APIs, making it easier to store data at scale and maximize the value of your data with analytics and machine learning workloads. Read this blog post to learn more about Mountpoint for Amazon S3 and how to get started, including a demo.

Put Cold Storage to Work Faster with Amazon S3 Glacier Flexible Retrieval
Amazon S3 Glacier Flexible Retrieval improves data restore time by up to 85 percent, at no additional cost. Faster data restores automatically apply to the Standard retrieval tier when using Amazon S3 Batch Operations. These restores begin to return objects within minutes, so you can process restored data faster. Processing restored data in parallel with ongoing restores helps you accelerate data workflows and quickly respond to business needs. Now, whether you’re transcoding media, restoring operational backups, training machine learning models, or analyzing historical data, you can speed up your data restores from archive.

Coupled with the S3 Glacier improvements to restore throughput by up to 10 times for millions of objects announced in 2022, S3 Glacier data restores of all sizes now benefit from both faster starts and shorter completion times.

Join the Maximize the value of cold data with Amazon S3 Glacier session to learn how Amazon S3 Glacier is helping organizations of all sizes and from all industries transform their data archiving to unlock business value, increase agility, and save on storage costs. Read this blog post to learn more about the Amazon S3 Glacier Flexible Retrieval performance improvements and follow step-by-step guidance on how to get started with faster standard retrievals from S3 Glacier Flexible Retrieval.

Supporting a Broad Spectrum of File Workloads
To serve a broad spectrum of use cases that rely on file systems, we offer a portfolio of file system services, each targeting a different set of needs. Amazon EFS is a serverless file system built to deliver an elastic experience for sharing data across compute resources. Amazon FSx makes it easier and cost-effective for you to launch, run, and scale feature-rich, high-performance file systems in the cloud, enabling you to move to the cloud with no changes to your code, processes, or how you manage your data.

Power ML research and big data analytics with Amazon EFS
Amazon EFS offers serverless and fully scalable file storage, designed for high scalability in both storage capacity and throughput performance. Just last week, we announced enhanced support for faster read and write IOPS, making it easier to power more demanding workloads. We’ve improved the performance capabilities of Amazon EFS by adding support for up to 55,000 read IOPS and up to 25,000 write IOPS per file system. These performance enhancements help you to run more demanding workflows, such as machine learning (ML) research with KubeFlow, financial simulations with IBM Symphony, and big data processing with Domino Data Lab, Hadoop, and Spark.

Join the Build and run analytics and SaaS applications at scale session to hear how recent Amazon EFS performance improvements can help power more workloads.

Multi-AZ file systems on Amazon FSx for OpenZFS
You can now use a multi-AZ deployment option when creating file systems on Amazon FSx for OpenZFS, making it easier to deploy file storage that spans multiple AWS Availability Zones to provide multi-AZ resilience for business-critical workloads. With this launch, you can take advantage of the power, agility, and simplicity of Amazon FSx for OpenZFS for a broader set of workloads, including business-critical workloads like database, line-of-business, and web-serving applications that require highly available shared storage that spans multiple AZs.

The new multi-AZ file systems are designed to deliver high levels of performance to serve a broad variety of workloads, including performance-intensive workloads such as financial services analytics, media and entertainment workflows, semiconductor chip design, and game development and streaming, up to 21 GB per second of throughput and over 1 million IOPS for frequently accessed cached data, and up to 10 GB per second and 350,000 IOPS for data accessed from persistent disk storage.

Join the Migrate NAS to AWS to reduce TCO and gain agility session to learn more about multi-AZs with Amazon FSx for OpenZFS.

New, Higher Throughput Capacity Levels on Amazon FSx for Windows File Server
Performance improvements for Amazon FSx for Windows File Server help you accelerate time-to-results for performance-intensive workloads such as SQL Server databases, media processing, cloud video editing, and virtual desktop infrastructure (VDI).

We’re adding four new, higher throughput capacity levels to increase the maximum I/O available up to 12 GB per second from the previous I/O of 2 GB per second. These throughput improvements come with correspondingly higher levels of disk IOPS, designed to deliver an increase up to 350,000 IOPS.

In addition, by using FSx for Windows File Server, you can provision IOPS higher than the default 3 IOPS per GiB for your SSD file system. This allows you to scale SSD IOPS independently from storage capacity, allowing you to optimize costs for performance-sensitive workloads.

Join the Migrate NAS to AWS to reduce TCO and gain agility session to learn more about the performance improvements for Amazon FSx for Windows File Server.

Logically Air-Gapped Vault for AWS Backup
AWS Backup is a fully managed, policy-based data protection solution that enables customers to centralize and automate backup restores across 19 AWS services (spanning compute, storage, and databases) and third-party applications such as VMware Cloud on AWS and on-premises, as well as SAP HANA on Amazon EC2.

Today, we’re announcing the preview of logically air-gapped vault as a new type of AWS Backup Vault that acts as an additional layer of protection to mitigate against malware events. With logically air-gapped vault, customers can recover their application data through a different trusted account.

Join the Deep dive on data recovery for ransomware events session to learn more about logically air-gapped vault for AWS Backup.

Copy Data to and from Other Clouds with AWS DataSync
AWS DataSync is an online data movement and discovery service that simplifies data migration and helps you quickly, easily, and securely transfer your file or object data to, from, and between AWS storage services. In addition to support of data migration to and from AWS storage services, DataSync supports copying to and from other clouds such as Google Cloud Storage, Azure Files, and Azure Blob Storage. Using DataSync, you can move your object data at scale between Amazon S3 compatible storage on other clouds and AWS storage services such as Amazon S3. We’re now expanding the support of DataSync for copying data to and from other clouds to include DigitalOcean Spaces, Wasabi Cloud Storage, Backblaze B2 Cloud Storage, Cloudflare R2 Storage, and Oracle Cloud Storage.

Join the Identify and accelerate data migrations at scale session to learn more about this expanded support for DataSync.

Join Us Online
Join us today for the AWS Storage Day virtual event on the AWS On Air channel on Twitch. The event will be live starting at 9:00 AM Pacific Time (12:00 PM Eastern Time) on August 9. All sessions will be available on demand approximately two days after Storage Day.

We look forward to seeing you on Twitch!

– Veliswa

AWS Cloud service considerations for designing multi-tenant SaaS solutions

2023-08-04 Dennis Greene

Post Syndicated from Dennis Greene original https://aws.amazon.com/blogs/architecture/aws-cloud-service-considerations-for-designing-multi-tenant-saas-solutions/

An increasing number of software as a service (SaaS) providers are considering the move from single to multi-tenant to utilize resources more efficiently and reduce operational costs. This blog aims to inform customers of considerations when evaluating a transformation to multi-tenancy in the Amazon Web Services (AWS) Cloud. You’ll find valuable information on how to optimize your cloud-based SaaS design to reduce operating expenses, increase resiliency, and offer a high-performing experience for your customers.

Single versus multi-tenancy

In a multi-tenant architecture, resources like compute, storage, and databases can be shared among independent tenants. In contrast, a single-tenant architecture allocates exclusive resources to each tenant.

Let’s consider a SaaS product that needs to support many customers, each with their own independent deployed website. Using a single-tenant model (see Figure 1), the SaaS provider may opt to utilize a dedicated AWS account to host each tenant’s workloads. To contain their respective workloads, each tenant would have their own Amazon Elastic Compute Cloud (Amazon EC2) instances organized within an Auto Scaling group. Access to the applications running in these EC2 instances would be done via an Application Load Balancer (ALB). Each tenant would be allocated their own database environment using Amazon Relational Database Service (RDS). The website’s storage (consisting of PHP, JavaScript, CSS, and HTML files) would be provided by Amazon Elastic Block Store (EBS) volumes attached to the EC2 instances. The SaaS provider would have a control plane AWS account used to create and modify these tenant-specific accounts.

Figure 1. Single-tenant configuration

To transition to a multi-tenant pattern, the SaaS provider can use containerization to package each website, and a container orchestrator to deploy the websites across shared compute nodes (EC2 instances). Kubernetes can be employed as a container orchestrator, and a website would then be represented by a Kubernetes deployment and its associated pods. A Kubernetes namespace would serve as the logical encapsulation of the tenant-specific resources, as each tenant would be mapped to one Kubernetes namespace. The Kubernetes HorizontalPodAutoscaler can be utilized for autoscaling purposes, dynamically adjusting the number of replicas in the deployment on a given namespace based on workload demands.

When additional compute resources are required, tools such as the Cluster Autoscaler, or Karpenter, can dynamically add more EC2 instances to the shared Kubernetes Cluster. An ALB can be reused by multiple tenants to route traffic to the appropriate pods. For RDS, SaaS providers can use tenant-specific database schemas to separate tenant data. For static data, Amazon Elastic File System (EFS) and tenant-specific directories can be employed. The SaaS provider would still have a control plane AWS account that would now interact with the Kubernetes and AWS APIs to create and update tenant-specific resources.

This transition to a multi-tenant design utilizing Kubernetes, Amazon Elastic Kubernetes Service (EKS), and other managed services offers numerous advantages. It enables efficient resource utilization by leveraging containerization and auto-scaling capabilities, reducing costs, and optimizing performance (see Figure 2).

Figure 2. Multi-tenant configuration

EKS cluster sizing and customer segmentation considerations in multi-tenancy designs

A high concentration of SaaS tenants hosted within the same system results in a large “blast radius.” This means a failure within the system has the potential to impact all resident tenants. This situation can lead to downtime for multiple tenants at once. To address this problem, SaaS providers are encouraged to partition their customers amongst multiple AWS accounts, each with their own deployments of this multi-tenant architecture. The number of tenants that can be present in a single cluster is a determination that can only be made by the SaaS provider after weighing the risks. Compare the shared fate of some subset of their customers, against the possible efficiency benefits of a multi-tenant architecture.

EKS security

SaaS providers must evaluate whether it’s appropriate for them to make use of containers as a workload isolation boundary. This is of particular importance in multi-tenant Kubernetes architectures, given that containers running on a single Amazon EC2 instance will share the underlying Linux kernel. Security vulnerabilities place this shared resource (the EC2 instance) at risk from attack vectors from the host Linux instance. Risk is elevated when any container running in a Kubernetes Pod cluster initiates untrusted code. This risk is heightened if SaaS providers permit tenants to “bring their code”. Kubernetes is a single tenant orchestrator, but with a multi-tenant approach to SaaS architectures, a single instance of the Amazon EKS control plane will be shared among all the workloads running within a cluster. Amazon EKS considers the cluster as the hard isolation security boundary. Every Amazon EKS managed Kubernetes cluster is isolated in a dedicated single-tenant Amazon VPC. At present, hard multi-tenancy can only be implemented by provisioning a unique cluster for each tenant.

EFS considerations

A SaaS provider may consider EFS as the storage solution for the static content of the multiple tenants. This provides them with a straightforward, serverless, and elastic file system. Directories may be used to separate the content for each tenant. While this approach of creating tenant-specific directories in EFS provides many benefits, there may be challenges harvesting per-tenant utilization and performance metrics. This can result in operational challenges for providers that need to granularly meter per-tenant usage of resources. Consequently, noisy neighbors will be difficult to identify and remediate. To resolve this, SaaS providers should consider building a custom solution to monitor the individual tenants in the multi-tenant file system by leveraging storage and throughput/IOPS metrics.

RDS considerations

Multi-tenant workloads, where data for multiple customers or end users is consolidated in the same RDS database cluster, can present operational challenges regarding per-tenant observability. Both MySQL Community Edition and open-source PostgreSQL have limited ability to provide per-tenant observability and resource governance. AWS customers operating multi-tenant workloads often use a combination of ‘database’ or ‘schema’ and ‘database user’ accounts as substitutes. AWS customers should use alternate mechanisms to establish a mapping between a tenant and these substitutes. This will give you the ability to process raw observability data from the database engine externally. You can then map these substitutes back to tenants, and distinguish tenants in the observability data.

Conclusion

In this blog, we’ve shown what to consider when moving to a multi-tenancy SaaS solution in the AWS Cloud, how to optimize your cloud-based SaaS design, and some challenges and remediations. Invest effort early in your SaaS design strategy to explore your customer requirements for tenancy. Work backwards from your SaaS tenants end goals. What level of computing performance do they require? What are the required cyber security features? How will you, as the SaaS provider, monitor and operate your platform with the target tenancy configuration? Your respective AWS account team is highly qualified to advise on these design decisions. Take advantage of reviewing and improving your design using the AWS Well-Architected Framework. The tenancy design process should be followed by extensive prototyping to validate functionality before production rollout.

Related information

Prime Day 2023 Powered by AWS – All the Numbers

2023-08-02 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/prime-day-2023-powered-by-aws-all-the-numbers/

As part of my annual tradition to tell you about how AWS makes Prime Day possible, I am happy to be able to share some chart-topping metrics (check out my 2016, 2017, 2019, 2020, 2021, and 2022 posts for a look back).

This year I bought all kinds of stuff for my hobbies including a small drill press, filament for my 3D printer, and irrigation tools. I also bought some very nice Alphablock books for my grandkids. According to our official release, the first day of Prime Day was the single largest sales day ever on Amazon and for independent sellers, with more than 375 million items purchased.

Prime Day by the Numbers
As always, Prime Day was powered by AWS. Here are some of the most interesting and/or mind-blowing metrics:

Amazon Elastic Block Store (Amazon EBS) – The Amazon Prime Day event resulted in an incremental 163 petabytes of EBS storage capacity allocated – generating a peak of 15.35 trillion requests and 764 petabytes of data transfer per day. Compared to the previous year, Amazon increased the peak usage on EBS by only 7% Year-over-Year yet delivered +35% more traffic per day due to efficiency efforts including workload optimization using Amazon Elastic Compute Cloud (Amazon EC2) AWS Graviton-based instances. Here’s a visual comparison:

AWS CloudTrail – AWS CloudTrail processed over 830 billion events in support of Prime Day 2023.

Amazon DynamoDB – DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 126 million requests per second.

Amazon Aurora – On Prime Day, 5,835 database instances running the PostgreSQL-compatible and MySQL-compatible editions of Amazon Aurora processed 318 billion transactions, stored 2,140 terabytes of data, and transferred 836 terabytes of data.

Amazon Simple Email Service (SES) – Amazon SES sent 56% more emails for Amazon.com during Prime Day 2023 vs. 2022, delivering 99.8% of those emails to customers.

Amazon CloudFront – Amazon CloudFront handled a peak load of over 500 million HTTP requests per minute, for a total of over 1 trillion HTTP requests during Prime Day.

Amazon SQS – During Prime Day, Amazon SQS set a new traffic record by processing 86 million messages per second at peak. This is 22% increase from Prime Day of 2022, where SQS supported 70.5M messages/sec.

Amazon Elastic Compute Cloud (EC2) – During Prime Day 2023, Amazon used tens of millions of normalized AWS Graviton-based Amazon EC2 instances, 2.7x more than in 2022, to power over 2,600 services. By using more Graviton-based instances, Amazon was able to get the compute capacity needed while using up to 60% less energy.

Amazon Pinpoint – Amazon Pinpoint sent tens of millions of SMS messages to customers during Prime Day 2023 with a delivery success rate of 98.3%.

Prepare to Scale
Every year I reiterate the same message: rigorous preparation is key to the success of Prime Day and our other large-scale events. If you are preparing for a similar chart-topping event of your own, I strongly recommend that you take advantage of AWS Infrastructure Event Management (IEM). As part of an IEM engagement, my colleagues will provide you with architectural and operational guidance that will help you to execute your event with confidence!

— Jeff;

The example

Overview

Initial data collection

First phase: system level

Second phase: CPU core level

Addressing CPU Core bottlenecks

Code sparsity

Instruction TLB misses/1000 instructions (c7g.16xl)

Instructions per clock cycle (IPC)

Results

Re-examining the setup of our testing enviornment: Changing where the load-generator lives

Branch prediction misses/1000 instructions

IPC

Code sparsity

Conclusion

Let’s see how Replace Root Volume works

Additional thoughts

Pricing and availability

Single versus multi-tenancy

EKS cluster sizing and customer segmentation considerations in multi-tenancy designs

EKS security

EFS considerations

RDS considerations

Conclusion

The collective thoughts of the interwebz