Tag Archives: AMD

Recently we announced the availability of Amazon Elastic Compute Cloud (Amazon EC2) R8a instances, the latest addition to the AMD memory-optimized instance family. These instances are powered by the 5th Generation AMD EPYC (codename Turin) processors with a maximum frequency of 4.5 GHz. In this post I take these instances for a spin and benchmark MySQL later on, but first I discuss the top things you should know about these instances.

Notable characteristics of R8a instances

Each vCPU on an R8a instance corresponds to a physical CPU core (something we started on 7th generation AMD instances). This means that there is no simultaneous multi-threading (SMT). Each vCPU mapped to a dedicated physical core, which means that you get more predictable and consistent performance because there’s no resource sharing or potential interference between threads, which is particularly crucial for performance-sensitive workloads where consistent latency is essential. When evaluating and adopting R8a instances, make sure that you’re re-evaluating your thresholds for CPU usage. You can likely squeeze more out of each instance’s CPU without impacting any of your workload’s SLA metrics.

R8a instances feature sizes of up to 192 vCPU with 1,536 GiB RAM. The following table shows the detailed specs:

Instance size	vCPU	Memory (GiB)	Instance storage	Network bandwidth (Gbps)	EBS bandwidth (Gbps)
r8a.medium	1	8	EBS Only	Up to 12.5	Up to 10
r8a.large	2	16	EBS Only	Up to 12.5	Up to 10
r8a.xlarge	4	32	EBS Only	Up to 12.5	Up to 10
r8a.2xlarge	8	64	EBS Only	Up to 15	Up to 10
r8a.4xlarge	16	128	EBS Only	Up to 15	Up to 10
r8a.8xlarge	32	256	EBS Only	15	10
r8a.12xlarge	48	384	EBS Only	22.5	15
r8a.16xlarge	64	512	EBS Only	30	20
r8a.24xlarge	96	768	EBS Only	40	30
r8a.48xlarge	192	1536	EBS Only	75	60
r8a.metal-24xl	96	768	EBS Only	40	30
r8a.metal-48xl	192	1536	EBS Only	75	60

Testing MySQL performance using HammerDB

R8a instances are a great choice for MySQL databases, so I thought that would be a great place to showcase some of these instances capabilities. To test MySQL, I used a series of scripts written by my colleagues to track MySQL performance across software versions and different EC2 instances. These scripts are stored in the repro-collection repository, which is an open source, extensible framework for performance testing that addresses real-world workloads rather than micro-benchmarks. It is built to provide a performance measurement reference usable across multiple organizations, and it’s currently centered on MySQL and actively used in discussions with Linux Kernel developers and maintainers. Furthermore, it helps track any performance impacts created by code changes to MySQL. The scripts contained in this repository set up a MySQL database to be tested, and a load generator running the HammerDB benchmark.

For this benchmark I used an r6a.24xlarge instance for the load generator, and an r6a.xlarge, r7a.xlarge, and r8a.xlarge instances for the MySQL database server all deployed in the same AWS Availability Zone (AZ). I chose a single AZ setup to minimize any latency variability from crossing multiple AZs. This is not meant to be a production-like setup, and I highly recommend using multiple AZs for production workloads. Each MySQL instance was tested separately using the same HammerDB load generator. Each test was run three times, and the results were averaged across the three runs. A diagram of the architecture is shown in the following figure:

HammerDB overall results

R8a instances show great results in the HammerDB benchmark for MySQL databases. For HammerDB’s overall score category, R8a instances outscored R7a instances by 55% and outscored R6a instances by 74%.

HammerDB transactions per minute test

R8a instances also showed a notable improvement in this category. When compared to previous generation R7a instances, R8a out performed R7a by 32%. When compared to R6a instances, R8a outperformed by 63%.

HammerDB P99 latency results

R8a instances showed improvement in P99 latency results, showing the efficiency gains driven by the new 5th Generation AMD EPYC CPUs and higher memory bandwidth. R8a shows an 14% latency reduction when compared to R7a, and a 25% latency reduction when compared to R6a.

Conclusion

Built on the AWS Nitro System using sixth generation Nitro Cards, R8a instances are ideal for high performance, memory-intensive workloads, such as SQL and NoSQL databases, as demonstrated by the bench-marking shown in this post, as well as distributed web scale in-memory caches, in-memory databases, real-time big data analytics, and Electronic Design Automation (EDA) applications. R8a instances offer 12 sizes, including 2 bare metal sizes. Amazon EC2 R8a instances are SAP-certified, and providing 38% more SAPS when compared to R7a instances. If you’re still running 6th generation R6a instances, then I highly encourage you to migrate to the 8th generation instances to use their clear price performance benefits. Staying on modern infrastructure is a great way to drive down costs and provide more features for your customers, and there are clear gains to be had based on the testing shown in this post.

Start optimizing your high performance memory intensive workloads today by migrating to R8a instances. Visit the Amazon EC2 R8a instances page to learn more and get started on your upgrades to use the increased price performance of R8a instances today!

HPE Shows off AMD EPYC Venice and SP7 Supercomputing Node at SC25

2025-11-18 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/hpe-shows-off-amd-epyc-venice-and-sp7-supercomputing-node-at-sc25/

HPE showed off its next-generation AMD EPYC “Venice” socket SP7 supercomputing platform at SC25 along with its Slingshot 400 interconnect

The post HPE Shows off AMD EPYC Venice and SP7 Supercomputing Node at SC25 appeared first on ServeTheHome.

HPE Launches New AMD EPYC Venice Instinct MI400 and NVIDIA Vera Rubin Compute Blades

2025-11-15 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/hpe-launches-new-amd-venice-instinct-mi400-and-nvidia-vera-rubin-compute-arm/

HPE has three new blades based on AMD EPYC Venice, MI430X, and NVIDIA Vera Rubin, along with Slingshot 400 for 2027 HPC

The post HPE Launches New AMD EPYC Venice Instinct MI400 and NVIDIA Vera Rubin Compute Blades appeared first on ServeTheHome.

MLPerf Training v5.1 NVIDIA Dominates while AMD Has a Strong Showing and Cisco Silicon

2025-11-14 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/mlperf-training-v5-1-nvidia-dominates-while-amd-has-a-strong-showing-and-cisco-silicon/

MLPerf Training v5.1 is out, with NVIDIA dominating. AMD showed up and performed well. Cisco Silicon One and MangoBoost DPUs made cameos

The post MLPerf Training v5.1 NVIDIA Dominates while AMD Has a Strong Showing and Cisco Silicon appeared first on ServeTheHome.

Gigabyte B343-C40-AAJ1 Review 10-Node AMD EPYC 4005 Goes High-Density

2025-11-01 Eric Smith

Post Syndicated from Eric Smith original https://www.servethehome.com/gigabyte-b343-c40-aaj1-review-10-node-amd-epyc-4005-goes-high-density/

We review the Gigabyte B343-C40-AAJ1, a 3U 10-node AMD EPYC 4005 and Ryzen 9000 server that seeks to drive up density and drive out costs

The post Gigabyte B343-C40-AAJ1 Review 10-Node AMD EPYC 4005 Goes High-Density appeared first on ServeTheHome.

ORNL Discovery and Lux Powered by HPE and AMD Announced

2025-10-27 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/ornl-discovery-and-lux-powered-by-hpe-and-amd-announced/

ORNL announced the AMD-powered Discovery and Lux systems alongside HPE’s announcement of its new supercomputing portfolio

The post ORNL Discovery and Lux Powered by HPE and AMD Announced appeared first on ServeTheHome.

AMD Helios MI450 Rack at OCP Summit with a Different Version from Meta

2025-10-17 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/amd-helios-mi450-rack-at-ocp-summit-with-a-different-version-from-meta/

At OCP Summit 2025, we saw the AMD Helios rack along with Meta’s Helios rack which was very different swapping power, networking, and more

The post AMD Helios MI450 Rack at OCP Summit with a Different Version from Meta appeared first on ServeTheHome.

Beelink GTR9 Pro Review AMD Ryzen AI Max 395 System with 128GB and dual 10GbE

2025-10-11 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/beelink-gtr9-pro-review-amd-ryzen-ai-max-395-system-with-128gb-and-dual-10gbe/

In our Beelink GTR9 Pro review, we see why this AMD Ryzen AI Max+ 395 system is fast packed with an Apple-like design and great features

The post Beelink GTR9 Pro Review AMD Ryzen AI Max 395 System with 128GB and dual 10GbE appeared first on ServeTheHome.

AMD Solarflare X4 NICs Launched for Low Latency Trading

2025-10-08 Rohit Kumar

Post Syndicated from Rohit Kumar original https://www.servethehome.com/amd-solarflare-x4-nics-launched-for-low-latency-trading/

The new AMD Solarflare X4 NICs are out for low-latency trading applications with the newest generation of Solarflare IP

The post AMD Solarflare X4 NICs Launched for Low Latency Trading appeared first on ServeTheHome.

Minisforum N5 Pro Review An Awesome NAS Platform

2025-10-07 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/minisforum-n5-pro-review-an-awesome-nas-platform/

We look at the Minisforum N5 Pro an awesome 5-bay NAS with 10GbE and a super-fast AMD Ryzen AI 9 HX Pro 370 CPU with ECC memory support

The post Minisforum N5 Pro Review An Awesome NAS Platform appeared first on ServeTheHome.

AMD and OpenAI Ink Megadeal for 6GW of Future AI Compute

2025-10-06 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/amd-and-openai-ink-megadeal-for-6gw-of-future-ai-compute/

AMD and OpenAI inked a megadeal for AI compute covering 6GW of compute including 1GW of MI450 targeting 2H 2026 deployment

The post AMD and OpenAI Ink Megadeal for 6GW of Future AI Compute appeared first on ServeTheHome.

Building New STH Studio Storage NAS

2025-10-01 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/building-new-sth-studio-nas-storage-qnap-solidigm-amd-nvidia/

We built a new 360TB studio NAS using high-capacity Solidigm SSDs, a QNAP NAS, and a host of customizations

The post Building New STH Studio Storage NAS appeared first on ServeTheHome.

AMD EPYC Embedded 4005 Series Launched

2025-09-17 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/amd-epyc-embedded-4005-series-launched/

The AMD EPYC Embedded 4005 series was launched with a 16 core 3D V-Cache part that has 7 year availability

The post AMD EPYC Embedded 4005 Series Launched appeared first on ServeTheHome.

Tuning guide for AMD Amazon EC2 instances

2025-09-12 Suyash Nadkarni

Post Syndicated from Suyash Nadkarni original https://aws.amazon.com/blogs/compute/tuning-guide-for-amd-amazon-ec2-instances/

As organizations migrate more mission-critical workloads to the cloud, optimizing for price-performance becomes a key consideration. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by AMD EPYC processors deliver high core density, large memory bandwidth, and hardware-enabled security features, making them a strong option for a wide range of compute, memory, and I/O-intensive workloads. In this post, we explain how to choose the right AMD-based Amazon EC2 instance types and describe tuning techniques that can help users improve workload efficiency. Whether you’re running simulations, large-scale analytics, or inference workloads, this post provides practical guidance for optimizing AMD-powered Amazon EC2 instance.

Amazon EC2 offers AMD-based instances built on multiple generations of AMD EPYC processors. This post focuses on optimization strategies for the 3rd and 4th generation families, which provide enhanced capabilities for compute and memory-intensive workloads.

3rd generation (M6a, R6a, C6a, Hpc6a): Balances compute, memory, and storage—well-suited for analytics, web servers, and high-performance computing.
4th generation (M7a, R7a, C7a, Hpc7a): Deliver up to 50% better performance over earlier AMD generations These instances introduce AVX-512 support, DDR5 memory, and Simultaneous Multithreading (SMT) turned off, SMT is a technology that allows a single physical core to run multiple threads concurrently; with SMT disabled, each virtual CPU (vCPU) maps directly to a physical core, which can improve workload isolation and consistency.

Choosing the right AMD EPYC powered Amazon EC2 instance type

Selecting the right AMD EPYC powered Amazon EC2 instance type starts with understanding how your application uses compute, memory, storage, and networking resources. Each instance family is optimized for specific workload characteristics.

Compute-intensive workloads

These workloads involve large-scale calculations, simulations, or encoding tasks, and they often need high CPU throughput and advanced instruction set support.

Recommended instances: C7a, Hpc7a, C6a, Hpc6a
Use cases: Scientific computing, financial modelling, media transcoding, encryption, machine learning (ML) inference

Big data and analytics

Applications that process and analyze large datasets benefit from high memory bandwidth and a balanced compute-to-memory ratio.

Recommended instances: R7a, M7a, R6a, M6a
Use cases: Stream processing, real-time analytics, business intelligence tools, distributed caching

Database workloads

Database workloads typically need consistent memory performance and high I/O throughput for read/write operations.

Recommended instances: R7a, M7a, R6a, M6a
Use cases: Relational databases (MySQL, PostgreSQL), NoSQL databases (MongoDB, Cassandra), in-memory databases (Redis)

Web and application servers

These applications handle variable request loads and benefit from balanced compute, memory, and network performance.

Recommended instances: C7a, M7a, C6a, M6a
Use cases: Web servers, content management systems, e-commerce platforms, API endpoints

AI/ML on CPU

ML tasks that do not need GPUs—such as inference or preprocessing—can run efficiently on CPU-based instances.

Recommended instances: M7a, R7a, C7a
Use cases: Model inference, natural language processing, computer vision, recommendation engines

High Performance Computing (HPC)

These workloads need high core counts, memory bandwidth, and low-latency networking for tightly coupled computations.

Recommended instances: Hpc7a, Hpc6a, R7a, M7a
Use cases: Computational fluid dynamics, genomics, seismic analysis, engineering simulations

Aligning your instance type with the needs of your workload helps provide predictable performance and cost efficiency. Services such as Amazon EC2 Auto Scaling and AWS Compute Optimizer can assist with ongoing instance selection and scaling decisions.

Optimizing AMD EPYC powered Amazon EC2 instances

Amazon EC2 instances powered by 4th generation AMD EPYC processors use a modular chiplet architecture, as shown in the following figure. Each processor includes multiple Core Complex Dies (CCDs), and each CCD contains one or more core complexes (CCXs). A CCX groups up to eight physical cores, with each core having 1 MB of dedicated L2 cache and all eight cores sharing a 32 MB L3 cache. These CCDs are connected to a central I/O die, which manages memory and interconnects across the chip.

Figure 1: Layout of the ‘Zen 4’ CPU die with 8 cores per die

The modular architecture of 4th generation AMD EPYC processors enables Amazon EC2 instances such as m7a.24xlarge and m7a.48xlarge to support high core counts-up to 96 physical cores per socket. For example:

m7a.24xlarge provides 96 physical cores from a single socket.
m7a.48xlarge spans two sockets, offering 192 physical cores.

Understanding how Amazon EC2 instance sizes map to physical processor layouts can help you optimize for performance and cache locality. Workloads that involve shared memory access or thread synchronization, such as high-performance computing or in-memory databases, can benefit from selecting instance sizes that minimize cross-socket communication and make efficient use of shared L3 cache, as shown in the following figure.

Figure 2: Layout of the ‘EPYC Chiplet’ CPU

Amazon EC2 instances powered by 4th generation AMD EPYC processors operate with SMT turned off. In this configuration, each vCPU maps directly to a physical core, eliminating resource sharing such as execution units and cache between sibling threads. This design can reduce intra-core interference and help provide more consistent performance under certain workloads. Users can isolate threads at the core level and observe lower variability and more stable throughput for workloads, such as high-performance computing, ML inference, and transactional databases.

CPU optimizations

Tools such as htop can help identify CPU usage patterns, system load averages, and per-process resource consumption. CPU usage should be evaluated in the context of your workload and performance requirements. If usage consistently reaches 100%, then it may indicate that the workload is CPU-bound and not optimally balanced. Before modifying the instance size, enabling Auto Scaling, or switching instance families, evaluations must be conducted for the tuning opportunities that could improve performance without changing infrastructure. Load averages that regularly exceed the number of vCPUs can also signal compute saturation and may warrant further optimization.

L3 cache usage

The L3 cache is a shared, high-speed memory layer used by a group of CPU cores. On AMD-based Amazon EC2, cores are organized into L3 cache slices, each shared by a subset of cores on the same socket. Threads scheduled within the same slice can access shared data more efficiently, reducing memory latency. On 4th generation AMD instances such as m7a.2xlarge or r7a.2xlarge, all vCPUs typically map to cores within a single L3 slice, which ensures consistent cache locality. For larger sizes (for example m7a.8xlarge and above), thread pinning—assigning threads to specific physical cores—can help maintain this locality. Thread pinning can reduce performance variability in workloads with shared-memory access patterns.

You can pin threads using the taskset command:

taskset -c 0-3 ./your_application

This example pins your application to CPU cores 0 through 3. To determine which cores share the same L3 cache region, use tools such as lscpu or lstopo to inspect the system’s CPU topology. Grouping related threads on cores that share an L3 cache can improve performance consistency for workloads with frequent shared-memory access.

Docker container optimization

In containerized environments running on AMD-based Amazon EC2 instances, tuning CPU-related settings can improve workload consistency and efficiency—particularly for compute-intensive or latency-sensitive applications. Although default configurations work for many general-purpose scenarios, certain workloads may benefit from more explicit control over how CPU resources are allocated. By default, container runtimes such as Docker allow the operating system to schedule containers across any available CPU cores. This flexible scheduling can lead to variability in performance when containers move across cores that don’t share cache. To reduce this variability and improve cache efficiency, containers can be pinned to specific cores using the --cpuset-cpus flag.

docker run --cpuset-cpus="1,3" my-container

This setting restricts the container to use only the specified cores. In this example, cores 1 and 3 are used for demonstration. The actual core selection should be based on CPU topology to make sure of cache-efficient scheduling. Pinning containers to cores that share L3 cache can reduce scheduling overhead and improve consistency for workloads with shared-memory access patterns.

CPU frequency governor settings

Some operating systems adjust CPU frequency dynamically to save power. This is typically controlled by a setting called the CPU frequency governor. Although this behavior is efficient for general-purpose workloads, it may introduce latency or performance variability in compute-sensitive environments. For workloads that need consistently high CPU performance—such as high-throughput data processing, simulations, or real-time applications—we recommend setting the CPU governor to performance mode. This makes sure that the CPU runs at its maximum frequency under load, avoiding time spent ramping up from lower power states.

You can apply this setting on bare metal instances or Amazon EC2 Dedicated Hosts using the following command:

sudo cpupower frequency-set -g performance

Before applying, consider benchmarking workload performance with other CPU frequency governors (such as ondemand or schedutil) to make sure that the performance setting provides measurable benefits without unnecessary energy trade-offs.

Use architecture-specific compiler flags

When compiling performance-sensitive C or C++ applications, architecture-specific flags such as -march=znverX can unlock AMD EPYC–specific optimizations, including improved vectorization and floating-point performance. Although this is beneficial for compute-heavy workloads, it may reduce portability across architectures. To balance performance and flexibility, consider implementing runtime feature detection and dispatching an approach used by many optimized libraries to adapt behavior based on the underlying CPU.

Before using these flags, verify that your compiler version supports them and make sure that the target EC2 instance architecture matches the specified flag. For example, a binary compiled with -march=znver4 may fail with an illegal instruction error (SIGILL) if run on earlier-generation instances such as M5a.The following table outlines the appropriate flags and minimum supported compiler versions for each AMD EPYC generation:

AMD EPYC Generation	-march Flag	Minimum GCC Version	Minimum LLVM/Clang Version
4th generation (for example M7a)	znver4	GCC 12	Clang 15
3rd generation (for example M6a)	znver3	GCC 11	Clang 13
2nd generation (for example M5a)	znver2	GCC 9	Clang 11

The following flags are supported for GCC 11+ or LLVM Clang 13+:

# 4th Gen EPYC (M7a, R7a, C7a, Hpc7a)
-march=znver4

# 3rd Gen EPYC (M6a, R6a, C6a)
-march=znver3

# 2nd Gen EPYC (M5a, R5a, C5a)
-march=znver2

When to enable AVX-512 and VNNI instructions

4th generation AMD EPYC powered Amazon EC2 instances support advanced single instruction, multiple data (SIMD) instruction sets such as AVX2, AVX-512, and VNNI. These can improve throughput for vector-heavy workloads such as ML inference, image processing, or scientific simulations. However, these flags are generation-specific—attempting to run binaries compiled with AVX-512 on unsupported instances (for example 2nd generation M5a) may result in runtime errors such as illegal instruction (SIGILL).

When compiling C or C++ code:

gcc -mavx2 -mavx512f -O2 your_program.c -o your_program

To better understand which optimizations are applied, use the following:

-ftree-vectorizer-verbose=2 -fopt-info-vec-missed

This helps identify loops that benefit from vectorization and those that don’t. Only enable these optimizations if your workload benefits and you’ve validated compatibility with the instance generation in use. Avoid applying AVX flags indiscriminately, because it may reduce portability and increase binary complexity.

AMD Optimizing CPU Libraries

The AMD Optimizing CPU Libraries (AOCL) provide performance-tuned math libraries specifically designed for AMD EPYC processors. These libraries include optimized implementations of commonly used functions in scientific computing, engineering, and ML workloads. You can link your applications against AOCL to use processor-specific optimizations without rewriting your code. AOCL includes libraries for vector and scalar math, random number generation, FFT, BLAS, and LAPACK, among others.

Setting up AOCL

Set the AOCL_ROOT environment variable to point to the installation directory:
```
export AOCL_ROOT=/path/to/aocl
```

Compile your application with the appropriate include and library paths:

gcc -I$AOCL_ROOT/include -L$AOCL_ROOT/lib -lamdlibm -lm your_program.c -o your_program

Vector and scalar math optimization: you can enable more vectorized or scalar math tuning flags for specific workloads:

# Vector math optimization
gcc -lamdlibm -fveclib=AMDLIBM -lm your_program.c -o your_program
		
# Faster scalar math
gcc -lamdlibm -fsclrlib=AMDLIBM -lamdlibmfast -lm your_program.c -o your_program

AOCL runtime profiling: AOCL supports runtime profiling, which helps developers identify which mathematical operations dominate execution time. To enable profiling, run the following:
```
export AOCL_PROFILE=1
./your_program
```

After running this, a report file named aocl_profile_report.txt is generated. It provides a function-level breakdown of call counts, execution time, and thread usage. Developers can use this to focus optimization efforts on high-impact operations.

Conclusion

This post explored how to select AMD-based Amazon EC2 instance types that align with specific workload characteristics, and how to apply tuning techniques focused on CPU usage, thread placement, cache efficiency, and math library optimization. These approaches are especially relevant for compute-bound or latency-sensitive workloads where consistent performance is critical.

Ready to get started? Sign in to the AWS Management Console and launch AMD EPYC powered Amazon EC2 instances to begin optimizing your workloads today.

Picking Servers CPUs for Databases in 2025 is Still Complex

2025-09-05 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/picking-servers-cpus-for-databases-in-2025-is-still-complex-amd-oracle-microsoft/

Picking CPUs for databases is still a topic of great complexity in 2025 with CPU vendors making different chips catering to database licenses

The post Picking Servers CPUs for Databases in 2025 is Still Complex appeared first on ServeTheHome.

MiTAC G8825Z5 AMD Instinct MI325X 8-GPU Server Review

2025-09-01 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/mitac-g8825z5-amd-instinct-mi325x-8-gpu-server-review/

In our MiTAC G8825Z5 review, we see how this 8-GPU AMD Instinct MI325X server performs and how it was made in a very neat fashion

The post MiTAC G8825Z5 AMD Instinct MI325X 8-GPU Server Review appeared first on ServeTheHome.

AMD Dives Deep on CDNA 4 Architecture and MI350 Accelerator at Hot Chips 2025

2025-08-27 Ryan Smith

Post Syndicated from Ryan Smith original https://www.servethehome.com/amd-dives-deep-on-cdna-4-architecture-and-mi350-accelerator-at-hot-chips-2025/

The second big machine learning accelerator talk of the afternoon belongs to AMD. The company’s chip architects are at this year’s show to tell the audience all about the CDNA 4 architecture, which is powering AMD’s new MI350 family of accelerators. Like it’s MI300 predecessor, AMD is using 3D die stacking to build up a […]

The post AMD Dives Deep on CDNA 4 Architecture and MI350 Accelerator at Hot Chips 2025 appeared first on ServeTheHome.

Notable characteristics of R8a instances

Testing MySQL performance using HammerDB

HammerDB overall results

HammerDB transactions per minute test

HammerDB P99 latency results

Conclusion

Choosing the right AMD EPYC powered Amazon EC2 instance type

Compute-intensive workloads

Big data and analytics

Database workloads

Web and application servers

AI/ML on CPU

High Performance Computing (HPC)

Optimizing AMD EPYC powered Amazon EC2 instances

CPU optimizations

L3 cache usage

Docker container optimization

CPU frequency governor settings

Use architecture-specific compiler flags

When to enable AVX-512 and VNNI instructions

AMD Optimizing CPU Libraries

Setting up AOCL

Conclusion

The collective thoughts of the interwebz