Tag Archives: EPYC

AMD EPYC Embedded 4005 Series Launched

2025-09-17 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/amd-epyc-embedded-4005-series-launched/

The AMD EPYC Embedded 4005 series was launched with a 16 core 3D V-Cache part that has 7 year availability

The post AMD EPYC Embedded 4005 Series Launched appeared first on ServeTheHome.

Picking Servers CPUs for Databases in 2025 is Still Complex

2025-09-05 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/picking-servers-cpus-for-databases-in-2025-is-still-complex-amd-oracle-microsoft/

Picking CPUs for databases is still a topic of great complexity in 2025 with CPU vendors making different chips catering to database licenses

The post Picking Servers CPUs for Databases in 2025 is Still Complex appeared first on ServeTheHome.

ASUS ESC A8A-E12U 8x AMD Instinct MI325X GPU Server Review

2025-08-21 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/asus-esc-a8a-e12u-8x-amd-instinct-mi325x-gpu-server-review/

In our ASUS ESC A8A-E12U review, we see how this neat 8x AMD Instinct MI325X system with a total of 2TB of HBM3e memory works

The post ASUS ESC A8A-E12U 8x AMD Instinct MI325X GPU Server Review appeared first on ServeTheHome.

Dell PowerEdge R6715 Review A Spiffy 1U AMD EPYC Server

2025-08-04 John Lee

Post Syndicated from John Lee original https://www.servethehome.com/dell-poweredge-r6715-liquid-cooled-1u-amd-epyc-review/

In our Dell PowerEdge R6715 review, we see what this liquid-cooled 1U AMD EPYC server offers, and it is a lot

The post Dell PowerEdge R6715 Review A Spiffy 1U AMD EPYC Server appeared first on ServeTheHome.

Gigabyte G893-ZX1-AAX2 AMD Instinct MI325X Server Review

2025-07-15 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/gigabyte-g893-zx1-aax2-amd-instinct-mi325x-server-review/

In our Gigabyte G893-ZX1-AAX2 review, we see how this 8x GPU AMD Instinct MI325X system (2TB of HBM3E!) with dual AMD EPYC CPUs works

The post Gigabyte G893-ZX1-AAX2 AMD Instinct MI325X Server Review appeared first on ServeTheHome.

ASRock Rack 2U12L2S-SIENA Review 2U AMD EPYC 8004 Siena Server

2025-06-30 John Lee

Post Syndicated from John Lee original https://www.servethehome.com/asrock-rack-2u12l2s-siena-review-2u-amd-epyc-8004-siena-server/

In our ASRock Rack 2U12L2S-SIENA review, we see how this AMD EPYC 8004 “Siena” server with one of our favoite motherboards fares

The post ASRock Rack 2U12L2S-SIENA Review 2U AMD EPYC 8004 Siena Server appeared first on ServeTheHome.

New HPE ProLiant DL325 Gen12 and DL345 Gen12 at HPE Discover with AMD EPYC

2025-06-26 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/new-hpe-proliant-dl325-gen12-and-dl345-gen12-at-hpe-discover-with-amd-epyc/

At Discover 2025, we saw the HPE ProLiant DL325 Gen12 and DL345 Gen12 systems that double the memory for HPE’s AMD EPYC 9005 Turin platforms

The post New HPE ProLiant DL325 Gen12 and DL345 Gen12 at HPE Discover with AMD EPYC appeared first on ServeTheHome.

CXL Paradigm Shift ASUS RS520QA-E13-RS8U 2U 4-Node Server Review

2025-06-09 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/cxl-paradigm-shift-asus-rs520qa-e13-rs8u-2u-4-node-amd-epyc-server-review/

We check out the ASUS RS520QA-E13-RS8U a 2U 4-node AMD EPYC 9005 server with a twist as each node has 8 additional DIMM slots via CXL

The post CXL Paradigm Shift ASUS RS520QA-E13-RS8U 2U 4-Node Server Review appeared first on ServeTheHome.

ASRock Rack EPYC4000D4U AMD EPYC 4005 Motherboard Review

2025-06-02 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/asrock-rack-epyc4000d4u-amd-epyc-4005-motherboard-review/

In our ASRock Rack EPYC4000D4U review, we see what makes this AMD EPYC 4000 and server motherboard different

The post ASRock Rack EPYC4000D4U AMD EPYC 4005 Motherboard Review appeared first on ServeTheHome.

Lenovo ThinkSystem ST45 V3 Review AMD EPYC 4000 Entry Server

2025-04-30 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/lenovo-thinksystem-st45-v3-review-amd-epyc-4000-entry-server/

In our Lenovo ThinkSystem ST45 V3 review, we see how this AMD EPYC 4000 tower server compares to the ST50 V3, offering a lot more performance

The post Lenovo ThinkSystem ST45 V3 Review AMD EPYC 4000 Entry Server appeared first on ServeTheHome.

Why One DIMM Per Channel or 1DPC Can Be Great for AMD EPYC 4004

2025-04-05 John Lee

Post Syndicated from John Lee original https://www.servethehome.com/why-one-dimm-per-channel-or-1dpc-can-be-great-for-amd-epyc-4004/

We show why you might prefer one DIMM per channel configuration using an AMD EPYC 4004 series server and 192GB of DDR5 ECC UDIMMs

The post Why One DIMM Per Channel or 1DPC Can Be Great for AMD EPYC 4004 appeared first on ServeTheHome.

Mapping Licensing for Virtualization is Cool Now

2025-03-26 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/mapping-licensing-for-virtualization-is-cool-now-amd-vmware-microsoft-windows-citrix-red-hat/

For a long time, we have been focusing a lot on the hardware costs of new processors but missing the virtualization license costs. Part of that is simply due to the number of virtualization licenses and support models. Recently, we purchased the most popular barebones server and the most popular server processor on Newegg to […]

The post Mapping Licensing for Virtualization is Cool Now appeared first on ServeTheHome.

ASUS ESC8000A-E13P Review an 8x PCIe GPU Server

2025-03-17 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/asus-esc8000a-e13p-review-an-8x-pcie-gpu-nvidia-amd-epyc-server/

The ASUS ASUS ESC8000A-E13P is a NVIDIA MGX design that we tested with 8x NVIDIA L40S GPUs, giving us 384GB of VRAM and 384 CPU cores

The post ASUS ESC8000A-E13P Review an 8x PCIe GPU Server appeared first on ServeTheHome.

HPE ProLiant DL145 Gen11 Review An AMD EPYC 8004 Edge Server

2025-03-07 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/hpe-proliant-dl145-gen11-review-an-amd-epyc-8004-edge-server-nvidia/

In our HPE ProLiant DL145 Gen11 edge server review, we see how this AMD EPYC 8004 server offers a lot in a low power envelope

The post HPE ProLiant DL145 Gen11 Review An AMD EPYC 8004 Edge Server appeared first on ServeTheHome.

ASRock Rack TURIN2D16-2T Review Dual AMD EPYC 9005 Motherboard

2025-02-28 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/asrock-rack-turin2d16-2t-review-dual-amd-epyc-9005-motherboard/

The ASRock Rack TURIN2D16-2T pushes the boundries of the EEB form factor squeezing two AMD EPYC 9004/ 9005 processors onto the motherboard

The post ASRock Rack TURIN2D16-2T Review Dual AMD EPYC 9005 Motherboard appeared first on ServeTheHome.

Compal SG720-2A AMD EPYC and AMD Instinct MI325X AI Server at SC24

2025-01-06 Eric Smith

Post Syndicated from Eric Smith original https://www.servethehome.com/compal-sg720-2a-amd-epyc-and-amd-instinct-mi325x-ai-server-at-sc24/

The Compal SG720-2A is an AMD EPYC 9005 “Turin” GPU server that houses eight AMD Instinct MI325X or MI300X GPUs

The post Compal SG720-2A AMD EPYC and AMD Instinct MI325X AI Server at SC24 appeared first on ServeTheHome.

Kaytus KR2190V3 is a Cool 2U 2 Node Server at SC24

2024-11-24 Eric Smith

Post Syndicated from Eric Smith original https://www.servethehome.com/kaytus-kr2190v3-is-a-cool-2u-2-node-server-at-sc24-intel-amd/

At SC24, we saw the Kaytus KR2190V3 which was not what we expected at all in a dual socket server making it a neat design

The post Kaytus KR2190V3 is a Cool 2U 2 Node Server at SC24 appeared first on ServeTheHome.

ASUS AMD EPYC CXL Memory Enabled Server AI and More OCP Summit 2024

2024-11-01 Eric Smith

Post Syndicated from Eric Smith original https://www.servethehome.com/asus-amd-epyc-cxl-memory-enabled-server-ai-ocp-summit-2024/

We check out four AMD EPYC 9005 servers from ASUS including AI GPU servers and the 2U 4-node ASUS RS520QA-E13-RS8U with CXL memory expansion

The post ASUS AMD EPYC CXL Memory Enabled Server AI and More OCP Summit 2024 appeared first on ServeTheHome.

Meta Brings AMD EPYC Turin to Yosemite v4

2024-10-22 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/meta-brings-amd-epyc-turin-to-yosemite-v4/

It looks like the Meta Yosemite v4 platform will have an AMD EPYC 9005 Turin module that is CXL enabled for 2025 deployment

The post Meta Brings AMD EPYC Turin to Yosemite v4 appeared first on ServeTheHome.

Analysis of the EPYC 145% performance gain in Cloudflare Gen 12 servers

2024-10-15 JQ Lau

Post Syndicated from JQ Lau original https://blog.cloudflare.com/analysis-of-the-epyc-145-performance-gain-in-cloudflare-gen-12-servers

Cloudflare’s network spans more than 330 cities in over 120 countries, serving over 60 million HTTP requests per second and 39 million DNS queries per second on average. These numbers will continue to grow, and at an accelerating pace, as will Cloudflare’s infrastructure to support them. While we can continue to scale out by deploying more servers, it is also paramount for us to develop and deploy more performant and more efficient servers.

At the heart of each server is the processor (central processing unit, or CPU). Even though many aspects of a server rack can be redesigned to improve the cost to serve a request, CPU remains the biggest lever, as it is typically the primary compute resource in a server, and the primary enabler of new technologies.

Cloudflare’s 12th Generation server with AMD EPYC 9684-X (codenamed Genoa-X) is 145% more performant and 63% more efficient. These are big numbers, but where do the performance gains come from? Cloudflare’s hardware system engineering team did a sensitivity analysis on three variants of 4th generation AMD EPYC processor to understand the contributing factors.

For the 4th generation AMD EPYC Processors, AMD offers three architectural variants:

mainstream classic Zen 4 cores, codenamed Genoa
efficiency optimized dense Zen 4c cores, codenamed Bergamo
cache optimized Zen 4 cores with 3D V-cache, codenamed Genoa-X

^{Figure 1 (from left to right): AMD EPYC 9654 (Genoa), AMD EPYC 9754 (Bergamo), AMD EPYC 9684X (Genoa-X)}

Key features common across the 4th Generation AMD EPYC processors:

Up to 12x Core Complex Dies (CCDs)
Each core has a private 1MB L2 cache
The CCDs connect to memory, I/O, and each other through an I/O die
Configurable Thermal Design Power (cTDP) up to 400W
Support up to 12 channels of DDR5-4800 1DPC
Support up to 128 lanes PCIe Gen 5

Classic Zen 4 Cores (Genoa):

Each Core Complex (CCX) has 8x Zen 4 Cores (16x Threads)
Each CCX has a shared 32 MB L3 cache (4 MB/core)
Each CCD has 1x CCX

Dense Zen 4c Cores (Bergamo):

Each CCX has 8x Zen 4c Cores (16x Threads)
Each CCX has a shared 16 MB L3 cache (2 MB/core)
Each CCD has 2x CCX

Classic Zen 4 Cores with 3D V-cache (Genoa-X):

Each CCX has 8x Zen 4 Cores (16x Threads)
Each CCX has a shared 96MB L3 cache (12 MB/core)
Each CCD has 1x CCX

For more information on 4th generation AMD EPYC Processors architecture, see: https://www.amd.com/system/files/documents/4th-gen-epyc-processor-architecture-white-paper.pdf

The following table is a summary of the specification of the AMD EPYC 7713 CPU in our Gen 11 server against the three CPU candidates, one from each variant of the 4th generation AMD EPYC Processors architecture:

CPU Model	AMD EPYC 7713	AMD EPYC 9654	AMD EPYC 9754	AMD EPYC 9684X
Series	Milan	Genoa	Bergamo	Genoa-X
# of CPU Cores	64	96	128	96
# of Threads	128	192	256	192
Base Clock	2.0 GHz	2.4 GHz	2.25 GHz	2.4 GHz
All Core Boost Clock	~2.7 GHz*	3.55 Ghz	3.1 Ghz	3.42 Ghz
Total L3 Cache	256 MB	384 MB	256 MB	1152 MB
L3 cache per core	4 MB / core	4 MB / core	2 MB / core	12 MB / core
Maximum configurable TDP	240W	400W	400W	400W

^{* AMD EPYC 7713 all core boost clock is based on Cloudflare production data, not the official specification from AMD}

cf_benchmark

Readers may remember that Cloudflare introduced cf_benchmark when we evaluated Qualcomm’s ARM chips, using it as our first pass benchmark to shortlist AMD’s Rome CPU for our Gen 10 servers and to evaluate our chosen ARM CPU Ampere Altra Max against AWS Graviton 2. Likewise, we ran cf_benchmark against the three candidate CPUs for our 12th Gen servers: AMD EPYC 9654 (Genoa), AMD EPYC 9754 (Bergamo), and AMD EPYC 9684X (Genoa-X). The majority of cf_benchmark workloads are compute bound, and given more cores or higher CPU frequency, they score better. The graph and the table below show the benchmark performance comparison of the three CPU candidates with Genoa 9654 as the baseline, where > 1.00x indicates better performance.

	Genoa 9654 (baseline)	Bergamo 9754	Genoa-X 9684X
openssl_pki	1.00x	1.16x	1.01x
openssl_aead	1.00x	1.20x	1.01x
luajit	1.00x	0.86x	1.00x
brotli	1.00x	1.11x	0.98x
gzip	1.00x	0.87x	1.01x
go	1.00x	1.09x	1.00x

Bergamo 9754 with 128 cores scores better in openssl_pki, openssl_aead, brotli, and go benchmark suites, and performs less favorably in luajit and gzip benchmark suites. Genoa-X 9684X (with significantly more L3 cache) doesn’t offer a significant boost in performance for these compute-bound benchmarks.

These benchmarks are representative of some of the common workloads Cloudflare runs, and are useful in identifying software scaling issues, system configuration bottlenecks, and the impact of CPU design choices on workload-specific performance. However, the benchmark suite is not an exhaustive list of all workloads Cloudflare runs in production, and in reality, the workloads included in the benchmark suites are almost certainly not the exclusive workload running on the CPU. In short, though benchmark results can be informative, they do not represent a good indication of production performance when a mix of these workloads run on the same processor.

Performance simulation

To get an early indication of production performance, Cloudflare has an internal performance simulation tool that exercises our software stack to fetch a fixed asset repeatedly. The simulation tool can be configured to fetch a specified fixed-size asset and configured to include or exclude services like WAF or Workers in the request path. Below, we show the simulated performance between the three CPUs for an asset size of 10 KB, where >1.00x indicates better performance.

	Milan 7713	Genoa 9654	Bergamo 9754	Genoa-X 9684X
Lab simulation performance multiplier	1.00x	2.20x	1.95x	2.75x

Based on these results, Bergamo 9754, which has the highest core count, but smallest L3 cache per core, is least performant among the three candidates, followed by Genoa 9654. The Genoa-X 9684X with the largest L3 cache per core is the most performant. This data suggests that our software stack is very sensitive to L3 cache size, in addition to core count and CPU frequency. This is interesting and worth a deep dive into a sensitivity analysis of our workload against a few (high level) CPU design points, especially core scaling, frequency scaling, and L2/L3 cache sizes scaling.

Sensitivity analysis

Core sensitivity

Number of cores is the headline specification that practically everyone talks about, and one of the easiest improvements CPU vendors can make to increase performance per socket. The AMD Genoa 9654 has 96 cores, 50% more than the 64 cores available on the AMD Milan 7713 CPUs that we used in our Gen 11 servers. Is more always better? Does Cloudflare’s primary workload scale with core count and effectively utilize all available cores?

The figure and table below shows the result of a core scaling experiment performed on an AMD Genoa 9654 configured with 96 cores, 80 cores, 64 cores, and 48 cores, which was done by incrementally disabling 2x CCD (8 cores/CCD) at each step. The result is GREAT, as Cloudflare’s simulated primary workload scales linearly with core count on AMD Genoa CPUs.

Core count	Core increase	Performance increase
48	1.00x	1.00
64	1.33x	1.39x
80	1.67x	1.71x
96	2.00x	2.05x

TDP sensitivity

Thermal Design Power (TDP), is the maximum amount of heat generated by a CPU that the cooling system is designed to dissipate, but more commonly refers to the power consumption of the processor under the maximum theoretical loads. AMD Genoa 9654’s default TDP is 360W, but can be configured up to 400W TDP. Is more always better? Does Cloudflare continue to see meaningful performance improvement up to 400W, or does performance stagnate at some point?

The chart below shows the result of sweeping the TDP of the AMD Genoa 9654 (in power determinism mode) from 240W to 400W. (Note: x-axis step size is not linear).

Cloudflare’s simulated primary workload continues to see incremental performance improvements up to the maximum configurable 400W, albeit at a less favorable perf/watt ratio.

Looking at TDP sensitivity data is a quick and easy way to identify if performance stagnates at some power point, but what does power sensitivity actually measure? There are several factors contributing to CPU power consumption, but let’s focus on one of the primary factors: dynamic power consumption. Dynamic power consumption is approximately CV²f, where C is the switched load capacitance, V is the regulated voltage, and f is the frequency. In modern processors like the AMD Genoa 9654, the CPU dynamically scales its voltage along with frequency, so theoretically, CPU dynamic power is loosely proportional to f³. In other words, measuring TDP sensitivity is measuring the frequency sensitivity of a workload. Does the data agree? Yes!

cTDP	All core boost frequency (GHz)	Perf (rps) / baseline
240	2.47	0.78x
280	2.75	0.87x
320	2.93	0.93x
340	3.13	0.97x
360	3.3	1.00x
380	3.4	1.03x
390	3.465	1.04x
400	3.55	1.05x

Frequency sensitivity

Instead of relying on an indirect measure through the TDP, let’s measure frequency sensitivity directly by sweeping the maximum boost frequency.

At above 3GHz, the data shows that Cloudflare’s primary workload sees roughly 2% incremental improvement for every 0.1GHz all core average frequency increment. We hit the 400W power cap at 3.545GHz. This is notably higher than the typical all core boost frequency that Cloudflare Gen 11 servers with AMD Milan 7713 at 2.7GHz see in production, or at 2.4GHz in our performance simulation, which is amazing!

L3 cache size sensitivity

What about L3 cache size sensitivity? L3 cache size is one of the primary design choices and major differences between the trio of Genoa, Bergamo, and Genoa-X. Genoa 9654 has 4 MB L3/core, Bergamo 9754 has 2 MB L3/core, and Genoa-X has 12 MB L3/core. L3 cache is the last and largest “memory” bank on-chip before having to access memory on DIMMs outside the chip that would take significantly more CPU cycles.

We ran an experiment on the Genoa 9654 to check how performance scales with L3 cache size. L3 cache size per core is reduced through MSR writes (but could also be done using Intel RDT) and L3 cache per core is increased by disabling physical cores in a CCD (which reduces the number of cores sharing the fixed size 32 MB L3 cache per CCD effectively growing the L3 cache per core). Below is the result of the experiment, where >1.00x indicates better performance:

L3 cache size increase vs baseline 4MB per core	0.25x	0.5x	0.75x	1x	1.14x	1.33x	1.60x	2.00x
rps/core / baseline	0.67x	0.78x	0.89x	1.00x	1.08x	1.15x	1.25x	1.31x
L3 cache miss rate per CCD	56.04%	39.15%	30.37%	23.55%	22.39%	19.73%	16.94%	14.28%

Even though the expectation was that the impact of a different L3 cache size gets diminished by the faster DDR5 and larger memory bandwidth, Cloudflare’s simulated primary workload is quite sensitive to L3 cache size. The L3 cache miss rate dropped from 56% with only 1 MB L3 per core, to 14.28% with 8 MB L3/core. Changing the L3 cache size by 25% affects the performance by approximately 11%, and we continue to see performance increase to 2x L3 cache size, though the performance increase starts to diminish when we get to 2x L3 cache per core.

Do we see the same behavior when comparing Genoa 9654, Bergamo 9754 and Genoa-X 9684X? We ran an experiment comparing the impact of L3 cache size, controlling for core count and all core boost frequency, and we also saw significant deltas. Halving the L3 cache size from 4 MB/core to 2 MB/core reduces performance by 24%, roughly matching the experiment above. However, increasing the cache 3x from 4 MB/core to 12 MB/core only increases performance by 25%, less than the indication provided by previous experiments. This is likely because the performance gain we saw on experiment result above could be partially attributed to less cache contention due to reduced number of cores based on how we set up the test. Nevertheless, these are significant deltas!

L3/core	2MB/core	4MB/core	12MB/core
Perf (rps) / baseline	0.76x	1x	1.25x

Putting it all together

The table below summarizes how each factor from sensitivity analysis above contributes to the overall performance gain. There are an additional 6% to 14% of unaccounted performance improvement that are contributed by other factors like larger L2 cache, higher memory bandwidth, and miscellaneous CPU architecture changes that improve IPC.

	Milan 7713	Genoa 9654	Bergamo 9754	Genoa-X 9684X
Lab simulation performance multiplier	1x	2.2x	1.95x	2.75x
Performance multiplier due to Core scaling	1x	1.5x	2x	1.5x
Performance multiplier due to Frequency scaling *(Note: Milan 7713 all core frequency is ~2.4GHz when running simulated workload at 100% CPU utilization)**	1x	1.32x	1.21x	1.29x
Performance multiplier due to L3 cache size scaling	1x	1x	0.76x	1.25x
Performance multiplier due to other factors like larger L2 cache, higher memory bandwidth, miscellaneous CPU architecture changes that improve IPC	1x	1.11x	1.06x	1.14x

Performance evaluation in production

How do these CPU candidates perform with real-world traffic and an actual production workload mix? The table below summarizes the performance of the three CPUs in lab simulation and in production. Genoa-X 9684X continues to outperform in production.

In addition, the Gen 12 server equipped with Genoa-X offered outstanding performance but only consumed 1.5x more power per system than our Gen 11 server with Milan 7713. In other words, we see a 63% increase in performance per watt. Genoa-X 9684X provides the best TCO improvement among the 3 options, and was ultimately chosen as the CPU for our Gen 12 server.

	Milan 7713	Genoa 9654	Bergamo 9754	Genoa-X 9684X
Lab simulation performance multiplier	1x	2.2x	1.95x	2.75x
Production performance multiplier	1x	2x	2.15x	2.45x
Production performance per watt multiplier	1x	1.33x	1.38x	1.63x

The Gen 12 server with AMD Genoa-X 9684X is the most powerful and the most power efficient server Cloudflare has built to date. It serves as the underlying platform for all the incredible services that Cloudflare offers to our customers globally, and will help power the growth of Cloudflare infrastructure for the next several years with improved cost structure.

Hardware engineers at Cloudflare work closely with our infrastructure engineering partners and externally with our vendors to design and develop world-class servers to best serve our customers.

Come join us at Cloudflare to help build a better Internet!

Noise