How Cloudflare is staying ahead of the AMD Zen vulnerability known as “Zenbleed”

Post Syndicated from Derek Chamorro original http://blog.cloudflare.com/zenbleed-vulnerability/

How Cloudflare is staying ahead of the AMD Zen vulnerability known as “Zenbleed”

How Cloudflare is staying ahead of the AMD Zen vulnerability known as “Zenbleed”

Google Project Zero revealed a new flaw in AMD's Zen 2 processors in a blog post today. The 'Zenbleed' flaw affects the entire Zen 2 product stack, from AMD's EPYC data center processors to the Ryzen 3000 CPUs, and can be exploited to steal sensitive data stored in the CPU, including encryption keys and login credentials. The attack can even be carried out remotely through JavaScript on a website, meaning that the attacker need not have physical access to the computer or server.

Cloudflare’s network includes servers using AMD’s Zen line of CPUs. We have patched our entire fleet of potentially impacted servers with AMD’s microcode to mitigate this potential vulnerability. While our network is now protected from this vulnerability, we will continue to monitor for any signs of attempted exploitation of the vulnerability and will report on any attempts we discover in the wild. To better understand the Zenbleed vulnerability, read on.

Background

Understanding how a CPU executes programs is crucial to comprehending the attack's workings. The CPU works with an arithmetic processing unit called the ALU. The ALU is used to perform mathematical tasks. Operations like addition, multiplication, and floating-point calculations fall under this category. The CPU's clock signal controls the application-specific digital circuitry that the ALU uses to carry out these functions.

For data to reach  the ALU, it has to pass through a series of storage systems. These include secondary memory, primary memory, cache memory, and CPU registers. Since the registers of the CPU are the target of this attack, we will go into a little more depth. Depending on the design of the computer, the CPU registers can store either 32 or 64 bits of information. The ALU can access the data in these registers and complete the operation.

As the demands on CPUs have increased, there has been a need for faster ways to perform calculations. Advanced Vector Extensions (or AVX) were developed to speed up the processing of large data sets by applications. AVX are extensions to the x86 instruction set architecture, which are relevant to x86-based CPUs from Intel and AMD. With the help of compatible software and the extra instruction set, compatible processors could handle more complex tasks. The primary motivation for developing this instruction set was to speed up operations associated with data compression, image processing, and cryptographic computations.

The vector data used by AVX instructions is stored in 16 YMM registers, each of which is 256 bits in size. The Y-register in the XMM register set is where the 128-bit values are stored, hence the name. Instructions from the arithmetic, logic, and trigonometry families of the AVX standard all make use of the YMM registers. They can also be used to keep masks, data that is used to filter out certain vector components.

Vectorized operations can be executed with great efficiency using the YMM registers. Applications that process large amounts of data stand to gain significantly from them, but they are increasingly the focus of malicious activity.

The attack

Speculative execution attacks have previously been used to compromise CPU registers. These are an attack variant that takes advantage of the speculative execution capabilities of modern CPUs. Computer processors use a method called speculative execution to speed up processing times. A CPU will execute an instruction speculatively if it has no way of knowing whether or not it will be executed. If it turns out that the CPU was unable to carry out the instruction, it will simply discard the data.

Because of their potential use for storing private information, AVX registers are especially susceptible to these kinds of attacks. Cryptographic keys and passwords, for instance, could be accessed by an attacker via a speculative execution attack on the AVX registers.

As mentioned above, Project Zero discovered a vulnerability in AMD's Zen 2-architecture-based CPUs, wherein data from another process and/or thread could be stored in the YMM registers, a 256-bit series of extended registers, potentially allowing an attacker access to sensitive information. This vulnerability is caused by a register not being written to 0 correctly under specific microarchitectural circumstances. Although this error is associated with speculative execution, it is not a side channel vulnerability.

This attack works by manipulating register files to force a mispredicted command. First, there is a trigger to XMM Register Merge Optimization2, which ironically is a hardware mitigation that can be used to protect against speculative execution attacks, followed by a register remapping (a technique used in computer processor design to resolve name conflicts between physical registers and logical registers) and then a mispredicted instruction call to vzeroupper, an instruction that is used to zero the upper half of the YMM and ZMM registers.

Since the register file is shared by all the processes running on the same physical core, this exploit can be used to eavesdrop on even the most fundamental system operations by monitoring the data being transferred between the CPU and the rest of the computer.

Fixing the bleed

Because of the exact timing for this to successfully execute, this vulnerability, CVE-2023-20593, is classified with a CVSS score of 6.5 (Medium). AMD's mitigation is implemented via the MSR register, which turns off a floating point optimization that otherwise would have allowed a move operation.

The following microcode updates have applied to our entire server fleet that contain potentially affected AMD Zen processors. We have seen no evidence of the bug being exploited and were able to patch our entire network within hours of the vulnerability’s disclosure. We will continue to monitor traffic across our network for any attempts to exploit the bug and report on our findings.