That title may be a bit challenging, but it is valid. With the UK-based Isambard 3 supercomputer NVIDIA Grace will have a 2.7PF supercomputer that NVIDIA is pointing out is one of the three greenest. Perhaps the bigger part of this announcement is that it is a vote of confidence for Grace. NVIDIA Notches a […]
This blog post is written by Ryan Doty Solutions Architect , AWS and Vishal Manan Sr. SSA, EC2 Graviton , AWS.
AWS customers recognize that Graviton-based EC2 instances deliver price-performance benefits but many are concerned about the effort to port existing applications. Porting code from one architecture to another can result in a substantial investment in time and effort. AWS has worked continuously to improve the migration process for customers. We recently introduced the Porting Advisor for Graviton as a tool to further simplify the migration process. In this blog, we’ll walk you through how to use Porting Advisor for Graviton so that you can learn how to use it.
Porting Advisor for Graviton is an open-source, command-line tool that analyzes source code and generates a report highlighting missing or outdated libraries and code constructs that may require modification and provides a user with alternative recommendations. It helps customers and developers accelerate their transition to Graviton-based Amazon EC2 instances by reducing the iterative process of identifying and resolving source code and library dependencies. This blog post will provide you with a step-by-step implementation on how to use Porting Advisor for Graviton. At the end of the blog, you will be able to run Porting Advisor for Graviton on your source code tree, generating findings that will help simplify the effort required to port your application.
Porting Advisor for Graviton scans for potentially unsupported or non-portable arm64 code in source code trees. The tool only scans source code files for the programming languages C/C++, Fortran, Go 1.11+, Java 8+, Python 3+, and dependency files such as project/requirements.txt file(s). Most importantly the Porting Advisor doesn’t make any code modifications, API-level recommendations, or send data back to AWS.
You can utilize Porting Advisor for Graviton either as a Python script or compiled into a binary and run on x86-64 or arm64 systems. Therefore, it can be easily implemented as part of your build processes.
Porting Advisor for Graviton reports the following issues:
Inline assembly with no corresponding arm64 inline assembly.
Assembly source files with no corresponding arm64 assembly source files.
Missing arm64 architecture detection in autoconf config.guess scripts.
Linking against libraries that aren’t available on the arm64 architecture.
Use architecture specific intrinsics.
Preprocessor errors that trigger when compiling on arm64.
Usages of old Visual C++ runtime (Windows specific).
Compiler specific code guarded by compiler specific pre-defined macros is detected, but not reported by default. The following cross-compile specific issues are detected, but not reported by default:
Architecture detection that depends on the host rather than the target.
Use of build artifacts in the build process.
Skillsets needed for using the tool
Porting Advisor for Graviton is designed to be easy to use. Users though should be versed in the following skills in order to take advantage of the recommendations the tool provides:
An understanding of the build system – project requirements and dependencies, versioning, etc.
Basic scripting language skills around Python, PowerShell/Bash.
Understanding hardware (inline assembly for C/C++) or compiler specific (intrinsic for C/C++) constructs when applicable.
The tool requires a minimum version of Python 3.10 and Java (8+ to be installed). The installation of Java is also optional and only required if you want to scan JAR files for native method calls. You can run the tool on a Windows/Linux/MacOS machine, or in an EC2 instance. I will show case usage on Windows and Amazon Linux 2(AL2) running on an EC2 instance. It supports both arm64 and x86-64 processors.
You don’t need to be on an arm64-based processor to run the tool.
You can run the tool as a Python script or an executable. The executable requires extra steps to build. However, it can be used on another machine without the need to install Python packages.
You must copy the complete “dist” folder for it to work, as there are libraries that are present in that folder along with the executable.
Porting Advisor for Graviton can be run as a binary or as a script. You must run the script to generate the binaries
The following shows how the tool would work on Windows when run as a script:
Running the Porting Advisor for Graviton on Linux as a Script
Output of the Porting Advisor for Graviton
The Porting Advisor for Graviton requires the directory parameter to point to the folder where your source code lives. If there are multiple locations, then you can run the tool as part of the script.
If no output file is supplied only standard output will be produced. The following is the output of the tool in HTML format. The first line shows the total files scanned.
If no issues are found, then you’ll see an output like the following:
With x86-64 specific intrinsics, such as _bswap64, you’ll see it flagged. arm64 specific intrinsics won’t be flagged. Therefore, if your code has arm64 specific intrinsics, then the porting advisor will only flag x86-64 to arm64 and not vice versa. The primary goal of the tool is determining the arm64 readiness of your code.
The scanner typically looks for source code files only, but it can also look for assembly files with *.s extensions. An example of a file with the C++ code with inline assembly code is as follows:
The tool has pointed out errors such as a preprocessor error and architecture-specific intrinsic usage errors.
If you don’t see any issues reported by the tool, then you are in good shape for doing the port. If no issues are reported it does not guarantee that it will work as port. Porting Advisor for Graviton is a tool used best as a helper. Regardless of the issues reported by the tool, we still highly suggest testing the application thoroughly.
As a good practice, we recommend that you use the latest version of your libraries.
Based on the issue, you can consider further actions. For Compiler intrinsic errors, we recommend studying Intel and arm64 intrinsics.
Once you’ve migrated your code and are using Gravition, you can start to look at taking steps to optimize your performance. If you’re interested in doing that please look at our Getting Started Guide.
How fast is the tool? The tool is able to scan 4048 files in 1.18 seconds.
On an arm64 Based Instance:
This tool scans all files in a source tree, regardless of whether they are included by the build system or not. Therefore, it may misreport issues in files that appear in the source tree but are excluded by the build system. Currently, the tool supports the following languages/dependencies: C/C++, Fortran, Go 1.11+, Java 8+, and Python 3+.
For example: You may have legacy code using Python version 2.7 that is in the source tree but isn’t being used. The tool will scan the code base and point out issues in that particular codebase even though you may not be using that piece of code. To mitigate, either remove that folder from the source code or ignore the error pointed by the tool.
I see mention of Ruby and .Net in the Open source tool, but they don’t work on my tool.
Ruby and .Net haven’t been implemented yet, but please consider contributing to it and open an issue requesting support. If you need support then see our CONTRIBUTING file.
Errors that you may encounter while building the tool binary:
PyInstaller needs a shared version of libraries.
Python 3.10+ not having shared libraries for use by the PyInstaller tool.
The fix for this is to build your version of Python(3.10+) with the right flags:
Porting Advisor for Graviton helps customers quantify the amount of work that is required to port an application. It accelerates your ability to transition to Graviton-based Amazon EC2 instances by reducing the iterative process of identifying and resolving source code and library dependencies.
Over the last few years, there has been a rise in the number of attacks that affect how a computer boots. Most modern computers use a specification called Unified Extensible Firmware Interface (UEFI) that defines a software interface between an operating system (e.g. Windows) and platform firmware (e.g. disk drives, video cards). There are security mechanisms built into UEFI that ensure that platform firmware can be cryptographically validated and boot securely through an application called a bootloader. This firmware is stored in non-volatile SPI flash memory on the motherboard, so it persists on the system even if the operating system is reinstalled and drives are replaced.
This creates a ‘trust anchor’ used to validate each stage of the boot process, but, unfortunately, this trust anchor is also a target for attack. In these UEFI attacks, malicious actions are loaded onto a compromised device early in the boot process. This means that malware can change configuration data, establish persistence by ‘implanting’ itself, and can bypass security measures that are only loaded at the operating system stage. So, while UEFI-anchored secure boot protects the bootloader from bootloader attacks, it does not protect the UEFI firmware itself.
Because of this growing trend of attacks, we began the process of cryptographically signing our UEFI firmware as a mitigation step. While our existing solution is platform specific to our x86 AMD server fleet, we did not have a similar solution to UEFI firmware signing for Arm. To determine what was missing, we had to take a deep dive into the Arm secure boot process.
Read on to learn about the world of Arm Trusted Firmware Secure Boot.
Arm Trusted Firmware Secure Boot
Arm defines a trusted boot process through an architecture called Trusted Board Boot Requirements (TBBR), or Arm Trusted Firmware (ATF) Secure Boot. TBBR works by authenticating a series of cryptographically signed binary images each containing a different stage or element in the system boot process to be loaded and executed. Every bootloader (BL) stage accomplishes a different stage in the initialization process:
BL1 defines the boot path (is this a cold boot or warm boot), initializes the architectures (exception vectors, CPU initialization, and control register setup), and initializes the platform (enables watchdog processes, MMU, and DDR initialization).
BL2 prepares initialization of the Arm Trusted Firmware (ATF), the stack responsible for setting up the secure boot process. After ATF setup, the console is initialized, memory is mapped for the MMU, and message buffers are set for the next bootloader.
The BL3 stage has multiple parts, the first being initialization of runtime services that are used in detecting system topology. After initialization, there is a handoff between the ATF ‘secure world’ boot stage to the ‘normal world’ boot stage that includes setup of UEFI firmware. Context is set up to ensure that no secure state information finds its way into the normal world execution state.
Each image is authenticated by a public key, which is stored in a signed certificate and can be traced back to a root key stored on the SoC in one time programmable (OTP) memory or ROM.
TBBR was originally designed for cell phones. This established a reference architecture on how to build a “Chain of Trust” from the first ROM executed (BL1) to the handoff to “normal world” firmware (BL3). While this creates a validated firmware signing chain, it has caveats:
SoC manufacturers are heavily involved in the secure boot chain, while the customer has little involvement.
A unique SoC SKU is required per customer. With one customer this could be easy, but most manufacturers have thousands of SKUs
The SoC manufacturer is primarily responsible for end-to-end signing and maintenance of the PKI chain. This adds complexity to the process requiring USB key fobs for signing.
Doesn’t scale outside the manufacturer.
What this tells us is what was built for cell phones doesn’t scale for servers.
If we were involved 100% in the manufacturing process, then this wouldn’t be as much of an issue, but we are a customer and consumer. As a customer, we have a lot of control of our server and block design, so we looked at design partners that would take some of the concepts we were able to implement with AMD Platform Secure Boot and refine them to fit Arm CPUs.
Amping it up
We partnered with Ampere and tested their Altra Max single socket rack server CPU (code named Mystique) that provides high performance with incredible power efficiency per core, much of what we were looking for in reducing power consumption. These are only a small subset of specs, but Ampere backported various features into the Altra Max notably, speculative attack mitigations that include Meltdown and Spectre (variants 1 and 2) from the Armv8.5 instruction set architecture, giving Altra the “+” designation in their ISA.
Ampere does implement a signed boot process similar to the ATF signing process mentioned above, but with some slight variations. We’ll explain it a bit to help set context for the modifications that we made.
Ampere Secure Boot
The diagram above shows the Arm processor boot sequence as implemented by Ampere. System Control Processors (SCP) are comprised of the System Management Processor (SMpro) and the Power Management Processor (PMpro). The SMpro is responsible for features such as secure boot and bmc communication while the PMpro is responsible for power features such as Dynamic Frequency Scaling and on-die thermal monitoring.
At power-on-reset, the SCP runs the system management bootloader from ROM and loads the SMpro firmware. After initialization, the SMpro spawns the power management stack on the PMpro and ATF threads. The ATF BL2 and BL31 bring up processor resources such as DRAM, and PCIe. After this, control is passed to BL33 BIOS.
At power on, the SMpro firmware reads Ampere’s public key (ROTPK) from the SMpro key certificate in SCP EEPROM, computes a hash and compares this to Ampere’s public key hash stored in eFuse. Once authenticated, Ampere’s public key is used to decrypt key and content certificates for SMpro, PMpro, and ATF firmware, which are launched in the order described above.
The SMpro public key will be used to authenticate the SMpro and PMpro images and ATF keys which in turn will authenticate ATF images. This cascading set of authentication that originates with the Ampere root key and stored in chip called an electronic fuse, or eFuse. An eFuse can be programmed only once, setting the content to be read-only and can not be tampered with nor modified.
This is the original hardware root of trust used for signing system, secure world firmware. When we looked at this, after referencing the signing process we had with AMD PSB and knowing there was a large enough one-time-programmable (OTP) region within the SoC, we thought: why can’t we insert our key hash in here?
Single Domain Secure Boot
Single Domain Secure Boot takes the same authentication flow and adds a hash of the customer public key (Cloudflare firmware signing key in this case) to the eFuse domain. This enables the verification of UEFI firmware by a hardware root of trust. This process is performed in the already validated ATF firmware by BL2. Our public key (dbb) is read from UEFI secure variable storage, a hash is computed and compared to the public key hash stored in eFuse. If they match, the validated public key is used to decrypt the BL33 content certificate, validating and launching the BIOS, and remaining boot items. This is the key feature added by SDSB. It validates the entire software boot chain with a single eFuse root of trust on the processor.
With a basic understanding of how Single Domain Secure Boot works, the next logical question is “How does it get implemented?”. We ensure that all UEFI firmware is signed at build time, but this process can be better understood if broken down into steps.
Ampere, our original device manufacturer (ODM), and we play a role in execution of SDSB. First, we generate certificates for a public-private key pair using our internal, secure PKI. The public key side is provided to the ODM as dbb.auth and dbu.auth in UEFI secure variable format. Ampere provides a reference Software Release Package (SRP) including the baseboard management controller, system control processor, UEFI, and complex programmable logic device (CPLD) firmware to the ODM, who customizes it for their platform. The ODM generates a board file describing the hardware configuration, and also customizes the UEFI to enroll dbb and dbu to secure variable storage on first boot.
Once this is done, we generate a UEFI.slim file using the ODM’s UEFI ROM image, Arm Trusted Firmware (ATF) and Board File. (Note: This differs from AMD PSB insofar as the entire image and ATF files are signed; with AMD PSB, only the first block of boot code is signed.) The entire .SLIM file is signed with our private key, producing a signature hash in the file. This can only be authenticated by the correct public key. Finally, the ODM packages the UEFI into .HPM format compatible with their platform BMC.
In parallel, we provide the debug fuse selection and hash of our DER-formatted public key. Ampere uses this information to create a special version of the SCP firmware known as Security Provisioning (SECPROV) .slim format. This firmware is run one time only, to program the debug fuse settings and public key hash into the SoC eFuses. Ampere delivers the SECPROV .slim file to the ODM, who packages it into a .hpm file compatible with the BMC firmware update tooling.
Fusing the keys
During system manufacturing, firmware is pre-programmed into storage ICs before placement on the motherboard. Note that the SCP EEPROM contains the SECPROV image, not standard SCP firmware. After a system is first powered on, an IPMI command is sent to the BMC which releases the Ampere processor from reset. This allows SECPROV firmware to run, burning the SoC eFuse with our public key hash and debug fuse settings.
Final manufacturing flow
Once our public key has been provisioned, manufacturing proceeds by re-programming the SCP EEPROM with its regular firmware. Once the system powers on, ATF detects there are no keys present in secure variable storage and allows UEFI firmware to boot, regardless of signature. Since this is the first UEFI boot, it programs our public key into secure variable storage and reboots. ATF is validated by Ampere’s public key hash as usual. Since our public key is present in dbb, it is validated against our public key hash in eFuse and allows UEFI to boot.
The first part of validation requires observing successful destruction of the eFuses. This imprints our public key hash into a dedicated, immutable memory region, not allowing the hash to be overwritten. Upon automatic or manual issue of an IPMI OEM command to the BMC, the BMC observes a signal from the SECPROV firmware, denoting eFuse programming completion. This can be probed with BMC commands.
When the eFuses have been blown, validation continues by observing the boot chain of the other firmware. Corruption of the SCP, ATF, or UEFI firmware obstructs boot flow and boot authentication and will cause the machine to fail booting to the OS. Once firmware is in place, happy path validation begins with booting the machine.
Upon first boot, firmware boots in the following order: BMC, SCP, ATF, and UEFI. The BMC, SCP, and ATF firmware can be observed via their respective serial consoles. The UEFI will automatically enroll the dbb and dbu files to the secure variable storage and trigger a reset of the system.
After observing the reset, the machine should successfully boot to the OS if the feature is executed correctly. For further validation, we can use the UEFI shell environment to extract the dbb file and compare the hash against the hash submitted to Ampere. After successfully validating the keys, we flash an unsigned UEFI image. An unsigned UEFI image causes authentication failure at bootloader stage BL3-2. The ATF firmware undergoes a boot loop as a result. Similar results will occur for a UEFI image signed with incorrect keys.
Updated authentication flow
On all subsequent boot cycles, the ATF will read secure variable dbb (our public key), compute a hash of the key, and compare it to the read-only Cloudflare public key hash in eFuse. If the computed and eFuse hashes match, our public key variable can be trusted and is used to authenticate the signed UEFI. After this, the system boots to the OS.
We were unable to get a machine without the feature enabled to demonstrate the set-up of the feature since we have the eFuse set at build time, but we can demonstrate what it looks like to go between an unsigned BIOS and a signed BIOS. What we would have observed with the set-up of the feature is a custom BMC command to instruct the SCP to burn the ROTPK into the SOC’s OTP fuses. From there, we would observe feedback to the BMC detailing whether burning the fuses was successful. Upon booting the UEFI image for the first time, the UEFI will write the dbb and dbu into secure storage.
As you can see, after flashing the unsigned BIOS, the machine fails to boot.
Despite the lack of visibility in failure to boot, there are a few things going on underneath the hood. The SCP (System Control Processor) still boots.
The SCP image holds a key certificate with Ampere’s generated ROTPK and the SCP key hash. SCP will calculate the ROTPK hash and compare it against the burned OTP fuses. In the failure case, where the hash does not match, you will observe a failure as you saw earlier. If successful, the SCP firmware will proceed to boot the PMpro and SMpro. Both the PMpro and SMpro firmware will be verified and proceed with the ATF authentication flow.
The conclusion of the SCP authentication is the passing of the BL1 key to the first stage bootloader via the SCP HOB(hand-off-block) to proceed with the standard three stage bootloader ATF authentication mentioned previously.
At BL2, the dbb is read out of the secure variable storage and used to authenticate the BL33 certificate and complete the boot process by booting the BL33 UEFI image.
Still more to do
In recent years, management interfaces on servers, like the BMC, have been the target of cyber attacks including ransomware, implants, and disruptive operations. Access to the BMC can be local or remote. With remote vectors open, there is potential for malware to be installed on the BMC via network interfaces. With compromised software on the BMC, malware or spyware could maintain persistence on the server. An attacker might be able to update the BMC directly using flashing tools such as flashrom or socflash without the same level of firmware resilience established at the UEFI level.
The future state involves using host CPU-agnostic infrastructure to enable a cryptographically secure host prior to boot time. We will look to incorporate a modular approach that has been proposed by the Open Compute Project’s Data Center Secure Control
Module Specification (DC-SCM) 2.0 specification. This will allow us to standardize our Root of Trust, sign our BMC, and assign physically unclonable function (PUF) based identity keys to components and peripherals to limit the use of OTP fusing. OTP fusing creates a problem with trying to “e-cycle” or reuse machines as you cannot truly remove a machine identity.
Cloudflare has millions of free customers. Not only is it something we’re incredibly proud of in the context of helping to build a better Internet — but it’s something that has made the Cloudflare service measurably better. One of the ways we’ve benefited is that it’s created a very strong imperative for Cloudflare to maintain a network that is as efficient as possible. There’s simply no other way to serve so many free customers.
In the spirit of this, we are very excited about the latest step in our energy-efficiency journey: turning to Arm for our server CPUs. It has been a long journey getting here — we started testing our first Arm CPUs all the way back in November 2017. It’s only recently, however, that the quantum of energy efficiency improvement from Arm has become clear. Our first Arm CPU was deployed in production earlier this month — July 2021.
Our most recently deployed generation of edge servers, Gen X, used AMD Rome CPUs. Compared with that, the newest Arm based CPUs process an incredible 57% more Internet requests per watt. While AMD has a sequel, Milan (and which Cloudflare will also be deploying), it doesn’t achieve the same degree of energy efficiency that the Arm processor does — managing only 39% more requests per watt than Rome CPUs in our existing fleet. As Arm based CPUs become more widely deployed, and our software is further optimized to take advantage of the Arm architecture, we expect further improvements in the energy efficiency of Arm servers.
Using Arm, Cloudflare can now securely process over ten times as many Internet requests for every watt of power consumed, than we did for servers designed in 2013.
(In the graphic below, for 2021, the perforated data point refers to x86 CPUs, whereas the bold data point refers to Arm CPUs)
As Arm server CPUs demonstrate their performance and become more widely deployed, we hope this will inspire x86 CPUs manufacturers (such as Intel and AMD) to urgently take energy efficiency more seriously. This is especially important since, worldwide, x86 CPUs continue to represent the vast majority of global data center energy consumption.
Together, we can reduce the carbon impact of Internet use. The environment depends on it.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.