Tag Archives: Cloudflare Network

Cloudflare treats SASE anxiety for VeloCloud customers

Post Syndicated from Brian Tokuyoshi original https://blog.cloudflare.com/treating-sase-anxiety


We understand that your VeloCloud deployment may be partially or even fully deployed. You may be experiencing discomfort from SASE anxiety. Symptoms include:

If you’re a VeloCloud customer, we are here to help you with your transition to Magic WAN, with planning, products and services. You’ve experienced the turbulence, and that’s why we are taking steps to help. First, it’s necessary to illustrate what’s fundamentally wrong with the architecture by acquisition model in order to define the right path forward. Second, we document the steps involved for making a transition from VeloCloud to Cloudflare. Third, we are offering a helping hand to help VeloCloud customers to get their SASE strategies back on track.

Architecture is the key to SASE

Your IT organization must deliver stability across your information systems, because the future of your business depends on the decisions that you make today. You need to make sure that your SASE journey is backed by vendors that you can depend on. Indecisive vendors and unclear strategies rarely inspire confidence, and it’s driving organizations to reconsider their relationship.

It’s not just VeloCloud that’s pivoting. Many vendors are chasing the brass ring to meet the requirement for Single Vendor SASE, and they’re trying to reduce their time to market by acquiring features on their checklist, rather than taking the time to build the right architecture for consistent management and user experience. It’s led to rapid consolidation of both startups and larger product stacks, but now we’re seeing many many instances of vendors having to rationalize their overlapping product lines. Strange days indeed.

But the thing is, Single Vendor SASE is not a feature checklist game. It’s not like shopping for PC antivirus software where the most attractive option was the one with the most checkboxes. It doesn’t matter if you acquire a large stack of product acronyms (ZTNA, SD-WAN, SWG, CASB, DLP,  FWaaS, SD-WAN to name but a few) if the results are just as convoluted as the technology it aims to replace.

If organizations are new to SASE, then it can be difficult to know what to look for. However, one clear sign of trouble is taking an SSE designed by one vendor and combining it with SD-WAN from another. Because you can’t get a converged platform out of two fundamentally incongruent technologies.

Why SASE Math Doesn’t Work

The conceptual model for SASE typically illustrates two half circles, with one consisting of cloud-delivered networking and the other being cloud-delivered security. With this picture in mind, it’s easy to see how one might think that combining an implementation of cloud-delivered networking (VeloCloud SD-WAN) and an implementation of cloud-delivered security (Symantec Network Protection – SSE) might satisfy the requirements. Does Single Vendor SASE = SD-WAN + SSE?

In practice, networking and network security do not exist in separate universes, but SD-WAN and SSE implementations do, especially when they were designed by different vendors. That’s why the math doesn’t work, because even with the requisite SASE functionality, the implementation of the functionality doesn’t fit. SD-WAN is designed for network connectivity between sites over the SD-WAN fabric, whereas SSE largely focuses on the enforcement of security policy for user->application traffic from remote users or traffic leaving (rather than traversing) the SD-WAN fabric. Therefore, to bring these two worlds together, you end up with security inconsistency, proxy chains which create a burden on latency, or implementing security at the edge rather than in the cloud.

Why Cloudflare is different

At Cloudflare, the basis for our approach to single vendor SASE starts from building a global network designed with private data centers, overprovisioned network and compute capacity, and a private backbone designed to deliver our customer’s traffic to any destination. It’s what we call any-to-any connectivity. It’s not using the public cloud for SASE services, because the public cloud was designed as a destination for traffic rather than being optimized for transit. We are in full control of the design of our data centers and network and we’re obsessed with making it even better every day.

It’s from this network that we deliver networking and security services. Conceptually, we implement a philosophy of composability, where the fundamental network connection between the customer’s site and the Cloudflare data center remains the same across different use cases. In practice, and unlike traditional approaches, it means no downtime for service insertion when you need more functionality — the connection to Cloudflare remains the same. It’s the services and the onboarding of additional destinations that changes as organizations expand their use of Cloudflare.

From the perspective of branch connectivity, use Magic WAN for the connectivity that ties your business together, no matter which way traffic passes. That’s because we don’t treat the directions of your network traffic as independent problems. We solve for consistency by on-ramping all traffic through one of Cloudflare’s 310+ anycasted data centers (whether inbound, outbound, or east-west) for enforcement of security policy. We solve for latency by eliminating the need to forward traffic to a compute location by providing full compute services in every data center. We implement SASE using a light edge / heavy cloud model, with services delivered within the Cloudflare connectivity cloud rather than on-prem.

How to transition from VeloCloud to Cloudflare

Start by contacting us to get a consultation session with our solutions architecture team. Our architects specialize in network modernization and can map your SASE goals across a series of smaller projects. We’ve worked with hundreds of organizations to achieve their SASE goals with the Cloudflare connectivity cloud and can build a plan that your team can execute on.

For product education, join one of our product workshops on Magic WAN to get a deep dive into how it’s built and how it can be rolled out to your locations. Magic WAN uses a light edge, heavy cloud model that has multiple network insertion models (whether a tunnel from an existing device, using our turnkey Magic WAN Connector, or deploying a virtual appliance) which can work in parallel or as a replacement for your branch connectivity needs, thus allowing you to migrate at your pace. Our specialist teams can help you mitigate transitionary hardware and license costs as you phase out VeloCloud and accelerate your rollout of Magic WAN.

The Magic WAN technical engineers have a number of resources to help you build product knowledge as well. This includes reference architectures and quick start guides that address your organization’s connectivity goals, whether sizing down your on-prem network in favor of the emerging “coffee shop networking” philosophy, retiring legacy SD-WAN, and full replacement of conventional MPLS.

For services, our customer success teams are ready to support your transition, with services that are tailored specifically for Magic WAN migrations both large and small.

Your next move

Interested in learning more? Contact us to get started, and we’ll help you with your SASE journey. Contact us to learn how to replace VeloCloud with Cloudflare Magic WAN and use our network as an extension of yours.

Armed to Boot: an enhancement to Arm’s Secure Boot chain

Post Syndicated from Derek Chamorro original https://blog.cloudflare.com/armed-to-boot/

Armed to Boot: an enhancement to Arm's Secure Boot chain

Armed to Boot: an enhancement to Arm's Secure Boot chain

Over the last few years, there has been a rise in the number of attacks that affect how a computer boots. Most modern computers use a specification called Unified Extensible Firmware Interface (UEFI) that defines a software interface between an operating system (e.g. Windows) and platform firmware (e.g. disk drives, video cards). There are security mechanisms built into UEFI that ensure that platform firmware can be cryptographically validated and boot securely through an application called a bootloader. This firmware is stored in non-volatile SPI flash memory on the motherboard, so it persists on the system even if the operating system is reinstalled and drives are replaced.

This creates a ‘trust anchor’ used to validate each stage of the boot process, but, unfortunately, this trust anchor is also a target for attack. In these UEFI attacks, malicious actions are loaded onto a compromised device early in the boot process. This means that malware can change configuration data, establish persistence by ‘implanting’ itself, and can bypass security measures that are only loaded at the operating system stage. So, while UEFI-anchored secure boot protects the bootloader from bootloader attacks, it does not protect the UEFI firmware itself.

Because of this growing trend of attacks, we began the process of cryptographically signing our UEFI firmware as a mitigation step. While our existing solution is platform specific to our x86 AMD server fleet, we did not have a similar solution to UEFI firmware signing for Arm. To determine what was missing, we had to take a deep dive into the Arm secure boot process.

Read on to learn about the world of Arm Trusted Firmware Secure Boot.

Arm Trusted Firmware Secure Boot

Arm defines a trusted boot process through an architecture called Trusted Board Boot Requirements (TBBR), or Arm Trusted Firmware (ATF) Secure Boot. TBBR works by authenticating a series of cryptographically signed binary images each containing a different stage or element in the system boot process to be loaded and executed. Every bootloader (BL) stage accomplishes a different stage in the initialization process:

BL1

BL1 defines the boot path (is this a cold boot or warm boot), initializes the architectures (exception vectors, CPU initialization, and control register setup), and initializes the platform (enables watchdog processes, MMU, and DDR initialization).

BL2

BL2 prepares initialization of the Arm Trusted Firmware (ATF), the stack responsible for setting up the secure boot process. After ATF setup, the console is initialized, memory is mapped for the MMU, and message buffers are set for the next bootloader.

BL3

The BL3 stage has multiple parts, the first being initialization of runtime services that are used in detecting system topology. After initialization, there is a handoff between the ATF ‘secure world’ boot stage to the ‘normal world’ boot stage that includes setup of UEFI firmware. Context is set up to ensure that no secure state information finds its way into the normal world execution state.

Each image is authenticated by a public key, which is stored in a signed certificate and can be traced back to a root key stored on the SoC in one time programmable (OTP) memory or ROM.

Armed to Boot: an enhancement to Arm's Secure Boot chain

TBBR was originally designed for cell phones. This established a reference architecture on how to build a “Chain of Trust” from the first ROM executed (BL1) to the handoff to “normal world” firmware (BL3). While this creates a validated firmware signing chain, it has caveats:

  1. SoC manufacturers are heavily involved in the secure boot chain, while the customer has little involvement.
  2. A unique SoC SKU is required per customer. With one customer this could be easy, but most manufacturers have thousands of SKUs
  3. The SoC manufacturer is primarily responsible for end-to-end signing and maintenance of the PKI chain. This adds complexity to the process  requiring USB key fobs for signing.
  4. Doesn’t scale outside the manufacturer.

What this tells us is what was built for cell phones doesn’t scale for servers.

Armed to Boot: an enhancement to Arm's Secure Boot chain

If we were involved 100% in the manufacturing process, then this wouldn’t be as much of an issue, but we are a customer and consumer. As a customer, we have a lot of control of our server and block design, so we looked at design partners that would take some of the concepts we were able to implement with AMD Platform Secure Boot and refine them to fit Arm CPUs.

Amping it up

We partnered with Ampere and tested their Altra Max single socket rack server CPU (code named Mystique) that provides high performance with incredible power efficiency per core, much of what we were looking for in reducing power consumption. These are only a small subset of specs, but Ampere backported various features into the Altra Max notably, speculative attack mitigations that include Meltdown and Spectre (variants 1 and 2) from the Armv8.5 instruction set architecture, giving Altra the “+” designation in their ISA.

Ampere does implement a signed boot process similar to the ATF signing process mentioned above, but with some slight variations. We’ll explain it a bit to help set context for the modifications that we made.

Ampere Secure Boot

Armed to Boot: an enhancement to Arm's Secure Boot chain

The diagram above shows the Arm processor boot sequence as implemented by Ampere. System Control Processors (SCP) are comprised of the System Management Processor (SMpro) and the Power Management Processor (PMpro). The SMpro is responsible for features such as secure boot and bmc communication while the PMpro is responsible for power features such as Dynamic Frequency Scaling and on-die thermal monitoring.

At power-on-reset, the SCP runs the system management bootloader from ROM and loads the SMpro firmware. After initialization, the SMpro spawns the power management stack on the PMpro and ATF threads. The ATF BL2 and BL31 bring up processor resources such as DRAM, and PCIe. After this, control is passed to BL33 BIOS.

Authentication flow

Armed to Boot: an enhancement to Arm's Secure Boot chain

At power on, the SMpro firmware reads Ampere’s public key (ROTPK) from the SMpro key certificate in SCP EEPROM, computes a hash and compares this to Ampere’s public key hash stored in eFuse. Once authenticated, Ampere’s public key is used to decrypt key and content certificates for SMpro, PMpro, and ATF firmware, which are launched in the order described above.

The SMpro public key will be used to authenticate the SMpro and PMpro images and ATF keys which in turn will authenticate ATF images. This cascading set of authentication that originates with the Ampere root key and stored in chip called an electronic fuse, or eFuse.  An eFuse can be programmed only once, setting the content to be read-only and can not be tampered with nor modified.

This is the original hardware root of trust used for signing system, secure world firmware. When we looked at this, after referencing the signing process we had with AMD PSB and knowing there was a large enough one-time-programmable (OTP) region within the SoC, we thought: why can’t we insert our key hash in here?

Single Domain Secure Boot

Armed to Boot: an enhancement to Arm's Secure Boot chain

Single Domain Secure Boot takes the same authentication flow and adds a hash of the customer public key (Cloudflare firmware signing key in this case) to the eFuse domain. This enables the verification of UEFI firmware by a hardware root of trust. This process is performed in the already validated ATF firmware by BL2. Our public key (dbb) is read from UEFI secure variable storage, a hash is computed and compared to the public key hash stored in eFuse. If they match, the validated public key is used to decrypt the BL33 content certificate, validating and launching the BIOS, and remaining boot items. This is the key feature added by SDSB. It validates the entire software boot chain with a single eFuse root of trust on the processor.

Building blocks

With a basic understanding of how Single Domain Secure Boot works, the next logical question is “How does it get implemented?”. We ensure that all UEFI firmware is signed at build time, but this process can be better understood if broken down into steps.

Armed to Boot: an enhancement to Arm's Secure Boot chain

Ampere, our original device manufacturer (ODM), and we play a role in execution of SDSB. First, we generate certificates for a public-private key pair using our internal, secure PKI. The public key side is provided to the ODM as dbb.auth and dbu.auth in UEFI secure variable format. Ampere provides a reference Software Release Package (SRP) including the baseboard management controller, system control processor, UEFI, and complex programmable logic device (CPLD) firmware to the ODM, who customizes it for their platform. The ODM generates a board file describing the hardware configuration, and also customizes the UEFI to enroll dbb and dbu to secure variable storage on first boot.

Armed to Boot: an enhancement to Arm's Secure Boot chain

Once this is done, we generate a UEFI.slim file using the ODM’s UEFI ROM image, Arm Trusted Firmware (ATF) and Board File. (Note: This differs from AMD PSB insofar as the entire image and ATF files are signed; with AMD PSB, only the first block of boot code is signed.) The entire .SLIM file is signed with our private key, producing a signature hash in the file. This can only be authenticated by the correct public key. Finally, the ODM packages the UEFI into .HPM format compatible with their platform BMC.

Armed to Boot: an enhancement to Arm's Secure Boot chain

In parallel, we provide the debug fuse selection and hash of our DER-formatted public key. Ampere uses this information to create a special version of the SCP firmware known as Security Provisioning (SECPROV) .slim format. This firmware is run one time only, to program the debug fuse settings and public key hash into the SoC eFuses. Ampere delivers the SECPROV .slim file to the ODM, who packages it into a .hpm file compatible with the BMC firmware update tooling.

Fusing the keys

Armed to Boot: an enhancement to Arm's Secure Boot chain

During system manufacturing, firmware is pre-programmed into storage ICs before placement on the motherboard. Note that the SCP EEPROM contains the SECPROV image, not standard SCP firmware. After a system is first powered on, an IPMI command is sent to the BMC which releases the Ampere processor from reset. This allows SECPROV firmware to run, burning the SoC eFuse with our public key hash and debug fuse settings.

Final manufacturing flow

Armed to Boot: an enhancement to Arm's Secure Boot chain

Once our public key has been provisioned, manufacturing proceeds by re-programming the SCP EEPROM with its regular firmware. Once the system powers on, ATF detects there are no keys present in secure variable storage and allows UEFI firmware to boot, regardless of signature. Since this is the first UEFI boot, it programs our public key into secure variable storage and reboots. ATF is validated by Ampere’s public key hash as usual. Since our public key is present in dbb, it is validated against our public key hash in eFuse and allows UEFI to boot.

Validation

Armed to Boot: an enhancement to Arm's Secure Boot chain

The first part of validation requires observing successful destruction of the eFuses. This imprints our public key hash into a dedicated, immutable memory region, not allowing the hash to be overwritten. Upon automatic or manual issue of an IPMI OEM command to the BMC, the BMC observes a signal from the SECPROV firmware, denoting eFuse programming completion. This can be probed with BMC commands.

When the eFuses have been blown, validation continues by observing the boot chain of the other firmware. Corruption of the SCP, ATF, or UEFI firmware obstructs boot flow and boot authentication and will cause the machine to fail booting to the OS. Once firmware is in place, happy path validation begins with booting the machine.

Upon first boot, firmware boots in the following order: BMC, SCP, ATF, and UEFI. The BMC, SCP, and ATF firmware can be observed via their respective serial consoles. The UEFI will automatically enroll the dbb and dbu files to the secure variable storage and trigger a reset of the system.

After observing the reset, the machine should successfully boot to the OS if the feature is executed correctly. For further validation, we can use the UEFI shell environment to extract the dbb file and compare the hash against the hash submitted to Ampere. After successfully validating the keys, we flash an unsigned UEFI image. An unsigned UEFI image causes authentication failure at bootloader stage BL3-2. The ATF firmware undergoes a boot loop as a result. Similar results will occur for a UEFI image signed with incorrect keys.

Updated authentication flow

Armed to Boot: an enhancement to Arm's Secure Boot chain

On all subsequent boot cycles, the ATF will read secure variable dbb (our public key), compute a hash of the key, and compare it to the read-only Cloudflare public key hash in eFuse. If the computed and eFuse hashes match, our public key variable can be trusted and is used to authenticate the signed UEFI. After this, the system boots to the OS.

Let’s boot!

We were unable to get a machine without the feature enabled to demonstrate the set-up of the feature since we have the eFuse set at build time, but we can demonstrate what it looks like to go between an unsigned BIOS and a signed BIOS. What we would have observed with the set-up of the feature is a custom BMC command to instruct the SCP to burn the ROTPK into the SOC’s OTP fuses. From there, we would observe feedback to the BMC detailing whether burning the fuses was successful. Upon booting the UEFI image for the first time, the UEFI will write the dbb and dbu into secure storage.

As you can see, after flashing the unsigned BIOS, the machine fails to boot.

Despite the lack of visibility in failure to boot, there are a few things going on underneath the hood. The SCP (System Control Processor) still boots.

  1. The SCP image holds a key certificate with Ampere’s generated ROTPK and the SCP key hash. SCP will calculate the ROTPK hash and compare it against the burned OTP fuses. In the failure case, where the hash does not match, you will observe a failure as you saw earlier. If successful, the SCP firmware will proceed to boot the PMpro and SMpro. Both the PMpro and SMpro firmware will be verified and proceed with the ATF authentication flow.
  2. The conclusion of the SCP authentication is the passing of the BL1 key to the first stage bootloader via the SCP HOB(hand-off-block) to proceed with the standard three stage bootloader ATF authentication mentioned previously.
  3. At BL2, the dbb is read out of the secure variable storage and used to authenticate the BL33 certificate and complete the boot process by booting the BL33 UEFI image.

Still more to do

In recent years, management interfaces on servers, like the BMC, have been the target of cyber attacks including ransomware, implants, and disruptive operations. Access to the BMC can be local or remote. With remote vectors open, there is potential for malware to be installed on the BMC via network interfaces. With compromised software on the BMC, malware or spyware could maintain persistence on the server. An attacker might be able to update the BMC directly using flashing tools such as flashrom or socflash without the same level of firmware resilience established at the UEFI level.

The future state involves using host CPU-agnostic infrastructure to enable a cryptographically secure host prior to boot time. We will look to incorporate a modular approach that has been proposed by the Open Compute Project’s Data Center Secure Control

Module Specification (DC-SCM) 2.0 specification. This will allow us to standardize our Root of Trust, sign our BMC, and assign physically unclonable function (PUF) based identity keys to components and peripherals to limit the use of OTP fusing. OTP fusing creates a problem with trying to “e-cycle” or reuse machines as you cannot truly remove a machine identity.

A more sustainable end-of-life for your legacy hardware appliances with Cloudflare and Iron Mountain

Post Syndicated from May Ma original https://blog.cloudflare.com/sustainable-end-of-life-hardware/

A more sustainable end-of-life for your legacy hardware appliances with Cloudflare and Iron Mountain

A more sustainable end-of-life for your legacy hardware appliances with Cloudflare and Iron Mountain

Today, as part of Cloudflare’s Impact Week, we’re excited to announce an opportunity for Cloudflare customers to make it easier to decommission and dispose of their used hardware appliances sustainably. We’re partnering with Iron Mountain to offer preferred pricing and discounts for Cloudflare customers that recycle or remarket legacy hardware through its service.

Replacing legacy hardware with Cloudflare’s network

Cloudflare’s products enable customers to replace legacy hardware appliances with our global network. Connecting to our network enables access to firewall (including WAF and Network Firewalls, Intrusion Detection Systems, etc), DDoS mitigation, VPN replacement, WAN optimization, and other networking and security functions that were traditionally delivered in physical hardware. These are served from our network and delivered as a service. This creates a myriad of benefits for customers including stronger security, better performance, lower operational overhead, and none of the headaches of traditional hardware like capacity planning, maintenance, or upgrade cycles. It’s also better for the Earth: our multi-tenant SaaS approach means more efficiency and a lower carbon footprint to deliver those functions.

But what happens with all that hardware you no longer need to maintain after switching to Cloudflare?

A more sustainable end-of-life for your legacy hardware appliances with Cloudflare and Iron Mountain

The life of a hardware box

The life of a hardware box begins on the factory line at the manufacturer. These are then packaged, shipped and installed at the destination infrastructure where they provide processing power to run front-end products or services, and routing network traffic. Occasionally, if the hardware fails to operate, or its performance declines over time, it will get fixed or will be returned for replacement under the warranty.

When none of these options work, the hardware box is considered end-of-life and it “dies”. This hardware must be decommissioned by being disconnected from the network, and then physically removed from the data center for disposal.

The useful lifespan of hardware depends on the availability of newer generations of processors which help realize critical efficiency improvements around cost, performance, and power. In general, the industry standard of hardware decommissioning timeline is between three and six years after installation. There are additional benefits to refreshing these physical assets at the lower end of the hardware lifespan spectrum, keeping your infrastructure at optimal performance.

In the instance where the hardware still works, but is replaced by newer technologies, it would be such a waste to discard this gear. Instead, there could be recoverable value in this outdated hardware. And simply tossing unwanted hardware into the trash indiscriminately, which will eventually become part of the landfill, causes devastating consequences as these electronic devices contain hazardous materials like lithium, palladium, lead, copper and cobalt or mercury, and those could contaminate the environment. Below, we explain sustainable alternatives and cost-beneficial practices one can pursue to dispose of your infrastructure hardware.

Option 1: Remarket / Reuse

For hardware that still works, the most sustainable route is to sanitize it of data, refurbish, and resell it in the second-hand market at a depreciated cost. Some IT asset disposition firms would also repurpose used hardware to maximize its market value. For example, harvesting components from a device to build part of another product and selling that at a higher price. For working parts that have very little resale value, companies can also consider reusing them to build a spare parts inventory for replacing failed parts later in the data centers.

The benefits of remarket and reuse are many. It helps maximize a hardware’s return of investment by including any reclaimed value at end-of-life stage, offering financial benefits to the business. And it reduces discarded electronics, or e-waste and their harmful efforts on our environment, helping socially responsible organizations build a more sustainable business. Lastly, it provides alternatives to individuals and organizations that cannot afford to buy new IT equipment.

Option 2: Recycle

For used hardware that is not able to be remarketed, it is recommended to engage an asset disposition firm to professionally strip it of any valuable and recyclable materials, such as precious metal and plastic, before putting it up for physical destruction. Similar to remarketing, recycling also reduces environmental impact, and cuts down the amount of raw materials needed to manufacture new products.

A key factor in hardware recycling is a secure chain of custody. Meaning, a supplier has the right certification, preferably its own fleet and secure facilities to properly and securely process the equipment.

Option 3: Destroy

From a sustainable point of view, this route should only be used as a last resort. When hardware does not operate as it is intended to, and has no remarketed nor recycled value, an asset disposition supplier would remove all the asset tags and information from it in preparation for a physical destruction. Depending on disposal policies, some companies would choose to sanitize and destroy all the data bearing hardware, such as SSD or HDD, for security reasons.

To further maximize recycling value and reduce e-waste, it is recommended to keep security policy up to date on discarded IT equipment and explore the option of reusing working devices after a professional data wiping as much as possible.

At Cloudflare, we follow an industry-standard capital depreciation timeline, which culminates in recycling actions through the engagement of IT asset disposition partners including Iron Mountain. Through these partnerships, besides data bearing hardware which follows the security policy to be sanitized and destroyed, approximately 99% of the rest decommissioned IT equipment from Cloudflare is sold or recycled.

Partnering with Iron Mountain to make sustainable goals more accessible

Hardware discomission can be a burden on a business, from operational strain to complex processes, a lack of streamlined execution to the risk of a data breach. Our experience shows that partnering with an established firm like Iron Mountain who is specialized in IT asset disposition would help kick-start one’s hardware recycling journey.

Iron Mountain has more than two decades of experience working with Hyperscale technology and data centers. A market leader in decommissioning, data security and remarketing capabilities. It has a wide footprint of facilities to support their customers’ sustainability goals globally.

Today, Iron Mountain has generated more than US$1.5 billion through value recovery and has been continually developing new ways to sell mass volumes of technology for their best use. Other than their end-to-end decommission offering, there are two additional value adding services that Iron Mountain provides to their customers that we find valuable. They offer a quarterly survey report which presents insights in the used market, and a sustainability report that measures the environmental impact based on total hardware processed with their customers.

Get started today

Get started today with Iron Mountain on your hardware recycling journey and sign up from here. After receiving the completed contact form, Iron Mountain will consult with you on the best solution possible. It has multiple programs to support including revenue share, fair market value, and guaranteed destruction with proper recycling. For example, when it comes to reselling used IT equipment, Iron Mountain would propose an appropriate revenue split, namely how much percentage of sold value will be shared with the customer, based on business needs. Iron Mountain’s secure chain of custody with added solutions such as redeployment, equipment retrieval programs, and onsite destruction can ensure it can tailor the solution that works best for your company’s security and environmental needs.

And in collaboration with Cloudflare, Iron Mountain offers additional two percent on your revenue share of the remarketed items and a five percent discount on the standard fees for other IT asset disposition services if you are new to Iron Mountain and choose to use these services via the link in this blog.

Cloudflare deployment in Guam

Post Syndicated from David Antunes original https://blog.cloudflare.com/cloudflare-deployment-in-guam/

Cloudflare deployment in Guam

Cloudflare deployment in Guam

Having fast Internet properties means being as few milliseconds as possible away from our customers and their users, no matter where they are on Earth. And because of the design of Cloudflare’s network we don’t just make Internet properties faster by being closer, we bring our Zero Trust services closer too. So whether you’re connecting to a public API, a website, a SaaS application, or your company’s internal applications, we’re close by.

This is possible by adding new cities, partners, capacity, and cables. And we have seen over and over again how making the Internet faster in a region also can have a clear impact on traffic: if the experience is quicker, people usually do more online.

Cloudflare’s network keeps increasing, and its global footprint does so accordingly. In April 2022 we announced that the Cloudflare network now spans 275 cities and the number keeps growing.

In this blog post we highlight the deployment of our data center in Hagatna, Guam.

Why a blog about Guam?

Guam is about 2,400 km from both Tokyo in the north and Manila in the west, and about 6,100 km from Honolulu in the east. Honolulu itself is the most remote major city in the US and one of the most remote in the world, the closest major city from it being San Francisco, California at 3,700 km. From here one can derive how far Guam is from the US to the west and from Asia to the east.

Cloudflare deployment in Guam
Figure 1: Guam Geographical Location.

Why is this relevant? As explained here, latency is the time it takes for data to pass from one point on a network to another. And one of the main reasons behind network latency is the distance between client devices — like a browser on a mobile phone — making requests and the servers responding to those requests. So, if we consider where Guam is geographically, we get a good picture about how Guam’s residents can be affected by the long distances their Internet requests, and responses, have to travel.

This is why every time Cloudflare adds a new location, we help make the Internet a bit faster. The reason is that every new location brings Cloudflare’s services closer to the users requested them. As part of Cloudflare’s mission, the Guam deployment is a perfect example of how we are going from being the most global network on Earth to the most local one as well.

Submarine cables

There are 486 active submarine cables and 1,306 landings that are currently active or under construction, running to an estimated 1.3 million km around the globe.

A closer look at specific submarine cables landing in Guam show us that the region is actually well served in terms of submarine cables, with several connections to the mainland such as Japan, Taiwan, Philippines, Australia, Indonesia and Hawaii, therefore making Guam more resilient to matters such as the one that affected Tonga in January 2022 due to the impact of a volcanic eruption on submarine cables – we wrote about it here.

Cloudflare deployment in Guam
Figure 2: Submarine Cables Landing in Guam (source: submarinecablemap.com)

The picture above also shows the relevance of Guam for other even more remote locations, such as the Federated States of Micronesia (FSM) or the Marshall Islands, which have an ‘extra-hop’ to cover when trying to reach the rest of the Internet. Palau also relies on Guam but, from a resilience point of view, has alternatives to locations such as the Philippines or to Australia.

Presence at Mariana Islands Internet Exchange

Cloudflare’s presence in Guam is through Mariana Islands Internet Exchange, or MARIIX, allowing Cloudflare to peer with participants such as:

  • AS 395400 – University of Guam
  • AS 9246 – GTA Teleguam
  • AS 3605 – DoCoMo Pacific
  • AS 7131 – IT&E
  • AS 17456 – PDS

As there are multiple participants, these are being added gradually. The first was AS 7131,  being served from April 2022, and the latest addition is AS 9246, from July 2022.

As some of these ASNs or ASs (autonomous systems — large networks or group of networks) have their own downstream customers, further ASs can leverage Cloudflare’s deployment at Guam, examples being AS 17893 – Palau National Communications Corp – or AS 21996 – Guam Cell.

Therefore, the Cloudflare deployment brings not only a better (and generally faster) Internet to Guam’s residents, but also to residents in nearby archipelagos that are anchored on Guam. In May 2022, according to UN’s forecasts, the covered resident population in the main areas in the region stands around 171k in Guam, 105k in FSM and 60k in the Marshall Islands.

For this deployment, Cloudflare worked with the skilled MARIIX personnel for the physical installations, provisioning and services turn-up. Despite the geographical distance and time zone differences (Hagatna is 9 hours ahead of GMT but only two hours ahead of the Cloudflare office in Singapore, so the time difference wasn’t a big challenge), all the logistics involved and communications went smoothly. A recent blog posted by APNIC, where we can see some personnel with whom Cloudflare worked, reiterates the valuable work being done locally and the increasing importance of Guam in the region.

Performance impact for local/regional habitants

Before Cloudflare’s deployment in Guam, customers of local ASs trying to reach Internet properties via Cloudflare’s network were redirected mainly to Cloudflare’s deployments in Tokyo and Seattle. This is due to the anycast routing used by Cloudflare — as described here; anycast typically routes incoming traffic to the nearest data center. In the case of Guam, and as previously described, these large distances to the nearest locations represents a distance of thousands of kilometers or, in other words, high latency thus affecting user experience badly.

With Cloudflare’s deployment in Guam, Guam’s and nearby archipelagos’ residents are no longer redirected to those faraway locations, instead they are served locally by the new Cloudflare servers. Although a decrease of a few milliseconds may not seem a lot, it actually represents a significant boost in user experience as latency is dramatically reduced. As the total distance between users and servers is reduced, load time is reduced as well. And as users very often quit waiting for a site to load when the load time is high, the opposite occurs as well, i.e., users are more willing to stay within a site if loading times are good. This improvement represents both a better user experience and higher use of the Internet.

In the case of Guam, we use AS 9246 as an example as it was previously served by Seattle but since around 23h00 UTC 14/July/2022 is served by Guam, as illustrated below:

Cloudflare deployment in Guam
Figure 3: Requests per Colo for AS 9246 Before vs After Cloudflare Deployment at Guam.

The following chart displays the median and the 90th percentile of the eyeball TCP RTT for AS 9246 immediately before and after AS 9246 users started to use the Guam deployment:

Cloudflare deployment in Guam
Figure 4: Eyeball TCP RTT for AS 9246 Before vs After Cloudflare Deployment at Guam.

From the chart above we can derive that the overall reduction for the eyeball TCP RTT immediately before and after Guam’s deployment was:

  • Median decreased from 136.3ms to 9.3ms, a 93.2% reduction;
  • P90 decreased from 188.7ms to 97.0ms, a 48.5% reduction.

When comparing the [12h00 ; 13h00] UTC period of the 14/July/2022 (therefore, AS 9246 still served by Seattle) vs the same hour but for the 15th/July/2022 (thus AS9246 already served by Guam), the differences are also clear. We pick this period as this is a busy hour period locally since local time equals UTC plus 10 hours:

Cloudflare deployment in Guam
Figure 5 – Median Eyeball TCP RTT for AS 9246 from Seattle vs Guam.

The median eyeball TCP RTT decreased from 146ms to 12ms, i.e., a massive 91.8% reduction and perhaps, due to already mentioned geographical specificities, one of Cloudflare’s deployments representing a larger reduction in latency for the local end users.

Impact on Internet traffic

We can actually see an increase in HTTP requests in Guam since early April, right when we were setting up our Guam data center. The impact of the deployment was more clear after mid-April, with a further step up in mid-June. Comparing March 8 with July 17, there was an 11.5% increase in requests, as illustrated below:

Cloudflare deployment in Guam
Figure 6: Trend in HTTP Requests per Second in Guam.

Edge Partnership Program

If you’re an ISP that is interested in hosting a Cloudflare cache to improve performance and reduce backhaul, get in touch on our Edge Partnership Program page. And if you’re a software, data, or network engineer – or just the type of person who is curious and wants to help make the Internet better – consider joining our team.

Debugging Hardware Performance on Gen X Servers

Post Syndicated from Yasir Jamal original https://blog.cloudflare.com/debugging-hardware-performance-on-gen-x-servers/

Debugging Hardware Performance on Gen X Servers

Debugging Hardware Performance on Gen X Servers

In Cloudflare’s global network, every server runs the whole software stack. Therefore, it’s critical that every server performs to its maximum potential capacity. In order to provide us better flexibility from a supply chain perspective, we buy server hardware from multiple vendors with the exact same configuration. However, after the deployment of our Gen X AMD EPYC Zen 2 (Rome) servers, we noticed that servers from one vendor (which we’ll call SKU-B) were consistently performing 5-10% worse than servers from second vendor (which we’ll call SKU-A).

The graph below shows the performance discrepancy between the two SKUs in terms of percentage difference. The performance is gauged on the metric of requests per second, and this data is an average of observations captured over 24 hours.

Debugging Hardware Performance on Gen X Servers
Machines before implementing performance improvements. The average RPS for SKU-B is approximately 10% below SKU-A.

Compute performance via DGEMM

The initial debugging efforts centered around the compute performance. We ran AMD’s DGEMM high performance computing tool to determine if CPU performance was the cause. DGEMM is designed to measure the sustained floating-point computation rate of a single server. Specifically, the code measures the floating point rate of execution of a real matrix–matrix multiplication with double precision. We ran a modified version of DGEMM equipped with specific AMD libraries to support the EPYC processor instruction set.

The DGEMM test brought out a few points related to the processor’s Thermal Design Power (TDP). TDP refers to the maximum power that a processor can draw for a thermally significant period while running a software application. The underperforming servers were only drawing 215 to 220 watts of power when fully stressed, whereas the max supported TDP on the processors is 240 watts. Additionally, their floating-point computation rate was ~100 gigaflops (GFLOPS) behind the better performing servers.

Screenshot from the DGEMM run of a good system:

Debugging Hardware Performance on Gen X Servers

Screenshot from an underperforming system:

Debugging Hardware Performance on Gen X Servers

To debug the issue, we first tried disabling idle power saving mode (also known as C-states) in the CPU BIOS configuration. The servers reported expected GFLOPS numbers and achieved max TDP. Believing that this could have been the root cause, the servers were moved back into the production test environment for data collection.

However, the performance delta was still there.

Further debugging

We started the debugging process all over again by comparing the BIOS settings logs of both SKU-A and SKU-B. Once this debugging option was exhausted, the focus shifted to the network interface. We ran the open source network interface tool iPerf to check if there were any bottlenecks — no issues were observed either.

During this activity, we noticed that the BIOS of both SKUs were not using the AMD Preferred I/O functionality, which allows devices on a single PCIe bus to obtain improved DMA write performance. We enabled the Preferred I/O option on SKU-B and tested the change in production. Unfortunately, there were no performance gains. After the focus on network activity, the team started looking into memory configuration and operating speed.

AMD HSMP tool & Infinity Fabric diagnosis

The Gen X systems are configured with DDR4 memory modules that can support a maximum of 2933 megatransfers per second (MT/s). With the BIOS configuration for memory clock Frequency set to Auto, the 2933 MT/s automatically configures the memory clock frequency to 1467 MHz. Double Data Rate (DDR) technology allows for the memory signal to be sampled twice per clock cycle: once on the rising edge and once on the falling edge of the clock signal. Because of this, the reported memory speed rate is twice the true memory clock frequency. Memory bandwidth was validated to be working quite well by running a Stream benchmark test.

Running out of debugging options, we reached out to AMD for assistance and were provided a debug tool called HSMP that lets users access the Host System Management Port. This tool provides a wide variety of processor management options, such as reading and changing processor TDP limits, reading processor and memory temperatures, and reading memory and processor clock frequencies. When we ran the HSMP tool, we discovered that the memory was being clocked at a different rate from AMD’s Infinity Fabric system — the architecture which connects the memory to the processor core. As shown below, the memory clock frequency was set to 1467 MHz while the Infinity Fabric clock frequency was set to 1333 MHz.

Debugging Hardware Performance on Gen X Servers

Ideally, the two clock frequencies need to be equal for optimized workloads sensitive to latency and throughput. Below is a snapshot for an SKU-A server where both memory and Infinity Fabric frequencies are equal.

Debugging Hardware Performance on Gen X Servers

Improved Performance

Since the Infinity Fabric clock setting was not exposed as a tunable parameter in the BIOS, we asked the vendor to provide a new BIOS that set the frequency to 1467 MHz during compile time. Once we deployed the new BIOS on the underperforming systems in production, we saw that all servers started performing to their expected levels. Below is a snapshot of the same set of systems with data captured and averaged over a 24 hours window on the same day of the week as the previous dataset. Now, the requests per second performance of SKU-B equals and in some cases exceeds the performance of SKU-A!

Debugging Hardware Performance on Gen X Servers
Internet Requests Per Second (RPS) for four SKU-A machines and four SKU-B machines after implementing the BIOS fix on SKU-B. The RPS of SKU-B now equals the RPS of SKU-A.

Hello, I am Yasir Jamal. I recently joined Cloudflare as a Hardware Engineer with the goal of helping provide a better Internet to everyone. If you share the same interest, come join us!

The Cloudflare network now spans 275 cities

Post Syndicated from Joanne Liew original https://blog.cloudflare.com/new-cities-april-2022-edition/

The Cloudflare network now spans 275 cities

The Cloudflare network now spans 275 cities

It was just last month that we announced our network had grown to over 270 cities globally. Today, we’re announcing that with recent additions we’ve reached 275 cities. With each new city we add, we help make the Internet faster, more reliable, and more secure. In this post, we’ll talk about the cities we added, the performance increase, and look closely at our network expansion in India.

The Cities

Here are the four new cities we added in the last month: Ahmedabad, India; Chandigarh, India; Jeddah, Saudi Arabia; and Yogyakarta, Indonesia.

A closer look at India

India is home to one of the largest and most rapidly growing bases of digital consumers. Recognising this, Cloudflare has increased its footprint in India in order to optimize reachability to users within the country.

Cloudflare’s expansion in India is facilitated through interconnections with several of the largest Internet Service Providers (ISPs), mobile network providers and Internet Exchange points (IXPs). At present, we are directly connected to the major networks that account for more than 95% of the country’s broadband subscribers. We are continuously working to not only expand the interconnection capacity and locations with these networks, but also establish new connections to the networks that we have yet to interconnect with.

In 2020, we were served through seven cities in the country. Since then, we have added our network presence in another five cities, totaling to 12 cities in India. In the case of one of our biggest partners, with whom we interconnect in these 12 cities, Cloudflare’s latency performance is better in comparison to other major platforms, as shown in the chart below.

The Cloudflare network now spans 275 cities
Response time (in ms) for the top network in India to Cloudflare and other platforms. Source: Cedexis

Helping make the Internet faster

Every time we add a new location, we help make the Internet a little bit faster. The reason is every new location brings our content and services closer to the person (or machine) that requested them. Instead of driving 25 minutes to the grocery story, it’s like one opened in your neighborhood.

In the case of Jeddah, Saudi Arabia, we already have six other locations in two different cities in Saudi Arabia. Still, by adding this new location, we were able to improve median performance (TCP RTT latency) by 26% from 81ms to 60ms. 20 milliseconds doesn’t sound like a lot, right? But this location is serving almost 10 million requests per day. That’s approximately 55 hours per day that someone (or something) wasn’t waiting for data.

The Cloudflare network now spans 275 cities

As we continue to put dots on the map, we’ll keep putting updates here on how Internet performance is improving. As we like to say, we’re just getting started.

If you’re an ISP that is interested in hosting a Cloudflare cache to improve performance and reduce backhaul, get in touch on our Edge Partnership Program page. And if you’re a software, data, or network engineer – or just the type of person who is curious and wants to help make the Internet better – consider joining our team.

New cities on the Cloudflare global network: March 2022 edition

Post Syndicated from Mike Conlow original https://blog.cloudflare.com/mid-2022-new-cities/

New cities on the Cloudflare global network: March 2022 edition

If you follow the Cloudflare blog, you know that we love to add cities to our global map. With each new city we add, we help make the Internet faster, more reliable, and more secure. Today, we are announcing the addition of 18 new cities in Africa, South America, Asia, and the Middle East, bringing our network to over 270 cities globally. We’ll also look closely at how adding new cities improves Internet performance, such as our new locations in Israel, which reduced median response time (latency) from 86ms to 29ms (a 66% improvement) in a matter of weeks for subscribers of one Israeli Internet service provider (ISP).

The Cities

Without further ado, here are the 18 new cities in 10 countries we welcomed to our global network: Accra, Ghana; Almaty, Kazakhstan; Bhubaneshwar, India; Chiang Mai, Thailand; Joinville, Brazil; Erbil, Iraq; Fukuoka, Japan; Goiânia, Brazil; Haifa, Israel; Harare, Zimbabwe; Juazeiro do Norte, Brazil; Kanpur, India; Manaus, Brazil; Naha, Japan; Patna, India; São José do Rio Preto, Brazil; Tashkent, Uzbekistan; Uberlândia, Brazil.

Cloudflare’s ISP Edge Partnership Program

But let’s take a step back and understand why and how adding new cities to our list helps make the Internet better. First, we should reintroduce the Cloudflare Edge Partnership Program. Cloudflare is used as a reverse proxy by nearly 20% of all Internet properties, which means the volume of ISP traffic trying to reach us can be significant. In some cases, as we’ll see in Israel, the distance data needs to travel can also be significant, adding to latency and reducing Internet performance for the user. Our solution is partnering with ISPs to embed our servers inside their network. Not only does the ISP avoid lots of back haul traffic, but their subscribers also get much better performance because the website is served on-net, and close to them geographically. It is a win-win-win.

Consider a large Israeli ISP we did not peer with locally in Tel Aviv. Last year, if a subscriber wanted to reach a website on the Cloudflare network, their request had to travel on the Internet backbone – the large carriers that connect networks together on behalf of smaller ISPs – from Israel to Europe before reaching Cloudflare and going back. The map below shows where they were able to find Cloudflare content before our deployment went live: 48% in Frankfurt, 33% in London, and 18% in Amsterdam. That’s a long way!

New cities on the Cloudflare global network: March 2022 edition

In January and March 2022, we turned up deployments with the ISP  in Tel Aviv and Haifa. Now live, these two locations serve practically all requests from their subscribers locally within Israel. Instead of traveling 3,000 km to reach one of the millions of websites on our network, most requests from Israel now travel 65 km, or less. The improvement has been dramatic: now we’re serving 66% of requests in under 50ms; before the deployment we couldn’t serve any in under 50ms because the distance was too great. Now, 85% are served in under 100ms; before, we served 66% of requests in under 100ms.

New cities on the Cloudflare global network: March 2022 edition

As we continue to put dots on the map, we’ll keep putting updates here on how Internet performance is improving. As we like to say, we’re just getting started.

If you’re an ISP that is interested in hosting a Cloudflare cache to improve performance and reduce back haul, get in touch on our Edge Partnership Program page. And if you’re a software, data, or network engineer – or just the type of person who is curious and wants to help make the Internet better – consider joining our team.

Protect all network traffic with Cloudflare

Post Syndicated from Annika Garbers original https://blog.cloudflare.com/protect-all-network-traffic/

Protect all network traffic with Cloudflare

Protect all network traffic with Cloudflare

Magic Transit protects customers’ entire networks—any port/protocol—from DDoS attacks and provides built-in performance and reliability. Today, we’re excited to extend the capabilities of Magic Transit to customers with any size network, from home networks to offices to large cloud properties, by offering Cloudflare-maintained and Magic Transit-protected IP space as a service.

What is Magic Transit?

Magic Transit extends the power of Cloudflare’s global network to customers, absorbing all traffic destined for your network at the location closest to its source. Once traffic lands at the closest Cloudflare location, it flows through a stack of security protections including industry-leading DDoS mitigation and cloud firewall. Detailed Network Analytics, alerts, and reporting give you deep visibility into all your traffic and attack patterns. Clean traffic is forwarded to your network using Anycast GRE or IPsec tunnels or Cloudflare Network Interconnect. Magic Transit includes load balancing and automatic failover across tunnels to steer traffic across the healthiest path possible, from everywhere in the world.

Protect all network traffic with Cloudflare
Magic Transit architecture: Internet BGP advertisement attracts traffic to Cloudflare’s network, where attack mitigation and security policies are applied before clean traffic is forwarded back to customer networks with an Anycast GRE tunnel or Cloudflare Network Interconnect.

The “Magic” is in our Anycast architecture: every server across our network runs every Cloudflare service, so traffic can be processed wherever it lands. This means the entire capacity of our network—121+Tbps as of this post—is available to block even the largest attacks. It also drives huge benefits for performance versus traditional “scrubbing center” solutions that route traffic to specialized locations for processing, and makes onboarding much easier for network engineers: one tunnel to Cloudflare automatically connects customer infrastructure to our entire network in over 250 cities worldwide.

What’s new?

Historically, Magic Transit has required customers to bring their own IP addresses—a minimum of a class C IP block, or /24—in order to use this service. This is because a /24 is the minimum prefix length that can be advertised via BGP on the public Internet, which is how we attract traffic for customer networks.

But not all customers have this much IP space; we’ve talked to many of you who want IP layer protection for a smaller network than we’re able to advertise to the Internet on your behalf. Today, we’re extending the availability of Magic Transit to customers with smaller networks by offering Magic Transit-protected, Cloudflare-managed IP space. Starting now, you can direct your network traffic to dedicated static IPs and receive all the benefits of Magic Transit including industry leading DDoS protection, visibility, performance, and resiliency.

Let’s talk through some new ways you can leverage Magic Transit to protect and accelerate any network.

Consistent cross-cloud security

Organizations adopting a hybrid or poly-cloud strategy have struggled to maintain consistent security controls across different environments. Where they used to manage a single firewall appliance in a datacenter, security teams now have a myriad of controls across different providers—physical, virtual, and cloud-based—all with different capabilities and control mechanisms.

Cloudflare is the single control plane across your hybrid cloud deployment, allowing you to manage security policies from one place, get uniform protection across your entire environment, and get deep visibility into your traffic and attack patterns.

Protect all network traffic with Cloudflare

Protecting branches of any size

As DDoS attack frequency and variety continues to grow, attackers are getting more creative with angles to target organizations. Over the past few years, we have seen a consistent rise in attacks targeted at corporate infrastructure including internal applications. As the percentage of a corporate network dependent on the Internet continues to grow, organizations need consistent protection across their entire network.

Now, you can get any network location covered—branch offices, stores, remote sites, event venues, and more—with Magic Transit-protected IP space. Organizations can also replace legacy hardware firewalls at those locations with our built-in cloud firewall, which filters bidirectional traffic and propagates changes globally within seconds.

Protect all network traffic with Cloudflare

Keeping streams alive without worrying about leaked IPs

Generally, DDoS attacks target a specific application or network in order to impact the availability of an Internet-facing resource. But you don’t have to be hosting anything in order to get attacked, as many gamers and streamers have unfortunately discovered. The public IP associated with a home network can easily be leaked, giving attackers the ability to directly target and take down a live stream.

As a streamer, you can now route traffic from your home network through a Magic Transit-protected IP. This means no more worrying about leaking your IP: attackers targeting you will have traffic blocked at the closest Cloudflare location to them, far away from your home network. And no need to worry about impact to your game: thanks to Cloudflare’s globally distributed and interconnected network, you can get protected without sacrificing performance.

Protect all network traffic with Cloudflare

Get started today

This solution is available today; learn more or contact your account team to get started.

Project Myriagon: Cloudflare Passes 10,000 Connected Networks

Post Syndicated from Ticiane Takami original https://blog.cloudflare.com/10000-networks-and-beyond/

Project Myriagon: Cloudflare Passes 10,000 Connected Networks

Project Myriagon: Cloudflare Passes 10,000 Connected Networks

During Speed Week, we’ve talked a lot about the products we’ve improved and the places we’ve expanded to. Today, we have a final exciting announcement: Cloudflare now connects with more than 10,000 other networks. Put another way, over 10,000 networks have direct on-ramps to the Cloudflare network.

This is the culmination of a special project we’ve been working on for the last few months dubbed Project Myriagon, a reference to the 10,000-sided polygon of the same name. In going about this project, we have learned a lot about the performance impact of adding more direct connections to our network — in one recent case, we saw a 90% reduction in median round-trip end-user latency.

But to really explain why this is such a big milestone, we first need to explain a bit about how the Internet works.

More roads leading to Rome

The Internet that all know and rely on is, on a basic level, an interconnected series of independently run local networks. Each network is defined as its own “autonomous system.” These networks are delineated numerically with Autonomous Systems Numbers, or ASNs. An ASN is like the Internet version of a zip code, a short number directly mapping to a distinct region of IP space using a clearly defined methodology. Network interconnection is all about bringing together different ASNs to exponentially multiply the number of possible paths between source and destination.

Most of us have home networks behind a modem and router, connecting your individual miniature network to your ISP. Your ISP then connects with other networks, to fetch the web pages or other Internet traffic you request. These networks in turn have connections to different networks, who in turn connect to interconnected networks, and so on, until your data reaches its destination. The fewer networks your request has to traverse, generally, the lower the end-to-end latency and odds that something will get lost along the way.

The average number of hops between any one network on the Internet to any other network is around 5.7 and 4.7, for the IPv4 and IPv6 networks respectively.

Project Myriagon: Cloudflare Passes 10,000 Connected Networks
Project Myriagon: Cloudflare Passes 10,000 Connected Networks
Source: https://blog.apnic.net/2020/01/14/bgp-in-2019-the-bgp-table/

How do ASNs work?

ASNs are a key part of the routing protocol that directs traffic along the Internet, BGP. Internet Assigned Numbers Authority (IANA), the global coordinator of the DNS Root, IP addressing, and other Internet protocol resources like AS Numbers, delegates ASN-making authority to Regional Internet Registries (RIRs), who in turn assign individual ASNs to network operators in line with their regional policies. The five RIRs are AFRINIC, APNIC, ARIN, LACNIC and RIPE, each entitled to assign and attribute ASN numbers in their respective appointed regions.

Cloudflare’s ASN is 13335, one of the approximately 70,000 ASNs advertised on the Internet. While we’d like to — and plan on — connecting to every one of these ASNs eventually, our team tries to prioritize those with the greatest impact on our overall breadth and improving the proximity to as many people on Earth as possible.

As enabling optimal routes is key to our core business and services, we continuously track how many ASNs we connect to (technically referred to as “adjacent networks”). With Project Myriagon, we aimed to speed up our rate of interconnection and pass 10,000 adjacent networks by the end of the year. By September 2021, we reached that milestone, bringing us from 8,300 at the start of 2020 to over 10,000 today.

As shown in the table below, that milestone is part of a continuous effort towards gradually hitting more of the total advertised ASNs on the Internet.

Project Myriagon: Cloudflare Passes 10,000 Connected Networks
The Regional Internet Registries and their Regions

Table 1: Cloudflare’s peer ASNs and their respective RIR

Project Myriagon: Cloudflare Passes 10,000 Connected Networks

Given that there are 70,000+ ASNs out there, you might be wondering: why is 10,000 a big deal? To understand this, we need to look deeply at BGP, the protocol that glues the Internet together. There are three different classes of ASNs:

  • Transit Only ASNs: these networks only provide connectivity to other networks. They don’t have any IP addresses inside their networks. These networks are quite rare, as it’s very unusual to not have any IP addresses inside your network. Instead, these networks are often used primarily for distinct management purposes within a single organization.
  • Origin Only ASNs: these are networks that do not provide connectivity to other networks. They are a stub network, and often, like your home network, only connected to a single ISP.
  • Mixed ASNs: these networks both have IP addresses inside their network, and provide connectivity to other networks.

Origin Only ASNs Mixed ASNs Transit Only ASNs
61,127 11,128 443

Source: https://bgp.potaroo.net/as6447/

One interesting fact: of the 61,127 origin only ASNs, nearly 43,000 of them are only connected to their ISP. As such, our direct connections to over 10,000 networks indicates that of the networks that connect more than one network, a very good percentage are now already connected to Cloudflare.

Cutting out the middle man

Directly connecting to a network — and eliminating the hops in between — can greatly improve performance in two ways. First, connecting with a network directly allows for Internet traffic to be exchanged locally rather than detouring through remote cities; and secondly, direct connections help avoid the congestion caused by bottlenecks that sometimes happen between networks.

To take a recent real-world example, turning up a direct peering session caused a 90% improvement in median end-user latency when turning up a peering session with a European network, from an average of 76ms to an average of 7ms.

Project Myriagon: Cloudflare Passes 10,000 Connected Networks
Immediate 90% improvement in median end-user latency after peering with a new network. 

By using our own on-ramps to other networks, we both ensure superior performance for our users and avoid adding load and causing congestion on the Internet at large.

And AS13335 is just getting started

Project Myriagon: Cloudflare Passes 10,000 Connected Networks

Cloudflare is an anycast network, meaning that the better connected we are, the faster and better-protected we are — obviating legacy concepts like scrubbing centers and slow origins. Hitting five digits of connected networks is something we’re really proud of as a step on our goal to helping to build a better Internet. As we’ve mentioned throughout the week, we’re all about high speed without having to pay a security or reliability cost.

There’s still work to do! While Project Myriagon has brought us, we believe, to be one of the top 5 most connected networks in the world, we estimate Google is connected to 12,000-15,000 networks. And so, today, we are kicking off Project CatchG. We won’t rest until we’re #1.

Interested in peering with us to help build a better Internet? Reach out to [email protected] with your request. More details on the locations we are present at can be found at http://as13335.peeringdb.com/.

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

Post Syndicated from Tanner Ryan original https://blog.cloudflare.com/cloudflare-backbone-internet-fast-lane/

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

The Internet is an amazing place. It’s a communication superhighway, allowing people and machines to exchange exabytes of information every day. But it’s not without its share of issues: whether it’s DDoS attacks, route leaks, cable cuts, or packet loss, the components of the Internet do not always work as intended.

The reason Cloudflare exists is to help solve these problems. As we continue to grow our rapidly expanding global network in more than 250 cities, while directly connecting with more than 9,800 networks, it’s important that our network continues to help bring improved performance and resiliency to the Internet. To accomplish this, we built our own backbone. Other than improving redundancy, the immediate advantage to you as a Cloudflare user? It can reduce your website loading times by up to 45% — and you don’t have to do a thing.

The Cloudflare Backbone

We began building out our global backbone in 2018. It comprises a network of long-distance fiber optic cables connecting various Cloudflare data centers across North America, South America, Europe, and Asia. This also includes Cloudflare’s metro fiber network, directly connecting data centers within a metropolitan area.

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

Our backbone is a dedicated network, providing guaranteed network capacity and consistent latency between various locations. It gives us the ability to securely, reliably, and quickly route packets between our data centers, without having to rely on other networks.

This dedicated network can be thought of as a fast lane on a busy highway. When traffic in the normal lanes of the highway encounter slowdowns from congestion and accidents, vehicles can make use of a fast lane to bypass the traffic and get to their destination on time.

Our software-defined network is like a smart GPS device, as we’re always calculating the performance of routes between various networks. If a route on the public Internet becomes congested or unavailable, our network automatically adjusts routing preferences in real-time to make use of all routes we have available, including our dedicated backbone, helping to deliver your network packets to the destination as fast as we can.

Measuring backbone improvements

As we grow our global infrastructure, it’s important that we analyze our network to quantify the impact we’re having on performance.

Here’s a simple, real-world test we’ve used to validate that our backbone helps speed up our global network. We deployed a simple API service hosted on a public cloud provider, located in Chicago, Illinois. Once placed behind Cloudflare, we performed benchmarks from various geographic locations with the backbone disabled and enabled to measure the change in performance.

Instead of comparing the difference in latency our backbone creates, it is important that our experiment captures a real-world performance gain that an API service or website would experience. To validate this, our primary metric is measuring the average request time when accessing an API service from Miami, Seattle, San Jose, São Paulo, and Tokyo. To capture the response of the network itself, we disabled caching on the Cloudflare dashboard and sent 100 requests from each testing location, both while forcing traffic through our backbone, and through the public Internet.

Now, before we claim our backbone solves all Internet problems, you can probably notice that for some tests (Seattle, WA and San Jose, CA), there was actually an increase in response time when we forced traffic through the backbone. Since latency is directly proportional to the distance of fiber optic cables, and since we have over 9,800 direct connections with other Internet networks, there is a possibility that an uncongested path on the public Internet might be geographically shorter, causing this speedup compared to our backbone.

Luckily for us, we have technologies like Argo Smart Routing, Argo Tiered Caching, WARP+, and most recently announced Orpheus, which dynamically calculates the performance of each route at our data centers, choosing the fastest healthy route at that time. What might be the fastest path during this test may not be the fastest at the time you are reading this.

With that disclaimer out of the way, now onto the test.

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

With the backbone disabled, if a visitor from São Paulo performed a request to our service, they would be routed to our São Paulo data center via BGP Anycast. With caching disabled, our São Paulo data center forwarded the request over the public Internet to the origin server in Chicago. On average, the entire process to fetch data from the origin server and return to the response to the requesting user took 335.8 milliseconds.

Once the backbone was enabled and requests were created, our software performed tests to determine the fastest healthy route to the origin, whether it was a route on the public Internet or through our private backbone. For this test the backbone was faster, resulting in an average total request time of 230.2 milliseconds. Just by routing the request through our private backbone, we improved the average response time by 31%.

We saw even better improvement when testing from Tokyo. When routing the request over the public Internet, the request took an average of 424 milliseconds. By enabling our backbone which created a faster path, the request took an average of 234 milliseconds, creating an average response time improvement of 44%.

Visitor Location Distance to Chicago Avg. response time using public Internet (ms) Avg. response using backbone (ms) Change in response time
Miami, FL, US 1917 km 84 75 10.7% decrease
Seattle, WA, US 2785 km 118 124 5.1% increase
San Jose, CA, US 2856 km 122 132 8.2% increase
São Paulo, BR 8403 km 336 230 31.5% decrease
Tokyo JP 10129 km 424 234 44.8% decrease

We also observed a smaller deviation in the response time of packets routed through our backbone over larger distances.

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

Our next generation network

Cloudflare is built on top of lossy, unreliable networks that we do not have control over. It’s our software that turns these traditional tubes of the Internet into a smart, high performing, and reliable network Cloudflare customers get to use today. Coupled with our new, but rapidly expanding backbone, it is this software that produces significant performance gains over traditional Internet networks.

Whether you visit a website powered by Cloudflare’s Argo Smart Routing, Argo Tiered Caching, Orpheus, or use our 1.1.1.1 service with WARP+ to access the Internet, you get direct access to the Internet fast lane we call the Cloudflare backbone.

For Cloudflare, a better Internet means improving Internet security, reliability, and performance. The backbone gives us the ability to build out our network in areas that have typically lacked infrastructure investments by other networks. Even with issues on the public Internet, these initiatives allow us to be located within 50 milliseconds of 95% of the Internet connected population.

In addition to our growing global infrastructure providing 1.1.1.1, WARP, Roughtime, NTP, IPFS Gateway, Drand, and F-Root to the greater Internet, it’s important that we extend our services to those who are most vulnerable. This is why we extend all our infrastructure benefits directly to the community, through projects like Galileo, Athenian, Fair Shot, and Pangea.

And while these thousands of fiber optic connections are already fixing today’s Internet issues, we truly are just getting started.

Want to help build the future Internet? Networks that are faster, safer, and more reliable than they are today? The Cloudflare Infrastructure team is currently hiring!

If you operate an ISP or transit network and would like to bring your users faster and more reliable access to websites and services powered by Cloudflare’s rapidly expanding network, please reach out to our Edge Partnerships team at [email protected].

Cloudflare Passes 250 Cities, Triples External Network Capacity, 8x-es Backbone

Post Syndicated from Jon Rolfe original https://blog.cloudflare.com/250-cities-is-just-the-start/

Cloudflare Passes 250 Cities, Triples External Network Capacity, 8x-es Backbone

Cloudflare Passes 250 Cities, Triples External Network Capacity, 8x-es Backbone

It feels like just the other week that we announced ten new cities and our expansion to 25+ cities in Brazil — probably because it was. Today, I have three speedy infrastructure updates: we’ve passed 250 on-network cities, more than tripled our external network capacity, and increased our long-haul internal backbone network by over 800% since the start of 2020.

Light only travels through fiber so fast and with so much bandwidth — and worse still over the copper or on mobile networks that make up most end-users’ connections to the Internet. At some point, there’s only so much software you can throw at the problem before you run into the fundamental problem that an edge network solves: if you want your users to see incredible performance, you have to have servers incredibly physically close. For example, over the past three months, we’ve added another 10 cities in Brazil.  Here’s how that lowered the connection time to Cloudflare. The red line shows the latency prior to the expansion, the blue shows after.

Cloudflare Passes 250 Cities, Triples External Network Capacity, 8x-es Backbone

We’re exceptionally proud of all the teams at Cloudflare that came together to raise the bar for the entire industry in terms of global performance despite border closures, semiconductor shortages, and a sudden shift to working from home. 95% of the entire Internet-connected world is now within 50 ms of a Cloudflare presence, and 80% of the entire Internet-connected world is within 20ms (for reference, it takes 300-400 ms for a human to blink):

Cloudflare Passes 250 Cities, Triples External Network Capacity, 8x-es Backbone

Today, when we ask ourselves what it means to have a fast website, it means having a server less than 0.05 seconds away from your user, no matter where on Earth they are. This is only possible by adding new cities, partners, capacity, and cables — so let’s talk about those.

New Cities

Cutting straight to the point, let’s start with cities and countries: in the last two-ish months, we’ve added another 17 cities (outside of mainland China) split across eight countries: Guayaquil, Ecuador; Dammam, Saudi Arabia; Algiers, Algeria; Surat Thani, Thailand; Hagåtña, Guam, United States; Krasnoyarsk, Russia; Cagayan, Philippines; and ten cities in Brazil: Caçador, Ribeirão Preto, Brasília, Florianópolis, Sorocaba, Itajaí, Belém, Americana, Blumenau, and Belo Horizonte.

Meanwhile, with our partner, JD Cloud and AI, we’re up to 37 cities in mainland China: Anqing and Huainan, Anhui; Beijing, Beijing; Fuzhou and Quanzhou, Fujian; Lanzhou, Gansu; Foshan, Guangzhou, and Maoming, Guangdong; Guiyang, Guizhou; Chengmai and Haikou, Hainan; Langfang and Qinhuangdao, Hebei; Zhengzhou, Henan; Shiyan and Yichang, Hubei; Changde and Yiyang, Hunan; Hohhot, Inner Mongolia; Changzhou, Suqian, and Wuxi, Jiangsu; Nanchang and Xinyu, Jiangxi; Dalian and Shenyang, Liaoning; Xining, Qinghai; Baoji and Xianyang, Shaanxi; Jinan and Qingdao, Shandong; Shanghai, Shanghai; Chengdu, Sichuan; Jinhua, Quzhou, and Taizhou, Zhejiang. These are subject to change: as we ramp up, we have been working with JD Cloud to “trial” cities for a few weeks or months to observe performance and tweak the cities to match.

More Capacity: What and Why?

In addition to all these new cities, we’re also proud to announce that we have seen a 3.5x increase in external network capacity from the start of 2020 to now. This is just as key to our network strategy as new cities: it wouldn’t matter if we were in every city on Earth if we weren’t interconnected with other networks. Last-mile ISPs will sometimes still “trombone” their traffic, but in general, end users will get faster Internet as we interconnect more.

Cloudflare Passes 250 Cities, Triples External Network Capacity, 8x-es Backbone

This interconnection is spread far and wide, both to user networks and those of website hosts and other major cloud networks. This has involved a lot of middleman-removal: rather than run fiber optics from our routers through a third-party network to an origin or user’s network, we’re running more and more Private Network Interconnects (PNIs) and, better yet, Cloudflare Network Interconnects (CNIs) to our customers.

These PNIs and CNIs can not only reduce egress costs for our customers (particularly with our Bandwidth Alliance partners) but also increase the speed, reliability, and privacy of connections. The fewer networks and less distance your Internet traffic flows through, the better off everyone is. To put some numbers on that, only 30% of this newly doubled capacity was transit, leaving 70% flowing directly either physically over PNIs/CNIs or logically over peering sessions at Internet exchange points.

The Backbone

Cloudflare Passes 250 Cities, Triples External Network Capacity, 8x-es Backbone

At the same time as this increase in external capacity, we’ve quietly been adding hundreds of new segments to our backbone. Our backbone consists of dedicated fiber optic lines and reserved portions of wavelength that connect Cloudflare data centers together. This is split approximately 55/45 between “metro” capacity, which redundantly connects data centers in which we have a presence, and “long-haul” capacity, which connects Cloudflare data centers in different cities.

The backbone is used to increase the speed of our customer traffic, e.g., for Argo Smart Routing, Argo Tiered Caching, and WARP+. Our backbone is like a private highway connecting cities, while public Internet routing is like local roads: not only does the backbone directly connect two cities, but it’s reliably faster and sees fewer issues. We’ll dive into some benchmarks of the speed improvements of the backbone in a more comprehensive future blog post.

The backbone is also more secure. While Cloudflare signs all of its BGP routes with RPKI, pushes adjacent networks to use RPKI to avoid route hijacks, and encrypts external and internal traffic, the most secure and private way to safeguard our users’ traffic is to keep it on-network as much as possible.

Internal load balancing between cities has also been greatly improved, thanks to the use of the backbone for traffic management with a technology we call Plurimog (a reference to our in-colo Layer 4 load balancer, Unimog). A surge of traffic into Portland can be shifted instantaneously over diverse links to Seattle, Denver, or San Jose with a single hop, without waiting for changes to propagate over anycast or running the risk of an interim increase in errors.

From an expansion perspective, two key areas of focus have been our undersea North America to Europe (transatlantic) and Asia to North America (transpacific) backbone rings. These links use geographically diverse subsea cable systems and connect into diverse routers and data centers on both ends — four transatlantic cables from North America to Europe, three transamerican cables connecting South and North America, and three transpacific cables connecting Asia and North America. User traffic coming from Los Angeles could travel to an origin as west as Singapore or as east as Moscow without leaving our network.

This rate of growth has been enabled by improved traffic forecast modeling, rapid internal feedback loops on link utilization, and more broadly by growing our teams and partnerships. We are creating a global view of capacity, pricing, and desirability of backbone links in the same way that we have for transit and peering. The result is a backbone that doubled in long-haul capacity this year, increased more than 800% from the start of last year, and will continue to expand to intelligently crisscross the globe.

The backbone has taken on a huge amount of traffic that would otherwise go over external transit and peering connections, freeing up capacity for when it is explicitly needed (last-hop routes, failover, etc.) and avoiding any outages on other major global networks (e.g., CenturyLink, Verizon).

In Conclusion

Cloudflare Passes 250 Cities, Triples External Network Capacity, 8x-es Backbone
A map of the world highlighting all 250+ cities in which Cloudflare is deployed.

More cities, capacity, and backbone are more steps as part of going from being the most global network on Earth to the most local one as well. We believe in providing security, privacy, and reliability for all — not just those who have the money to pay for something we consider fundamental Internet rights. We have seen the investment into our network pay huge dividends this past year.

Happy Speed Week!

Do you want to work on the future of a globally local network? Are you passionate about edge networks? Do you thrive in an exciting, rapid-growth environment? If so, good news: Cloudflare Infrastructure is hiring; check our open roles here!

Alternatively — if you work at an ISP we aren’t already deployed with and want to bring this level of speed and control to your users, we’re here to make that happen. Please reach out to our Edge Partnerships team at [email protected].

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Post Syndicated from Chris Howells original https://blog.cloudflare.com/the-epyc-journey-continues-to-rome-in-cloudflares-11th-generation-edge-server/

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

When I was interviewing to join Cloudflare’s in 2014 as a member of the SRE team, we had just introduced our generation 4 server, and I was excited about the prospects. Since then, Cloudflare, the industry and I have all changed dramatically. The best thing about working for a rapidly growing company like Cloudflare is that as the company grows, new roles open up to enable career development. And so, having left the SRE team last year, I joined the recently formed hardware engineering team, a team that simply didn’t exist in 2014.

We aim to introduce a new server platform to our edge network every 12 to 18 months or so, to ensure that we keep up with the latest industry technologies and developments. We announced the generation 9 server in October 2018 and we announced the generation 10 server in February 2020. We consider this length of cycle optimal: short enough to stay nimble and take advantage of the latest technologies, but long enough to offset the time taken by our hardware engineers to test and validate the entire platform. When we are shipping servers to over 200 cities around the world with a variety of regulatory standards, it’s essential to get things right the first time.

We continually work with our silicon vendors to receive product roadmaps and stay on top of the latest technologies. Since mid-2020, the hardware engineering team at Cloudflare has been working on our generation 11 server.

Requests per Watt is one of our defining characteristics when testing new hardware and we use it to identify how much more efficient a new hardware generation is than the previous generation. We continually strive to reduce our operational costs and power consumption reduction is one of the most important parts of this. It’s good for the planet and we can fit more servers into a rack, reducing our physical footprint.

The design of these Generation 11 x86 servers has been in parallel with our efforts to design next-generation edge servers using the Ampere Altra ARM architecture. You can read more about our tests in a blog post by my colleague Sung and will document our work on Arm at the edge in a subsequent blog post.

We evaluated Intel’s latest generation of “Ice Lake” Xeon processors. Although Intel’s chips were able to compete with AMD in terms of raw performance, the power consumption was several hundred watts higher per server – that’s enormous. This meant that Intel’s Performance per Watt was unattractive.

We previously described how we had deployed AMD EPYC 7642’s processors in our generation 10 server. This has 48 cores and is based on AMD’s 2nd generation EPYC architecture, code named Rome. For our generation 11 server, we evaluated 48, 56 and 64 core samples based on AMD’s 3rd generation EPYC architecture, code named Milan. We were interested to find that comparing the two 48 core processors directly, we saw a performance boost of several percent in the 3rd generation EPYC architecture. We therefore had high hopes for the 56 core and 64 core chips.

So, based on the samples we received from our vendors and our subsequent testing, hardware from AMD and Ampere made the shortlist for our generation 11 server. On this occasion, we decided that Intel did not meet our requirements. However, it’s healthy that Intel and AMD compete and innovate in the x86 space and we look forward to seeing how Intel’s next generation shapes up.

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Testing and validation process

Before we go on to talk about the hardware, I’d like to say a few words about the testing process we went through to test out generation 11 servers.

As we elected to proceed with AMD chips, we were able to use our generation 10 servers as our Engineering Validation Test platform, with the only changes being the new silicon and updated firmware. We were able to perform these upgrades ourselves in our hardware validation lab.

Cloudflare’s network is built with cheap commodity hardware and we source the hardware from multiple vendors, known as ODMs (Original Design Manufacturer) who build the servers to our specifications.

When you are working with bleeding edge silicon and experimental firmware, not everything is plain sailing. We worked with one of our ODMs to eliminate an issue which was causing the Linux kernel to panic on boot. Once resolved, we used a variety of synthetic benchmarking tools to verify the performance including cf_benchmark, as well as an internal tool which applies a synthetic load to our entire software stack.

Once we were satisfied, we ordered Design Validation Test samples, which were manufactured by our ODMs with the new silicon. We continued to test these and iron out the inevitable issues that arise when you are developing custom hardware. To ensure that performance matched our expectations, we used synthetic benchmarking to test the new silicon. We also began testing it in our production environment by gradually introducing customer traffic to them as confidence grew.

Once the issues were resolved, we ordered the Product Validation Test samples, which were again manufactured by our ODMs, taking into account the feedback obtained in the DVT phase. As these are intended to be production grade, we work with the broader Cloudflare teams to deploy these units like a mass production order.

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

CPU

Previously: AMD EPYC 7642 48-Core Processor
Now: AMD EPYC 7713 64-Core Processor

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

AMD EPYC 7642 AMD EPYC 7643 AMD EPYC 7663 AMD EPYC 7713
Status Incumbent Candidate Candidate Candidate
Core Count 48 48 56 64
Thread Count 96 96 112 128
Base Clock 2.3GHz 2.3GHz 2.0GHz 2.0GHz
Max Boost Clock 3.3GHz 3.6GHz 3.5GHz 3.675GHz
Total L3 Cache 256MB 256MB 256MB 256MB
Default TDP 225W 225W 240W 225W
Configurable TDP 240W 240W 240W 240W

In the above chart, TDP refers to Thermal Design Power, a measure of the heat dissipated. All of the above processors have a configurable TDP – assuming the cooling solution is capable – giving more performance at the expense of increased power consumption. We tested all processors configured at their highest supported TDP.

The 64 core processors have 33% more cores than the 48 core processors so you might hypothesize that we would see a corresponding 33% increase in performance, although our benchmarks saw slightly more modest gains. This can be explained because the 64 core processors have lower base clock frequencies to fit within the same 225W power envelope.

In production testing, we found that the 64 core EPYC 7713 gave us around a 29% performance boost over the incumbent, whilst having similar power consumption and thermal properties.

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Memory

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Previously: 256GB DDR4-2933
Now: 384GB DDR4-3200

Having made a decision about the processor, the next step was to determine the optimal amount of memory for our workload. We ran a series of experiments with our chosen EPYC 7713 processor and 256GB, 384GB and 512MB memory configurations. We started off by running synthetic benchmarks with tools such as STREAM to ensure that none of the configurations performed unexpectedly poorly and to generate a baseline understanding of the performance.

After the synthetic benchmarks, we proceeded to test the various configurations with production workloads to empirically determine the optimal quantity. We use Prometheus and Grafana to gather and display a rich set of metrics from all of our servers so that we can monitor and spot trends, and we re-used the same infrastructure for our performance analysis.

As well as measuring available memory, previous experience has shown us that one of the best ways to ensure that we have enough memory is to observe request latency and disk IO performance. If there is insufficient memory, we expect to see request latency and disk IO volume and latency to increase. The reason for this is that our core HTTP server uses memory to cache web assets and if there is insufficient memory the assets will be ejected from memory prematurely and more assets will be fetched from disk instead of memory, degrading performance.

Like most things in life, it’s a balancing act. We want enough memory to take advantage of the fact that serving web assets directly from memory is much faster than even the best NVMe disks. We also want to future proof our platform to enable the new features such as the ones that we recently announced in security week and developer week. However, we don’t want to spend unnecessarily on excess memory that will never be used. We found that the 512GB configuration did not provide a performance boost to justify the extra cost and settled on the 384GB configuration.

We also tested the performance impact of switching from DDR4-2933 to DDR4-3200 memory. We found that it provided a performance boost of several percent and the pricing has improved to the point where it is cost beneficial to make the change.

Disk

Previously: 3x Samsung PM983 x 960GB
Now: 2x Samsung PM9A3 x 1.92TB

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

We validated samples by studying the manufacturer’s data sheets and testing using fio to ensure that the results being obtained in our test environment were in line with the published specifications. We also developed an automation framework to help compare different drive models using fio. The framework helps us to restore the drives close to factory settings, precondition the drives, perform the sequential and random tests in our environment, and analyze the data results to evaluate the bandwidth and latency results. Since our SSD samples were arriving in our test center at different months, having an automated framework helped in dealing with speedy evaluations by reducing our time spent testing and doing analysis.

For Gen 11 we decided to move to a 2x 2TB configuration from the original 3x 1TB configuration giving us an extra 1 TB of storage. This also meant we could use the higher performance of a 2TB drive and save around 6W of power since there is one less SSD.

After analyzing the performances of various 2TB drives, their latencies and endurances, we chose Samsung’s PM9A3 SSDs as our Gen11 drives. The results we obtained below were consistent with the manufacturer’s claims.

Sequential performance:

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server
The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Random Performance:

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server
The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Compared to our previous generation drives, we could see a 1.5x – 2x improvement in read and write bandwidths. The higher values for the PM9A3 can be attributed to the fact that these are PCIe 4.0 drives, have more intelligent SSD controllers and an upgraded NAND architecture.

Network

Previously: Mellanox ConnectX-4 dual-port 25G
Now: Mellanox ConnectX-4 dual-port 25G

There is no change on the network front; the Mellanox ConnectX-4 is a solid performer which continues to meet our needs. We investigated higher speed Ethernet, but we do not currently see this as beneficial. Cloudflare’s network is built on cheap commodity hardware and the highly distributed nature of Cloudflare’s network means we don’t have discrete DDoS scrubbing centres. All points of presence operate as scrubbing centres. This means that we distribute the load across our entire network and do not need to employ higher speed and more expensive Ethernet devices.

Open source firmware

Transparency, security and integrity is absolutely critical to us at Cloudflare. Last year, we described how we had deployed Platform Secure Boot to create trust that we were running the software that we thought we were.

Now, we are pleased to announce that we are deploying open source firmware to our servers using OpenBMC. With access to the source code, we have been able to configure BMC features such as the fan PID controller, having BIOS POST codes recorded and accessible, and managing networking ports and devices. Prior to OpenBMC, requesting these features from our vendors led to varying results and misunderstandings of the scope and capabilities of the BMC. After working with the BMC source code much more directly, we have the flexibility to to work on features ourselves to our liking, or understand why the BMC is incapable of running our desired software.

Whilst our current BMC is an industry standard, we feel that OpenBMC better suits our needs and gives us advantages such as allowing us to deal with upstream security issues without a dependency on our vendors. Some opportunities with security include integration of desired authentication modules, usage of specific software packages, staying up to date with the latest Linux kernel, and controlling a variety of attack vectors. Because we have a kernel lockdown implemented, flashing tooling is difficult to use in our environment. With access to source code of the flashing tools, we have an understanding of what the tools need access to, and assess whether or not this meets our standard of security.

Summary

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

The jump between our generation 9 and generation 10 servers was enormous. To summarise, we changed from a dual-socket Intel platform to a single socket AMD platform. We upgraded the SATA SSDs to NVMe storage devices, and physically the multi-node chassis changed to a 1U form factor.

At the start of the generation 11 project we weren’t sure if we would be making such radical changes again. However, after a thorough testing of the latest chips and a review of how well the generation 10 server has performed in production for over a year, our generation 11 server built upon the solid foundations of generation 10 and ended up as a refinement rather than total revamp. Despite this, and bearing in mind that performance varies by time of day and geography, we are pleased that generation 11 is capable of serving approximately 29% more requests than generation 10 without an increase in power consumption.

The EPYC journey continues to Rome in Cloudflare’s 11th generation Edge Server

Thanks to Denny Mathew and Ryan Chow’s work on benchmarking and OpenBMC, respectively.

If you are interested in working with bleeding edge hardware, open source server firmware, solving interesting problems, helping to improve our performance, and are interested in helping us work on our generation 12 server platform (amongst many other things!), we’re hiring.

Expanding Cloudflare to 25+ Cities in Brazil

Post Syndicated from Jen Kim original https://blog.cloudflare.com/expanding-to-25-plus-cities-in-brazil/

Expanding Cloudflare to 25+ Cities in Brazil

Expanding Cloudflare to 25+ Cities in Brazil

Today, we are excited to announce an expansion we’ve been working on behind the scenes for the last two years: a 25+ city partnership with one of the largest ISPs in Brazil. This is one of the largest simultaneous single-country expansions we’ve done so far.

With this partnership, Brazilians throughout the country will see significant improvement to their Internet experience. Already, the 25th-percentile latency of non-bot traffic (we use that measure as an approximation of physical distance from our servers to end users) has dropped from the mid-20 millisecond range to sub-10 milliseconds. This benefit extends not only to the 25 million Internet properties on our network, but to the entire Internet with Cloudflare services like 1.1.1.1 and WARP. We expect that as we approach 25 cities in Brazil, latency will continue to drop while throughput increases.

Expanding Cloudflare to 25+ Cities in Brazil
25th percentile latency of non-bot traffic in Brazil has more than halved as new cities have gone live.
Expanding Cloudflare to 25+ Cities in Brazil

This partnership is part of our mission to help create a better Internet and the best development experience for all — not just those in major population centers or in Western markets — and we are excited to take this step on our journey to help build a better Internet. Whether you live in the heart of São Paulo or the outskirts of the Amazon rainforest in Manaus, expect an upgrade to your Internet experience soon.

We have already launched in Porto Alegre, Belo Horizonte, Brasília, Campinas, Curitiba, and Fortaleza, with additional presences coming soon to Manaus, São Paulo, Blumenau, Joinville, Florianópolis, Itajai, Belém, Goiânia, Salvador, São José do Rio Preto, Americana, and Sorocaba.

From there, we’re planning on adding presences in the following cities: Guarulhos, Mogi das Cruzes, São José dos Campos, Vitória, Londrina, Maringá, Campina Grande, Caxias do Sul, Cuiabá, Lajeado, Natal, Recife, Osasco, Santo André, and Rio. The result will be a net expansion of Cloudflare in Brazil by 12 to 16 times.

We celebrate the benefits that this partnership will bring to Latin America. Our President and Chief Operating Officer Michelle Zatlyn likes to say that “we’re just getting started”. In that spirit, expect more exciting news about the Cloudflare network not only in Latin America, but worldwide!

Do you work at an ISP who is interested in bringing a better Internet experience to your users and better control over your network? Please reach out to our Edge Partnerships team at [email protected].

Are you passionate about working to expand our network to make the best edge platform on the globe? Do you thrive in an exciting, rapid-growth environment? Check out open roles on the Infrastructure team here!

Cloudflare’s Network Doubles CPU Capacity and Expands Into Ten New Cities in Four New Countries

Post Syndicated from Jon Rolfe original https://blog.cloudflare.com/ten-new-cities-four-new-countries/

Cloudflare’s Network Doubles CPU Capacity and Expands Into Ten New Cities in Four New Countries

Cloudflare’s Network Doubles CPU Capacity and Expands Into Ten New Cities in Four New Countries

Cloudflare’s global network is always expanding, and 2021 has been no exception. Today, I’m happy to give a mid-year update: we’ve added ten new Cloudflare cities, with four new countries represented among them. And we’ve doubled our computational footprint since the start of pandemic-related lockdowns.

No matter what else we do at Cloudflare, constant expansion of our infrastructure to new places is a requirement to help build a better Internet. 2021, like 2020, has been a difficult time to be a global network — from semiconductor shortages to supply-chain disruptions — but regardless, we have continued to expand throughout the entire globe, experimenting with technologies like ARM, ASICs, and Nvidia all the way.

The Cities

Cloudflare’s Network Doubles CPU Capacity and Expands Into Ten New Cities in Four New Countries

Without further ado, here are the new Cloudflare cities: Tbilisi, Georgia; San José, Costa Rica; Tunis, Tunisia; Yangon, Myanmar; Nairobi, Kenya; Jashore, Bangladesh; Canberra, Australia; Palermo, Italy; and Salvador and Campinas, Brazil.

These deployments are spread across every continent except Antarctica.

We’ve solidified our presence in every country of the Caucuses with our first deployment in the country of Georgia in the capital city of Tbilisi. And on the other side of the world, we’ve also established our first deployment in Costa Rica’s capital of San José with NIC.CR, run by the Academia Nacional de Ciencias.

In the northernmost country in Africa comes another capital city deployment, this time in Tunis, bringing us one country closer towards fully circling the Mediterranean Sea. Wrapping up the new country docket is our first city in Myanmar with our presence in Yangon, the country’s capital and largest city.

Our second Kenyan city is the country’s capital, Nairobi, bringing our city count in sub-Saharan Africa to a total of fifteen. In Bangladesh, Jashore puts us in the capital of its namesake Jashore District and the third largest city in the country after Chittagong and Dhaka, both already Cloudflare cities.

In the land way down under, our Canberra deployment puts us in Australia’s capital city, located, unsurprisingly, in the Australian Capital Territory. In differently warm lands is Palermo, Italy, capital of the island of Sicily, which we already see boosting performance throughout Italy.

Cloudflare’s Network Doubles CPU Capacity and Expands Into Ten New Cities in Four New Countries
25th percentile latency of non-bot traffic in Italy, year-to-date.

Finally, we’ve gone live in Salvador (capital of the state of Bahia) and Campinas, Brazil, the only city announced today that isn’t a capital. These are some of the first few steps in a larger Brazilian expansion — watch this blog for more news on that soon.

This is in addition to the dozens of new cities we’ve added in Mainland China with our partner, JD Cloud, with whom we have been working closely to quickly deploy and test new cities since last year.

The Impact

While we’re proud of our provisioning process, the work with new cities begins, not ends, with deployment. Each city is not only a new source of opportunity, but risk: Internet routing is fickle, and things that should improve network quality don’t always do so. While we have always had a slew of ways to track performance, we’ve found that a significant, constant improvement in the 25th percentile latency of non-bot traffic to be an ideal approximation of latency impacted by only physical distance.

Using this metric, we can quickly see the improvement that comes from adding new cities. For example, in Kenya, we can see that the addition of our Nairobi presence improved real user performance:

Cloudflare’s Network Doubles CPU Capacity and Expands Into Ten New Cities in Four New Countries
25th percentile latency of non-bot Kenyan traffic, before and after Nairobi gained a Cloudflare point of presence.

Latency variations in general are expected on the Internet — particularly in countries with high amounts of Internet traffic originating from non-fixed connections, like mobile phones — but in aggregate, the more consistently low latency, the better. From this chart, you can clearly see that not only was there a reduction in latency, but also that there were fewer frustrating variations in user latency. We all get annoyed when a page loads quickly one second and slowly the next, and the lower jitter that comes with being closer to the server helps to eliminate it.

As a reminder, while these measurements are in thousandths of a second, they add up quickly. Popular sites often require hundreds of individual requests for assets, some of which are initiated serially, so the difference between 25 milliseconds and 5 milliseconds can mean the difference between single and multi-second page load times.

To sum things up, users in the cities or greater areas of these cities should expect an improved Internet experience when using everything from our free, private 1.1.1.1 DNS resolver to the tens of millions of Internet properties that trust Cloudflare with their traffic. We have dozens more cities in the works at any given time, including now. Watch this space for more!

Join Our Team

Like our network, Cloudflare continues to rapidly grow. If working at a rapidly expanding, globally diverse company interests you, we’re hiring for scores of positions, including in the Infrastructure group. Or, if you work at a global ISP and would like to improve your users’ experience and be part of building a better Internet, get in touch with our Edge Partners Program at [email protected] we’ll look into sending some servers your way!

Introducing the Cloudflare Data Localization Suite

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/introducing-the-cloudflare-data-localization-suite/

Introducing the Cloudflare Data Localization Suite

Introducing the Cloudflare Data Localization Suite

Today we’re excited to announce the Cloudflare Data Localization Suite, which helps businesses get the performance and security benefits of Cloudflare’s global network, while making it easy to set rules and controls at the edge about where their data is stored and protected.

The Data Localization Suite is available now as an add-on for Enterprise customers.

Cloudflare’s network is private and compliant by design. Preserving end-user privacy is core to our mission of helping to build a better Internet; we’ve never sold personal data about customers or end users of our network. We comply with laws like GDPR and maintain certifications such as ISO-27001.

Today, we’re announcing tools that make it simple for our customers to build the same rigor into their own applications. In this post, I’ll explain the different types of data that we process and how the Data Localization Suite keeps this data local.

We’ll also talk about how Cloudflare makes it possible to build applications that comply with data locality laws, while remaining fast, secure and scalable.

Why keep data local?

Cloudflare’s customers have increasing desire or face legal requirements for data locality: they want to control the geographic location where their data is handled. Many categories of data that our customers process (including healthcare, legal, or financial data) may be subject to obligations that specify the data be stored or processed in a specific location. The preference or requirement for data localization is growing across jurisdictions such as the EU, India, and Brazil; over time, we expect more customers in more places will be expected to keep data local.

Although “data locality” sounds like a simple concept, our conversations with Cloudflare customers make clear that there are a number of unique challenges they face in the attempt to move toward this goal.  The availability of information on their Internet properties will remain global–they don’t want to limit access to their websites to local jurisdictions–but they want to make sure data stays local. Variously, they are trying to figure out:

  • How do I build local requirements into my global online operations?
  • How do I make sure unencrypted traffic is only available locally?
  • How do I make sure personal data is handled according to localization obligations?
  • How do I make sure my applications only store data in certain locations?

The Cloudflare Data Localization Suite attempts to respond to these questions.

Until now, customers who wanted to localize their data had to choose to restrict their application to one data center, or to one cloud provider’s region. This is a fragile approach, fraught with performance, reliability, and security challenges. Cloudflare is creating a new paradigm: customers should be able to get the performance and security benefits of our global network, while effortlessly keeping their data local.

Encryption is the backbone of privacy

Before we go into data locality, we should discuss encryption. Privacy isn’t possible without strong encryption; otherwise, anyone could snoop your customers’ data, regardless of where it’s stored.

Data is often described as being “in transit” and “at rest”. It’s critically important that both are encrypted. Data “in transit” refers to just that—data while it’s moving about on the wire, whether a local network or the public Internet. “At rest” generally means stored on a disk somewhere, whether a spinning HDD or a modern SSD.

In transit, Cloudflare can enforce that all traffic to end-users uses modern TLS and gets the highest level of encryption possible. We can also enforce that all traffic back to customers’ origin servers is always encrypted. Communication between all our edge and core data centers is always encrypted.

Cloudflare encrypts all of the data we handle at rest, usually with disk-level encryption. From cached files on our edge network, to configuration state in databases in our core data centers—every byte is encrypted at rest.

Control where TLS private keys can be accessed

Given the importance of encryption, one of the most sensitive pieces of data that our customers trust us to protect are their cryptographic keys, which enable data to be decrypted. Cloudflare offers two ways for customers to ensure that their private keys are only accessible in locations they specify.

Keyless SSL allows a customer to store and manage their own SSL private keys for use with Cloudflare on any external infrastructure of their choosing. Customers can use a variety of systems for their keystore, including hardware security modules (“HSMs”), virtual servers, and hardware running Unix/Linux and Windows that is housed in environments customers control. Cloudflare never has access to the private key with Keyless SSL.

Geo Key Manager gives customers granular control over which locations should store their keys. For example, a customer can choose for the private keys required for inspection of traffic to only be accessible inside data centers located in the European Union.

Manage where HTTPS requests and responses are inspected

In order to deploy our WAF, or detect malicious bot traffic, Cloudflare must terminate TLS in our edge data centers and inspect HTTPS request and response payloads.

Regional Services gives organizations control over where their traffic is inspected. With Regional Services enabled, traffic is ingested on Cloudflare’s global Anycast network at the location closest to the client, where we can provide L3 and L4 DDoS protection. Instead of being inspected at the HTTP level at that data center, this traffic is securely transmitted to Cloudflare data centers inside the region selected by the customer and handled there.

Introducing the Cloudflare Data Localization Suite

Control the logs and analytics generated by your traffic

In addition to making our customers’ infrastructure and teams faster, more secure, and more reliable, we also provide insights into what our services do, and how customers can make better use of them. We gather metadata about the traffic that goes through our edge data centers, and use this to improve the operation of our own network: for example, by crafting WAF rules to block the latest attacks, or by developing machine learning models to detect malicious bots. We also make this data available to our customers in the form of logs and analytics.

This only requires a subset of the metadata to be processed in our core data centers in the US/EU. This data contains information about how many requests were served, how much data was sent, how long requests took, and other information that is essential for the operation of our network.

With Edge Log Delivery, customers can send logs directly from the edge to their partner of choice—for example, an Azure storage bucket in their preferred region, or an instance of Splunk that runs in an on-premise data center. With this option, customers can still get their complete logs in their preferred region, without these logs first flowing through either of our US or EU core data centers.

Introducing the Cloudflare Data Localization Suite

Edge Log Delivery is in early beta for Enterprise customers today—please visit our product page for more information.

Ultimately, we are working towards providing customers full control over where their metadata is stored, and for how long. In the coming year, we plan to allow customers to be able to choose exactly which fields are stored, and for how long, and in which location.

Building location-aware applications from the ground up

So far, we’ve discussed how Cloudflare’s products can offer global performance and security solutions for our customers, while keeping their existing keys, application data, and metadata local.

But we know that customers are also struggling to use existing, traditional cloud systems to manage their data locality needs. Existing platforms may allow code or data to be deployed to a specific region, but having copies of applications in each region, and managing state across each of them, can be challenging at best (or impossible at worst).

The ultimate promise of serverless has been to allow any developer to say “I don’t care where my code runs, just make it scale.” Increasingly, another promise will need to be “I do care where my code runs, and I need more control to satisfy my compliance department.” Cloudflare Workers allows you the best of both worlds, with instant scaling, locations that span more than 100 countries around the world, and the granularity to choose exactly what you need.

Introducing the Cloudflare Data Localization Suite

We are announcing a major improvement that lets customers control where their applications store data: Workers Durable Objects will support Jurisdiction Restrictions.  Durable Objects provide globally consistent state and coordination to serverless applications running on the Cloudflare Workers platform. Jurisdiction Restrictions will make it possible for users to ensure that their Durable Objects do not store data or run outside of a given jurisdiction—making it trivially simple to build applications that combine global performance with local compliance. With automatic migration of Durable Objects, adapting to new rules will be as simple as adding a tag to a set of Durable Objects.

Building for the long haul

The data localization landscape is constantly evolving. Since we began working on the Data Localization Suite, the European Data Protection Board has released new guidance about how data may be transferred between the EU and the US. And we know this is just the beginning — over time, more regions and more industries will have data localization requirements.

At Cloudflare, we stay on top of the latest developments around data protection so our customers don’t have to. The Data Localization Suite gives our customers the tools to set rules and controls at the edge about where their data is stored and protected, while taking advantage of our global network.

Anchoring Trust: A Hardware Secure Boot Story

Post Syndicated from Derek Chamorro original https://blog.cloudflare.com/anchoring-trust-a-hardware-secure-boot-story/

Anchoring Trust: A Hardware Secure Boot Story

Anchoring Trust: A Hardware Secure Boot Story

As a security company, we pride ourselves on finding innovative ways to protect our platform to, in turn, protect the data of our customers. Part of this approach is implementing progressive methods in protecting our hardware at scale. While we have blogged about how we address security threats from application to memory, the attacks on hardware, as well as firmware, have increased substantially. The data cataloged in the National Vulnerability Database (NVD) has shown the frequency of hardware and firmware-level vulnerabilities rising year after year.

Technologies like secure boot, common in desktops and laptops, have been ported over to the server industry as a method to combat firmware-level attacks and protect a device’s boot integrity. These technologies require that you create a trust ‘anchor’, an authoritative entity for which trust is assumed and not derived. A common trust anchor is the system Basic Input/Output System (BIOS) or the Unified Extensible Firmware Interface (UEFI) firmware.

While this ensures that the device boots only signed firmware and operating system bootloaders, does it protect the entire boot process? What protects the BIOS/UEFI firmware from attacks?

The Boot Process

Before we discuss how we secure our boot process, we will first go over how we boot our machines.

Anchoring Trust: A Hardware Secure Boot Story

The above image shows the following sequence of events:

  • After powering on the system (through a baseboard management controller (BMC) or physically pushing a button on the system), the system unconditionally executes the UEFI firmware residing on a flash chip on the motherboard.
  • UEFI performs some hardware and peripheral initialization and executes the Preboot Execution Environment (PXE) code, which is a small program that boots an image over the network and usually resides on a flash chip on the network card.
  • PXE sets up the network card, and downloads and executes a small program bootloader through an open source boot firmware called iPXE.
  • iPXE loads a script that automates a sequence of commands for the bootloader to know how to boot a specific operating system (sometimes several of them). In our case, it loads our Linux kernel, initrd (this contains device drivers which are not directly compiled into the kernel), and a standard Linux root filesystem. After loading these components, the bootloader executes and hands off the control to the kernel.
  • Finally, the Linux kernel loads any additional drivers it needs and starts applications and services.

UEFI Secure Boot

Our UEFI secure boot process is fairly straightforward, albeit customized for our environments. After loading the UEFI firmware from the bootloader, an initialization script defines the following variables:

Platform Key (PK): It serves as the cryptographic root of trust for secure boot, giving capabilities to manipulate and/or validate the other components of the secure boot framework.

Trusted Database (DB): Contains a signed (by platform key) list of hashes of all PCI option ROMs, as well as a public key, which is used to verify the signature of the bootloader and the kernel on boot.

These variables are respectively the master platform public key, which is used to sign all other resources, and an allow list database, containing other certificates, binary file hashes, etc. In default secure boot scenarios, Microsoft keys are used by default. At Cloudflare we use our own, which makes us the root of trust for UEFI:

Anchoring Trust: A Hardware Secure Boot Story

But, by setting our trust anchor in the UEFI firmware, what attack vectors still exist?

UEFI Attacks

As stated previously, firmware and hardware attacks are on the rise. It is clear from the figure below that firmware-related vulnerabilities have increased significantly over the last 10 years, especially since 2017, when the hacker community started attacking the firmware on different platforms:

Anchoring Trust: A Hardware Secure Boot Story

This upward trend, coupled with recent malware findings in UEFI, shows that trusting firmware is becoming increasingly problematic.

By tainting the UEFI firmware image, you poison the entire boot trust chain. The ability to trust firmware integrity is important beyond secure boot. For example, if you can’t trust the firmware not to be compromised, you can’t trust things like trusted platform module (TPM) measurements to be accurate, because the firmware is itself responsible for doing these measurements (e.g a TPM is not an on-path security mechanism, but instead it requires firmware to interact and cooperate with). Firmware may be crafted to extend measurements that are accepted by a remote attestor, but that don’t represent what’s being locally loaded. This could cause firmware to have a questionable measured boot and remote attestation procedure.

Anchoring Trust: A Hardware Secure Boot Story

If we can’t trust firmware, then hardware becomes our last line of defense.

Hardware Root of Trust

Early this year, we made a series of blog posts on why we chose AMD EPYC processors for our Gen X servers. With security in mind, we started turning on features that were available to us and set forth the plan of using AMD silicon as a Hardware Root of Trust (HRoT).

Platform Secure Boot (PSB) is AMD’s implementation of hardware-rooted boot integrity. Why is it better than UEFI firmware-based root of trust? Because it is intended to assert, by a root of trust anchored in the hardware, the integrity and authenticity of the System ROM image before it can execute. It does so by performing the following actions:

  • Authenticates the first block of BIOS/UEFI prior to releasing x86 CPUs from reset.
  • Authenticates the System Read-Only Memory (ROM) contents on each boot, not just during updates.
  • Moves the UEFI Secure Boot trust chain to immutable hardware.

This is accomplished by the AMD Platform Security Processor (PSP), an ARM Cortex-A5 microcontroller that is an immutable part of the system on chip (SoC). The PSB consists of two components:

On-chip Boot ROM

  • Embeds a SHA384 hash of an AMD root signing key
  • Verifies and then loads the off-chip PSP bootloader located in the boot flash

Off-chip Bootloader

  • Locates the PSP directory table that allows the PSP to find and load various images
  • Authenticates first block of BIOS/UEFI code
  • Releases CPUs after successful authentication
Anchoring Trust: A Hardware Secure Boot Story
  1. The PSP secures the On-chip Boot ROM code, loads the off-chip PSP firmware into PSP static random access memory (SRAM) after authenticating the firmware, and passes control to it.
  2. The Off-chip Bootloader (BL) loads and specifies applications in a specific order (whether or not the system goes into a debug state and then a secure EFI application binary interface to the BL)
  3. The system continues initialization through each bootloader stage.
  4. If each stage passes, then the UEFI image is loaded and the x86 cores are released.

Now that we know the booting steps, let’s build an image.

Build Process

Public Key Infrastructure

Before the image gets built, a public key infrastructure (PKI) is created to generate the key pairs involved for signing and validating signatures:

Anchoring Trust: A Hardware Secure Boot Story

Our original device manufacturer (ODM), as a trust extension, creates a key pair (public and private) that is used to sign the first segment of the BIOS (private key) and validates that segment on boot (public key).

On AMD’s side, they have a key pair that is used to sign (the AMD root signing private key) and certify the public key created by the ODM. This is validated by AMD’s root signing public key, which is stored as a hash value (RSASSA-PSS: SHA-384 with 4096-bit key is used as the hashing algorithm for both message and mask generation) in SPI-ROM.

Private keys (both AMD and ODM) are stored in hardware security modules.

Because of the way the PKI mechanisms are built, the system cannot be compromised if only one of the keys is leaked. This is an important piece of the trust hierarchy that is used for image signing.

Certificate Signing Request

Once the PKI infrastructure is established, a BIOS signing key pair is created, together with a certificate signing request (CSR). Creating the CSR uses known common name (CN) fields that many are familiar with:

  • countryName
  • stateOrProvinceName
  • localityName
  • organizationName

In addition to the fields above, the CSR will contain a serialNumber field, a 32-bit integer value represented in ASCII HEX format that encodes the following values:

  • PLATFORM_VENDOR_ID: An 8-bit integer value assigned by AMD for each ODM.
  • PLATFORM_MODEL_ID: A 4-bit integer value assigned to a platform by the ODM.
  • BIOS_KEY_REVISION_ID: is set by the ODM encoding a 4-bit key revision as unary counter value.
  • DISABLE_SECURE_DEBUG: Fuse bit that controls whether secure debug unlock feature is disabled permanently.
  • DISABLE_AMD_BIOS_KEY_USE: Fuse bit that controls if the BIOS, signed by an AMD key, (with vendor ID == 0) is permitted to boot on a CPU with non-zero Vendor ID.
  • DISABLE_BIOS_KEY_ANTI_ROLLBACK: Fuse bit that controls whether BIOS key anti-rollback feature is enabled.
Anchoring Trust: A Hardware Secure Boot Story

Remember these values, as we’ll show how we use them in a bit. Any of the DISABLE values are optional, but recommended based on your security posture/comfort level.

AMD, upon processing the CSR, provides the public part of the BIOS signing key signed and certified by the AMD signing root key as a RSA Public Key Token file (.stkn) format.

Putting It All Together

The following is a step-by-step illustration of how signed UEFI firmware is built:

Anchoring Trust: A Hardware Secure Boot Story
  1. The ODM submits their public key used for signing Cloudflare images to AMD.
  2. AMD signs this key using their RSA private key and passes it back to ODM.
  3. The AMD public key and the signed ODM public key are part of the final BIOS SPI image.
  4. The BIOS source code is compiled and various BIOS components (PEI Volume, Driver eXecution Environment (DXE) volume, NVRAM storage, etc.) are built as usual.
  5. The PSP directory and BIOS directory are built next. PSP directory and BIOS directory table points to the location of various firmware entities.
  6. The ODM builds the signed BIOS Root of Trust Measurement (RTM) signature based on the blob of BIOS PEI volume concatenated with BIOS Directory header, and generates the digital signature of this using the private portion of ODM signing key. The SPI location for signed BIOS RTM code is finally updated with this signature blob.
  7. Finally, the BIOS binaries, PSP directory, BIOS directory and various firmware binaries are combined to build the SPI BIOS image.

Enabling Platform Secure Boot

Platform Secure Boot is enabled at boot time with a PSB-ready firmware image. PSB is configured using a region of one-time programmable (OTP) fuses, specified for the customer. OTP fuses are on-chip non-volatile memory (NVM) that permits data to be written to memory only once. There is NO way to roll the fused CPU back to an unfused one.

Enabling PSB in the field will go through two steps: fusing and validating.

  • Fusing: Fuse the values assigned in the serialNumber field that was generated in the CSR
  • Validating: Validate the fused values and the status code registers

If validation is successful, the BIOS RTM signature is validated using the ODM BIOS signing key, PSB-specific registers (MP0_C2P_MSG_37 and MP0_C2P_MSG_38) are updated with the PSB status and fuse values, and the x86 cores are released

If validation fails, the registers above are updated with the PSB error status and fuse values, and the x86 cores stay in a locked state.

Let’s Boot!

With a signed image in hand, we are ready to enable PSB on a machine. We chose to deploy this on a few machines that had an updated, unsigned AMI UEFI firmware image, in this case version 2.16. We use a couple of different firmware update tools, so, after a quick script, we ran an update to change the firmware version from 2.16 to 2.18C (the signed image):

. $sudo ./UpdateAll.sh
Bin file name is ****.218C

BEGIN

+---------------------------------------------------------------------------+
|                 AMI Firmware Update Utility v5.11.03.1778                 |      
|                 Copyright (C)2018 American Megatrends Inc.                |                       
|                         All Rights Reserved.                              |
+---------------------------------------------------------------------------+
Reading flash ............... done
FFS checksums ......... ok
Check RomLayout ........ ok.
Erasing Boot Block .......... done
Updating Boot Block ......... done
Verifying Boot Block ........ done
Erasing Main Block .......... done
Updating Main Block ......... done
Verifying Main Block ........ done
Erasing NVRAM Block ......... done
Updating NVRAM Block ........ done
Verifying NVRAM Block ....... done
Erasing NCB Block ........... done
Updating NCB Block .......... done
Verifying NCB Block ......... done

Process completed.

After the update completed, we rebooted:

Anchoring Trust: A Hardware Secure Boot Story

After a successful install, we validated that the image was correct via the sysfs information provided in the dmidecode output:

Anchoring Trust: A Hardware Secure Boot Story

Testing

With a signed image installed, we wanted to test that it worked, meaning: what if an unauthorized user installed their own firmware image? We did this by downgrading the image back to an unsigned image, 2.16. In theory, the machine shouldn’t boot as the x86 cores should stay in a locked state. After downgrading, we rebooted and got the following:

Anchoring Trust: A Hardware Secure Boot Story

This isn’t a powered down machine, but the result of booting with an unsigned image.

Anchoring Trust: A Hardware Secure Boot Story

Flashing back to a signed image is done by running the same flashing utility through the BMC, so we weren’t bricked. Nonetheless, the results were successful.

Naming Convention

Our standard UEFI firmware images are alphanumeric, making it difficult to distinguish (by name) the difference between a signed and unsigned image (v2.16A vs v2.18C), for example. There isn’t a remote attestation capability (yet) to probe the PSB status registers or to store these values by means of a signature (e.g. TPM quote). As we transitioned to PSB, we wanted to make this easier to determine by adding a specific suffix: -sig  that we could query in userspace. This would allow us to query this information via Prometheus. Changing the file name alone wouldn’t do it, so we had to make the following changes to reflect a new naming convention for signed images:

  • Update filename
  • Update BIOS version for setup menu
  • Update post message
  • Update SMBIOS type 0 (BIOS version string identifier)

Signed images now have a -sig suffix:

~$ sudo dmidecode -t0
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.
# SMBIOS implementations newer than version 3.2.0 are not
# fully supported by this version of dmidecode.

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
	Vendor: American Megatrends Inc.
	Version: V2.20-sig
	Release Date: 09/29/2020
	Address: 0xF0000
	Runtime Size: 64 kB
	ROM Size: 16 MB

Conclusion

Finding weaknesses in firmware is a challenge that many attackers have taken on. Attacks that physically manipulate the firmware used for performing hardware initialization during the booting process can invalidate many of the common secure boot features that are considered industry standard. By implementing a hardware root of trust that is used for code signing critical boot entities, your hardware becomes a ‘first line of defense’ in ensuring that your server hardware and software integrity can derive trust through cryptographic means.

What’s Next?

While this post discussed our current, AMD-based hardware platform, how will this affect our future hardware generations? One of the benefits of working with diverse vendors like AMD and Ampere (ARM) is that we can ensure they are baking in our desired platform security by default (which we’ll speak about in a future post), making our hardware security outlook that much brighter 😀.

How our network powers Cloudflare One

Post Syndicated from Annika Garbers original https://blog.cloudflare.com/our-network-cloudflare-one/

How our network powers Cloudflare One

Earlier this week, we announced Cloudflare One™, a unified approach to solving problems in enterprise networking and security. With Cloudflare One, your organization’s data centers, offices, and devices can all be protected and managed in a single control plane. Cloudflare’s network is central to the value of all of our products, and today I want to dive deeper into how our network powers Cloudflare One.

Over the past ten years, Cloudflare has encountered the same challenges that face every organization trying to grow and protect a global network: we need to protect our infrastructure and devices from attackers and malicious outsiders, but traditional solutions aren’t built for distributed networks and teams. And we need visibility into the activity across our network and applications, but stitching together logging and analytics tools across multiple solutions is painful and creates information gaps.

How our network powers Cloudflare One

We’ve architected our network to meet these challenges, and with Cloudflare One, we’re extending the advantages of these decisions to your company’s network to help you solve them too.

Distribution

Enterprises and some small organizations alike have team members around the world. Legacy models of networking forced traffic back through central choke points, slowing down users and constraining network scale. We keep hearing from our customers who want to stop buying appliances and expensive MPLS links just to try and outpace the increased demand their distributed teams place on their network.

Wherever your users are, we are too

Global companies have enough of a challenge managing widely distributed corporate networks, let alone the additional geographic dispersity introduced as users are enabled to work from home or from anywhere. Because Cloudflare has data centers close to Internet users around the world, all traffic can be processed close to its source (your users), regardless of their location. This delivers performance benefits across all of our products.

We built our network to meet users where they are. Today, we have data centers in over 200 cities and over 100 countries. As the geographical reach of Cloudflare’s network has expanded, so has our capacity, which currently tops 42 Tbps. This reach and capacity is extended to your enterprise with Cloudflare One.

The same Cloudflare, everywhere

Traditional solutions for securing enterprise networks often involve managing a plethora of regional providers with different capabilities. This means that traffic from two users in different parts of the world may be treated completely differently, for example, with respect to quality of DDoS attack detection. With Cloudflare One, you can manage security for your entire global network from one place, consolidating and standardizing control.

Capacity for the good & the bad

With 42 Tbps of network capacity, you can rest assured that Cloudflare can handle all of your traffic – the clean, legitimate traffic you want, and the malicious and attack traffic you don’t.

Scalability

Every product on every server

All of Cloudflare’s services are standardized across our entire network. Every service runs on every server, which means that traffic through all of the products you use can be processed close to its source, rather than being sent around to different locations for different services. This also means that as our network continues to grow, all products benefit: new data centers will automatically process traffic for every service you use.

For example, your users who connect to the Internet through Cloudflare Gateway in South America connect to one of our data centers in the region, rather than backhauling to another location. When those users need to reach an origin located on the other side of the world, we can also route them over our private backbone to get them there faster.

Commodity hardware, software-based functions

We built our network using commodity hardware, which allows us to scale quickly without relying on one single vendor or getting stuck in supply chain bottlenecks. And the services that process your traffic are software-based – no specialized, third-party hardware performing specific functions. This means that the development, maintenance, and support for the products you use all lives within Cloudflare, reducing the complexity of getting help when you need it.

This approach also lets us build efficiency into our network. We use that efficiency to serve customers on our free plan and deliver a more cost-effective platform to our larger customers.

Connectivity

Cloudflare interconnects with over 8,800 networks globally, including major ISPs, cloud services, and enterprises. Because we’ve built one of the most interconnected networks in the world, Cloudflare One can deliver a better experience for your users and applications, regardless of your network architecture or connectivity/transit vendors.

Broad interconnectivity with eyeball networks

Because of our CDN product (among others), being close to end users (“eyeballs”) has always been critical for our network. Now that more people than ever are working from home, eyeball → datacenter connectivity is more crucial than ever. We’ve spoken to customers who, since transitioning to a work-from-home model earlier this year, have had congestion issues with providers who are not well-connected with eyeball networks. With Cloudflare One, your employees can do their jobs from anywhere with Cloudflare smoothly keeping their traffic (and your infrastructure) secure.

Extensive presence in peering facilities

Earlier this year, we announced Cloudflare Network Interconnect (CNI), the ability for you to connect your network with Cloudflare’s via a secure physical or virtual connection. Using CNI means more secure, reliable traffic to your network through Cloudflare One. With our highly-connected network, there’s a good chance we’re colocated with your organization in at least one peering facility, making CNI setup a no-brainer. We’ve also partnered with five interconnect platforms to provide even more flexibility with virtual (software-defined layer 2) connections with Cloudflare. Finally, we peer with major cloud providers all over the world, providing even more flexibility for organizations at any stage of hybrid/cloud transition.

Making the Internet smarter

Traditional approaches to creating secure and reliable network connectivity involve relying on expensive MPLS links to provide point to point connection. Cloudflare is built from the ground-up on the Internet, relying on and improving the same Internet links that customers use today. We’ve built software and techniques that help us be smarter about how we use the Internet to deliver better performance and reliability to our customers. We’ve also built the Cloudflare Global Private Backbone to help us even further enhance our software and techniques to deliver even more performance and reliability where it’s needed the most.

This approach allows us to use the variety of connectivity options in our toolkit intelligently, building toward a more performant network than what we could accomplish with a traditional MPLS solution. And because we use transit from a wide variety of providers, chances are that whoever your ISP is, you already have high-quality connectivity to Cloudflare’s network.

Insight

Diverse traffic workload yields attack intelligence

We process all kinds of traffic thanks to our network’s reach and the diversity of our customer base. That scale gives us unique insight into the Internet. We can analyze trends and identify new types of attacks before they hit the mainstream, allowing us to better prepare and protect customers as the security landscape changes.

We also provide you with visibility into these network and threat intelligence insights with tools like Cloudflare Radar and Cloudflare One Intel. Earlier this week, we launched a feature to block DNS tunneling attempts. We analyze a tremendous number of DNS queries and have built a model of what they should look like. We use that model to block suspicious queries which might leak data from devices.

Unique network visibility enables Smart Routing

In addition to attacks and malicious traffic across our network, we’re paying attention to the state of the Internet. Visibility across carriers throughout the world allows us to identify congestion and automatically route traffic along the fastest and most reliable paths. Contrary to the experience delivered by traditional scrubbing providers, Magic Transit customers experience minimal latency and sometimes even performance improvements with Cloudflare in path, thanks to our extensive connectivity and transit diversity.

Argo Smart Routing, powered by our extensive network visibility, improves performance for web assets by 30% on average; we’re excited to bring these benefits to any traffic through Cloudflare One with Argo Smart Routing for Magic Transit (coming soon!).

What’s next?

Cloudflare’s network is the foundation of the value and vision for Cloudflare One. With Cloudflare One, you can put our network between the Internet and your entire enterprise, gaining the powerful benefits of our global reach, scalability, connectivity, and insight. All of the products we’ve launched this week, like everything we’ve built so far, benefit from the unique advantages of our network.

We’re excited to see these effects multiply as organizations adopt Cloudflare One to protect and accelerate all of their traffic. And we’re just getting started: we’re going to continue to expand our network, and the products that run on it, to deliver an even faster, more secure, more reliable experience across all of Cloudflare One.

Delivering HTTP/2 upload speed improvements

Post Syndicated from Junho Choi original https://blog.cloudflare.com/delivering-http-2-upload-speed-improvements/

Delivering HTTP/2 upload speed improvements

Delivering HTTP/2 upload speed improvements

Cloudflare recently shipped improved upload speeds across our network for clients using HTTP/2. This post describes our journey from troubleshooting an issue to fixing it and delivering faster upload speeds to the global Internet.

We launched speed.cloudflare.com in May 2020 to give our users insight into how well their networks perform. The test provides download, upload and latency tests. Soon after release, we received reports from a small number of users that sometimes upload speeds were underreported. Our investigation determined that it seemed to happen with end users that had high upload bandwidth available (several hundreds Mbps class cable modem or fiber service). Our speed tests are performed via browser JavaScript, and most browsers use HTTP/2 by default. We found that HTTP/2 upload speeds were sometimes much slower than HTTP/1.1 (assuming all TLS) when the user had high available upload bandwidth.

Upload speed is more important than ever, especially for people using home broadband connections. As many people have been forced to work from home they’re using their broadband connections differently than before. Prior to the pandemic broadband traffic was very asymmetric (you downloaded way more than you uploaded… think listening to music, or streaming a movie), but now we’re seeing an increase in uploading as people video conference from home or create content from their home office.

Initial Tests

User reports were focused on particularly fast home networks. I set up a dummynet network simulator to test upload speed in a controlled environment. I launched a linux VM running our code inside my Macbook Pro and set up a dummynet between the VM and Mac host.  Measuring upload speed is simple – create a file and upload using curl to an endpoint which accepts a request body. I ran the same test 20 times and took a median upload speed (Mbps).

% dd if=/dev/urandom of=test.dat bs=1M count=10
% curl --http1.1 -w '%{speed_upload}\n' -sf -o/dev/null --data-binary @test.dat https://edge/upload-endpoint
% curl --http2 -w '%{speed_upload}\n' -sf -o/dev/null --data-binary @test.dat https://edge/upload-endpoint

Stepping up to uploading a 10MB object over a network which has 200Mbps available bandwidth and 40ms RTT, the result was surprising. Using our production configuration, HTTP/2 upload speed tested at almost half of the same test conditions using HTTP/1.1 (higher is better).

Delivering HTTP/2 upload speed improvements

The result may differ depending on your network, but the gap is bigger when the network is fast. On a slow network, like my home cable connection (5Mbps upload and 20ms RTT), HTTP/2 upload speed was almost identical to the performance observed with HTTP/1.1.

Receiver Flow Control

Before we get into more detail on this topic, my intuition suggested the issue was related to receiver flow control. Usually the client (browser or any HTTP client) is the receiver of data, but in the case the client is uploading content to the server, the server is the receiver of data. And the receiver needs some type of flow control of the receive buffer.

How we handle receiver flow control differs between HTTP/1.1 and HTTP/2. For example, HTTP/1.1 doesn’t define protocol-level receiver flow control since there is no multiplexing of requests in the connection and it’s up to the TCP layer which handles receiving data. Note that most of the modern OS TCP stacks have auto tuning of the receive buffer (we will revisit that later) and they tune based on the current BDP (bandwidth-delay product).

In the case of HTTP/2, there is a stream-level flow control mechanism because the protocol supports multiplexing of streams. Each HTTP/2 stream has its own flow control window and there is connection level flow control for all streams in the connection. If it’s too tight, the sender will be blocked by the flow control. If it’s too loose we may end up wasting memory for buffering. So keeping it optimal is important when implementing flow control and the most optimal strategy is to keep the receive buffer matching the current BDP. BDP represents the maximum bytes in flight in the given network and can be used as an optimal buffer size.

How NGINX handles the request body buffer

Initially I tried to find a parameter which controls NGINX upload buffering and tried to see if tuning the values improved the result. There are a couple of parameters which are related to uploading a request body.

And this is HTTP/2 specific:

Cloudflare does not use the proxy_buffering directive, so it can be immediately discounted.  client_body_buffer_size is the size of the request body buffer which is used regardless of the protocol, so this one applies to HTTP/1.1 and HTTP/2 as well.

When looking into the code, here is how it works:

Delivering HTTP/2 upload speed improvements

  • HTTP/1.1: use client_body_buffer_size buffer as a buffer between upstream and the client, simply repeating reading and writing using this buffer.
  • HTTP/2: since we need a flow control window update for the HTTP/2 DATA frame, there are two parameters:
    • http2_body_preread_size: it specifies the size of the initial request body read before it starts to send to the upstream.
    • client_body_buffer_size: it specifies the size of the request body buffer.
    • Those two parameters are used for allocating a request body buffer during uploading. Here is a brief summary of how unbuffered upload works:
      • Allocate a single request body buffer which size is a maximum of http2_body_preread_size and client_body_buffer_size. This means if http2_body_preread_size is 64KB and client_body_buffer_size is 128KB, then a 128KB buffer is allocated. We use 128KB for client_body_buffer_size.
      • HTTP/2 Settings INITIAL_WINDOW_SIZE of the stream is set to http2_body_preread_size and we use 64KB as a default (the RFC7540 default value).
      • HTTP/2 module reads up to http2_body_preread_size before sending it to upstream.
      • After flushing the preread buffer, keep reading from the client and write to upstream and send WINDOW_UPDATE frame back to the client when necessary until the request body is fully received.

To summarise what this means: HTTP/1.1 simply uses a single buffer, so TCP socket buffers do the flow control. However with HTTP/2, the application layer also has receiver flow control and NGINX uses a fixed size buffer for the receiver. This limits upload speed when the current link has a BDP larger than the current request body buffer size. So the bottleneck is HTTP/2 flow control when the buffer size is too tight.

We’re going to need a bigger buffer?

In theory, bigger buffer sizes should avoid upload bottlenecks, so I tried a few out by running my tests again. The previous chart result is now labelled “prod” and plotted alongside HTTP/2 tests with client_body_buffer_size set to 256KB, 512KB and 1024KB:

Delivering HTTP/2 upload speed improvements

It appears 512KB is an optimal value for client_body_buffer_size.

What if I test with some other network parameter? Here is when RTT is 10ms, in this case, 256KB looks optimal.

Delivering HTTP/2 upload speed improvements

Both cases look much better than current 128KB and get a similar performance to HTTP/1.1 or even better. However, it seems like the optimal buffer size is a moving target and furthermore having too large a buffer size can hurt the performance: we need a smart way to find the optimal buffer size.

Autotuning request body buffer size

One of the ideas that can help this kind of situation is autotuning. For example, modern TCP stacks autotune their receive buffers automatically. In production, our edge also has TCP receiver buffer autotuning enabled by default.

net.ipv4.tcp_moderate_rcvbuf = 1

But in case of HTTP/2, TCP buffer autotuning is not very effective because the HTTP/2 layer is doing its own flow control and the existing 128KB was too small for a high BDP link. At this point, I decided to pursue autotuning HTTP/2 receive buffer sizing as well, similar to what TCP does.

The basic idea is that NGINX doubles the size of HTTP/2 request body buffer size based on its BDP. Here is an algorithm currently implemented in our version of NGINX:

  • Allocate a request body buffer as explained above.
  • For every RTT (using linux tcp_info), update the current BDP.
  • Double the request body buffer size when the current BDP > (receiver window / 4).

Test Result

Lab Test

Here is a test result when HTTP/2 autotune upload is enabled (still using client_body_buffer_size 128KB). You can see “h2 autotune” is doing pretty well – similar or even slightly faster than HTTP/1.1 speed (that’s the initial goal). It might be slightly worse than a hand-picked optimal buffer size for given conditions, but you can see now NGINX picks up optimal buffer size automatically based on network conditions.

Delivering HTTP/2 upload speed improvements
Delivering HTTP/2 upload speed improvements

Production test

After we deployed this feature, I ran similar tests against our production edge, uploading a 10MB file from well connected client nodes to our edge. I created a Linux VM instance in Google Cloud and ran the upload test where the network is very fast (a few Gbps) and low latency (<10ms).

Here is when I run the test in the Google Cloud Belgium region to our CDG (Paris) PoP which has 7ms RTT. This looks very good with almost 3x improvement.

Delivering HTTP/2 upload speed improvements

I also tested between the Google Cloud Tokyo region and our NRT (Tokyo) PoP, which had a 2.3ms RTT. Although this is not realistic for home users, the results are interesting. A 128KB fixed buffer performs well, but HTTP/2 with buffer autotune outperforms even HTTP/1.1.

Delivering HTTP/2 upload speed improvements

Summary

HTTP/2 upload buffer autotuning is now fully deployed in the Cloudflare edge. Customers should now benefit from improved upload performance for all HTTP/2 connections, including speed tests on speed.cloudflare.com. Autotuning upload buffer size logic works well for most cases, and now HTTP/2 upload is much faster than before! When we think about the performance we usually tend to think about download speed or latency reduction, but faster uploading can also help users working from home when they need a large amount of upload, such as photo/video sharing apps, content creation, video conferencing or self broadcasting.

Many thanks to Lucas Pardue and Rustam Lalkaka for great feedback and suggestions on the article.