As the COVID-19 emergency continues to affect countries and territories around the world, the Internet has been a key factor in providing information to the public. As businesses, organizations and government agencies adjust to this new normal, we recognize the strain that this pandemic has put on the groups working to assist in virus mitigation and provide accurate information to the general public on the state of the pandemic.
At Cloudflare, this means ensuring that these entities have the necessary tools and resources available to them in these extenuating circumstances. On March 13, we announced our Cloudflare for Team products will be free until September 1, 2020, to ensure Cloudflare users and prospective users have the tools they need to support secure and efficient remote work. Additionally, we have removed usage caps for existing Cloudflare for Teams users and are also providing onboarding sessions so these groups can continue business in this new normal.
As a company, we believe we can do more and have been thinking about ways we can support organizations and businesses that are at the forefront of the pandemic such as health officials and those providing relief to the public. Many organizations have reached out to us with COVID-19 related initiatives including the creation of symptom tracking websites, medical resource donations, and websites focused on providing updates on COVID-19 cases in specific regions.
During this time, we have seen an increase in applications for Project Galileo, an initiative we started in 2014 to provide free services to organizations on the Internet including humanitarian organizations, media sites and voices of political dissent. Project Galileo was started to ensure these groups stay online, as they are repeatedly targeted due to the work they do. Since March 16, we have seen a 40% increase in applications for the project of organizations related to COVID-19 relief efforts and information. We are happy to assist other organizations that have started initiatives such as these with ensuring the accessibility and resilience of their web infrastructure and internal team.
Risks faced to Government Agencies Web Infrastructure due to COVID-19 pandemic
As COVID-19 has disrupted our lives, the Internet has allowed many aspects of our life to adapt and carry on. From health care, to academia, to sales, a working Internet infrastructure is essential for business continuity and the dissemination of information. At Cloudflare, we’ve witnessed the effects of this transition to online interaction. In the last two months, we have seen both a massive increase in Internet traffic and a shift in the type of content users access online. Government agencies have seen a 100% increase in traffic to their websites during the pandemic.
This unexpected shift in traffic patterns can come with a cost. Essential websites that provide crucial information and updates on this pandemic may not have configured their systems to handle the massive surges in traffic they are currently seeing. Government agencies providing essential health information to citizens on the COVID-19 pandemic have temporarily gone offline due to increased traffic. We’ve also seen examples of public service announcements and the sites of local governments providing unemployment resources unable to serve their traffic. In New Jersey, New York and Ohio, websites that provide unemployment benefits and health insurance options for people who have recently been laid off have crashed due to large amounts of traffic and unprecedented demand.
During the spread of COVID-19, government agencies have also experienced cyberattacks.
The Australian government’s digital platform for providing welfare services for Australian citizens, known as Mygov, was slow and inaccessible for a short period of time. Although a DDoS attack was suspected, the problems were actually the result of 95,000 legitimate requests to access unemployment benefits, as the country recently doubled these benefits to help those impacted by the pandemic.
COVID-19 Government Package
Cloudflare has helped improve the security and performance of many vulnerable entities on the Internet with Project Galileo and ensured the security of government related election agencies with the Athenian Project. Our services are designed not only to prevent malicious actors from disrupting a website, but also to protect large influxes of legitimate traffic. In light of recent events, we want to help state and local government agencies stay online and provide essential information to the public without worrying their site can be taken down by malicious or unexpected spikes in traffic.
Therefore, we are excited to provide a free package of services to state and local governments worldwide until September 1, 2020, to ensure they have the tools needed to secure their web infrastructure and internal teams.
This package of free services includes the following features:
Cloudflare Business Level services: Includes unmetered mitigation of DDoS attacks, web application firewall (WAF) with up to 25 custom rulesets, and ability to upload custom SSL certificates.
Rate limiting: Rate Limiting allows users to rate limit, shape or block traffic based on the rate of requests per client IP address, cookie, authentication token, or other attributes of the request.
Cloudflare for Teams: A suite of tools to help ensure that those working from home can ensure continuity.
Access: To ensure the security of internal teams, Cloudflare Access, allows for organizations to secure, authenticate, and monitor user access to any domain, application, or path on Cloudflare, without using a VPN.
Gateway: Uses DNS filtering to help protect users from phishing scams or malware sites at multiple locations.
We are also making this offer available for Cloudflare channel partners around the world to help support government agencies in their respective countries during this challenging time for the global community. If you are a partner and would like information on how to provide Cloudflare for Teams, a Business Plan and Rate Limiting at no charge, please contact your Cloudflare Partner Representative or email [email protected].
The news of COVID-19 has transformed every part of our lives. During this difficult time, the Internet has allowed us to stay connected with friends, family, and provide resources to those in need. At Cloudflare, we are committed to helping businesses, organizations and government agencies stay online to ensure that everyone has access to authoritative information.
Last April 1 we announced WARP — an option within the 220.127.116.11 iOS and Android app to secure and speed up Internet connections. Today, millions of users have secured their mobile Internet connections with WARP.
While WARP started as an option within the 18.104.22.168 app, it’s really a technology that can benefit any device connected to the Internet. In fact, one of the most common requests we’ve gotten over the last year is support for WARP for macOS and Windows. Today we’re announcing exactly that: the start of the WARP beta for macOS and Windows.
What’s The Same: Fast, Secure, and Free
We always wanted to build a WARP client for macOS and Windows. We started with mobile because it was the hardest challenge. And it turned out to be a lot harder than we anticipated. While we announced the beta of 22.214.171.124 with WARP on April 1, 2019 it took us until late September before we were able to open it up to general availability. We don’t expect the wait for macOS and Windows WARP to be nearly as long.
The WARP client for macOS and Windows relies on the same fast, efficient Wireguard protocol to secure Internet connections and keep them safe from being spied on by your ISP. Also, just like WARP on the 126.96.36.199 mobile app, the basic service will be free on macOS and Windows.
WARP+ Gets You There Faster
We plan to add WARP+ support in the coming months to allow you to leverage Cloudflare’s Argo network for even faster Internet performance. We will provide a plan option for existing WARP+ subscribers to add additional devices at a discount. In the meantime, existing WARP+ users will be among the first to be invited to try WARP for macOS and Windows. If you are a WARP+ subscriber, check your 188.8.131.52 app over the coming weeks for a link to an invitation to try the new WARP for macOS and Windows clients.
If you’re not a WARP+ subscriber, you can add yourself to the waitlist by signing up on the page linked below. We’ll email as soon as it’s ready for you to try.
We haven’t forgotten about Linux. About 10% of Cloudflare’s employees run Linux on their desktops. As soon as we get the macOS and Windows clients out we’ll turn our attention to building a WARP client for Linux.
Thank you to everyone who helped us make WARP fast, efficient, and reliable on mobile. It’s incredible how far it’s come over the last year. If you tried it early in the beta last year but aren’t using it now, I encourage you to give it another try. We’re looking forward to bringing WARP speed and security to even more devices.
The Cloudflare Load Balancer was introduced over three years ago to provide our customers with a powerful, easy to use tool to intelligently route traffic to their origins across the world. During the initial design process, one of the questions we had to answer was ‘where do we send traffic if all pools are down?’ We did not think it made sense just to drop the traffic, so we used the concept of a ‘fallback pool’ to send traffic to a ‘pool of last resort’ in the case that no pools were detected as available. While this may still result in an error, it gave an eyeball request a chance at being served successfully in case the pool was still up.
As a brief reminder, a load balancer helps route traffic across your origin servers to ensure your overall infrastructure stays healthy and available. Load Balancers are made up of pools, which can be thought of as collections of servers in a particular location.
Over the past three years, we’ve made many updates to the dashboard. The new designs now support the fallback pool addition to the dashboard UI. The use of a fallback pool is incredibly helpful in a tight spot, but not having it viewable in the dashboard led to confusion around which pool was set as the fallback. Was there a fallback pool set at all? We want to be sure you have the tools to support your day-to-day work, while also ensuring our dashboard is usable and intuitive.
You can now check which pool is set as the fallback in any given Load Balancer, along with being able to easily designate any pool in the Load Balancer as the fallback. If no fallback pool is set, then the last pool in the list will automatically be chosen. We made the decision to auto-set a pool to be sure that customers are always covered in case the worst scenario happens. You can access the fallback pool within the Traffic App of the Cloudflare dashboard when creating or editing a Load Balancer.
Load Balancing UI Improvements
Not only did we add the fallback pool to the UI, but we saw this as an opportunity to update other areas of the Load Balancing app that have caused some confusion in the past.
Facelift and De-modaling
As a start, we gave the main Load Balancing page a face lift as well as de-modaling (moving content out of a smaller modal screen into a larger area) the majority of the Load Balancing UI. We felt moving this content out of a small web element would allow users to more easily understand the content on the page and allow us to better use the larger available space rather than being limited to the small area of a modal. This change has been applied when you create or edit a Load Balancer and manage monitors and/or pools.
The updated UI has combined the health status and icon to declutter the available space and make it clear at a glance what the status is for a particular Load Balancer or Pool. We have also updated to a smaller toggle button across the Load Balancing UI, which allows us to update the action buttons with the added margin space gained. Now that we are utilizing the page surface area more efficiently, we moved forward to add more information in our tables so users are more aware of the shared aspects of their Load Balancer.
Shared Objects and Editing
Shared objects have caused some level of concern for companies who have teams across the world – all leveraging the Cloudflare dashboard.
Some of the shared objects, Monitors and Pools, have a new column added outlining which Pools or Load Balancers are currently in use by a particular Monitor or Pool. This brings more clarity around what will be affected by any changes made by someone from your organization. This supports users to be more autonomous and confident when they make an update in the dashboard. If someone from team A wants to update a monitor for a production server, they can do so without the worry of monitoring for another pool possibly breaking or have to speak to team B first. The time saved and empowerment to make updates as things change in your business is incredibly valuable. It supports velocity you may want to achieve while maintaining a safe environment to operate in. The days of having to worry about unforeseen consequences that could crop up later down the road are swiftly coming to a close.
This helps teams understand the impact of a given change and what else would be affected. But, we did not feel this was enough. We want to be sure that everyone is confident in the changes they are making. On top of the additional columns, we added in a number of confirmation modals to drive confidence about a particular change. Also, a list in the modal of the other Load Balancers or Pools that would be impacted. We really wanted to drive the message home around which objects are shared: we made a final change to allow edits of monitors to take place only within the Manage Monitors page. We felt that having users navigate to the manage page in itself gives more understanding that these items are shared. For example, allowing edits to a Monitor in the same view of editing a Load Balancer can make it seem like those changes are only for that Load Balancer, which is not always the case.
Manage Monitors before:
Manage monitors after:
Lastly, when users would expand the Manage Load Balancer table to view more details about their Pools or Origins within that specific Load Balancer, they would click the large X icon in the top right of that expanded card to close it – seems reasonable in the expanded context.
But, the X icon did not close the expanded card, but rather deleted the Load Balancer altogether. This is dangerous and we want to prevent users from making mistakes. With the added space we gained from de-modaling large areas of the UI, we have updated these buttons to be clickable text buttons that read ‘Edit’ or ‘Delete’ instead of the icon buttons. The difference is providing clearly defined text around the action that will take place, rather than leaving it up to a users interpretation of what the icon on the button means and the action it would result in. We felt this was much clearer to users and not be met with unwanted changes.
We are very excited about the updates to the Load Balancing dashboard and look forward to improving day in and day out.
This email was sent to all Cloudflare customers a short while ago
From: Matthew Prince Date: Thu, Mar 12, 2020 at 4:20 PM Subject: Cloudflare During the Coronavirus Emergency
We know that organizations and individuals around the world depend on Cloudflare and our network. I wanted to send you a personal note to let you know how Cloudflare is dealing with the Coronavirus emergency.
First, the health and safety of our employees and customers is our top priority. We have implemented a number of sensible policies to this end, including encouraging many employees to work from home. This, however, hasn’t slowed our operations. Our network operations center (NOC), security operations center (SOC), and customer support teams will remain fully operational and can do their jobs entirely remote as needed.
Second, we are tracking Internet usage patterns globally. As more people work from home, peak traffic in impacted regions has increased, on average, approximately 10%. In Italy, which has imposed a nationwide quarantine, peak Internet traffic is up 30%. Traffic patterns have also shifted so peak traffic is occurring earlier in the day in impacted regions. None of these traffic changes raise any concern for us. Cloudflare’s network is well provisioned to handle significant spikes in traffic. We have not seen, and do not anticipate, any impact to our network’s performance, reliability, or security globally.
Third, we are monitoring for any changes in cyberthreats. While we have seen more phishing attacks using the Coronavirus as a lure, we have not seen any significant increase in attack traffic or new threats. Again, our SOC remains fully operational and is continuously monitoring for any new security threats that may emerge.
Finally, we recognize that this emergency has put strain on the infrastructure of companies around the world as more employees work from home. On Monday, I wrote about how we are making our Cloudflare for Teams product, which helps support secure and efficient remote work, free for small businesses for at least the next six months:
As the severity of the emergency has become clearer over the course of this week, we decided to extend this offer to help any business, regardless of size. The healthy functioning of our economy globally depends on work continuing to get done, even as people need to do that work remotely. If Cloudflare can do anything to help ensure that happens, I believe it is our duty to do so.
If you are already a Cloudflare for Teams customer, we have removed the caps on usage during this emergency so you can scale to whatever number of seats you need without additional cost. If you are not yet using Cloudflare for Teams, and if you or your employer are struggling with limits on the capacity of your existing VPN or Firewall, we stand ready to help and have removed the limits on the free trials of our Access and Gateway products for at least the next six months. Cloudflare employees around the world have volunteered to run no-cost onboarding sessions so you can get set up quickly and ensure your business’ continuity.
Thank you for being a Cloudflare customer. These are challenging times but I want you to know that we stand ready to help however we can. We understand the critical role we play in the functioning of the Internet and we are continually humbled by the trust you place in us. Together, we can get through this.
In the past, we didn’t have the opportunity to evaluate as many CPUs as we do today. The hardware ecosystem was simple – Intel had consistently delivered industry leading processors. Other vendors could not compete with them on both performance and cost. Recently it all changed: AMD has been challenging the status quo with their 2nd Gen EPYC processors.
This is not the first time that Intel has been challenged; previously there was Qualcomm, and we worked with AMD and considered their 1st Gen EPYC processors and based on the original Zen architecture, but ultimately, Intel prevailed. AMD did not give up and unveiled their 2nd Gen EPYC processors codenamed Rome based on the latest Zen 2 architecture.
This made many improvements over its predecessors. Improvements include a die shrink from 14nm to 7nm, a doubling of the top end core count from 32 to 64, and a larger L3 cache size. Let’s emphasize again on the size of that L3 cache, which is 32 MiB L3 cache per Core Complex Die (CCD).
This time around, we have taken steps to understand our workloads at the hardware level through the use of hardware performance counters and profiling tools. Using these specialized CPU registers and profilers, we collected data on the AMD 2nd Gen EPYC and Intel Skylake-based Xeon processors in a lab environment, then validated our observations in production against other generations of servers from the past.
We evaluated several Intel Cascade Lake and AMD 2nd Gen EPYC processors, trading off various factors between power and performance; the AMD EPYC 7642 CPU came out on top. The majority of Cascade Lake processors have 1.375 MiB L3 cache per core shared across all cores, a common theme that started with Skylake. On the other hand, the 2nd Gen EPYC processors start at 4 MiB per core. The AMD EPYC 7642 is a unique SKU since it has 256 MiB of L3 cache shared across its 48 cores. Having a cache this large or approximately 5.33 MiB sitting right next to each core means that a program will spend fewer cycles fetching data from RAM with the capability to have more data readily available in the L3 cache.
Traditional cache layout has also changed with the introduction of 2nd Gen EPYC, a byproduct of AMD using a multi-chip module (MCM) design. The 256 MiB L3 cache is formed by 8 individual dies or Core Complex Die (CCD) that is formed by 2 Core Complexes (CCX) with each CCX containing 16 MiB of L3 cache.
Our production traffic shares many characteristics of a sustained workload which typically does not induce large variation in operating frequencies nor enter periods of idle time. We picked out a simulated traffic pattern that closely resembled our production traffic behavior which was the Cached 10KiB png via HTTPS. We were interested in assessing the CPU’s maximum throughput or requests per second (RPS), one of our key metrics. With that being said, we did not disable Intel Turbo Boost or AMD Precision Boost, nor matched the frequencies clock-for-clock while measuring for requests per second, instructions retired per second (IPS), L3 cache miss rate, and sustained operating frequency.
The 1P AMD 2nd Gen EPYC 7642 powered server took the lead and processed 50% more requests per second compared to our Gen 9’s 2P Intel Xeon Platinum 6162 server.
We are running a sustained workload, so we should end up with a sustained operating frequency that is higher than base clock. The AMD EPYC 7642 operating frequency or the number cycles that the processor had at its disposal was approximately 20% greater than the Intel Xeon Platinum 6162, so frequency alone was not enough to explain the 50% gain in requests per second.
Taking a closer look, the number of instructions retired over time was far greater on the AMD 2nd Gen EPYC 7642 server, thanks to its low L3 cache miss rate.
Our most predominant bottleneck appears to be the cache memory and we saw significant improvement in requests per second as well as time to process a request due to low L3 cache miss rate. The data we present in this section was collected at a point of presence that spanned between Gen 7 to Gen 9 servers. We also collected data from a secondary region to gain additional confidence that the data we present here was not unique to one particular environment. Gen 9 is the baseline just as we have done in the previous section.
We put the 2nd Gen EPYC-based Gen X into production with hopes that the results would mirror closely to what we have previously seen in the lab. We found that the requests per second did not quite align with the results we had hoped, but the AMD EPYC server still outperformed all previous generations including outperforming the Intel Gen 9 server by 36%.
Sustained operating frequency was nearly identical to what we have seen back in the lab.
Due to the lower than expected requests per second, we also saw lower instructions retired over time and higher L3 cache miss rate but maintained a lead over Gen 9, with 29% better performance.
The single AMD EPYC 7642 performed very well during our lab testing, beating our Gen 9 server with dual Intel Xeon Platinum 6162 with the same total number of cores. Key factors we noticed were its large L3 cache, which led to a low L3 cache miss rate, as well as a higher sustained operating frequency. The AMD 2nd Gen EPYC 7642 did not have as big of an advantage in production, but nevertheless still outperformed all previous generations. The observation we made in production was based on a PoP that could have been influenced by a number of other factors such as but not limited to ambient temperature, timing, and other new products that will shape our traffic patterns in the future such as WebAssembly on Cloudflare Workers. The AMD EPYC 7642 opens up the possibility for our upcoming Gen X server to maintain the same core count while processing more requests per second than its predecessor.
Got a passion for hardware? I think we should get in touch. We are always looking for talented and curious individuals to join our team. The data presented here would not have been possible if it was not for the teamwork between many different individuals within Cloudflare. As a team, we strive to work together to create highly performant, reliable, and secure systems that will form the pillars of our rapidly growing network that spans 200 cities in more than 90 countries and we are just getting started.
More than 1 billion unique IP addresses pass through the Cloudflare Network each day, serving on average 11 million HTTP requests per second and operating within 100ms of 95% of the Internet-connected population globally. Our network spans 200 cities in more than 90 countries, and our engineering teams have built an extremely fast and reliable infrastructure.
We’re extremely proud of our work and are determined to help make the Internet a better and more secure place. Cloudflare engineers who are involved with hardware get down to servers and their components to understand and select the best hardware to maximize the performance of our stack.
Our software stack is compute intensive and is very much CPU bound, driving our engineers to work continuously at optimizing Cloudflare’s performance and reliability at all layers of our stack. With the server, a straightforward solution for increasing computing power is to have more CPU cores. The more cores we can include in a server, the more output we can expect. This is important for us since the diversity of our products and customers has grown over time with increasing demand that requires our servers to do more. To help us drive compute performance, we needed to increase core density and that’s what we did. Below is the processor detail for servers we’ve deployed since 2015, including the core counts:
Start of service
Intel Xeon E5-2630 v3
Intel Xeon E5-2630 v4
Intel Xeon Silver 4116
Intel Xeon Platinum 6162
2 x 8
2 x 10
2 x 12
2 x 24
2 x 85W
2 x 85W
2 x 85W
2 x 150W
TDP per Core
In 2018, we made a big jump in total number of cores per server with Gen 9. Our physical footprint was reduced by 33% compared to Gen 8, giving us increased capacity and computing power per rack. Thermal Design Power (TDP aka typical power usage) are mentioned above to highlight that we’ve also been more power efficient over time. Power efficiency is important to us: first, because we’d like to be as carbon friendly as we can; and second, so we can better utilize our provisioned power supplied by the data centers. But we know we can do better.
Our main defining metric is Requests per Watt. We can increase our Requests per Second number with more cores, but we have to stay within our power budget envelope. We are constrained by the data centers’ power infrastructure which, along with our selected power distribution units, leads us to power cap for each server rack. Adding servers to a rack obviously adds more power draw increasing power consumption at the rack level. Our Operational Costs significantly increase if we go over a rack’s power cap and have to provision another rack. What we need is more compute power inside the same power envelope which will drive a higher (better) Requests per Watt number – our key metric.
As you might imagine, we look at power consumption carefully in the design stage. From the above you can see that it’s not worth the time for us to deploy more power-hungry CPUs if TDP per Core is higher than our current generation which would hurt our Requests per Watt metric. As we started looking at production ready systems to power our Gen X solution, we took a long look at what is available to us in the market today and we’ve made our decision. We’re moving on from Gen 9’s 48-core setup of dual socket Intel® Xeon® Platinum 6162‘s to a 48-core single socket AMD EPYC™ 7642.
Xeon Platinum 6162
2 x 24
L3 Cache / socket
24 x 1.375MiB
16 x 16MiB
Memory / socket
6 channels, up to DDR4-2400
8 channels, up to DDR4-3200
2 x 150W
PCIe / socket
From the specs, we see that with the AMD chip we get to keep the same amount of cores and lower TDP. Gen 9’s TDP per Core was 6.25W, Gen X’s will be 4.69W… That’s a 25% decrease. With higher frequency, and perhaps going to a more simplified setup of single socket, we can speculate that the AMD chip will perform better. We’re walking through a series of tests, simulations, and live production results in the rest of this blog to see how much better AMD performs.
As a side note before we go further, TDP is a simplified metric from the manufacturers’ datasheets that we use in the early stages of our server design and CPU selection process. A quick Google search leads to thoughts that AMD and Intel define TDP differently, which basically makes the spec unreliable. Actual CPU power draw, and more importantly server system power draw, are what we really factor in our final decisions.
At the beginning of our journey to choose our next CPU, we got a variety of processors from different vendors that could fit well with our software stack and services, which are written in C, LuaJIT, and Go. More details about benchmarking for our stack were explained when we benchmarked Qualcomm’s ARM® chip in the past. We’re going to go through the same suite of tests as Vlad’s blog this time around since it is a quick and easy “sniff test”. This allows us to test a bunch of CPUs within a manageable time period before we commit to spend more engineering effort and need to apply our software stack.
We tried a variety of CPUs with different number of cores, sockets, and frequencies. Since we’re explaining how we chose the AMD EPYC 7642, all of the graphs in this blog focus on how AMD compares with our Gen 9’s Intel Xeon Platinum 6162 CPU as a baseline.
Our results correspond to server node for both CPUs tested; meaning the numbers pertain to 2x 24-core processors for Intel, and 1x 48-core processor for AMD – a two socket Intel based server and a one socket AMD EPYC powered server. Before we started our testing, we changed the Cloudflare lab test servers’ BIOS settings to match our production server settings. This gave us CPU frequencies yields for AMD at 3.03 Ghz and Intel at 2.50 Ghz on average with very little variation. With gross simplification, we expect that with the same amount of cores AMD would perform about 21% better than Intel. Let’s start with our crypto tests.
Looking promising for AMD. In public key cryptography, it does 18% better. Meanwhile for symmetric key, AMD loses on AES-128-GCM but it’s comparable overall.
We do a lot of compression at the edge to save bandwidth and help deliver content faster. We go through both zlib and brotli libraries written in C. All tests are done on blog.cloudflare.com HTML file in memory.
AMD wins by an average of 29% using gzip across all qualities. It does even better with brotli with tests lower than quality 7, which we use for dynamic compression. There’s a throughput cliff starting brotli-9 which Vlad’s explanation is that Brotli consumes lots of memory and thrashes cache. Nevertheless, AMD wins by a healthy margin.
A lot of our services are written in Go. In the following graphs we’re redoing the crypto and compression tests in Go along with RegExp on 32KB strings and the strings library.
AMD performs better in all of our Go benchmarks except for ECDSA P256 Sign losing by 38%, which is peculiar since with the test in C it does 24% better. It’s worth investigating what’s going on here. Other than that, AMD doesn’t win by as much of a margin but it still proves to be better.
We rely a lot on LuaJIT in our stack. As Vlad said, it’s the glue that holds Cloudflare together. We’re glad to show that AMD wins here as well.
Overall our tests show a single EPYC 7642 to be more competitive than two Xeon Platinum 6162. While there are a couple of tests where AMD loses out such as OpenSSL AES-128-GCM and Go OpenSSL ECDSA-P256 Sign, AMD wins in all the others. By scanning quickly and treating all tests equally, AMD does on average 25% better than Intel.
After our ‘sniff’ tests, we put our servers through another series of emulations which apply synthetic workloads simulating our edge software stack. Here we are simulating workloads of scenarios with different types of requests we see in production. Types of requests vary from asset size, whether they go through HTTP or HTTPS, WAF, Workers, or one of many additional variables. Below shows the throughput comparison between the two CPUs of the types of requests we see most typically.
The results above are ratios using Gen 9’s Intel CPUs as the baseline normalized at 1.0 on the X-axis. For example, looking at simple requests of 10KiB assets over HTTPS, we see that AMD does 1.50x better than Intel in Requests per Second. On average for the tests shown on the graph above, AMD performs 34% better than Intel. Considering that the TDP for the single AMD EPYC 7642 is 225W, when compared to two Intel’s being 300W, we’re looking at AMD delivering up to 2.0x better Requests per Watt vs. the Intel CPUs!
By this time, we were already leaning heavily toward a single socket setup with AMD EPYC 7642 as our CPU for Gen X. We were excited to see exactly how well AMD EPYC servers would do in production, so we immediately shipped a number of the servers out to some of our data centers.
Step one of course was to get all our test servers set up for a production environment. All of our machines in the fleet are loaded with the same processes and services which makes for a great apples-to-apples comparison. Like data centers everywhere, we have multiple generations of servers deployed and we deploy our servers in clusters such that each cluster is pretty homogeneous by server generation. In some environments this can lead to varying utilization curves between clusters. This is not the case for us. Our engineers have optimized CPU utilization across all server generations so that no matter if the machine’s CPU has 8 cores or 24 cores, CPU usage is generally the same.
As you can see above and to illustrate our ‘similar CPU utilization’ comment, there is no significant difference in CPU usage between Gen X AMD powered servers and Gen 9 Intel based servers. This means both test and baseline servers are equally loaded. Good. This is exactly what we want to see with our setup, to have a fair comparison. The 2 graphs below show the comparative number of requests processed at the CPU single core and all core (server) level.
We see that AMD does on average about 23% more requests. That’s really good! We talked a lot about bringing more muscle in the Gen 9 blog. We have the same number of cores, yet AMD does more work, and does it with less power. Just by looking at the specs for number of cores and TDP in the beginning, it’s really nice to see that AMD also delivers significantly more performance with better power efficiency.
But as we mentioned earlier, TDP isn’t a standardized spec across manufacturers so let’s look at real power usage below. Measuring server power consumption along with requests per second (RPS) yields the graph below:
Observing our servers request rate over their power consumption, the AMD Gen X server performs 28% better. While we could have expected more out of AMD since its TDP is 25% lower, keep in mind that TDP is very ambiguous. In fact, we saw that AMD actual power draw ran nearly at spec TDP with its much higher than base frequency; Intel was far from it. Another reason why TDP is becoming a less reliable estimate of power draw. Moreover, CPU is just one component contributing to the overall power of the system. Let’s remind that Intel CPUs are integrated in a multi-node system as described in the Gen 9 blog, while AMD is in a regular 1U form-factor machine. That actually doesn’t favor AMD since multi-node systems are designed for high density capabilities at lower power per node, yet it still outperformed the Intel system on a power per node basis anyway.
Through the majority of comparisons from the datasheets, test simulations, and live production performance, the 1P AMD EPYC 7642 configuration performed significantly better than the 2P Intel Xeon 6162. We’ve seen in some environments that AMD can do up to 36% better in live production and we believe we can achieve that consistently with some optimization on both our hardware and software.
So that’s it. AMD wins.
The additional graphs below show the median and p99 NGINX processing mostly on-CPU time latencies between the two CPUs throughout 24 hours. On average, AMD processes about 25% faster. At p99, it does about 20-50% depending on the time of day.
Hardware and Performance engineers at Cloudflare do significant research and testing to figure out the best server configuration for our customers. Solving big problems like this is why we love working here, and we’re also helping solving yours with our services like serverless edge compute and the array of security solutions such as Magic Transit, Argo Tunnel, and DDoS protection. All of our servers on the Cloudflare Network are designed to make our products work reliably, and we strive to make each new generation of our server design better than its predecessor. We believe the AMD EPYC 7642 is the answer for our Gen X’s processor question.
With Cloudflare Workers, developers have enjoyed deploying their applications to our Network, which is ever expanding across the globe. We’ve been proud to empower our customers by letting them focus on writing their code while we are managing the security and reliability in the cloud. We are now even more excited to say that their work will be deployed on our Gen X servers powered by 2nd Gen AMD EPYC processors.
Thanks to AMD, using the EPYC 7642 allows us to increase our capacity and expand into more cities easier. Rome wasn’t built in one day, but it will be very close to many of you.
In the last couple of years, we’ve been experimenting with many Intel and AMD x86 chips along with ARM CPUs. We look forward to having these CPU manufacturers partner with us for future generations so that together we can help build a better Internet.
We designed and built Cloudflare’s network to be able to grow capacity quickly and inexpensively; to allow every server, in every city, to run every service; and to allow us to shift customers and traffic across our network efficiently. We deploy standard, commodity hardware, and our product developers and customers do not need to worry about the underlying servers. Our software automatically manages the deployment and execution of our developers’ code and our customers’ code across our network. Since we manage the execution and prioritization of code running across our network, we are both able to optimize the performance of our highest tier customers and effectively leverage idle capacity across our network.
An alternative approach might have been to run several fragmented networks with specialized servers designed to run specific features, such as the Firewall, DDoS protection or Workers. However, we believe that approach would have resulted in wasted idle resources and given us less flexibility to build new software or adopt the newest available hardware. And a single optimization target means we can provide security and performance at the same time.
We use Anycast to route a web request to the nearest Cloudflare data center (from among 200 cities), improving performance and maximizing the surface area to fight attacks.
Once a datacenter is selected, we use Unimog, Cloudflare’s custom load balancing system, to dynamically balance requests across diverse generations of servers. We load balance at different layers: between cities, between physical deployments located across a city, between external Internet ports, between internal cables, between servers, and even between logical CPU threads within a server.
As demand grows, we can scale out by simply adding new servers, points of presence (PoPs), or cities to the global pool of available resources. If any server component has a hardware failure, it is gracefully de-prioritized or removed from the pool, to be batch repaired by our operations team. This architecture has enabled us to have no dedicated Cloudflare staff at any of the 200 cities, instead relying on help for infrequent physical tasks from the ISPs (or data centers) hosting our equipment.
Gen X: Intel Not Inside
We recently turned up our tenth generation of servers, “Gen X”, already deployed across major US cities, and in the process of being shipped worldwide. Compared with our prior server (Gen 9), it processes as much as 36% more requests while costing substantially less. Additionally, it enables a ~50% decrease in L3 cache miss rate and up to 50% decrease in NGINX p99 latency, powered by a CPU rated at 25% lower TDP (thermal design power) per core.
Notably, for the first time, Intel is not inside. We are not using their hardware for any major server components such as the CPU, board, memory, storage, network interface card (or any type of accelerator). Given how critical Intel is to our industry, this would until recently have been unimaginable, and is in contrast with priorgenerations which made extensive use of their hardware.
This time, AMD is inside.
We were particularly impressed by the 2nd Gen AMD EPYC processors because they proved to be far more efficient for our customers’ workloads. Since the pendulum of technology leadership swings back and forth between providers, we wouldn’t be surprised if that changes over time. However, we were happy to adapt quickly to the components that made the most sense for us.
CPU efficiency is very important to our server design. Since we have a compute-heavy workload, our servers are typically limited by the CPU before other components. Cloudflare’s software stack scales quite well with additional cores. So, we care more about core-count and power-efficiency than dimensions such as clock speed.
We selected the AMD EPYC 7642 processor in a single-socket configuration for Gen X. This CPU has 48-cores (96 threads), a base clock speed of 2.4 GHz, and an L3 cache of 256 MB. While the rated power (225W) may seem high, it is lower than the combined TDP in our Gen 9 servers and we preferred the performance of this CPU over lower power variants. Despite AMD offering a higher core count option with 64-cores, the performance gains for our software stack and usage weren’t compelling enough.
We have deployed the AMD EPYC 7642 in half a dozen Cloudflare data centers; it is considerably more powerful than a dual-socket pair of high-core count Intel processors (Skylake as well as Cascade Lake) we used in the last generation.
Readers of our blog might remember our excitement around ARM processors. We even ported the entirety of our software stack to run on ARM, just as it does with x86, and have been maintaining that ever since even though it calls for slightly more work for our software engineering teams. We did this leading up to the launch of Qualcomm’s Centriq server CPU, which eventually got shuttered. While none of the off-the-shelf ARM CPUs available this moment are interesting to us, we remain optimistic about high core count offerings launching in 2020 and beyond, and look forward to a day when our servers are a mix of x86 (Intel and AMD) and ARM.
We aim to replace servers when the efficiency gains enabled by new equipment outweigh their cost.
The performance we’ve seen from the AMD EPYC 7642 processor has encouraged us to accelerate replacement of multiple generations of Intel-based servers.
Compute is our largest investment in a server. Our heaviest workloads, from the Firewall to Workers (our serverless offering), often require more compute than other server resources. Also, the average size in kilobytes of a web request across our network tends to be small, influenced in part by the relative popularity of APIs and mobile applications. Our approach to server design is very different than traditional content delivery networks engineered to deliver large object video libraries, for whom servers focused on storage might make more sense, and re-architecting to offer serverless is prohibitively capital intensive.
Our Gen X server is intentionally designed with an “empty” PCIe slot for a potential add on card, if it can perform some functions more efficiently than the primary CPU. Would that be a GPU, FPGA, SmartNIC, custom ASIC, TPU or something else? We’re intrigued to explore the possibilities.
In accompanying blog posts over the next few days, our hardware engineers will describe how AMD 7642 performed against the benchmarks we care about. We are thankful for their hard work.
Memory, Storage & Network
Since we are typically limited by CPU, Gen X represented an opportunity to grow components such as RAM and SSD more slowly than compute.
For memory, we continued to use 256GB of RAM, as in our prior generation, but rated higher at 2933MHz. For storage, we continue to have ~3TB, but moved to 3x1TB form factor using NVME flash (instead of SATA) with increased available IOPS and higher endurance, which enables full disk encryption using LUKS without penalty. For the network card, we continue to use Mellanox 2x25G NIC.
We moved from our multi-node chassis back to a simple 1U form factor, designed to be lighter and less error prone during operational work at the data center. We also added multiple new ODM partners to diversify how we manufacture our equipment and to take advantage of additional global warehousing.
Our newest generation of servers give us the flexibility to continue to build out our network even closer to every user on Earth. We’re proud of the hard work from across engineering teams on Gen X, and are grateful for the support of our partners. Be on the lookout for more blogs about these servers in the coming days.
During the past year, we saw nearly 2 billion global citizens go to the polls to vote in democratic elections. There were major elections in more than 50 countries, including India, Nigeria, and the United Kingdom, as well as elections for the European Parliament. In 2020, we will see a similar number of elections in countries from Peru to Myanmar. In November, U.S citizens will cast their votes for the 46th President, 435 seats in the U.S House of Representatives, 35 of the 100 seats in the U.S. Senate, and many state and local elections.
Recognizing the importance of maintaining public access to election information, Cloudflare launched the Athenian Project in 2017, providing U.S. state and local government entities with the tools needed to secure their election websites for free. As we’ve seen, however, political parties and candidates for office all over the world are also frequent targets for cyberattack. Cybersecurity needs for campaign websites and internal tools are at an all time high.
Although Cloudflare has helped improve the security and performance of political parties and candidates for office all over the world for years, we’ve long felt that we could do more. So today, we’re announcing Cloudflare for Campaigns, a suite of Cloudflare services tailored to campaign needs. Cloudflare for Campaigns is designed to make it easier for all political campaigns and parties, especially those with small teams and limited resources, to get access to cybersecurity services.
Both because of our services to state and local government election websites through the Athenian Project and because a significant number of political parties and candidates for office use our services, Cloudflare has seen many attacks on election infrastructure and political campaigns firsthand.
During the 2020 U.S. election cycle, Cloudflare has provided services to 18 major presidential campaigns, as well as a range of congressional campaigns. On a typical day, Cloudflare blocks 400,000 attacks against political campaigns, and, on a busy day, Cloudflare blocks more than 40 million attacks against campaigns.
What is Cloudflare for Campaigns?
Cloudflare for Campaigns is a suite of Cloudflare products focused on the needs of political campaigns, particularly smaller campaigns that don’t have the resources to bring significant cybersecurity resources in house. To ensure the security of a campaign website, the Cloudflare for Campaigns package includes Business-level service, as well as security tools particularly helpful for political campaigns websites, such as the web application firewall, rate limiting, load balancing, Enterprise level “I am Under Attack Support”, bot management, and multi-user account enablement.
To ensure the security of internal campaign teams, the Cloudflare for Campaigns service will also provide tools for campaigns to ensure the security of their internal teams with Cloudflare Access, allowing for campaigns to secure, authenticate, and monitor user access to any domain, application, or path on Cloudflare, without using a VPN. Along with Access, we will be providing Cloudflare Gateway with DNS-based filtering at multiple locations to protect campaign staff as they navigate the Internet by keeping malicious content off the campaign’s network using DNS filtering, helping prevent users from running into phishing scams or malware sites. Campaigns can use Gateway after the product’s public release.
Cloudflare for Campaigns also includes Cloudflare reliability and security guide, which lists a best practice guide for political campaigns to maintain their campaign site and secure their internal teams.
Although there is widespread agreement that campaigns and political parties face threats of cyberattack, there is less consensus on how best to get political campaigns the help they need. Many political campaigns and political parties operate under resource constraints, without the technological capability and financial resources to dedicate to cybersecurity. At the same time, campaigns around the world are the subject of a variety of different regulations intended to prevent corruption of democratic processes. As a practical matter, that means that, although campaigns may not have the resources needed to access cybersecurity services, donation of cybersecurity services to campaigns may not always be allowed.
In the U.S., campaign finance regulations prohibit corporations from providing any contributions of either money or services to federal candidates or political party organizations. These rules prevent companies from offering free or discounted services if those services are not provided on the same terms and conditions to similarly situated members of the general public. The Federal Elections Commission (FEC), which enforces U.S. campaign finance laws, has struggled with the issue of how best to apply those rules to the provision of free or discounted cybersecurity services to campaigns. In consideration of a number of advisory opinions, they have publicly wrestled with the competing priorities of securing campaigns from cyberattack while not opening a backdoor to donation of goods services that are intended to curry favors with particular candidates.
The FEC has issued two advisory opinions to tech companies seeking to provide free or discounted cybersecurity services to campaigns. In 2018, the FEC approved a request by Microsoft to offer a package of enhanced online account security protections for “election-sensitive” users. The FEC reasoned that Microsoft was offering the services to its paid users “based on commercial rather than political considerations, in the ordinary course of its business and not merely for promotional consideration or to generate goodwill.” In July 2019, the FEC approved a request by a cybersecurity company to provide low-cost anti-phishing services to campaigns because those services would be provided in the ordinary course of business and on the same terms and conditions as offered to similarly situated non-political clients.
In September 2018, a month after Microsoft submitted its request, Defending Digital Campaigns (DDC), a nonprofit established with the mission to “secure our democratic campaign process by providing eligible campaigns and political parties, committees, and related organizations with knowledge, training, and resources to defend themselves from cyber threats,” submitted a request to the FEC to offer free or reduced-cost cybersecurity services, including from technology corporations, to federal candidates and parties. Over the following months, the FEC issued and requested comment on multiple draft opinions on whether the donation was permissible and, if so, on what basis. As described by the FEC, to support its position, DDC represented that “federal candidates and parties are singularly ill-equipped to counteract these threats.” The FEC’s advisory opinion to DDC noted:
“You [DDC] state that presidential campaign committees and national party committees require expert guidance on cybersecurity and you contend that the ‘vast majority of campaigns’ cannot afford full-time cybersecurity staff and that ‘even basic cybersecurity consulting software and services’ can overextend the budgets of most congressional campaigns. AOR004. For instance, you note that a congressional candidate in California reported a breach to the Federal Bureau of Investigation (FBI) in March of this year but did not have the resources to hire a professional cybersecurity firm to investigate the attack, or to replace infected computers. AOR003.”
In May 2019, the FEC approved DDC’s request to partner with technology companies to provide free and discounted cybersecurity services “[u]nder the unusual and exigent circumstances” presented by the request and “in light of the demonstrated, currently enhanced threat of foreign cyberattacks against party and candidate committees.”
All of these opinions demonstrate the FEC’s desire to allow campaigns to access affordable cybersecurity services because of the heightened threat of cyberattack, while still being cautious to ensure that those services are offered transparently and consistent with the goals of campaign finance laws.
Partnering with DDC to Provide Free Services to US Candidates
We share the view of both DDC and the FEC that political campaigns — which are central to our democracy — must have the tools to protect themselves against foreign cyberattack. Cloudflare is therefore excited to announce a new partnership with DDC to provide Cloudflare for Campaigns for free to candidates and parties that meet DDC’s criteria.
To receive free services under DDC, political campaigns must meet the following criteria, as the DDC laid out to the FEC:
A House candidate’s committee that has at least $50,000 in receipts for the current election cycle, and a Senate candidate’s committee that has at least $100,000 in receipts for the current election cycle;
A House or Senate candidate’s committee for candidates who have qualified for the general election ballot in their respective elections; or
Any presidential candidate’s committee whose candidate is polling above five percent in national polls.
Although political campaigns are regulated differently all around the world, Cloudflare believes that the integrity of all political campaigns should be protected against powerful adversaries. With this in mind, Cloudflare will therefore also be offering Cloudflare for Campaigns as a paid service, designed to help campaigns all around the world as we attempt to address regulatory hurdles. For more information on how to sign up for the Cloudflare election package, please visit cloudflare.com/campaigns.
Friday the 13th is a lucky day for Cloudflare for many reasons. On December 13, 2019 Tommy Pauly, co-chair of the IETF HTTP Working Group, announced the adoption of the “Extensible Prioritization Scheme for HTTP” -- a new approach to HTTP prioritization.
Web pages are made up of many resources that must be downloaded before they can be presented to the user. The role of HTTP prioritization is to load the right bytes at the right time in order to achieve the best performance. This is a collaborative process between client and server, a client sends priority signals that the server can use to schedule the delivery of response data. In HTTP/1.1 the signal is basic, clients order requests smartly across a pool of about 6 connections. In HTTP/2 a single connection is used and clients send a signal per request, as a frame, which describes the relative dependency and weighting of the response. HTTP/3 tried to use the same approach but dependencies don’t work well when signals can be delivered out of order.
HTTP/3 is being standardised as part of the QUIC effort. As a Working Group (WG) we’ve been trying to fix the problems that non-deterministic ordering poses for HTTP priorities. However, in parallel some of us have been working on an alternative solution, the Extensible Prioritization Scheme, which fixes problems by dropping dependencies and using an absolute weighting. This is signalled in an HTTP header field meaning it can be backported to work with HTTP/2 or carried over HTTP/1.1 hops. The alternative proposal is documented in the Individual-Draft draft-kazuho-httpbis-priority-04, co-authored by Kazuho Oku (Fastly) and myself. This has now been adopted by the IETF HTTP WG as the basis of further work; It’s adopted name will be draft-ietf-httpbis-priority-00.
To some extent document adoption is the end of one journey and the start of the next; sometimes the authors of the original work are not the best people to oversee the next phase. However, I’m pleased to say that Kazuho and I have been selected as co-editors of this new document. In this role we will reflect the consensus of the WG and help steward the next chapter of HTTP prioritization standardisation. Before the next journey begins in earnest, I wanted to take the opportunity to share my thoughts on the story of developing the alternative prioritization scheme through 2019.
I’d love to explain all the details of this new approach to HTTP prioritization but the truth is I expect the standardization process to refine the design and for things to go stale quickly. However, it doesn’t hurt to give a taste of what’s in store, just be aware that it is all subject to change.
A recap on priorities
The essence of HTTP prioritization comes down to trying to download many things over constrained connectivity. To borrow some text from Pat Meenan: Web pages are made up of dozens (sometimes hundreds) of separate resources that are loaded and assembled by a browser into the final displayed content. Since it is not possible to download everything immediately, we prefer to fetch more important things before less important ones. The challenge comes in signalling the importance from client to server.
In HTTP/2, every connection has a priority tree that expresses the relative importance between requests. Servers use this to determine how to schedule sending response data. The tree starts with a single root node and as requests are made they either depend on the root or each other. Servers may use the tree to decide how to schedule sending resources but clients cannot force a server to behave in any particular way.
To illustrate, imagine a client that makes three simple GET requests that all depend on root. As the server receives each request it grows its view of the priority tree:
Once all requests are received, the server determines all requests have equal priority and that it should send response data using round-robin scheduling: send some fraction of response 1, then a fraction of response 2, then a fraction of response 3, and repeat until all responses are complete.
A single HTTP/2 request-response exchange is made up of frames that are sent on a stream. A simple GET request would be sent using a single HEADERS frame:
Each region of a frame is a named field, a ‘?’ indicates the field is optional and the value in parenthesis is the length in bytes with ‘*’ meaning variable length. The Header Block Fragment field holds compressed HTTP header fields (using HPACK), Pad Length and Padding relate to optional padding, and E, Stream Dependency and Weight combined are the priority signal that controls the priority tree.
The HEADERS frame E field is the interesting bit (pun intended). A request with the field set to 1 (true) means that the dependency is exclusive and nothing else can depend on the indicated node. To illustrate, imagine a client that sends three requests which set the E field to 1. As the server receives each request, it interprets this as an exclusive dependency on the root node. Because all requests have the same dependency on root, the tree has to be shuffled around to satisfy the exclusivity rules.
The final version of the tree looks very different from our previous example. The server would schedule all of response 3, then all of response 2, then all of response 1. This could help load all of an HTML file before an image and thus improve the visual load behaviour.
In reality, clients load a lot more than three resources and use a mix of priority signals. To understand the priority of any single request, we need to understand all requests. That presents some technological challenges, especially for servers that act like proxies such as the Cloudflare edge network. Some servers have problems applying prioritization effectively.
Because not all clients send the most optimal priority signals we were motivated to develop Cloudflare’s Enhanced HTTP/2 Prioritization, announced last May during Speed Week. This was a joint project between the Speed team (Andrew Galloni, Pat Meenan, Kornel Lesiński) and Protocols team (Nick Jones, Shih-Chiang Chien) and others. It replaces the complicated priority tree with a simpler scheme that is well suited to web resources. Because the feature is implemented on the server side, we avoid requiring any modification of clients or the HTTP/2 protocol itself. Be sure to check out my colleague Nick’s blog post that details some of the technical challenges and changes needed to let our servers deliver smarter priorities.
The Extensible Prioritization Scheme proposal
The scheme specified in draft-kazuho-httpbis-priority-04, defines a way for priorities to be expressed in absolute terms. It replaces HTTP/2’s dependency-based relative prioritization, the priority of a request is independent of others, which makes it easier to reason about and easier to schedule.
Rather than send the priority signal in a frame, the scheme defines an HTTP header -- tentatively named “Priority” -- that can carry an urgency on a scale of 0 (highest) to 7 (lowest). For example, a client could express the priority of an important resource by sending a request with:
And a less important background resource could be requested with:
While Kazuho and I are the main authors of this specification, we were inspired by several ideas in the Internet community, and we have incorporated feedback or direct input from many of our peers in the Internet community over several drafts. The text today reflects the efforts-so-far of cross-industry work involving many engineers and researchers including organizations such Adobe, Akamai, Apple, Cloudflare, Fastly, Facebook, Google, Microsoft, Mozilla and UHasselt. Adoption in the HTTP Working Group means that we can help improve the design and specification by spending some IETF time and resources for broader discussion, feedback and implementation experience.
I work in Cloudflare’s Protocols team which is responsible for terminating HTTP at the edge. We deal with things like TCP, TLS, QUIC, HTTP/1.x, HTTP/2 and HTTP/3 and since joining the company I’ve worked with Alessandro Ghedini, Junho Choi and Lohith Bellad to make QUIC and HTTP/3 generally available last September.
Working on emerging standards is fun. It involves an eclectic mix of engineering, meetings, document review, specification writing, time zones, personalities, and organizational boundaries. So while working on the codebase of quiche, our open source implementation of QUIC and HTTP/3, I am also mulling over design details of the protocols and discussing them in cross-industry venues like the IETF.
Because of HTTP/3’s lineage, it carries over a lot of features from HTTP/2 including the priority signals and tree described earlier in the post.
One of the key benefits of HTTP/3 is that it is more resilient to the effect of lossy network conditions on performance; head-of-line blocking is limited because requests and responses can progress independently. This is, however, a double-edged sword because sometimes ordering is important. In HTTP/3 there is no guarantee that the requests are received in the same order that they were sent, so the priority tree can get out of sync between client and server. Imagine a client that makes two requests that include priority signals stating request 1 depends on root, request 2 depends on request 1. If request 2 arrives before request 1, the dependency cannot be resolved and becomes dangling. In such a case what is the best thing for a server to do? Ambiguity in behaviour leads to assumptions and disappointment. We should try to avoid that.
This is just one example where things get tricky quickly. Unfortunately the WG kept finding edge case upon edge case with the priority tree model. We tried to find solutions but each additional fix seemed to create further complexity to the HTTP/3 design. This is a problem because it makes it hard to implement a server that handles priority correctly.
Arguably HTTP/2 prioritization never lived up to its hype. However, replacing it with something else in HTTP/3 is a challenge because the QUIC WG charter required us to try and maintain parity between the protocols. Mark Nottingham, co-chair of the HTTP and QUIC WGs responded with a good summary of the situation. To quote part of that response:
My sense is that people know that we need to do something about prioritisation, but we’re not yet confident about any particular solution. Experimentation with new schemes as HTTP/2 extensions would be very helpful, as it would give us some data to work with. If you’d like to propose such an extension, this is the right place to do it.
And so started a very interesting year of cross-industry discussion on the future of HTTP prioritization.
A year of prioritization
The following is an account of my personal experiences during 2019. It’s been a busy year and there may be unintentional errors or omissions, please let me know if you think that is the case. But I hope it gives you a taste of the standardization process and a look behind the scenes of how new Internet protocols that benefit everyone come to life.
Pat’s email came at the same time that I was attending the QUIC WG Tokyo interim meeting hosted at Akamai (thanks to Mike Bishop for arrangements). So I was able to speak to a few people face-to-face on the topic. There was a bit of mailing list chatter but it tailed off after a few days.
February to April
Things remained quiet in terms of prioritization discussion. I knew the next best opportunity to get the ball rolling would be the HTTP Workshop 2019 held in April. The workshop is a multi-day event not associated with a standards-defining-organization (even if many of the attendees also go to meetings such as the IETF or W3C). It is structured in a way that allows the agenda to be more fluid than a typical standards meeting and gives plenty of time for organic conversation. This sometimes helps overcome gnarly problems, such as the community finding a path forward for WebSockets over HTTP/2 due to a productive discussion during the 2017 workshop. HTTP prioritization is a gnarly problem, so I was inspired to pitch it as a talk idea. It was selected and you can find the full slide deck here.
During the presentation I recounted the history of HTTP prioritization. The great thing about working on open standards is that many email threads, presentation materials and meeting materials are publicly archived. It’s fun digging through this history. Did you know: HTTP/2 is based on SPDY and inherited its weight-based prioritization scheme, the tree-based scheme we are familiar with today was only introduced in draft-ietf-httpbis-http2-11? One of the reasons for the more-complicated tree was to help HTTP intermediaries (a.k.a. proxies) implement clever resource management. However, it became clear during the discussion that no intermediaries implement this, and none seem to plan to. I also explained a bit more about Pat’s alternative scheme and Nick described his implementation experiences. Despite some interesting discussion around the topic however, we didn’t come to any definitive solution. There were a lot of other interesting topics to discover that week.
In early May, Ian Swett (Google) restarted interest in Pat’s mailing list thread. Unfortunately he was not present at the HTTP Workshop so had some catching up to do. A little while later Ian submitted a Pull Request to the HTTP/3 specification called “Strict Priorities”. This incorporated Pat’s proposal and attempted to fix a number of those prioritization edge cases that I mentioned earlier.
In late May, another QUIC WG interim meeting was held in London at the new Cloudflare offices, here is the view from the meeting room window. Credit to Alessandro for handling the meeting arrangements.
Mike, the editor of the HTTP/3 specification presented some of the issues with prioritization and we attempted to solve them with the conventional tree-based scheme. Ian, with contribution from Robin Marx (UHasselt), also presented an explanation about his “Strict Priorities” proposal. I recommend taking a look at Robin’s priority tree visualisations which do a great job of explaining things. From that presentation I particularly liked “The prioritization spectrum”, it’s a concise snapshot of the state of things at that time:
June and July
Following the interim meeting, the prioritization “debate” continued electronically across GitHub and email. Some time in June Kazuho started work on a proposal that would use a scheme similar to Pat and Ian’s absolute priorities. The major difference was that rather than send the priority signal in an HTTP frame, it would use a header field. This isn’t a new concept, Roy Fielding proposed something similar at IETF 83.
In HTTP/2 and HTTP/3 requests are made up of frames that are sent on streams. Using a simple GET request as an example: a client sends a HEADERS frame that contains the scheme, method, path, and other request header fields. A server responds with a HEADERS frame that contains the status and response header fields, followed by DATA frame(s) that contain the payload.
To signal priority, a client could also send a PRIORITY frame. In the tree-based scheme the frame carries several fields that express dependencies and weights. Pat and Ian’s proposals changed the contents of the PRIORITY frame. Kazuho’s proposal encodes the priority as a header field that can be carried in the HEADERS frame as normal metadata, removing the need for the PRIORITY frame altogether.
I liked the simplification of Kazuho’s approach and the new opportunities it might create for application developers. HTTP/2 and HTTP/3 implementations (in particular browsers) abstract away a lot of connection-level details such as stream or frames. That makes it hard to understand what is happening or to tune it.
The lingua franca of the Web is HTTP requests and responses, which are formed of header fields and payload data. In browsers, APIs such as Fetch and Service Worker allow handling of these primitives. In servers, there may be ways to interact with the primitives via configuration or programming languages. As part of Enhanced HTTP/2 Prioritization, we have exposed prioritization to Cloudflare Workers to allow rich behavioural customization. If a Worker adds the “cf-priority” header to a response, Cloudflare’s edge servers use the specified priority to serve the response. This might be used to boost the priority of a resource that is important to the load time of a page. To help inform this decision making, the incoming browser priority signal is encapsulated in the request object passed to a Worker’s fetch event listener (request.cf.requestPriority).
Standardising approaches to problems is part of helping to build a better Internet. Because of the resonance between Cloudflare’s work and Kazuho’s proposal, I asked if he would consider letting me come aboard as a co-author. He kindly accepted and on July 8th we published the first version as an Internet-Draft.
Meanwhile, Ian was helping to drive the overall prioritization discussion and proposed that we use time during IETF 105 in Montreal to speak to a wider group of people. We kicked off the week with a short presentation to the HTTP WG from Ian, and Kazuho and I presented our draft in a side-meeting that saw a healthy discussion. There was a realization that the concepts of prioritization scheme, priority signalling and server resource scheduling (enacting prioritization) were conflated and made effective communication and progress difficult. HTTP/2’s model was seen as one aspect, and two different I-Ds were created to deprecate it in some way (draft-lassey-priority-setting, draft-peon-httpbis-h2-priority-one-less). Martin Thomson (Mozilla) also created a Pull Request that simply removed the PRIORITY frame from HTTP/3.
To round off the week, in the second HTTP session it was decided that there was sufficient interest in resolving the prioritization debate via the creation of a design team. I joined the team led by Ian Swett along with others from Adobe, Akamai, Apple, Cloudflare, Fastly, Facebook, Google, Microsoft, and UHasselt.
August to October
Martin’s PR generated a lot of conversation. It was merged under proviso that some solution be found before the HTTP/3 specification was finalized. Between May and August we went from something very complicated (e.g. Orphan placeholder, with PRIORITY only on control stream, plus exclusive priorities) to a blank canvas. The pressure was now on!
The design team held several teleconference meetings across the months. Logistics are a bit difficult when you have team members distributed across West Coast America, East Coast America, Western Europe, Central Europe, and Japan. However, thanks to some late nights and early mornings we managed to all get on the call at the same time.
In October most of us travelled to Cupertino, CA to attend another QUIC interim meeting hosted at Apple’s Infinite Loop (Eric Kinnear helping with arrangements). The first two days of the meeting were used for interop testing and were loosely structured, so the design team took the opportunity to hold the first face-to-face meeting. We made some progress and helped Ian to form up some new slides to present later in the week. Again, there was some useful discussion and signs that we should put some time in the agenda in IETF 106.
The design team came to agreement that draft-kazuho-httpbis-priority was a good basis for a new prioritization scheme. We decided to consolidate the various I-Ds that had sprung up during IETF 105 into the document, making it a single source that was easier for people to track progress and open issues if required. This is why, even though Kazuho and I are the named authors, the document reflects a broad input from the community. We published draft 03 in November, just ahead of the deadline for IETF 106 in Singapore.
Many of us travelled to Singapore ahead of the actual start of IETF 106. This wasn’t to squeeze in some sightseeing (sadly) but rather to attend the IETF Hackathon. These are events where engineers and researchers can really put the concept of “running code” to the test. I really enjoy attending and I’m grateful to Charles Eckel and the team that organised it. If you’d like to read more about the event, Charles wrote up a nice blog post that, through some strange coincidence, features a picture of me, Kazuho and Robin talking at the QUIC table.
The design team held another face-to-face during a Hackathon lunch break and decided that we wanted to make some tweaks to the design written up in draft 03. Unfortunately the freeze was still in effect so we could not issue a new draft. Instead, we presented the most recent thinking to the HTTP session on Monday where Ian put forward draft-kazuho-httpbis-priority as the group’s proposed design solution. Ian and Robin also shared results of prioritization experiments. We received some great feedback in the meeting and during the week pulled out all the stops to issue a new draft 04 before the next HTTP session on Thursday. The question now was: Did the WG think this was suitable to adopt as the basis of an alternative prioritization scheme? I think we addressed a lot of the feedback in this draft and there was a general feeling of support in the room. However, in the IETF consensus is declared via mailing lists and so Tommy Pauly, co-chair of the HTTP WG, put out a Call for Adoption on November 21st.
The HTTP priorities team played the waiting game and watched the mailing list discussion. On the whole people supported the concept but there was one topic that divided opinion. Some people loved the use of headers to express priorities, some people didn’t and wanted to stick to frames.
On December 13th Tommy announced that the group had decided to adopt our document and assign Kazuho and I as editors. The header/frame divide was noted as something that needed to be resolved.
The next step of the journey
Just because the document has been adopted does not mean we are done. In some ways we are just getting started. Perfection is often the enemy of getting things done and so sometimes adoption occurs at the first incarnation of a “good enough” proposal.
Today HTTP/3 has no prioritization signal. Without priority information there is a small danger that servers pick a scheduling strategy that is not optimal, that could cause the web performance of HTTP/3 to be worse than HTTP/2. To avoid that happening we’ll refine and complete the design of the Extensible Priority Scheme. To do so there are open issues that we have to resolve, we’ll need to square the circle on headers vs. frames, and we’ll no doubt hit unknown unknowns. We’ll need the input of the WG to make progress and their help to document the design that fits the need, and so I look forward to continued collaboration across the Internet community.
2019 was quite a ride and I’m excited to see what 2020 brings.
If working on protocols is your interest and you like what Cloudflare is doing, please visit our careers page. Our journey isn’t finished, in fact far from it.
One of the more interesting features introduced by TLS 1.3, the latest revision of the TLS protocol, was the so called “zero roundtrip time connection resumption”, a mode of operation that allows a client to start sending application data, such as HTTP requests, without having to wait for the TLS handshake to complete, thus reducing the latency penalty incurred in establishing a new connection.
The basic idea behind 0-RTT connection resumption is that if the client and server had previously established a TLS connection between each other, they can use information cached from that session to establish a new one without having to negotiate the connection’s parameters from scratch. Notably this allows the client to compute the private encryption keys required to protect application data before even talking to the server.
However, in the case of TLS, “zero roundtrip” only refers to the TLS handshake itself: the client and server are still required to first establish a TCP connection in order to be able to exchange TLS data.
Zero means zero
QUIC goes a step further, and allows clients to send application data in the very first roundtrip of the connection, without requiring any other handshake to be completed beforehand.
Unfortunately, 0-RTT connection resumption is not all smooth sailing, and it comes with caveats and risks, which is why Cloudflare does not enable 0-RTT connection resumption by default. Users should consider the risks involved and decide whether to use this feature or not.
For starters, 0-RTT connection resumption does not provide forward secrecy, meaning that a compromise of the secret parameters of a connection will trivially allow compromising the application data sent during the 0-RTT phase of new connections resumed from it. Data sent after the 0-RTT phase, meaning after the handshake has been completed, would still be safe though, as TLS 1.3 (and QUIC) will still perform the normal key exchange algorithm (which is forward secret) for data sent after the handshake completion.
More worryingly, application data sent during 0-RTT can be captured by an on-path attacker and then replayed multiple times to the same server. In many cases this is not a problem, as the attacker wouldn’t be able to decrypt the data, which is why 0-RTT connection resumption is useful, but in some cases this can be dangerous.
For example, imagine a bank that allows an authenticated user (e.g. using HTTP cookies, or other HTTP authentication mechanisms) to send money from their account to another user by making an HTTP request to a specific API endpoint. If an attacker was able to capture that request when 0-RTT connection resumption was used, they wouldn’t be able to see the plaintext and get the user’s credentials, because they wouldn’t know the secret key used to encrypt the data; however they could still potentially drain that user’s bank account by replaying the same request over and over:
Of course this problem is not specific to banking APIs: any non-idempotent request has the potential to cause undesired side effects, ranging from slight malfunctions to serious security breaches.
In order to help mitigate this risk, Cloudflare will always reject 0-RTT requests that are obviously not idempotent (like POST or PUT requests), but in the end it’s up to the application sitting behind Cloudflare to decide which requests can and cannot be allowed with 0-RTT connection resumption, as even innocuous-looking ones can have side effects on the origin server.
To help origins detect and potentially disallow specific requests, Cloudflare also follows the techniques described in RFC8470. Notably, Cloudflare will add the Early-Data: 1 HTTP header to requests received during 0-RTT resumption that are forwarded to origins.
Origins able to understand this header can then decide to answer the request with the 425 (Too Early) HTTP status code, which will instruct the client that originated the request to retry sending the same request but only after the TLS or QUIC handshake have fully completed, at which point there is no longer any risk of replay attacks. This could even be implemented as part of a Cloudflare Worker.
This makes it possible for origins to allow 0-RTT requests for endpoints that are safe, such as a website’s index page which is where 0-RTT is most useful, as that is typically the first request a browser makes after establishing a connection, while still protecting other endpoints such as APIs and form submissions. But if an origin does not provide any of those non-idempotent endpoints, no action is required.
One stop shop for all your 0-RTT needs
Just like we previously did for TLS 1.3, we now support 0-RTT resumption for QUIC as well. In honor of this event, we have dusted off the user-interface controls that allow Cloudflare users to enable this feature for their websites, and introduced a dedicated toggle to control whether 0-RTT connection resumption is enabled or not, which can be found under the “Network” tab on the Cloudflare dashboard:
When TLS 1.3 and/or QUIC (via the HTTP/3 toggle) are enabled, 0-RTT connection resumption will be automatically offered to clients that support it, and the replay mitigation mentioned above will also be applied to the connections making use of this feature.
In addition, if you are a user of our open-source HTTP/3 patch for NGINX, after updating the patch to the latest version, you’ll be able to enable support for 0-RTT connection resumption in your own NGINX-based HTTP/3 deployment by using the built-in “ssl_early_data” option, which will work for both TLS 1.3 and QUIC+HTTP/3.
It was a scorching Monday on July 22 as temperatures soared above 37°C (99°F) in Austin, TX, the live music capital of the world. Only hours earlier, the last crowds dispersed from the historic East 6th Street entertainment district. A few blocks away, Cloudflarians were starting to make their way to the office. Little did those early arrivers know that they would soon be unknowingly participating in a Cloudflare time honored tradition of dogfooding new services before releasing them to the wild.
6th East Street, Austin Texas
Dogfooding is when an organization uses its own products. In this case, we dogfed our newest cloud service, Magic Transit, which both protects and accelerates our customers’ entire network infrastructure—not just their web properties or TCP/UDP applications. With Magic Transit, Cloudflare announces your IP prefixes via BGP, attracts (routes) your traffic to our global network edge, blocks bad packets, and delivers good packets to your data centers via Anycast GRE.
We decided to use Austin’s network because we wanted to test the new service on a live network with real traffic from real people and apps. With the target identified, we began onboarding the Austin office in an always-on routing topology.
In an always-on routing mode, Cloudflare data centers constantly advertise Austin’s prefix, resulting in faster, almost immediate mitigation. As opposed to traditional on-demand scrubbing center solutions with limited networks, Cloudflare operates within 100 milliseconds of 99% of the Internet-connected population in the developed world. For our customers, this means that always-on DDoS mitigation doesn’t sacrifice performance due to suboptimal routing. On the contrary, Magic Transit can actually improve your performance due to our network’s reach.
Cloudflare’s Global Network
Now that we’ve completed onboarding Austin to Magic Transit, all we needed was a motivated attacker to launch a DDoS attack. Luckily, we found more than a few willing volunteers on our Site Reliability Engineering (SRE) team to execute the attack. While the teams were still assembling in multiple locations around the world, our SRE volunteer started firing packets at our target from an undisclosed location.
Without Magic Transit, the Austin office would’ve been hit directly with the packet flood. Two things could have happened in this case (not mutually exclusive):
Austin’s on-premise equipment (routers, firewalls, servers, etc.) would have been overwhelmed and failed
Austin’s service providers would have dropped packets that exceeded its bandwidth allowance
Both cases would result in a very bad day for everyone.
Cloudflare DDoS Mitigation
Instead, when our SRE attacker launched the flood the packets were automatically routed via BGP to Cloudflare’s network. The packets reached the closest data center via Anycast and encountered multiple defenses in the form of XDP, eBPF and iptables. Those defenses are populated with pre-configured static firewall rules as well as dynamic rules generated by our DDoS mitigation systems.
Static rules can vary from straightforward IP blocking and rate-limiting to more sophisticated expressions that match against specific packet attributes. Dynamic rules, on the other hand, are generated automatically in real-time. To play fair with our attacker, we didn’t pre-configure any special rules against the attack. We wanted to give our attacker a fair opportunity to take Austin down. Although due to our multi-layered protection approach, the odds were never actually in their favor.
Generating Dynamic Rules
As part of our multi-layered protection approach, Dynamic Rules are generated on-the-fly by analyzing the packets that route through our network. While the packets are being routed, flow data is asynchronously sampled, collected, and analyzed by two main detection systems. The first is called Gatebot and runs across the entire Cloudflare network; the second is our newly deployed DoSD (denial of service daemon) which operates locally within each data center. DoSD is an exciting improvement that we’ve just recently rolled out and we look forward to writing more about its technical details here soon. DoSD samples at a much faster rate (1/100 packets) versus Gatebot which samples at a lower rate (~1/8000 packets), allowing it to detect even more attacks and block them faster.
The asynchronous attack detection lifecycle is represented as the dotted lines in the diagram below. Attacks are detected out of path to assure that we don’t add any latency, and mitigation rules are pushed in line and removed as needed.
Multiple packet attributes and correlations are taken into consideration during analysis and detection. Gatebot and DoSD search for both new network anomalies and already known attacks. Once an attack is detected, rules are automatically generated, propagated, and applied in the optimal location within 10 seconds or less. Just to give you an idea of the scale, we’re talking about hundreds of thousands of dynamic rules that are applied and removed every second across the entire Cloudflare network.
One of the beauties of Gatebot and DoSD is that they don’t require a traffic learning period. Once a customer is onboarded, they’re protected immediately. They don’t need to sample traffic for weeks before kicking in. While we can always apply specific firewall rules if requested by the customer, no manual configuration is required by the customer or our teams. It just works.
What this mitigation process looks like in practice
Let’s look at what happened in Austin when one of our SREs tried to DDoS Austin and failed. During one of the first attempts, before DoSD had rolled out globally, a degradation in audio and video quality was noticed for Austin employees on video calls for a few seconds before Gatebot kicked in. However, as soon as Gatebot kicked in, the quality was immediately restored. If we hadn’t had Magic Transit in-line, the degradation of service would’ve worsened until the point of full denial of service. Austin would have been offline and our Austin colleagues wouldn’t have had a very productive day.
On a subsequent attack attempt which took place after DoSD was deployed, our SRE launched a SYN flood on Austin. The attack targeted multiple IP addresses in Austin’s prefix and peaked just above 250,000 packets per second. DoSD detected the attack and blocked it in approximately 3 seconds. DoSD’s quick response resulted in no degradation of service for the Austin team.
What We Learned
Dogfooding Magic Transit served as a valuable experiment for us with lots of lessons learned both from the engineering and procedural aspects. From the engineering aspect, we fine-tuned our mitigations and optimized routings. From the procedural aspects, we drilled members of multiple teams including the Security Operations Center and Solution Engineering teams to help refine our run-books. By doing so, we reduced the onboarding duration to hours instead of days in order to assure a quick and smooth onboarding experience for our customers.
Want To Learn More?
Request a demo and learn how you can protect and accelerate your network with Cloudflare.
This week we celebrated Cloudflare’s 9th birthday by launching a variety of new offerings that support our mission: to help build a better Internet. Below is a summary recap of how we celebrated Birthday Week 2019.
Every day Cloudflare protects over 20 million Internet properties from malicious bots, and this week you were invited to join in the fight! Now you can enable “bot fight mode” in the Firewall settings of the Cloudflare Dashboard and we’ll start deploying CPU intensive code to traffic originating from malicious bots. This wastes the bots’ CPU resources and makes it more difficult and costly for perpetrators to deploy malicious bots at scale. We’ll also share the IP addresses of malicious bot traffic with our Bandwidth Alliance partners, who can help kick malicious bots offline. Join us in the battle against bad bots – and, as you can read here – you can help the climate too!
Speed matters, and if you manage a website or app, you want to make sure that you’re delivering a high performing website to all of your global end users. Now you can enable Browser Insights in the Speed section of the Cloudflare Dashboard to analyze website performance from the perspective of your users’ web browsers.
Several months ago we announced WARP, a free mobile app purpose-built to address the security and performance challenges of the mobile Internet, while also respecting user privacy. After months of testing and development, this week we (finally) rolled out WARP to approximately 2 million wait-list customers. We also enabled Warp Plus, a WARP experience that uses Argo routing technology to route your mobile traffic across faster, less-congested, routes through the Internet. Warp and Warp Plus (Warp+) are now available in the iOS and Android App stores and we can’t wait for you to give it a try!
Last year we announced early support for QUIC, a UDP based protocol that aims to make everything on the Internet work faster, with built-in encryption. The IETF subsequently decided that QUIC should be the foundation of the next generation of the HTTP protocol, HTTP/3. This week, Cloudflare was the first to introduce support for HTTP/3 in partnership with Google Chrome and Mozilla.
Finally, to wrap up our birthday week announcements, we announced Workers Sites. The Workers serverless platform continues to grow and evolve, and every day we discover new and innovative ways to help developers build and optimize their applications. Workers Sites enables developers to easily deploy lightweight static sites across Cloudflare’s global cloud platform without having to build out the traditional backend server infrastructure to support these sites.
We look forward to Birthday Week every year, as a chance to showcase some of our exciting new offerings — but we all know building a better Internet is about more than one week. It’s an effort that takes place all year long, and requires the help of our partners, employees and especially you — our customers. Thank you for being a customer, providing valuable feedback and helping us stay focused on our mission to help build a better Internet.
Can’t get enough of this week’s announcements, or want to learn more? Register for next week’s Birthday Week Recap webinar to get the inside scoop on every announcement.
Today, after a longer than expected wait, we’re opening WARP and WARP Plus to the general public. If you haven’t heard about it yet, WARP is a mobile app designed for everyone which uses our global network to secure all of your phone’s Internet traffic.
We announced WARP on April 1 of this year and expected to roll it out over the next few months at a fairly steady clip and get it released to everyone who wanted to use it by July. That didn’t happen. It turned out that building a next generation service to secure consumer mobile connections without slowing them down or burning battery was… harder than we originally thought.
Before today, there were approximately two million people on the waitlist to try WARP. That demand blew us away. It also embarrassed us. The common refrain is consumers don’t care about their security and privacy, but the attention WARP got proved to us how wrong that assumption actually is.
This post is an explanation of why releasing WARP took so long, what we’ve learned along the way, and an apology for those who have been eagerly waiting. It also talks briefly about the rationale for why we built WARP as well as the privacy principles we’ve committed to. However, if you want a deeper dive on those last two topics, I encourage you to read our original launch announcement.
And, if you just want to jump in and try it, you can download and start using WARP on your iOS or Android devices for free through the following links:
If you’ve already installed the 184.108.40.206 App on your device, you may need to update to the latest version in order to get the option to enable Warp.
Let me start with the apology. We are sorry making WARP available took far longer than we ever intended. As a way of hopefully making amends, for everyone who was on the waitlist before today, we’re giving 10 GB of WARP Plus — the even faster version of WARP that uses Cloudflare’s Argo network — to those of you who have been patiently waiting.
For people just signing up today, the basic WARP service is free without bandwidth caps or limitations. The unlimited version of WARP Plus is available for a monthly subscription fee. WARP Plus is the even faster version of WARP that you can optionally pay for. The fee for WARP Plus varies by region and is designed to approximate what a McDonald’s Big Mac would cost in the region. On iOS, the WARP Plus pricing as of the publication of this post is still being adjusted on a regional basis, but that should settle out in the next couple days.
WARP Plus uses Cloudflare’s virtual private backbone, known as Argo, to achieve higher speeds and ensure your connection is encrypted across the long haul of the Internet. We charge for it because it costs us more to provide. However, in order to help spread the word about WARP, you can earn 1GB of WARP Plus for every friend you refer to sign up for WARP. And everyone you refer gets 1GB of WARP Plus for free to get started as well.
Okay, Thanks, That’s Nice, But What Took You So Long?
So what took us so long?
WARP is an ambitious project. We set out to secure Internet connections from mobile devices to the edge of Cloudflare’s network. In doing so, however, we didn’t want to slow devices down or burn excess battery. We wanted it to just work. We also wanted to bet on the technology of the future, not the technology of the past. Specifically, we wanted to build not around legacy protocols like IPsec, but instead around the hyper-efficient WireGuard protocol.
At some level, we thought it would be easy. We already had the 220.127.116.11 App that was securing DNS requests running on millions of mobile devices. That worked great. How much harder could securing all the rest of the requests on a device be? Right??
It turns out, a lot. Zack Bloom has written up a great technical post describing many of the challenges we faced and the solutions we had to invent to deal with them. If you’re interested, I encourage you to check it out.
Apple threw us a curveball by releasing iOS 12.2 just days before the April 1 planned roll out. The new version of iOS significantly changed the underlying network stack implementation in a way that made some of what we were doing to implement WARP unstable. Ultimately we had to find work-arounds in our networking code, costing us valuable time.
We had a version of the WARP app that (kind of) worked on April 1. But, when we started to invite people from outside of Cloudflare to use it, we quickly realized that the mobile Internet around the world was far more wild and varied than we’d anticipated. The Internet is made up of diverse network components which do not always play nicely, we knew that. What we didn’t expect was how much more pain is introduced by the diversity of mobile carriers, mobile operating systems, and mobile device models.
And, while phones in our testbed were relatively stationary, phones in the real world move around — a lot. When they do, their network settings can change wildly. While that doesn’t matter much for stateless, simple DNS queries, for the rest of Internet traffic that makes things complex. Keeping WireGuard fast requires long-lived sessions between your phone and a server in our network, maintaining that for hours and days was very complex. Even beyond that, we use a technology called Anycast to route your traffic to our network. Anycast meant your traffic could move not just between machines, but between entire data centers. That made things very complex.
But there is a huge difference between hard and impossible. From long before the announcement, the team has been hard at work and I’m deeply proud of what they’ve accomplished. We changed our roll out plan to focus on iOS and solidify the shared underpinnings of the app to ensure it would work even with future network stack upgrades. We invited beta users not in the order of when they signed up, but instead based on networks where we didn’t yet have information to help us discover as many corner cases as possible. And we invented new technologies to keep session state even when the wild west of mobile networks and Anycast routing collide.
I’ve been running WARP on my phone since April 1. The first few months were… rough. Really rough. But, today, WARP has blended into the background of my mobile. And I sleep better knowing that my Internet connections from my phone are secure. Using my phone is as fast, and in some cases faster, than without WARP. In other words, WARP today does what we set out to accomplish: securing your mobile Internet connection and otherwise getting out of the way.
There Will Be Bugs
While WARP is a lot better than it was when we first announced it, we know there are still bugs. The most common bug we’re seeing these days is when WARP is significantly slower than using the mobile Internet without WARP. This is usually due to traffic being misrouted. For instance, we discovered a network in Turkey earlier this week that was being routed to London rather than our local Turkish facility. Once we’re aware of these routing issues we can typically fix them quickly.
Other common bugs involved captive portals — the pages where you have to enter information, for instance, when connecting to a hotel WiFI. We’ve fixed a lot of them but we haven’t had WARP users connecting to every hotel WiFi yet, so there will inevitably still be some that are broken.
We’ve made it easy to report issues that you discover. From the 18.104.22.168 App you can click on the little bug icon near the top of the screen, or just shake your phone with the app open, and quickly send us a report. We expect, over the weeks ahead, we’ll be squashing many of the bugs that you report.
Even Faster With Plus
WARP is not just a product, it’s a testbed for all of the Internet-improving technology we have spent years developing. One dream was to use our Argo routing technology to allow all of your Internet traffic to use faster, less-congested, routes through the Internet. When used by Cloudflare customers for the past several years Argo has improved the speed of their websites by an average of over 30%. Through some hard work of the team we are making that technology available to you as WARP Plus.
The WARP Plus technology is not without cost for us. Routing your traffic over our network often costs us more than if we release it directly to the Internet. To cover those costs we charge a monthly fee — $4.99/month or less — for WARP Plus. The fee depends on the region that you’re in and is intended to approximate what a Big Mac would cost in the same region.
Basic WARP is free. Our first priority is not to make money off of WARP however, we want to grow it to secure every single phone. To help make that happen, we wanted to give you an incentive to share WARP with your friends. You can earn 1GB of free WARP Plus for every person you share WARP with. And everyone you refer also gets 1GB of WARP Plus for free as well. There is no limit on how much WARP Plus data you can earn by sharing.
The free consumer security space has traditionally not been the most reputable. Many other companies that have promised to keep consumers’ data safe but instead built businesses around selling it or using it help target you with advertising. We think that’s disgusting. That is not Cloudflare’s business model and it never will be. WARP continues all the strong privacy protections that 22.214.171.124 launched with including:
We don’t write user-identifiable log data to disk;
We will never sell your browsing data or use it in any way to target you with advertising data;
Don’t need to provide any personal information — not your name, phone number, or email address — in order to use WARP or WARP Plus; and
We will regularly work with outside auditors to ensure we’re living up to these promises.
What WARP Is Not
From a technical perspective, WARP is a VPN. But it is designed for a very different audience than a traditional VPN. WARP is not designed to allow you to access geo-restricted content when you’re traveling. It will not hide your IP address from the websites you visit. If you’re looking for that kind of high-security protection then a traditional VPN or a service like Tor are likely better choices for you.
WARP, instead, is built for the average consumer. It’s built to ensure that your data is secured while it’s in transit. So the networks between you and the applications you’re using can’t spy on you. It will help protect you from people sniffing your data while you’re at a local coffee shop. It will also help ensure that your ISP isn’t hoovering up data on your browsing patterns to sell to advertisers.
WARP isn’t designed for the ultra-techie who wants to specify exactly what server their traffic will be routed through. There’s basically only one button in the WARP interface: ON or OFF. It’s simple on purpose. It’s designed for my mom and dad who ask me every holiday dinner what they can do to be a bit safer online. I’m excited this year to have something easy for them to do: install the 126.96.36.199 App, enable WARP, and rest a bit easier.
How Fast Is It?
Once we got WARP to a stable place, this was my first question. My initial inclination was to go to one of the many Speed Test sites and see the results. And the results were… weird. Sometimes much faster, sometimes much slower. Overall, they didn’t make a lot of sense. The reason why is that these sites are designed to measure the speed of your ISP. WARP is different, so these test sites don’t give particularly accurate readings.
The better test is to visit common sites around the Internet and see how they load, in real conditions, on WARP versus off. We’ve built a tool that does this. Generally, in our tests, WARP is around the same speed as non-WARP connections when you’re on a high performance network. As network conditions get worse, WARP will often improve performance more. But your experience will depend on the particular conditions of your network.
We plan, in the next few weeks, to expose the test tool within the 188.8.131.52 App so you can see how your device loads a set of popular sites without WARP, with WARP, and with WARP Plus. And, again, if you’re seeing particularly poor performance, please report it to us. Our goal is to provide security without slowing you down or burning excess battery. We can already do that for many networks and devices and we won’t rest until we can do it for everyone.
Here’s to a More Secure, Fast Internet
Cloudflare’s mission is to help build a better Internet. We’ve done that by securing and making more performance millions of Internet properties since we launched almost exactly 9 years ago. WARP furthers Cloudflare’s mission by extending our network to help make every consumer’s mobile device a bit more secure. Our team is proud of what we’ve built with WARP — albeit a bit embarrassed it took us so long to get into your hands. We hope you’ll forgive us for the delay, give WARP a try, and let us know what you think.
The good news is that modern web browsers expose lots of performance data that can help you understand how your web page performs. With the launch of Browser Insights today, we can analyze the performance from the perspective of the web browser and what the end user actually experiences. In this post, we’ll dive into how we think about performance and utilize the timing APIs in the web browser.
How web browsers measure performance
In the old days, the only way for a developer to profile performance was to intercept requests and measure the time from the beginning of the page load until the end of the load event.
Today, we can use Web APIs that are supported by modern browsers. This is part of the web standard called the Performance API. The Performance API consists of a collection of individual APIs that include:
Navigation Timing (for timing information relates to the page and navigation)
Resource Timing (for timing data regarding the loading of an application’s resources)
Paint Timing (that provides timing information about paint operation during the page construction)
In this blog post, we will primarily focus on the Navigation Timing API.
Inside the Performance API
To see what’s collected with the Performance API, you can open the Developer Tools in Chrome browser and type ‘performance’ in the console tab (or type in performance.timing to get direct access to the PerformanceTiming attribute).
Try expanding the Performance object by clicking on the arrow before the label and again expand the ‘timing’. This is called PerformanceTiming, which includes all the timings that relate to the current page load as UNIX epoch timestamp (milliseconds). The timing attributes shown are not in the order of the load. So for better understanding, let’s look at the illustration provided by the W3C.
As we can see from the diagram shown, each element (represented as a box above) in the order from left-to-right, represents the navigation flow of the page load. Each element has an attribute from the starting point to the end (and some have multiple attributes!) so that we can measure the elapsed time for each element. For example, to get the Request Time, you could type in the command like shown below in the console which appears to be 60 milliseconds.
How Cloudflare uses performance data
The metrics we surface are:
DNS (domainLookupEnd – domainLookupStart): How long the DNS query takes. This could appear as zero if the connection is reused or the content was stored in the local cache (memory or disk).
TCP (connectEnd – connectStart): How long it takes to establish a TCP connection the server. If HTTPS, this process includes TLS negotiation time.
Request (responseStart – requestStart): The time elapsed between making an HTTP request and receiving the first byte of the response.
Response (responseEnd – responseStart): The time elapsed between the first byte and the last byte of the response received. You can think of this as a resource download time.
Processing (domComplete – domLoading): How long it took to render the page. If this number is big, you can optimize your document architecture, resource size, or configure settings under Speed page such as Auto Minify the source code. This document process can be drill down more with domInteractive, domContentLoadedEventStart, domContentLoadedEventEnd, and domComplete. We plan to provide more detailed analytics on this later on.
Load Event (loadEventEnd – loadEventStart): When the browser finishes loading its document and resources, it triggers a `load` event. This duration may be helpful to you if you have additional functions or any logic for the load event.
Total Time: Sum of each timing metrics shown on the graph.
If you are seeing any spikes or unusual form of a line in the stacked line chart, you could start investigating on each element to see what is causing the problem.
Speed matters. We know that when your website or app gets faster, users have a better experience and you get more conversions and more revenue. At Cloudflare, we spend our days obsessing about speed and building new features to squeeze out as much performance as possible.
But to improve speed, you first need to measure it. That’s why we’re launching Browser Insights: a new tool that measures the performance of your website from the perspective of your users. Browser Insights lets you dive in to understand where, when, and why web pages are slow. And you can enable it today, for free, with one click.
Why did we build Browser Insights?
Let’s say you run an e-commerce site, and you want to make your conversion rates better. You’ve noticed that there’s a lot of traffic from visitors in Peru, but they have worse conversion than users in North America. Maybe you theorize that it takes a long time to load your checkout page, which causes customers to drop off before checking out. How would you verify that this is happening?
There are a few ways you could do this: you could check your server logs to look at timing information, or you could load the page a few times in your browser to see what’s slow.
These approaches have a few downsides though:
If you only look at server-side data, you miss factors that impact the end-user experience — how long did it take for the web browser to load all the necessary scripts, execute them, and paint the page?
If you only measure from one computer (or a small number of them), you miss the diversity of the computing population — for example, “how does this work on a phone on a 3G connection?”
To solve these problems, we use Real User Monitoring. This gives us the best of both worlds: we can run a timer inside real web browsers. This timer captures how long it takes web pages to load, from your actual users.
How does it work?
Browser Insights can be enabled with the flip of a switch in the “Speed” section of the dashboard:
There’s a lot of info this graph! At a high level, there are two main types of metrics
Request-level metrics like TCP connection time, or Request time. These metrics are counted on every page load and are impacted by Internet infrastructure, like the mobile network of your end users, or the speed of your servers.
In addition to seeing several metrics about your web page performance, it’s helpful to drill into the dimensions that impact performance like URL and Country. This means you can filter down to the performance of a specific page (like your home page or checkout page), and you can see the locations where your site loads the fastest and slowest.
Going back to our example above, we want to see how performance in Peru compares to North America:
Sure enough, we can confirm that there’s significant traffic from Peru, but web pages take about 13s to load on average — compared with just 4.2 seconds for users in the US. Theory confirmed! Now we can filter all of our metrics to just Peru to understand what’s happening better:
Note that “Processing” has increased the most, all the way to 12 seconds. Request times are higher as well, likely because we are connecting to an origin server in the US. Web pages are made of many individual requests, so it makes sense that, when combined, they lead to slower load times. In this example, caching faster content would probably lead to significantly page loads.
What’s coming next?
Our launch today is just the tip of the iceberg for Browser Insights. In the near future we want to add much more information that will help you understand exactly what’s slowing down your website, and what you can do to make it faster. We plan to add:
More metrics and dimensions, including page-level metrics like Time to First Paint and more dimensions like browser and network type
Subresource analytics. The average web page loads over 100 subresources, and we can provide a waterfall chart to show exactly which one is slow.
A/B testing, to show you how potential configuration changes will impact the performance of your own traffic
Alerting so that you know when performance falls below a pre-defined threshold
Insights powered by Cloudflare that tell you why something might be slow – for example, how your cache hit ratio impacts page load time
Protecting user privacy
Cloudflare’s mission to help build a better Internet is based on the importance we place on establishing trust with our customers, our customers’ end users, and the Internet community globally. We have a transparent business model that aligns with the interests of our customers — we make money from protecting and speeding up our customers’ Internet properties. We do not sell our customers’ (or their end users’) data.
Browser Insights requires that end users’ browsers report timing information back to Cloudflare. We designed Browser Insights so that it reports only the bare-minimum information needed to show our customers how their websites are performing. The only metrics Browser Insights collects are about timing. We do not track individual end users across our customers’ Internet properties. We encourage you to open up the Inspector in your favorite web browser to see what we’re sending back!
Try Browser Insights today
Last May we announced the all-new Speed Page. Our mission with the Speed page is to show you how fast your website is, and what you can do to make it faster. Today, we’re excited to announce that the new Speed Page is available for everyone!
Browser Insights will be available on the Speed page in early access and we’ll be working hard to bring it to everyone as soon as possible in the coming weeks. Watch this space for updates!
Today we’re excited to announce Cloudflare Magic Transit. Magic Transit provides secure, performant, and reliable IP connectivity to the Internet. Out-of-the-box, Magic Transit deployed in front of your on-premise network protects it from DDoS attack and enables provisioning of a full suite of virtual network functions, including advanced packet filtering, load balancing, and traffic management tools.
Magic Transit is built on the standards and networking primitives you are familiar with, but delivered from Cloudflare’s global edge network as a service. Traffic is ingested by the Cloudflare Network with anycast and BGP, announcing your company’s IP address space and extending your network presence globally. Today, our anycast edge network spans 193 cities in more than 90 countries around the world.
Once packets hit our network, traffic is inspected for attacks, filtered, steered, accelerated, and sent onward to the origin. Magic Transit will connect back to your origin infrastructure over Generic Routing Encapsulation (GRE) tunnels, private network interconnects (PNI), or other forms of peering.
Enterprises are often forced to pick between performance and security when deploying IP network services. Magic Transit is designed from the ground up to minimize these trade-offs: performance and security are better together. Magic Transit deploys IP security services across our entire global network. This means no more diverting traffic to small numbers of distant “scrubbing centers” or relying on on-premise hardware to mitigate attacks on your infrastructure.
We’ve been laying the groundwork for Magic Transit for as long as Cloudflare has been in existence, since 2010. Scaling and securing the IP network Cloudflare is built on has required tooling that would have been impossible or exorbitantly expensive to buy. So we built the tools ourselves! We grew up in the age of software-defined networking and network function virtualization, and the principles behind these modern concepts run through everything we do.
When we talk to our customers managing on-premise networks, we consistently hear a few things: building and managing their networks is expensive and painful, and those on-premise networks aren’t going away anytime soon.
Traditionally, CIOs trying to connect their IP networks to the Internet do this in two steps:
Source connectivity to the Internet from transit providers (ISPs).
Purchase, operate, and maintain network function specific hardware appliances. Think hardware load balancers, firewalls, DDoS mitigation equipment, WAN optimization, and more.
Each of these boxes costs time and money to maintain, not to mention the skilled, expensive people required to properly run them. Each additional link in the chain makes a network harder to manage.
This all sounded familiar to us. We had an aha! moment: we had the same issues managing our datacenter networks that power all of our products, and we had spent significant time and effort building solutions to those problems. Now, nine years later, we had a robust set of tools we could turn into products for our own customers.
Magic Transit aims to bring the traditional datacenter hardware model into the cloud, packaging transit with all the network “hardware” you might need to keep your network fast, reliable, and secure. Once deployed, Magic Transit allows seamless provisioning of virtualized network functions, including routing, DDoS mitigation, firewalling, load balancing, and traffic acceleration services.
Magic Transit is your network’s on-ramp to the Internet
Magic Transit delivers its connectivity, security, and performance benefits by serving as the “front door” to your IP network. This means it accepts IP packets destined for your network, processes them, and then outputs them to your origin infrastructure.
Connecting to the Internet via Cloudflare offers numerous benefits. Starting with the most basic, Cloudflare is one of the most extensively connected networks on the Internet. We work with carriers, Internet exchanges, and peering partners around the world to ensure that a bit placed on our network will reach its destination quickly and reliably, no matter the destination.
An example deployment: Acme Corp
Let’s walk through how a customer might deploy Magic Transit. Customer Acme Corp. owns the IP prefix 203.0.113.0/24, which they use to address a rack of hardware they run in their own physical datacenter. Acme currently announces routes to the Internet from their customer-premise equipment (CPE, aka a router at the perimeter of their datacenter), telling the world 203.0.113.0/24 is reachable from their autonomous system number, AS64512. Acme has DDoS mitigation and firewall hardware appliances on-premise.
Acme wants to connect to the Cloudflare Network to improve the security and performance of their own network. Specifically, they’ve been the target of distributed denial of service attacks, and want to sleep soundly at night without relying on on-premise hardware. This is where Cloudflare comes in.
Deploying Magic Transit in front of their network is simple:
Cloudflare uses Border Gateway Protocol (BGP) to announce Acme’s 203.0.113.0/24 prefix from Cloudflare’s edge, with Acme’s permission.
Cloudflare begins ingesting packets destined for the Acme IP prefix.
Magic Transit applies DDoS mitigation and firewall rules to the network traffic. After it is ingested by the Cloudflare network, traffic that would benefit from HTTPS caching and WAF inspection can be “upgraded” to our Layer 7 HTTPS pipeline without incurring additional network hops.
Acme would like Cloudflare to use Generic Routing Encapsulation (GRE) to tunnel traffic back from the Cloudflare Network back to Acme’s datacenter. GRE tunnels are initiated from anycast endpoints back to Acme’s premise. Through the magic of anycast, the tunnels are constantly and simultaneously connected to hundreds of network locations, ensuring the tunnels are highly available and resilient to network failures that would bring down traditionally formed GRE tunnels.
Cloudflare egresses packets bound for Acme over these GRE tunnels.
Let’s dive deeper on how the DDoS mitigation included in Magic Transit works.
Magic Transit protects networks from DDoS attack
Customers deploying Cloudflare Magic Transit instantly get access to the same IP-layer DDoS protection system that has protected the Cloudflare Network for the past 9 years. This is the same mitigation system that stopped a 942Gbps attack dead in its tracks, in seconds. This is the same mitigation system that knew how to stop memcached amplification attacks days before a 1.3Tbps attack took down Github, which did not have Cloudflare watching its back. This is the same mitigation we trust every day to protect Cloudflare, and now it protects your network.
Cloudflare has historically protected Layer 7 HTTP and HTTPS applications from attacks at all layers of the OSI Layer model. The DDoS protection our customers have come to know and love relies on a blend of techniques, but can be broken into a few complementary defenses:
Anycast and a network presence in 193 cities around the world allows our network to get close to users and attackers, allowing us to soak up traffic close to the source without introducing significant latency.
30+Tbps of network capacity allows us to soak up a lot of traffic close to the source. Cloudflare’s network has more capacity to stop DDoS attacks than that of Akamai Prolexic, Imperva, Neustar, and Radware — combined.
Our HTTPS reverse proxy absorbs L3 (IP layer) and L4 (TCP layer) attacks by terminating connections and re-establishing them to the origin. This stops most spurious packet transmissions from ever getting close to a customer origin server.
Layer 7 mitigations and rate limiting stop floods at the HTTPS application layer.
Looking at the above description carefully, you might notice something: our reverse proxy servers protect our customers by terminating connections, but our network and servers still get slammed by the L3 and 4 attacks we stop on behalf of our customers. How do we protect our own infrastructure from these attacks?
Gatebot is a suite of software running on every one of our servers inside each of our datacenters in the 193 cities we operate, constantly analyzing and blocking attack traffic. Part of Gatebot’s beauty is its simple architecture; it sits silently, in wait, sampling packets as they pass from the network card into the kernel and onward into userspace. Gatebot does not have a learning or warm-up period. As soon as it detects an attack, it instructs the kernel of the machine it is running on to drop the packet, log its decision, and move on.
Historically, if you wanted to protect your network from a DDoS attack, you might have purchased a specialized piece of hardware to sit at the perimeter of your network. This hardware box (let’s call it “The DDoS Protection Box”) would have been fantastically expensive, pretty to look at (as pretty as a 2U hardware box could get), and required a ton of recurring effort and money to stay on its feet, keep its licence up to date, and keep its attack detection system accurate and trained.
For one thing, it would have to be carefully monitored to make sure it was stopping attacks but not stopping legitimate traffic. For another, if an attacker managed to generate enough traffic to saturate your datacenter’s transit links to the Internet, you were out of luck; no box sitting inside your datacenter can protect you from an attack generating enough traffic to congest the links running from the outside world to the datacenter itself.
Early on, Cloudflare considered buying The DDoS Protection Box(es) to protect our various network locations, but ruled them out quickly. Buying hardware would have incurred substantial cost and complexity. In addition, buying, racking, and managing specialized pieces of hardware makes a network hard to scale. There had to be a better way. We set out to solve this problem ourselves, starting from first principles and modern technology.
To make our modern approach to DDoS mitigation work, we had to invent a suite of tools and techniques to allow us to do ultra-high performance networking on a generic x86 server running Linux.
At the core of our network data plane is the eXpress Data Path (XDP) and the extended Berkeley Packet Filter (eBPF), a set of APIs that allow us to build ultra-high performance networking applications in the Linux kernel. My colleagues have written extensively about how we use XDP and eBPF to stop DDoS attacks:
At the end of the day, we ended up with a DDoS mitigation system that:
Is delivered by our entire network, spread across 193 cities around the world. To put this another way, our network doesn’t have the concept of “scrubbing centers” — every single one of our network locations is always mitigating attacks, all the time. This means faster attack mitigation and minimal latency impact for your users.
Has exceptionally fast times to mitigate, with most attacks mitigated in 10s or less.
Was built in-house, giving us deep visibility into its behavior and the ability to rapidly develop new mitigations as we see new attack types.
Is deployed as a service, and is horizontally scalable. Adding x86 hardware running our DDoS mitigation software stack to a datacenter (or adding another network location) instantly brings more DDoS mitigation capacity online.
Gatebot is designed to protect Cloudflare infrastructure from attack. And today, as part of Magic Transit, customers operating their own IP networks and infrastructure can rely on Gatebot to protect their own network.
Magic Transit puts your network hardware in the cloud
We’ve covered how Cloudflare Magic Transit connects your network to the Internet, and how it protects you from DDoS attack. If you were running your network the old-fashioned way, this is where you’d stop to buy firewall hardware, and maybe another box to do load balancing.
With Magic Transit, you don’t need those boxes. We have a long track record of delivering common network functions (firewalls, load balancers, etc.) as services. Up until this point, customers deploying our services have relied on DNS to bring traffic to our edge, after which our Layer 3 (IP), Layer 4 (TCP & UDP), and Layer 7 (HTTP, HTTPS, and DNS) stacks take over and deliver performance and security to our customers.
Magic Transit is designed to handle your entire network, but does not enforce a one-size-fits-all approach to what services get applied to which portion of your traffic. To revisit Acme, our example customer from above, they have brought 203.0.113.0/24 to the Cloudflare Network. This represents 256 IPv4 addresses, some of which (eg 203.0.113.8/30) might front load balancers and HTTP servers, others mail servers, and others still custom UDP-based applications.
Each of these sub-ranges may have different security and traffic management requirements. Magic Transit allows you to configure specific IP addresses with their own suite of services, or apply the same configuration to large portions (or all) of your block.
Taking the above example, Acme may wish that the 203.0.113.8/30 block containing HTTP services fronted by a traditional hardware load balancer instead deploy the Cloudflare Load Balancer, and also wants HTTP traffic analyzed with Cloudflare’s WAF and content cached by our CDN. With Magic Transit, deploying these network functions is straight-forward — a few clicks in our dashboard or API calls will have your traffic handled at a higher layer of network abstraction, with all the attendant goodies applying application level load balancing, firewall, and caching logic bring.
This is just one example of a deployment customers might pursue. We’ve worked with several who just want pure IP passthrough, with DDoS mitigation applied to specific IP addresses. Want that? We got you!
Magic Transit runs on the entire Cloudflare Global Network. Or, no more scrubs!
When you connect your network to Cloudflare Magic Transit, you get access to the entire Cloudflare network. This means all of our network locations become your network locations. Our network capacity becomes your network capacity, at your disposal to power your experiences, deliver your content, and mitigate attacks on your infrastructure.
How expansive is the Cloudflare Network? We’re in 193 cities worldwide, with more than 30Tbps of network capacity spread across them. Cloudflare operates within 100 milliseconds of 98% of the Internet-connected population in the developed world, and 93% of the Internet-connected population globally (for context, the blink of an eye is 300-400 milliseconds).
Just as we built our own products in house, we also built our network in house. Every product runs in every datacenter, meaning our entire network delivers all of our services. This might not have been the case if we had assembled our product portfolio piecemeal through acquisition, or not had completeness of vision when we set out to build our current suite of services.
The end result for customers of Magic Transit: a network presence around the globe as soon you come on board. Full access to a diverse set of services worldwide. All delivered with latency and performance in mind.
We’ll be sharing a lot more technical detail on how we deliver Magic Transit in the coming weeks and months.
Magic Transit lowers total cost of ownership
Traditional network services don’t come cheap; they require high capital outlays up front, investment in staff to operate, and ongoing maintenance contracts to stay functional. Just as our product aims to be disruptive technically, we want to disrupt traditional network cost-structures as well.
Magic Transit is delivered and billed as a service. You pay for what you use, and can add services at any time. Your team will thank you for its ease of management; your management will thank you for its ease of accounting. That sounds pretty good to us!
Magic Transit is available today
We’ve worked hard over the past nine years to get our network, management tools, and network functions as a service into the state they’re in today. We’re excited to get the tools we use every day in customers’ hands.
So that brings us to naming. When we showed this to customers the most common word they used was ‘whoa.’ When we pressed what they meant by that they almost all said: ‘It’s so much better than any solution we’ve seen before. It’s, like, magic!’ So it seems only natural, if a bit cheesy, that we call this product what it is: Magic Transit.
Today we announced Cloudflare Magic Transit, which makes Cloudflare’s network available to any IP traffic on the Internet. Up until now, Cloudflare has primarily operated proxy services: our servers terminate HTTP, TCP, and UDP sessions with Internet users and pass that data through new sessions they create with origin servers. With Magic Transit, we are now also operating at the IP layer: in addition to terminating sessions, our servers are applying a suite of network functions (DoS mitigation, firewalling, routing, and so on) on a packet-by-packet basis.
Over the past nine years, we’ve built a robust, scalable global network that currently spans 193 cities in over 90 countries and is ever growing. All Cloudflare customers benefit from this scale thanks to two important techniques. The first is anycast networking. Cloudflare was an early adopter of anycast, using this routing technique to distribute Internet traffic across our data centers. It means that any data center can handle any customer’s traffic, and we can spin up new data centers without needing to acquire and provision new IP addresses. The second technique is homogeneous server architecture. Every server in each of our edge data centers is capable of running every task. We build our servers on commodity hardware, making it easy to quickly increase our processing capacity by adding new servers to existing data centers. Having no specialty hardware to depend on has also led us to develop an expertise in pushing the limits of what’s possible in networking using modern Linux kernel techniques.
Magic Transit is built on the same network using the same techniques, meaning our customers can now run their network functions at Cloudflare scale. Our fast, secure, reliable global edge becomes our customers’ edge. To explore how this works, let’s follow the journey of a packet from a user on the Internet to a Magic Transit customer’s network.
Putting our DoS mitigation to work… for you!
In the announcement blog post we describe an example deployment for Acme Corp. Let’s continue with this example here. When Acme brings their IP prefix 203.0.113.0/24 to Cloudflare, we start announcing that prefix to our transit providers, peers, and to Internet exchanges in each of our data centers around the globe. Additionally, Acme stops announcing the prefix to their own ISPs. This means that any IP packet on the Internet with a destination address within Acme’s prefix is delivered to a nearby Cloudflare data center, not to Acme’s router.
Let’s say I want to access Acme’s FTP server on 203.0.113.100 from my computer in Cloudflare’s office in Champaign, IL. My computer generates a TCP SYN packet with destination address 203.0.113.100 and sends it out to the Internet. Thanks to anycast, that packet ends up at Cloudflare’s data center in Chicago, which is the closest data center (in terms of Internet routing distance) to Champaign. The packet arrives on the data center’s router, which uses ECMP (Equal Cost Multi-Path) routing to select which server should handle the packet and dispatches the packet to the selected server.
Once at the server, the packet flows through our XDP- and iptables-based DoS detection and mitigation functions. If this TCP SYN packet were determined to be part of an attack, it would be dropped and that would be the end of it. Fortunately for me, the packet is permitted to pass.
So far, this looks exactly like any other traffic on Cloudflare’s network. Because of our expertise in running a global anycast network we’re able to attract Magic Transit customer traffic to every data center and apply the same DoS mitigation solution that has been protecting Cloudflare for years. Our DoS solution has handled some of the largest attacks ever recorded, including a 942Gbps SYN flood in 2018. Below is a screenshot of a recent SYN flood of 300M packets per second. Our architecture lets us scale to stop the largest attacks.
Network namespaces for isolation and control
The above looked identical to how all other Cloudflare traffic is processed, but this is where the similarities end. For our other services, the TCP SYN packet would now be dispatched to a local proxy process (e.g. our nginx-based HTTP/S stack). For Magic Transit, we instead want to dynamically provision and apply customer-defined network functions like firewalls and routing. We needed a way to quickly spin up and configure these network functions while also providing inter-network isolation. For that, we turned to network namespaces.
Namespaces are a collection of Linux kernel features for creating lightweight virtual instances of system resources that can be shared among a group of processes. Namespaces are a fundamental building block for containerization in Linux. Notably, Docker is built on Linux namespaces. A network namespace is an isolated instance of the Linux network stack, including its own network interfaces (with their own eBPF hooks), routing tables, netfilter configuration, and so on. Network namespaces give us a low-cost mechanism to rapidly apply customer-defined network configurations in isolation, all with built-in Linux kernel features so there’s no performance hit from userspace packet forwarding or proxying.
When a new customer starts using Magic Transit, we create a brand new network namespace for that customer on every server across our edge network (did I mention that every server can run every task?). We built a daemon that runs on our servers and is responsible for managing these network namespaces and their configurations. This daemon is constantly reading configuration updates from Quicksilver, our globally distributed key-value store, and applying customer-defined configurations for firewalls, routing, etc, inside the customer’s namespace. For example, if Acme wants to provision a firewall rule to allow FTP traffic (TCP ports 20 and 21) to 203.0.113.100, that configuration is propagated globally through Quicksilver and the Magic Transit daemon applies the firewall rule by adding an nftables rule to the Acme customer namespace:
Getting the customer’s traffic to their network namespace requires a little routing configuration in the default network namespace. When a network namespace is created, a pair of virtual ethernet (veth) interfaces is also created: one in the default namespace and one in the newly created namespace. This interface pair creates a “virtual wire” for delivering network traffic into and out of the new network namespace. In the default network namespace, we maintain a routing table that forwards Magic Transit customer IP prefixes to the veths corresponding to those customers’ namespaces. We use iptables to mark the packets that are destined for Magic Transit customer prefixes, and we have a routing rule that specifies that these specially marked packets should use the Magic Transit routing table.
(Why go to the trouble of marking packets in iptables and maintaining a separate routing table? Isolation. By keeping Magic Transit routing configurations separate we reduce the risk of accidentally modifying the default routing table in a way that affects how non-Magic Transit traffic flows through our edge.)
Network namespaces provide a lightweight environment where a Magic Transit customer can run and manage network functions in isolation, letting us put full control in the customer’s hands.
GRE + anycast = magic
After passing through the edge network functions, the TCP SYN packet is finally ready to be delivered back to the customer’s network infrastructure. Because Acme Corp. does not have a network footprint in a colocation facility with Cloudflare, we need to deliver their network traffic over the public Internet.
This poses a problem. The destination address of the TCP SYN packet is 203.0.113.100, but the only network announcing the IP prefix 203.0.113.0/24 on the Internet is Cloudflare. This means that we can’t simply forward this packet out to the Internet—it will boomerang right back to us! In order to deliver this packet to Acme we need to use a technique called tunneling.
Tunneling is a method of carrying traffic from one network over another network. In our case, it involves encapsulating Acme’s IP packets inside of IP packets that can be delivered to Acme’s router over the Internet. There are a number of common tunneling protocols, but Generic Routing Encapsulation (GRE) is often used for its simplicity and widespread vendor support.
GRE tunnel endpoints are configured both on Cloudflare’s servers (inside of Acme’s network namespace) and on Acme’s router. Cloudflare servers then encapsulate IP packets destined for 203.0.113.0/24 inside of IP packets destined for a publicly-routable IP address for Acme’s router, which decapsulates the packets and emits them into Acme’s internal network.
Now, I’ve omitted an important detail in the diagram above: the IP address of Cloudflare’s side of the GRE tunnel. Configuring a GRE tunnel requires specifying an IP address for each side, and the outer IP header for packets sent over the tunnel must use these specific addresses. But Cloudflare has thousands of servers, each of which may need to deliver packets to the customer through a tunnel. So how many Cloudflare IP addresses (and GRE tunnels) does the customer need to talk to? The answer: just one, thanks to the magic of anycast.
Cloudflare uses anycast IP addresses for our GRE tunnel endpoints, meaning that any server in any data center is capable of encapsulating and decapsulating packets for the same GRE tunnel. How is this possible? Isn’t a tunnel a point-to-point link? The GRE protocol itself is stateless—each packet is processed independently and without requiring any negotiation or coordination between tunnel endpoints. While the tunnel is technically bound to an IP address it need not be bound to a specific device. Any device that can strip off the outer headers and then route the inner packet can handle any GRE packet sent over the tunnel. Actually, in the context of anycast the term “tunnel” is misleading since it implies a link between two fixed points. With Cloudflare’s Anycast GRE, a single “tunnel” gives you a conduit to every server in every data center on Cloudflare’s global edge.
One very powerful consequence of Anycast GRE is that it eliminates single points of failure. Traditionally, GRE-over-Internet can be problematic because an Internet outage between the two GRE endpoints fully breaks the “tunnel”. This means reliable data delivery requires going through the headache of setting up and maintaining redundant GRE tunnels terminating at different physical sites and rerouting traffic when one of the tunnels breaks. But because Cloudflare is encapsulating and delivering customer traffic from every server in every data center, there is no single “tunnel” to break. This means Magic Transit customers can enjoy the redundancy and reliability of terminating tunnels at multiple physical sites while only setting up and maintaining a single GRE endpoint, making their jobs simpler.
Our scale is now your scale
Magic Transit is a powerful new way to deploy network functions at scale. We’re not just giving you a virtual instance, we’re giving you a global virtual edge. Magic Transit takes the hardware appliances you would typically rack in your on-prem network and distributes them across every server in every data center in Cloudflare’s network. This gives you access to our global anycast network, our fleet of servers capable of running your tasks, and our engineering expertise building fast, reliable, secure networks. Our scale is now your scale.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.