Tag Archives: Better Internet

Zero Trust WARP: tunneling with a MASQUE

Post Syndicated from Dan Hall original https://blog.cloudflare.com/zero-trust-warp-with-a-masque


Slipping on the MASQUE

In June 2023, we told you that we were building a new protocol, MASQUE, into WARP. MASQUE is a fascinating protocol that extends the capabilities of HTTP/3 and leverages the unique properties of the QUIC transport protocol to efficiently proxy IP and UDP traffic without sacrificing performance or privacy

At the same time, we’ve seen a rising demand from Zero Trust customers for features and solutions that only MASQUE can deliver. All customers want WARP traffic to look like HTTPS to avoid detection and blocking by firewalls, while a significant number of customers also require FIPS-compliant encryption. We have something good here, and it’s been proven elsewhere (more on that below), so we are building MASQUE into Zero Trust WARP and will be making it available to all of our Zero Trust customers — at WARP speed!

This blog post highlights some of the key benefits our Cloudflare One customers will realize with MASQUE.

Before the MASQUE

Cloudflare is on a mission to help build a better Internet. And it is a journey we’ve been on with our device client and WARP for almost five years. The precursor to WARP was the 2018 launch of 1.1.1.1, the Internet’s fastest, privacy-first consumer DNS service. WARP was introduced in 2019 with the announcement of the 1.1.1.1 service with WARP, a high performance and secure consumer DNS and VPN solution. Then in 2020, we introduced Cloudflare’s Zero Trust platform and the Zero Trust version of WARP to help any IT organization secure their environment, featuring a suite of tools we first built to protect our own IT systems. Zero Trust WARP with MASQUE is the next step in our journey.

The current state of WireGuard

WireGuard was the perfect choice for the 1.1.1.1 with WARP service in 2019. WireGuard is fast, simple, and secure. It was exactly what we needed at the time to guarantee our users’ privacy, and it has met all of our expectations. If we went back in time to do it all over again, we would make the same choice.

But the other side of the simplicity coin is a certain rigidity. We find ourselves wanting to extend WireGuard to deliver more capabilities to our Zero Trust customers, but WireGuard is not easily extended. Capabilities such as better session management, advanced congestion control, or simply the ability to use FIPS-compliant cipher suites are not options within WireGuard; these capabilities would have to be added on as proprietary extensions, if it was even possible to do so.

Plus, while WireGuard is popular in VPN solutions, it is not standards-based, and therefore not treated like a first class citizen in the world of the Internet, where non-standard traffic can be blocked, sometimes intentionally, sometimes not. WireGuard uses a non-standard port, port 51820, by default. Zero Trust WARP changes this to use port 2408 for the WireGuard tunnel, but it’s still a non-standard port. For our customers who control their own firewalls, this is not an issue; they simply allow that traffic. But many of the large number of public Wi-Fi locations, or the approximately 7,000 ISPs in the world, don’t know anything about WireGuard and block these ports. We’ve also faced situations where the ISP does know what WireGuard is and blocks it intentionally.

This can play havoc for roaming Zero Trust WARP users at their local coffee shop, in hotels, on planes, or other places where there are captive portals or public Wi-Fi access, and even sometimes with their local ISP. The user is expecting reliable access with Zero Trust WARP, and is frustrated when their device is blocked from connecting to Cloudflare’s global network.

Now we have another proven technology — MASQUE — which uses and extends HTTP/3 and QUIC. Let’s do a quick review of these to better understand why Cloudflare believes MASQUE is the future.

Unpacking the acronyms

HTTP/3 and QUIC are among the most recent advancements in the evolution of the Internet, enabling faster, more reliable, and more secure connections to endpoints like websites and APIs. Cloudflare worked closely with industry peers through the Internet Engineering Task Force on the development of RFC 9000 for QUIC and RFC 9114 for HTTP/3. The technical background on the basic benefits of HTTP/3 and QUIC are reviewed in our 2019 blog post where we announced QUIC and HTTP/3 availability on Cloudflare’s global network.

Most relevant for Zero Trust WARP, QUIC delivers better performance on low-latency or high packet loss networks thanks to packet coalescing and multiplexing. QUIC packets in separate contexts during the handshake can be coalesced into the same UDP datagram, thus reducing the number of receive and system interrupts. With multiplexing, QUIC can carry multiple HTTP sessions within the same UDP connection. Zero Trust WARP also benefits from QUIC’s high level of privacy, with TLS 1.3 designed into the protocol.

MASQUE unlocks QUIC’s potential for proxying by providing the application layer building blocks to support efficient tunneling of TCP and UDP traffic. In Zero Trust WARP, MASQUE will be used to establish a tunnel over HTTP/3, delivering the same capability as WireGuard tunneling does today. In the future, we’ll be in position to add more value using MASQUE, leveraging Cloudflare’s ongoing participation in the MASQUE Working Group. This blog post is a good read for those interested in digging deeper into MASQUE.

OK, so Cloudflare is going to use MASQUE for WARP. What does that mean to you, the Zero Trust customer?

Proven reliability at scale

Cloudflare’s network today spans more than 310 cities in over 120 countries, and interconnects with over 13,000 networks globally. HTTP/3 and QUIC were introduced to the Cloudflare network in 2019, the HTTP/3 standard was finalized in 2022, and represented about 30% of all HTTP traffic on our network in 2023.

We are also using MASQUE for iCloud Private Relay and other Privacy Proxy partners. The services that power these partnerships, from our Rust-based proxy framework to our open source QUIC implementation, are already deployed globally in our network and have proven to be fast, resilient, and reliable.

Cloudflare is already operating MASQUE, HTTP/3, and QUIC reliably at scale. So we want you, our Zero Trust WARP users and Cloudflare One customers, to benefit from that same reliability and scale.

Connect from anywhere

Employees need to be able to connect from anywhere that has an Internet connection. But that can be a challenge as many security engineers will configure firewalls and other networking devices to block all ports by default, and only open the most well-known and common ports. As we pointed out earlier, this can be frustrating for the roaming Zero Trust WARP user.

We want to fix that for our users, and remove that frustration. HTTP/3 and QUIC deliver the perfect solution. QUIC is carried on top of UDP (protocol number 17), while HTTP/3 uses port 443 for encrypted traffic. Both of these are well known, widely used, and are very unlikely to be blocked.

We want our Zero Trust WARP users to reliably connect wherever they might be.

Compliant cipher suites

MASQUE leverages TLS 1.3 with QUIC, which provides a number of cipher suite choices. WireGuard also uses standard cipher suites. But some standards are more, let’s say, standard than others.

NIST, the National Institute of Standards and Technology and part of the US Department of Commerce, does a tremendous amount of work across the technology landscape. Of interest to us is the NIST research into network security that results in FIPS 140-2 and similar publications. NIST studies individual cipher suites and publishes lists of those they recommend for use, recommendations that become requirements for US Government entities. Many other customers, both government and commercial, use these same recommendations as requirements.

Our first MASQUE implementation for Zero Trust WARP will use TLS 1.3 and FIPS compliant cipher suites.

How can I get Zero Trust WARP with MASQUE?

Cloudflare engineers are hard at work implementing MASQUE for the mobile apps, the desktop clients, and the Cloudflare network. Progress has been good, and we will open this up for beta testing early in the second quarter of 2024 for Cloudflare One customers. Your account team will be reaching out with participation details.

Continuing the journey with Zero Trust WARP

Cloudflare launched WARP five years ago, and we’ve come a long way since. This introduction of MASQUE to Zero Trust WARP is a big step, one that will immediately deliver the benefits noted above. But there will be more — we believe MASQUE opens up new opportunities to leverage the capabilities of QUIC and HTTP/3 to build innovative Zero Trust solutions. And we’re also continuing to work on other new capabilities for our Zero Trust customers.
Cloudflare is committed to continuing our mission to help build a better Internet, one that is more private and secure, scalable, reliable, and fast. And if you would like to join us in this exciting journey, check out our open positions.

Making home Internet faster has little to do with “speed”

Post Syndicated from Mike Conlow original https://blog.cloudflare.com/making-home-internet-faster/

Making home Internet faster has little to do with “speed”

Making home Internet faster has little to do with “speed”

More than ten years ago, researchers at Google published a paper with the seemingly heretical title “More Bandwidth Doesn’t Matter (much)”. We published our own blog showing it is faster to fly 1TB of data from San Francisco to London than it is to upload it on a 100 Mbps connection. Unfortunately, things haven’t changed much. When you make purchasing decisions about home Internet plans, you probably consider the bandwidth of the connection when evaluating Internet performance. More bandwidth is faster speed, or so the marketing goes. In this post, we’ll use real-world data to show both bandwidth and – spoiler alert! – latency impact the speed of an Internet connection. By the end, we think you’ll understand why Cloudflare is so laser focused on reducing latency everywhere we can find it.

First, we should quickly define bandwidth and latency. Bandwidth is the amount of data that can be transmitted at any single time. It’s the maximum throughput, or capacity, of the communications link between two servers that want to exchange data. Usually, the bottleneck – the place in the network where the connection is constrained by the amount of bandwidth available – is in the “last mile”, either the wire that connects a home, or the modem or router in the home itself.

If the Internet is an information superhighway, bandwidth is the number of lanes on the road. The wider the road, the more traffic can fit on the highway at any time. Bandwidth is useful for downloading large files like operating system updates and big game updates. While bandwidth, throughput and capacity are synonyms, confusingly “speed” has come to mean bandwidth when talking about Internet plans. More on this later.

We use bandwidth when streaming video, though probably less than you think. Netflix recommends 15 Mbps of bandwidth to watch a stream in 4K/Ultra HD. A 1 Gbps connection could stream more than 60 Netflix shows in 4K at the same time!

Latency, on the other hand, is how fast data is moving through the Internet. To extend our analogy, tatency is the speed at which traffic is moving on the highway. If traffic is moving quickly, you’ll get to your destination faster. Latency is measured in the number of milliseconds that it takes a packet of data to go from a client (such as your laptop computer) to a server, and then back to the client. If you’re practicing tennis against a wall, round-trip latency is the time the ball was in the air. On the Internet “backbone” data is traveling at almost 200,000 kilometers per second as it bounces off the glass on the inside of fiber optic wires. That’s fast!

While we can’t make light travel through glass much faster, we can improve latency by moving the content closer to users, shortening the distance data needs to travel. That’s the effect of our presence in more than 285 cities globally: when you’re on the Internet superhighway trying to reach Cloudflare, we want to be just off the next exit.

We know that low-latency (low delay; high speed) connections are important for gaming, where tiny bits of data, such as the change in position of players in a game, need to reach another computer quickly. And increasingly, we’re becoming aware of high latency when it makes our videoconferencing choppy and unpleasant.

But we don’t use the Internet only to play games, nor only watch streaming video. We do those, and we visit a lot of normal web pages in between.

In the 2010 paper from Google, the author simulated loading web pages while varying the throughput and latency of the connection. What he finds is that above about 5 Mbps, the page doesn’t load much faster. Increasing bandwidth from 1 Mbps to 2 Mbps is almost a 40 percent improvement in page load time. From 5 Mbps to 6 Mbps is less than a 5 percent improvement.

When he varies the latency (the Round Trip Time, or RTT), something interesting happens: there is a linear return to better latency. For every 20 milliseconds of reduced latency, the page load time improves by about 10%.

Let’s see what this looks like in real life with empirical data. Below is a chart from an excellent recent paper by two researchers from MIT. Using data from the FCC’s Measuring Broadband America program, these researchers produce a similar chart to what we expect from the 2010 simulation. Though the point of diminishing returns to more bandwidth has moved higher – to about 20 Mbps – the overall trend is exactly the same.

Making home Internet faster has little to do with “speed”

For the latency relationship, we use aggregate and anonymized data from Cloudflare’s own delivery network. It’s a familiar pattern. For every 200 milliseconds of latency we can save, we cut the page load time by over 1 second. That relationship applies when the latency is 950 milliseconds. And it applies when the latency is 50 milliseconds.

Making home Internet faster has little to do with “speed”

Let’s try to understand why this relationship exists. When you connect to a website, the first thing that your browser does is encrypt the connection and make sure the site you want to access is authenticated to be the site you think you’re accessing. This process is called the TLS establishment and SSL handshake. Before you even see a pixel of content, your browser and the website will exchange information up to seven times: three to perform TLS establishment, and three more to perform the SSL handshake.

Making home Internet faster has little to do with “speed”

On top of that, when we load a webpage after we establish encryption and verify website authority, we might be asking the browser to load hundreds of different files across dozens of different domains. Some of these files can be loaded in parallel, but others need to be loaded sequentially. As the browser races to compile all these different files, it’s the speed at which it can get to the server and back that determines how fast it can put the page together. The files are often quite small, but there’s a lot of them.

The chart below shows the beginning of what the browser does when it loads cnn.com. After a 301 redirect, the browser loads the main HTML page in step two. Only then, more than 1 second into the load, does it know about all the javascript files it needs to render the page. Files 3-19 become unblocked once the browser has the HTML file in step two, but we see more blocking later on. Files 20-27 are all blocked on earlier files. They can’t start until the browser has the previous file back from the server. There are 650 assets in this page load, and the blocking happens all the way through the page load. Here’s why this matters: better latency makes every file load faster, which in turn unblocks other files faster, and so on.

Making home Internet faster has little to do with “speed”

The browser will do its best to use all the bandwidth available, but often is using just a fraction of what’s available. It’s no wonder then that adding more bandwidth doesn’t speed up the page load, but better latency does. While developments like Early Hints help this by allowing browsers to pre-fetch files and content that don’t need to be serialized, this is still a problem for many websites on the Internet today.

Making home Internet faster has little to do with “speed”

Recently, Internet researchers have turned their attention to using our understanding of the relationship between throughput and latency to improve Internet Quality of Experience (QoE). A paper from the Broadband Internet Technical Advisory Group (BITAG) summarizes:

But we now recognize that it is not just greater throughput that matters, but also consistently low latency. Unfortunately, the way that we’ve historically understood and characterized latency was flawed, and our latency measurements and metrics were not aligned with end-user QoE.

There’s a difference between latency on an idle Internet connection and latency measured in working conditions, which we call “working latency” or “responsiveness”. Since responsiveness is what the user experiences as the speed of their Internet connection, it’s important to understand and measure this particular latency.

An Internet connection can suffer from poor responsiveness (even if it has good idle latency) when data is delayed in buffers. If you download a large file, for example an operating system update, the server sending the file might send the file with higher throughput than the Internet connection can accept. That’s ok. Extra bits of the file will sit in a buffer until it’s their turn to go through the funnel. Adding extra lanes to the highway allows more cars to pass through, and is a good strategy if we aren’t particularly concerned with the speed of the traffic.

Say for example, Christabel is watching a stream of the news while on a video meeting. When Christabel starts watching the video, her browser fetches a bunch of content and stores it in various buffers on the way from the content host to the browser. Those same buffers also contain data packets pertaining to the video meeting Christabel is currently in. If the data generated as part of a video conference sits in the same buffer as the video files, the video files will fill up the buffer and cause delay for the video meeting packets as well. That type of buffering delay is called bufferbloat, and leads to jittery video calls and delays where people speak over each other and get frustrated.

Making home Internet faster has little to do with “speed”

To help users understand the strengths and weaknesses of their connection, we recently added Aggregated Internet Measurement (AIM) scores to our own “Speed” Test. These scores remove the technical metrics and give users a real-world, plain-English understanding of what their connection will be good at, and where it might struggle. We’d also like to collect more data from our speed test to help track Page Load Times (PLT) and see how they are correlated with the reduction of lower working latency. You’ll start seeing those numbers on our speed test soon!

Making home Internet faster has little to do with “speed”

We all use our Internet connections in slightly different ways, but we share the desire for our connections to be as fast as possible. As more and more services move into the cloud – word documents, music, websites, communications, etc – the speed at which we can access those services becomes critical. While bandwidth plays a part, the latency of the connection – the real Internet “speed” – is more important.

At Cloudflare, we’re working every day to help build a more performant Internet. Want to help? Apply for one of our open engineering roles here.

Closing out 2022 with our latest Impact Report

Post Syndicated from Andie Goodwin original https://blog.cloudflare.com/impact-report-2022/

Closing out 2022 with our latest Impact Report

Closing out 2022 with our latest Impact Report

To conclude Impact Week, which has been filled with announcements about new initiatives and features that we are thrilled about, today we are publishing our 2022 Impact Report.

In short, the Impact Report is an annual summary highlighting how we are helping build a better Internet and the progress we are making on our environmental, social, and governance priorities. It is where we showcase successes from Cloudflare Impact programs, celebrate awards and recognitions, and explain our approach to fundamental values like transparency and privacy.

We believe that a better Internet is principled, for everyone, and sustainable; these are the three themes around which we constructed the report. The Impact Report also serves as our repository for disclosures consistent with our commitments for the Global Reporting Initiative (GRI), Sustainability Accounting Standards Board (SASB), and UN Global Compact (UNGC).

Check out the full report to:

  • Explore how we are expanding the value and scope of our Cloudflare Impact programs
  • Review our latest diversity statistics — and our newest employee resource group
  • Understand how we are supporting humanitarian and human rights causes
  • Read quick summaries of Impact Week announcements
  • Examine how we calculate and validate emissions data

As fantastic as 2022 has been for scaling up Cloudflare Impact and making strides toward a better Internet, we are aiming even higher in 2023. To keep up with developments throughout the year, follow us on Twitter and LinkedIn, and keep an eye out for updates on our Cloudflare Impact page.

How Cloudflare advocates for a better Internet

Post Syndicated from Christiaan Smits original https://blog.cloudflare.com/how-cloudflare-advocates-for-a-better-internet/

How Cloudflare advocates for a better Internet

How Cloudflare advocates for a better Internet

We mean a lot of things when we talk about helping to build a better Internet. Sometimes, it’s about democratizing technologies that were previously only available to the wealthiest and most technologically savvy companies, sometimes it’s about protecting the most vulnerable groups from cyber attacks and online prosecution. And the Internet does not exist in a vacuum.

As a global company, we see the way that the future of the Internet is affected by governments, regulations, and people. If we want to help build a better Internet, we have to make sure that we are in the room, sharing Cloudflare’s perspective in the many places where important conversations about the Internet are happening. And that is why we believe strongly in the value of public policy.

We thought this week would be a great opportunity to share Cloudflare’s principles and our theories behind policy engagement. Because at its core, a public policy approach needs to reflect who the company is through their actions and rhetoric. And as a company, we believe there is real value in helping governments understand how companies work, and helping our employees understand how governments and law-makers work. Especially now, during a time in which many jurisdictions are passing far-reaching laws that shape the future of the Internet, from laws on content moderation, to new and more demanding regulations on cybersecurity.

Principled, Curious, Transparent

At Cloudflare, we have three core company values: we are Principled, Curious, and Transparent. By principled, we mean thoughtful, consistent, and long-term oriented about what the right course of action is. By curious, we mean taking on big challenges and understanding the why and how behind things. Finally, by transparent, we mean being clear on why and how we decide to do things both internally and externally.

Our approach to public policy aims to integrate these three values into our engagement with stakeholders. We are thoughtful when choosing the right issues to prioritize, and are consistent once we have chosen to take a position on a particular topic. We are curious about the important policy conversations that governments and institutions around the world are having about the future of the Internet, and want to understand the different points of view in that debate. And we aim to be as transparent as possible when talking about our policy stances, by, for example, writing blogs, submitting comments to public consultations, or participating in conversations with policymakers and our peers in the industry. And, for instance with this blog, we also aim to be transparent about our actual advocacy efforts.

What makes Cloudflare different?

With approximately 20 percent of websites using our service, including those who use our free tier, Cloudflare protects a wide variety of customers from cyberattack. Our business model relies on economies of scale, and customers choosing to add products and services to our entry-level cybersecurity protections. This means our policy perspective can be broad: we are advocating for a better Internet for our customers who are Fortune 1000 companies, as well as for individual developers with hobby blogs or small business websites. It also means that our perspective is distinct: we have a business model that is unique, and therefore a perspective that often isn’t represented by others.

Strategy

We are not naive: we do not believe that a growing company can command the same attention as some of the Internet giants, or has the capacity to engage on as many issues as those bigger companies. So how do we prioritize? What’s our rule of thumb on how and when we engage?

Our starting point is to think about the policy developments that have the largest impact on our own activities. Which issues could force us to change our model? Cause significant (financial) impact? Skew incentives for stronger cybersecurity? Then we do the exercise again, this time, thinking about whether our perspective on that policy issue is dramatically different from those of other companies in the industry. Is it important to us, but we share the same perspective as other cybersecurity, infrastructure, or cloud companies? We pass. For example, while changing corporate tax rates could have a significant financial impact on our business, we don’t exactly have a unique perspective on that. So that’s off the list. But privacy? There we think we have a distinct perspective, as a company that practices privacy by design, and supports and develops standards that help ensure privacy on the Internet. And crucially: we think privacy will be critical to the future of the Internet. So on public policy ideas related to privacy we engage. And then there is our unique vantage point, derived from our global network. This often gives us important insight and data, which we can use to educate policymakers on relevant issues.

Our engagement channels

Our Public Policy team includes people who have worked in government, law firms and the tech industry before they joined Cloudflare. The informal networks, professional relationships, and expertise that they have built over the course of their careers are instrumental in ensuring that Cloudflare is involved in important policy conversations about the Internet. We do not have a Political Action Committee, and we do not make political contributions.

As mentioned, we try to focus on the issues where we can make a difference, where we have a unique interest, perspective and expertise. Nonetheless, there are many policies and regulations that could affect not only us at Cloudflare, but the entire Internet ecosystem. In order to track policy developments worldwide, and ensure that we are able to share information, we are members of a number of associations and coalitions.

Some of these advocacy groups represent a particular industry, such as software companies, or US based technology firms, and engage with lawmakers on a wide variety of relevant policy issues for their particular sector. Other groups, in contrast, focus their advocacy on a more specific policy issue.

In addition to formal trade association memberships, we will occasionally join coalitions of companies or civil society organizations assembled for particular advocacy purposes. For example, we periodically engage with the Stronger Internet coalition, to share information about policies around encryption, privacy, and free expression around the world.

It almost goes without saying that, given our commitment to transparency as a company and entirely in line with our own ethics code and legal compliance, we fully comply with all relevant rules around advocacy in jurisdictions across the world. You can also find us in transparency registers of governmental entities, where these exist. Because we want to be transparent about how we advocate for a better Internet, today we have published an overview of the organizations we work with on our website.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

Post Syndicated from Carlos Rodrigues original https://blog.cloudflare.com/rpki-updates-data/

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

The Border Gateway Protocol (BGP) is the glue that keeps the entire Internet together. However, despite its vital function, BGP wasn’t originally designed to protect against malicious actors or routing mishaps. It has since been updated to account for this shortcoming with the Resource Public Key Infrastructure (RPKI) framework, but can we declare it to be safe yet?

If the question needs asking, you might suspect we can’t. There is a shortage of reliable data on how much of the Internet is protected from preventable routing problems. Today, we’re releasing a new method to measure exactly that: what percentage of Internet users are protected by their Internet Service Provider from these issues. We find that there is a long way to go before the Internet is protected from routing problems, though it varies dramatically by country.

Why RPKI is necessary to secure Internet routing

The Internet is a network of independently-managed networks, called Autonomous Systems (ASes). To achieve global reachability, ASes interconnect with each other and determine the feasible paths to a given destination IP address by exchanging routing information using BGP. BGP enables routers with only local network visibility to construct end-to-end paths based on the arbitrary preferences of each administrative entity that operates that equipment. Typically, Internet traffic between a user and a destination traverses multiple AS networks using paths constructed by BGP routers.

BGP, however, lacks built-in security mechanisms to protect the integrity of the exchanged routing information and to provide authentication and authorization of the advertised IP address space. Because of this, AS operators must implicitly trust that the routing information exchanged through BGP is accurate. As a result, the Internet is vulnerable to the injection of bogus routing information, which cannot be mitigated by security measures at the client or server level of the network.

An adversary with access to a BGP router can inject fraudulent routes into the routing system, which can be used to execute an array of attacks, including:

  • Denial-of-Service (DoS) through traffic blackholing or redirection,
  • Impersonation attacks to eavesdrop on communications,
  • Machine-in-the-Middle exploits to modify the exchanged data, and subvert reputation-based filtering systems.

Additionally, local misconfigurations and fat-finger errors can be propagated well beyond the source of the error and cause major disruption across the Internet.

Such an incident happened on June 24, 2019. Millions of users were unable to access Cloudflare address space when a regional ISP in Pennsylvania accidentally advertised routes to Cloudflare through their capacity-limited network. This was effectively the Internet equivalent of routing an entire freeway through a neighborhood street.

Traffic misdirections like these, either unintentional or intentional, are not uncommon. The Internet Society’s MANRS (Mutually Agreed Norms for Routing Security) initiative estimated that in 2020 alone there were over 3,000 route leaks and hijacks, and new occurrences can be observed every day through Cloudflare Radar.

The most prominent proposals to secure BGP routing, standardized by the IETF focus on validating the origin of the advertised routes using Resource Public Key Infrastructure (RPKI) and verifying the integrity of the paths with BGPsec. Specifically, RPKI (defined in RFC 7115) relies on a Public Key Infrastructure to validate that an AS advertising a route to a destination (an IP address space) is the legitimate owner of those IP addresses.

RPKI has been defined for a long time but lacks adoption. It requires network operators to cryptographically sign their prefixes, and routing networks to perform an RPKI Route Origin Validation (ROV) on their routers. This is a two-step operation that requires coordination and participation from many actors to be effective.

The two phases of RPKI adoption: signing origins and validating origins

RPKI has two phases of deployment: first, an AS that wants to protect its own IP prefixes can cryptographically sign Route Origin Authorization (ROA) records thereby attesting to be the legitimate origin of that signed IP space. Second, an AS can avoid selecting invalid routes by performing Route Origin Validation (ROV, defined in RFC 6483).

With ROV, a BGP route received by a neighbor is validated against the available RPKI records. A route that is valid or missing from RPKI is selected, while a route with RPKI records found to be invalid is typically rejected, thus preventing the use and propagation of hijacked and misconfigured routes.

One issue with RPKI is the fact that implementing ROA is meaningful only if other ASes implement ROV, and vice versa. Therefore, securing BGP routing requires a united effort and a lack of broader adoption disincentivizes ASes from commiting the resources to validate their own routes. Conversely, increasing RPKI adoption can lead to network effects and accelerate RPKI deployment. Projects like MANRS and Cloudflare’s isbgpsafeyet.com are promoting good Internet citizenship among network operators, and make the benefits of RPKI deployment known to the Internet. You can check whether your own ISP is being a good Internet citizen by testing it on isbgpsafeyet.com.

Measuring the extent to which both ROA (signing of addresses by the network that controls them) and ROV (filtering of invalid routes by ISPs) have been implemented is important to evaluating the impact of these initiatives, developing situational awareness, and predicting the impact of future misconfigurations or attacks.

Measuring ROAs is straightforward since ROA data is readily available from RPKI repositories. Querying RPKI repositories for publicly routed IP prefixes (e.g. prefixes visible in the RouteViews and RIPE RIS routing tables) allows us to estimate the percentage of addresses covered by ROA objects. Currently, there are 393,344 IPv4 and 86,306 IPv6 ROAs in the global RPKI system, covering about 40% of the globally routed prefix-AS origin pairs1.

Measuring ROV, however, is significantly more challenging given it is configured inside the BGP routers of each AS, not accessible by anyone other than each router’s administrator.

Measuring ROV deployment

Although we do not have direct access to the configuration of everyone’s BGP routers, it is possible to infer the use of ROV by comparing the reachability of RPKI-valid and RPKI-invalid prefixes from measurement points within an AS2.

Consider the following toy topology as an example, where an RPKI-invalid origin is advertised through AS0 to AS1 and AS2. If AS1 filters and rejects RPKI-invalid routes, a user behind AS1 would not be able to connect to that origin. By contrast, if AS2 does not reject RPKI invalids, a user behind AS2 would be able to connect to that origin.

While occasionally a user may be unable to access an origin due to transient network issues, if multiple users act as vantage points for a measurement system, we would be able to collect a large number of data points to infer which ASes deploy ROV.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

If, in the figure above, AS0 filters invalid RPKI routes, then vantage points in both AS1 and AS2 would be unable to connect to the RPKI-invalid origin, making it hard to distinguish if ROV is deployed at the ASes of our vantage points or in an AS along the path. One way to mitigate this limitation is to announce the RPKI-invalid origin from multiple locations from an anycast network taking advantage of its direct interconnections to the measurement vantage points as shown in the figure below. As a result, an AS that does not itself deploy ROV is less likely to observe the benefits of upstream ASes using ROV, and we would be able to accurately infer ROV deployment per AS3.

Note that it’s also important that the IP address of the RPKI-invalid origin should not be covered by a less specific prefix for which there is a valid or unknown RPKI route, otherwise even if an AS filters invalid RPKI routes its users would still be able to find a route to that IP.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

The measurement technique described here is the one implemented by Cloudflare’s isbgpsafeyet.com website, allowing end users to assess whether or not their ISPs have deployed BGP ROV.

The isbgpsafeyet.com website itself doesn’t submit any data back to Cloudflare, but recently we started measuring whether end users’ browsers can successfully connect to invalid RPKI origins when ROV is present. We use the same mechanism as is used for global performance data4. In particular, every measurement session (an individual end user at some point in time) attempts a request to both valid.rpki.cloudflare.com, which should always succeed as it’s RPKI-valid, and invalid.rpki.cloudflare.com, which is RPKI-invalid and should fail when the user’s ISP uses ROV.

This allows us to have continuous and up-to-date measurements from hundreds of thousands of browsers on a daily basis, and develop a greater understanding of the state of ROV deployment.

The state of global ROV deployment

The figure below shows the raw number of ROV probe requests per hour during October 2022 to valid.rpki.cloudflare.com and invalid.rpki.cloudflare.com. In total, we observed 69.7 million successful probes from 41,531 ASNs.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

Based on APNIC’s estimates on the number of end users per ASN, our weighted5 analysis covers 96.5% of the world’s Internet population. As expected, the number of requests follow a diurnal pattern which reflects established user behavior in daily and weekly Internet activity6.

We can also see that the number of successful requests to valid.rpki.cloudflare.com (gray line) closely follows the number of sessions that issued at least one request (blue line), which works as a smoke test for the correctness of our measurements.

As we don’t store the IP addresses that contribute measurements, we don’t have any way to count individual clients and large spikes in the data may introduce unwanted bias. We account for that by capturing those instants and excluding them.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

Overall, we estimate that out of the four billion Internet users, only 261 million (6.5%) are protected by BGP Route Origin Validation, but the true state of global ROV deployment is more subtle than this.

The following map shows the fraction of dropped RPKI-invalid requests from ASes with over 200 probes over the month of October. It depicts how far along each country is in adopting ROV but doesn’t necessarily represent the fraction of protected users in each country, as we will discover.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

Sweden and Bolivia appear to be the countries with the highest level of adoption (over 80%), while only a few other countries have crossed the 50% mark (e.g. Finland, Denmark, Chad, Greece, the United States).

ROV adoption may be driven by a few ASes hosting large user populations, or by many ASes hosting small user populations. To understand such disparities, the map below plots the contrast between overall adoption in a country (as in the previous map) and median adoption over the individual ASes within that country. Countries with stronger reds have relatively few ASes deploying ROV with high impact, while countries with stronger blues have more ASes deploying ROV but with lower impact per AS.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

In the Netherlands, Denmark, Switzerland, or the United States, adoption appears mostly driven by their larger ASes, while in Greece or Yemen it’s the smaller ones that are adopting ROV.

The following histogram summarizes the worldwide level of adoption for the 6,765 ASes covered by the previous two maps.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

Most ASes either don’t validate at all, or have close to 100% adoption, which is what we’d intuitively expect. However, it’s interesting to observe that there are small numbers of ASes all across the scale. ASes that exhibit partial RPKI-invalid drop rate compared to total requests may either implement ROV partially (on some, but not all, of their BGP routers), or appear as dropping RPKI invalids due to ROV deployment by other ASes in their upstream path.

To estimate the number of users protected by ROV we only considered ASes with an observed adoption above 95%, as an AS with an incomplete deployment still leaves its users vulnerable to route leaks from its BGP peers.

If we take the previous histogram and summarize by the number of users behind each AS, the green bar on the right corresponds to the 261 million users currently protected by ROV according to the above criteria (686 ASes).

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

Looking back at the country adoption map one would perhaps expect the number of protected users to be larger. But worldwide ROV deployment is still mostly partial, lacking larger ASes, or both. This becomes even more clear when compared with the next map, plotting just the fraction of fully protected users.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

To wrap up our analysis, we look at two world economies chosen for their contrasting, almost symmetrical, stages of deployment: the United States and the European Union.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation
Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

112 million Internet users are protected by 111 ASes from the United States with comprehensive ROV deployments. Conversely, more than twice as many ASes from countries making up the European Union have fully deployed ROV, but end up covering only half as many users. This can be reasonably explained by end user ASes being more likely to operate within a single country rather than span multiple countries.

Conclusion

Probe requests were performed from end user browsers and very few measurements were collected from transit providers (which have few end users, if any). Also, paths between end user ASes and Cloudflare are often very short (a nice outcome of our extensive peering) and don’t traverse upper-tier networks that they would otherwise use to reach the rest of the Internet.

In other words, the methodology used focuses on ROV adoption by end user networks (e.g. ISPs) and isn’t meant to reflect the eventual effect of indirect validation from (perhaps validating) upper-tier transit networks. While indirect validation may limit the “blast radius” of (malicious or accidental) route leaks, it still leaves non-validating ASes vulnerable to leaks coming from their peers.

As with indirect validation, an AS remains vulnerable until its ROV deployment reaches a sufficient level of completion. We chose to only consider AS deployments above 95% as truly comprehensive, and Cloudflare Radar will soon begin using this threshold to track ROV adoption worldwide, as part of our mission to help build a better Internet.

When considering only comprehensive ROV deployments, some countries such as Denmark, Greece, Switzerland, Sweden, or Australia, already show an effective coverage above 50% of their respective Internet populations, with others like the Netherlands or the United States slightly above 40%, mostly driven by few large ASes rather than many smaller ones.

Worldwide we observe a very low effective coverage of just 6.5% over the measured ASes, corresponding to 261 million end users currently safe from (malicious and accidental) route leaks, which means there’s still a long way to go before we can declare BGP to be safe.

……
1https://rpki.cloudflare.com/
2Gilad, Yossi, Avichai Cohen, Amir Herzberg, Michael Schapira, and Haya Shulman. "Are we there yet? On RPKI’s deployment and security." Cryptology ePrint Archive (2016).
3Geoff Huston. “Measuring ROAs and ROV”. https://blog.apnic.net/2021/03/24/measuring-roas-and-rov/
4Measurements are issued stochastically when users encounter 1xxx error pages from default (non-customer) configurations.
5Probe requests are weighted by AS size as calculated from Cloudflare’s worldwide HTTP traffic.
6Quan, Lin, John Heidemann, and Yuri Pradkin. "When the Internet sleeps: Correlating diurnal networks with external factors." In Proceedings of the 2014 Conference on Internet Measurement Conference, pp. 87-100. 2014.

Introducing Cloudflare’s Third Party Code of Conduct

Post Syndicated from Andria Jones original https://blog.cloudflare.com/introducing-cloudflares-third-party-code-of-conduct/

Introducing Cloudflare's Third Party Code of Conduct

Introducing Cloudflare's Third Party Code of Conduct

Cloudflare is on a mission to help build a better Internet, and we are committed to doing this with ethics and integrity in everything that we do. This commitment extends beyond our own actions, to third parties acting on our behalf. Cloudflare has the same expectations of ethics and integrity of our suppliers, resellers, and other partners as we do of ourselves.

Our new code of conduct for third parties

We first shared publicly our Code of Business Conduct and Ethics during Cloudflare’s initial public offering in September 2019. All Cloudflare employees take legal training as part of their onboarding process, as well as an annual refresher course, which includes the topics covered in our Code, and they sign an acknowledgement of our Code and related policies as well.

While our Code of Business Conduct and Ethics applies to all directors, officers and employees of Cloudflare, it has not extended to third parties. Today, we are excited to share our Third Party Code of Conduct, specifically formulated with our suppliers, resellers, and other partners in mind. It covers such topics as:

  • Human Rights
  • Fair Labor
  • Environmental Sustainability
  • Anti-Bribery and Anti-Corruption
  • Trade Compliance
  • Anti-Competition
  • Conflicts of Interest
  • Data Privacy and Security
  • Government Contracting

But why have another Code?

We work with a wide range of third parties in all parts of the world, including countries with a high risk of corruption, potential for political unrest, and also countries that are just not governed by the same laws that we may see as standard in the United States. We wanted a Third Party Code of Conduct that serves as a statement of Cloudflare’s core values and commitments, and a call for third parties who share the same.

The following are just a few examples of how we want to ensure our third parties act with ethics and integrity on our behalf, even when we aren’t watching:

We want to ensure that the servers and other equipment in our supply chain are sourced responsibly, from manufacturers who respect human rights — free of forced or child labor, with environmental sustainability at the forefront.

We want to provide our products and services to customers based on the quality of Cloudflare, not because a third party reseller may bribe a customer to enter into an agreement.

We want to ensure there are no conflicts of interest with our third parties, that might give someone an unfair advantage.

As a government contractor, we want to ensure that we do not have telecommunications or video surveillance equipment, systems, or services from prohibited parties in our supply chain to protect national security interests.

Having a Third Party Code of Conduct is also industry standard. As Cloudflare garners an increasing number of Fortune 500 and other enterprise customers, we find ourselves reviewing and committing to their Third Party Codes of Conduct as well.

How it works

Our Third Party Code of Conduct is not meant to replace our terms of service or other contractual agreements. Rather, it is meant to supplement them, highlighting Cloudflare’s ethical commitments and encouraging our suppliers, resellers, and other partners to commit to the same. We will be cascading this new Code to all existing third parties, and include it at onboarding for all new third parties going forward. A violation of the Code, or any contractual agreements between Cloudflare and our third parties, may result in termination of the relationship.

This Third Party Code of Conduct is only one facet of Cloudflare’s third party due diligence program, and it complements the other work that Cloudflare does in this area. Cloudflare rigorously screens and vets our suppliers and partners at onboarding, and we continue to routinely monitor and audit them over time. We are always looking for ways to communicate with, educate, and learn from our third parties as well.

Join our mission

Are you a supplier, reseller or other partner who shares these values of ethics and integrity? Come work with us and join Cloudflare on its mission to help build a better, more ethical Internet.

The unintended consequences of blocking IP addresses

Post Syndicated from Alissa Starzak original https://blog.cloudflare.com/consequences-of-ip-blocking/

The unintended consequences of blocking IP addresses

The unintended consequences of blocking IP addresses

In late August 2022, Cloudflare’s customer support team began to receive complaints about sites on our network being down in Austria. Our team immediately went into action to try to identify the source of what looked from the outside like a partial Internet outage in Austria. We quickly realized that it was an issue with local Austrian Internet Service Providers.

But the service disruption wasn’t the result of a technical problem. As we later learned from media reports, what we were seeing was the result of a court order. Without any notice to Cloudflare, an Austrian court had ordered Austrian Internet Service Providers (ISPs) to block 11 of Cloudflare’s IP addresses.

In an attempt to block 14 websites that copyright holders argued were violating copyright, the court-ordered IP block rendered thousands of websites inaccessible to ordinary Internet users in Austria over a two-day period. What did the thousands of other sites do wrong? Nothing. They were a temporary casualty of the failure to build legal remedies and systems that reflect the Internet’s actual architecture.

Today, we are going to dive into a discussion of IP blocking: why we see it, what it is, what it does, who it affects, and why it’s such a problematic way to address content online.

Collateral effects, large and small

The craziest thing is that this type of blocking happens on a regular basis, all around the world. But unless that blocking happens at the scale of what happened in Austria, or someone decides to highlight it, it is typically invisible to the outside world. Even Cloudflare, with deep technical expertise and understanding about how blocking works, can’t routinely see when an IP address is blocked.

For Internet users, it’s even more opaque. They generally don’t know why they can’t connect to a particular website, where the connection problem is coming from, or how to address it. They simply know they cannot access the site they were trying to visit. And that can make it challenging to document when sites have become inaccessible because of IP address blocking.

Blocking practices are also wide-spread. In their Freedom on the Net report, Freedom House recently reported that 40 out of the 70 countries that they examined – which vary from countries like Russia, Iran and Egypt to Western democracies like the United Kingdom and Germany –  did some form of website blocking. Although the report doesn’t delve into exactly how those countries block, many of them use forms of IP blocking, with the same kind of potential effects for a partial Internet shutdown that we saw in Austria.

Although it can be challenging to assess the amount of collateral damage from IP blocking, we do have examples where organizations have attempted to quantify it. In conjunction with a case before the European Court of Human Rights, the European Information Society Institute, a Slovakia-based nonprofit, reviewed Russia’s regime for website blocking in 2017. Russia exclusively used IP addresses to block content. The European Information Society Institute concluded that IP blocking led to “collateral website blocking on a massive scale” and noted that as of June 28, 2017, “6,522,629 Internet resources had been blocked in Russia, of which 6,335,850 – or 97% – had been blocked collaterally, that is to say, without legal justification.”

In the UK, overbroad blocking prompted the non-profit Open Rights Group to create the website Blocked.org.uk. The website has a tool enabling users and site owners to report on overblocking and request that ISPs remove blocks. The group also has hundreds of individual stories about the effect of blocking on those whose websites were inappropriately blocked, from charities to small business owners. Although it’s not always clear what blocking methods are being used, the fact that the site is necessary at all conveys the amount of overblocking. Imagine a dressmaker, watchmaker or car dealer looking to advertise their services and potentially gain new customers with their website. That doesn’t work if local users can’t access the site.

One reaction might be, “Well, just make sure there are no restricted sites sharing an address with unrestricted sites.” But as we’ll discuss in more detail, this ignores the large difference between the number of possible domain names and the number of available IP addresses, and runs counter to the very technical specifications that empower the Internet. Moreover, the definitions of restricted and unrestricted differ across nations, communities, and organizations. Even if it were possible to know all the restrictions, the designs of the protocols — of the Internet, itself — mean that it is simply infeasible, if not impossible, to satisfy every agency’s constraints.

Overblocking websites is not only a problem for users; it has legal implications. Because of the effect it can have on ordinary citizens looking to exercise their rights online, government entities (both courts and regulatory bodies) have a legal obligation to make sure that their orders are necessary and proportionate, and don’t unnecessarily affect those who are not contributing to the harm.

It would be hard to imagine, for example, that a court in response to alleged wrongdoing would blindly issue a search warrant or an order based solely on a street address without caring if that address was for a single family home, a six-unit condo building, or a high rise with hundreds of separate units. But those sorts of practices with IP addresses appear to be rampant.

In 2020, the European Court of Human Rights (ECHR) – the court overseeing the implementation of the Council of Europe’s European Convention on Human Rights – considered a case involving a website that was blocked in Russia not because it had been targeted by the Russian government, but because it shared an IP address with a blocked website. The website owner brought suit over the block. The ECHR concluded that the indiscriminate blocking was impermissible, ruling that the block on the lawful content of the site “amounts to arbitrary interference with the rights of owners of such websites.” In other words, the ECHR ruled that it was improper for a government to issue orders that resulted in the blocking of sites that were not targeted.

Using Internet infrastructure to address content challenges

Ordinary Internet users don’t think a lot about how the content they are trying to access online is delivered to them. They assume that when they type a domain name into their browser, the content will automatically pop up. And if it doesn’t, they tend to assume the website itself is having problems unless their entire Internet connection seems to be broken. But those basic assumptions ignore the reality that connections to a website are often used to limit access to content online.

Why do countries block connections to websites? Maybe they want to limit their own citizens from accessing what they believe to be illegal content – like online gambling or explicit material – that is permissible elsewhere in the world. Maybe they want to prevent the viewing of a foreign news source that they believe to be primarily disinformation. Or maybe they want to support copyright holders seeking to block access to a website to limit viewing of content that they believe infringes their intellectual property.

To be clear, blocking access is not the same thing as removing content from the Internet. There are a variety of legal obligations and authorities designed to permit actual removal of illegal content. Indeed, the legal expectation in many countries is that blocking is a matter of last resort, after attempts have been made to remove content at the source.

Blocking just prevents certain viewers – those whose Internet access depends on the ISP that is doing the blocking – from being able to access websites. The site itself continues to exist online and is accessible by everyone else. But when the content originates from a different place and can’t be easily removed, a country may see blocking as their best or only approach.

We recognize the concerns that sometimes drive countries to implement blocking. But fundamentally, we believe it’s important for users to know when the websites they are trying to access have been blocked, and, to the extent possible, who has blocked them from view and why. And it’s critical that any restrictions on content should be as limited as possible to address the harm, to avoid infringing on the rights of others.

Brute force IP address blocking doesn’t allow for those things. It’s fully opaque to Internet users. The practice has unintended, unavoidable consequences on other content. And the very fabric of the Internet means that there is no good way to identify what other websites might be affected either before or during an IP block.

To understand what happened in Austria and what happens in many other countries around the world that seek to block content with the bluntness of IP addresses, we have to understand what is going on behind the scenes. That means diving into some technical details.

Identity is attached to names, never addresses

Before we even get started describing the technical realities of blocking, it’s important to stress that the first and best option to deal with content is at the source. A website owner or hosting provider has the option of removing content at a granular level, without having to take down an entire website. On the more technical side, a domain name registrar or registry can potentially withdraw a domain name, and therefore a website, from the Internet altogether.

But how do you block access to a website, if for whatever reason the content owner or content source is unable or unwilling to remove it from the Internet?  There are only three possible control points.

The first is via the Domain Name System (DNS), which translates domain names into IP addresses so that the site can be found. Instead of returning a valid IP address for a domain name, the DNS resolver could lie and respond with a code, NXDOMAIN, meaning that “there is no such name.” A better approach would be to use one of the honest error numbers standardized in 2020, including error 15 for blocked, error 16 for censored, 17 for filtered, or 18 for prohibited, although these are not widely used currently.

Interestingly, the precision and effectiveness of DNS as a control point depends on whether the DNS resolver is private or public. Private or ‘internal’ DNS resolvers are operated by ISPs and enterprise environments for their own known clients, which means that operators can be precise in applying content restrictions. By contrast, that level of precision is unavailable to open or public resolvers, not least because routing and addressing is global and ever-changing on the Internet map, and in stark contrast to addresses and routes on a fixed postal or street map. For example, private DNS resolvers may be able to block access to websites within specified geographic regions with at least some level of accuracy in a way that public DNS resolvers cannot, which becomes profoundly important given the disparate (and inconsistent) blocking regimes around the world.

The second approach is to block individual connection requests to a restricted domain name. When a user or client wants to visit a website, a connection is initiated from the client to a server name, i.e. the domain name. If a network or on-path device is able to observe the server name, then the connection can be terminated. Unlike DNS, there is no mechanism to communicate to the user that access to the server name was blocked, or why.

The third approach is to block access to an IP address where the domain name can be found. This is a bit like blocking the delivery of all mail to a physical address. Consider, for example, if that address is a skyscraper with its many unrelated and independent occupants. Halting delivery of mail to the address of the skyscraper causes collateral damage by invariably affecting all parties at that address. IP addresses work the same way.

Notably, the IP address is the only one of the three options that has no attachment to the domain name. The website domain name is not required for routing and delivery of data packets; in fact it is fully ignored. A website can be available on any IP address, or even on many IP addresses, simultaneously. And the set of IP addresses that a website is on can change at any time. The set of IP addresses cannot definitively be known by querying DNS, which has been able to return any valid address at any time for any reason, since 1995.

The idea that an address is representative of an identity is anathema to the Internet’s design, because the decoupling of address from name is deeply embedded in the Internet standards and protocols, as is explained next.

The Internet is a set of protocols, not a policy or perspective

Many people still incorrectly assume that an IP address represents a single website. We’ve previously stated that the association between names and addresses is understandable given that the earliest connected components of the Internet appeared as one computer, one interface, one address, and one name. This one-to-one association was an artifact of the ecosystem in which the Internet Protocol was deployed, and satisfied the needs of the time.

Despite the one-to-one naming practice of the early Internet, it has always been possible to assign more than one name to a server (or ‘host’). For example, a server was (and is still) often configured with names to reflect its service offerings such as mail.example.com and www.example.com, but these shared a base domain name.  There were few reasons to have completely different domain names until the need to colocate completely different websites onto a single server. That practice was made easier in 1997 by the Host header in HTTP/1.1, a feature preserved by the SNI field in a TLS extension in 2003.

Throughout these changes, the Internet Protocol and, separately, the DNS protocol, have not only kept pace, but have remained fundamentally unchanged. They are the very reason that the Internet has been able to scale and evolve, because they are about addresses, reachability, and arbitrary name to IP address relationships.

The designs of IP and DNS are also entirely independent, which only reinforces that names are separate from addresses. A closer inspection of the protocols’ design elements illuminates the misperceptions of policies that lead to today’s common practice of controlling access to content by blocking IP addresses.

By design, IP is for reachability and nothing else

Much like large public civil engineering projects rely on building codes and best practice, the Internet is built using a set of open standards and specifications informed by experience and agreed by international consensus. The Internet standards that connect hardware and applications are published by the Internet Engineering Task Force (IETF) in the form of “Requests for Comment” or RFCs — so named not to suggest incompleteness, but to reflect that standards must be able to evolve with knowledge and experience. The IETF and its RFCs are cemented in the very fabric of communications, for example, with the first RFC 1 published in 1969. The Internet Protocol (IP) specification reached RFC status in 1981.

Alongside the standards organizations, the Internet’s success has been helped by a core idea known as the end-to-end (e2e) principle, codified also in 1981, based on years of trial and error experience. The end-to-end principle is a powerful abstraction that, despite taking many forms, manifests a core notion of the Internet Protocol specification: the network’s only responsibility is to establish reachability, and every other possible feature has a cost or a risk.

The idea of “reachability” in the Internet Protocol is also enshrined in the design of IP addresses themselves. Looking at the Internet Protocol specification, RFC 791, the following excerpt from Section 2.3 is explicit about IP addresses having no association with names, interfaces, or anything else.

Addressing

    A distinction is made between names, addresses, and routes [4].   A
    name indicates what we seek.  An address indicates where it is.  A
    route indicates how to get there.  The internet protocol deals
    primarily with addresses.  It is the task of higher level (i.e.,
    host-to-host or application) protocols to make the mapping from
    names to addresses.   The internet module maps internet addresses to
    local net addresses.  It is the task of lower level (i.e., local net
    or gateways) procedures to make the mapping from local net addresses
    to routes.
                            [ RFC 791, 1981 ]

Just like postal addresses for skyscrapers in the physical world, IP addresses are no more than street addresses written on a piece of paper. And just like a street address on paper, one can never be confident about the entities or organizations that exist behind an IP address. In a network like Cloudflare’s, any single IP address represents thousands of servers, and can have even more websites and services — in some cases numbering into the millions — expressly because the Internet Protocol is designed to enable it.

Here’s an interesting question: could we, or any content service provider, ensure that every IP address matches to one and only one name? The answer is an unequivocal no, and here too, because of a protocol design — in this case, DNS.

The number of names in DNS always exceeds the available addresses

A one-to-one relationship between names and addresses is impossible given the Internet specifications for the same reasons that it is infeasible in the physical world. Ignore for a moment that people and organizations can change addresses. Fundamentally, the number of people and organizations on the planet exceeds the number of postal addresses. We not only want, but need for the Internet to accommodate more names than addresses.

The difference in magnitude between names and addresses is also codified in the specifications. IPv4 addresses are 32 bits, and IPv6 addresses are 128 bits. The size of a domain name that can be queried by DNS is as many as 253 octets, or 2,024 bits (from Section 2.3.4 in RFC 1035, published 1987). The table below helps to put those differences into perspective:

The unintended consequences of blocking IP addresses

On November 15, 2022, the United Nations announced the population of the Earth surpassed eight billion people. Intuitively, we know that there cannot be anywhere near as many postal addresses. The difference between the number of possible names on the planet, and similarly on the Internet, does and must exceed the number of available addresses.

The proof is in the pudding names!

Now that those two relevant principles about IP addresses and DNS names in the international standards are understood – that IP address and domain names serve distinct purposes and there is no one to one relationship between the two – an examination of a recent case of content blocking using IP addresses can help to see the reasons it is problematic. Take, for example, the IP blocking incident in Austria late August 2022. The goal was to restrict access to 14 target domains, by blocking 11 IP addresses (source: RTR.Telekom. Post via the Internet Archive) — the mismatch between those two numbers should have been a warning flag that IP blocking might not have the desired effect.

Analogies and international standards may explain the reasons that IP blocking should be avoided, but we can see the scale of the problem by looking at Internet-scale data. To better understand and explain the severity of IP blocking, we decided to generate a global view of domain names and IP addresses (thanks are due to a PhD research intern, Sudheesh Singanamalla, for the effort). In September 2022, we used the authoritative zone files for the top-level domains (TLDs) .com, .net, .info, and .org, together with top-1M website lists, to find a total of 255,315,270 unique names. We then queried DNS from each of five regions and recorded the set of IP addresses returned. The table below summarizes our findings:

The unintended consequences of blocking IP addresses

The table above makes clear that it takes no more than 10.7 million addresses to reach 255,315,270 million names from any region on the planet, and the total set of IP addresses for those names from everywhere is about 16 million — the ratio of names to IP addresses is nearly 24x in Europe and 16x globally.

There is one more worthwhile detail about the numbers above: The IP addresses are the combined totals of both IPv4 and IPv6 addresses, meaning that far fewer addresses are needed to reach all 255M websites.

We’ve also inspected the data a few different ways to find some interesting observations. For example, the figure below shows the cumulative distribution (CDF) of the proportion of websites that can be visited with each additional IP address. On the y-axis is the proportion of websites that can be reached given some number of IP addresses. On the x-axis, the 16M IP addresses are ranked from the most domains on the left, to the least domains on the right. Note that any IP address in this set is a response from DNS and so it must have at least one domain name, but the highest numbers of domains on IP addresses in the set number are in the 8-digit millions.

The unintended consequences of blocking IP addresses

By looking at the CDF there are a few eye-watering observations:

  • Fewer than 10 IP addresses are needed to reach 20% of, or approximately 51 million, domains in the set;
  • 100 IPs are enough to reach almost 50% of domains;
  • 1000 IPs are enough to reach 60% of domains;
  • 10,000 IPs are enough to reach 80%, or about 204 million, domains.

In fact, from the total set of 16 million addresses, fewer than half, 7.1M (43.7%), of the addresses in the dataset had one name. On this ‘one’ point we must be additionally clear: we are unable to ascertain if there was only one and no other names on those addresses because there are many more domain names than those contained only in .com, .org, .info., and .net — there might very well be other names on those addresses.

In addition to having a number of domains on a single IP address, any IP address may change over time for any of those domains.  Changing IP addresses periodically can be helpful with certain security, performance, and to improve reliability for websites. One common example in use by many operations is load balancing. This means DNS queries may return different IP addresses over time, or in different places, for the same websites. This is a further, and separate, reason why blocking based on IP addresses will not serve its intended purpose over time.

Ultimately, there is no reliable way to know the number of domains on an IP address without inspecting all names in the DNS, from every location on the planet, at every moment in time — an entirely infeasible proposition.

Any action on an IP address must, by the very definitions of the protocols that rule and empower the Internet, be expected to have collateral effects.

Lack of transparency with IP blocking

So if we have to expect that the blocking of an IP address will have collateral effects, and it’s generally agreed that it’s inappropriate or even legally impermissible to overblock by blocking IP addresses that have multiple domains on them, why does it still happen? That’s hard to know for sure, so we can only speculate. Sometimes it reflects a lack of technical understanding about the possible effects, particularly from entities like judges who are not technologists. Sometimes governments just ignore the collateral damage – as they do with Internet shutdowns – because they see the blocking as in their interest. And when there is collateral damage, it’s not usually obvious to the outside world, so there can be very little external pressure to have it addressed.

It’s worth stressing that point. When an IP is blocked, a user just sees a failed connection. They don’t know why the connection failed, or who caused it to fail. On the other side, the server acting on behalf of the website doesn’t even know it’s been blocked until it starts getting complaints about the fact that it is unavailable. There is virtually no transparency or accountability for the overblocking. And it can be challenging, if not impossible, for a website owner to challenge a block or seek redress for being inappropriately blocked.

Some governments, including Austria, do publish active block lists, which is an important step for transparency. But for all the reasons we’ve discussed, publishing an IP address does not reveal all the sites that may have been blocked unintentionally. And it doesn’t give those affected a means to challenge the overblocking. Again, in the physical world example, it’s hard to imagine a court order on a skyscraper that wouldn’t be posted on the door, but we often seem to jump over such due process and notice requirements in virtual space.

We think talking about the problematic consequences of IP blocking is more important than ever as an increasing number of countries push to block content online. Unfortunately, ISPs often use IP blocks to implement those requirements. It may be that the ISP is newer or less robust than larger counterparts, but larger ISPs engage in the practice, too, and understandably so because IP blocking takes the least effort and is readily available in most equipment.

And as more and more domains are included on the same number of IP addresses, the problem is only going to get worse.

Next steps

So what can we do?

We believe the first step is to improve transparency around the use of IP blocking. Although we’re not aware of any comprehensive way to document the collateral damage caused by IP blocking, we believe there are steps we can take to expand awareness of the practice. We are committed to working on new initiatives that highlight those insights, as we’ve done with the Cloudflare Radar Outage Center.

We also recognize that this is a whole Internet problem, and therefore has to be part of a broader effort. The significant likelihood that blocking by IP address will result in restricting access to a whole series of unrelated (and untargeted) domains should make it a non-starter for everyone. That’s why we’re engaging with civil society partners and like-minded companies to lend their voices to challenge the use of blocking IP addresses as a way of addressing content challenges and to point out collateral damage when they see it.

To be clear, to address the challenges of illegal content online, countries need legal mechanisms that enable the removal or restriction of content in a rights-respecting way. We believe that addressing the content at the source is almost always the best and the required first step. Laws like the EU’s new Digital Services Act or the Digital Millennium Copyright Act provide tools that can be used to address illegal content at the source, while respecting important due process principles. Governments should focus on building and applying legal mechanisms in ways that least affect other people’s rights, consistent with human rights expectations.

Very simply, these needs cannot be met by blocking IP addresses.

We’ll continue to look for new ways to talk about network activity and disruption, particularly when it results in unnecessary limitations on access. Check out Cloudflare Radar for more insights about what we see online.

The Montgomery, Alabama Internet Exchange is making the Internet faster. We’re happy to be there.

Post Syndicated from Mike Conlow original https://blog.cloudflare.com/montgomery-alabama-ix/

The Montgomery, Alabama Internet Exchange is making the Internet faster. We’re happy to be there.

The Montgomery, Alabama Internet Exchange is making the Internet faster. We’re happy to be there.

Part of the magic of the Internet is in tens of thousands of networks connecting to each other all across the world in an effort to share information more efficiently. Cloudflare is a member of 279 Internet Exchanges (IX for short), but today we want to highlight one such dot on the global Internet map: the Montgomery, Alabama Internet Exchange, called MGMix. Thanks to the hard work of local leaders and the participation of dozens of networks (including Cloudflare), the Internet in Alabama works better today than it did before the IX launched.

Understanding IXs

Before we talk more about Alabama in particular, let’s take a step back to understand the critical role that Internet Exchanges play in our global Internet. In a simple model of exchanging Internet traffic, one person is on their laptop and requests content on a website, uses a video conferencing application, or wants to securely connect to their workplace from home. The person, or “client” in technical terms, is generally using a traditional Internet Service Provider, who they pay to access everything on the Internet. On the other hand, whatever the user is trying to reach – the website, API endpoint, or security service – or “server” in technical terms, is usually on a different network. How the data gets from the client’s network to the server’s network is not something Internet users think much about, but at Cloudflare, we think about it a lot.

One way that a network can reach another network is by paying a 3rd party network to deliver the traffic. This is called “transit” and it’s an appealing option because it’s simple. One “Tier 1” transit provider can reach the entire Internet. Of course, the tradeoff is that convenience comes at a cost – networks pay transit providers based on the quantity of traffic passed over the connection.

At the other end, larger networks often connect directly with what are called Private Network Interconnections (PNI). If one network is consistently sending large volumes of traffic to another network, it will be less expensive to use a PNI than to send the traffic over a transit provider. In this case, the two networks string a fiber cable across the ceiling of a data center where both networks have a presence, from one network’s cage to the other’s.

The Montgomery, Alabama Internet Exchange is making the Internet faster. We’re happy to be there.

Right in the Goldilocks zone between transit providers and PNIs are Internet Exchanges. An IX brings networks together in one place, and lets them freely exchange traffic. Sometimes they’re literally called “meeting rooms”. Once a network joins an IX, they might be able to reach hundreds of other networks without incurring 3rd party transit fees. Thriving IX communities are a power-up for the Internet: they reduce the cost of delivering Internet traffic, incentivizing more networks to join, while making the Internet faster through better interconnection.

Montgomery Internet Exchange (MGMix)

Back to Alabama. Unfortunately, Alabama, and the “Deep South” in general, has some of the worst performing Internet in the country. In Alabama, 15% of locations don’t have access to home Internet with download throughput of 25 Mbps and 3 Mbps upload according to the latest FCC data. In Mississippi, it’s 20%. The national average is 7%. In terms of latency, which is how we measure the speed of the Internet, the Deep South is also well above average.

50th percentile TCP Connect Time (ms) to Major Content Delivery Networks

The Montgomery, Alabama Internet Exchange is making the Internet faster. We’re happy to be there.

One of the reasons for the poor performance is that requests for content often travel to Atlanta, Dallas, or other Internet hubs even farther away before coming all the way back to the user in Alabama or Mississippi. That’s why an IX in Montgomery is so exciting: if networks can exchange traffic in Montgomery, the data doesn’t need to travel as far, and the Internet will be faster.

A few years ago, local leaders in Montgomery started to build up the Montgomery Internet Exchange (MGMix). With the support of the mayor, and the help of city staff, and a cooperative that included the city, county, state, and a nearby Air Force base, they launched the IX in 2016.  Later they formed a technical committee and upgraded to 100 Gbps of capacity.

With a donated switch from Packet Clearing House, MGMix estimated their initial costs at $1,000 per month for data center space and connection to the Internet. At their core, an IX is just a Layer 2 switch where all the networks plug in and advertise their presence to each other. That’s not to say it’s easy. One of the hardest parts is the work to attract networks.

IX’s have a hard chicken-and-egg problem. The first network at an IX doesn’t have anyone to exchange traffic with. Conversely, once there are a lot of networks at an IX, it becomes easy to attract new ones. Additionally, networks like Cloudflare need certain types of networks – transits – to be present. In almost all cases, Cloudflare doesn’t actually host the website or service an Internet user is trying to reach; we protect them, but aren’t the original source. To get content from the original source, we need access to transit networks. The City of Montgomery did the hard work of building up the IX network by network.

MGMix now has a who’s-who of the Internet in Alabama as members. Some are ISPs like Charter, Wide Open West, Uniti Fiber, and Troy Cablevision. Some are big institutions like the State of Alabama, Alabama State University, the City of Montgomery. And still others are the providers of content and services, like Cloudflare, Meta, and Akamai.

From Cloudflare’s perspective, it was an easy decision to join MGMix. We followed the development closely, and joined soon after it opened. After all, it means better Internet performance for a group of southern states that have been historically underserved. Now that it’s established, it’s essentially maintenance-free. It’s set-it-and-forget-it for better Internet performance.

Below is a chart of our traffic through MGMix over the course of November. We see daily spikes in traffic outbound from Cloudflare to other networks that are members of the IX. Interestingly, the traffic is lower from the 20th of November through the 27th of November which is the week of Thanksgiving in the US. It looks like Internet users in Alabama were enjoying a restful week with their families and not using the Internet (as much as usual).

The Montgomery, Alabama Internet Exchange is making the Internet faster. We’re happy to be there.

It has apparently been going so well that MGMix just announced they’re expanding to Auburn, Alabama.

Steven Reed, the current mayor of Montgomery, said of the expansion: “This is a step forward to achieving digital equity across the region, benefiting individuals who live in underserved rural communities. By extending our network fabric to a datacenter in Auburn, the MGMix will improve the efficiency and resiliency of the Internet for the Montgomery area, colleges and businesses along the I-85 corridor, and the entire River Region.

We couldn’t have said it better. IXs are a critical part of a strong Internet interconnection ecosystem. We’re proud members of the MGMix, and will continue to join IXs globally where we can reach Internet users more efficiently and effectively.

Cloudflare expands Project Pangea to connect and protect (even) more community networks

Post Syndicated from Ben Ritter original https://blog.cloudflare.com/project-pangea-expansion/

Cloudflare expands Project Pangea to connect and protect (even) more community networks

Cloudflare expands Project Pangea to connect and protect (even) more community networks

In July 2021, Cloudflare announced Project Pangea to help underserved community networks get access to the Internet for free. Today, as part of Impact Week, we’re excited to expand this program to support even more communities by relaxing the technical requirements to participate.

Previously, in order to be eligible for Project Pangea, participants would need to bring at least a /24 block of IP space for Cloudflare to advertise on their behalf (referred to as “Bring Your Own IP”). But everyone should have secure, fast, and reliable access to the Internet, without being gated by costly network resources like IPv4 space. Starting now, participants no longer need to bring a /24 in order to access Pangea services: Internet connectivity, DDoS protection, network firewalling, traffic acceleration, and more, are available for free for eligible networks.

How is Project Pangea helping community networks?

The Internet Society, or ISOC, describes community networks as “when people come together to build and maintain the necessary infrastructure for Internet connection.” Most often, community networks emerge from need, and in response to the lack or absence of available Internet connectivity.

Cloudflare’s global network, which spans more than 275 cities across the world, provides us with the unique opportunity to help community networks of all shapes and sizes. Cloudflare offers community networks secure, fast, and reliable Internet access through Magic Transit, and frees up time for community network operators by mitigating malicious traffic. This empowers operators to focus more on managing the last mile connections to network users.

By placing a community network behind Cloudflare with Magic Transit, those networks are automatically protected against Distributed Denial of Service attacks which often overwhelm network and security devices, or undersized Internet connections. Beyond mitigating DDoS attacks, Cloudflare also offers Magic Firewall through Project Pangea. Magic Firewall is a firewall as a service, and enables operators to remove physical firewalls and still enforce network level firewall rules. Implementing Magic Firewall in place of a physical firewall removes a single point of failure, and another device which needs to be upgraded during a maintenance window.

As community networks grow to support more users, the bandwidth required and the exposure to attack traffic also grows. One challenge with growing a network and providing security is that on premise firewalls need to be replaced or upgraded when they hit specific bandwidth limitations. The security appliance is often an expensive bottleneck to upgrade, preventing networks from helping more users. One unique benefit to using Cloudflare for network connectivity is that unlike an on premise network firewall, operators never need to upgrade Cloudflare. Incoming traffic is distributed across hundreds of locations, allowing Cloudflare to provide security services, and block attacks across the whole Cloudflare network.

Cloudflare expands Project Pangea to connect and protect (even) more community networks
One of several possible deployment models Pangea participants can use to get connected

Pangea participant highlight: Ayva Networks

Ayva Networks is a not-for-profit Wireless Internet Service Provider that provides backbone and Internet services to approximately 400 households in the rural mountain areas west of Boulder, Colorado. In 2023, they will grow their network to provide more gigabit network access. Nick Wilson from Ayva Networks explains that “reliable Internet in our community isn’t a privilege, it’s an essential utility, and often provides the only means of communication for many homes in our region as cellular service is generally rare.

After connecting through Magic Transit, Nick shared “speeds are noticeably better on Magic Transit, especially for those who work with cloud resources” and that “our firewalls deal with a lot less background noise” due to all the attack traffic mitigated by Cloudflare.

Colorado’s environment can be pretty extreme, and present many challenges to running a Wireless Internet Service Provider. Ayva Networks responds to 100+ mph wind, massive hail, blizzards, flooding, insects, lightning, and fire. By using Magic Transit, Ayva Networks is better able “to engineer traffic flows much more granularly than we otherwise are able to with BGP alone, and has become an essential tool for us in mitigating and responding to outages.

What have we learned since launching Project Pangea?

We’ve been privileged to help a lot of great organizations like Ayva Networks connect more people to the Internet. Many community networks are passion projects, and are run by volunteers who want to make a difference in their community. Volunteers often only have limited time to contribute, and this has emphasized how simple we need to make it for organizations of any size to get up and running behind Cloudflare.

Another challenge we did not foresee is that many community networks do not have their own network IP address space. IP addresses are needed by all computers to communicate on the Internet. Until today, Magic Transit and Magic Firewall required that community networks provide their own IP addresses. We recently extended Magic Transit to support customers without their own IP address space with Magic Transit with Cloudflare IPs, and we’re excited to bring this functionality to community networks via Project Pangea.

How can my community network get involved?

Check out our landing page to learn more and apply for Project Pangea today.

The US government is working on an “Internet for all” plan. We’re on board.

Post Syndicated from Mike Conlow original https://blog.cloudflare.com/internet-for-all-us/

The US government is working on an “Internet for all” plan. We’re on board.

The US government is working on an “Internet for all” plan. We’re on board.

Recently, the United States Department of Commerce announced that all 50 states and every eligible territory had signed on to the “Internet for All” initiative. Internet for All is the US government’s $65 billion initiative to close the Digital Divide once and for all through new broadband deployment and digital equity programs. Cloudflare is on a mission to help build a better Internet, and we support initiatives like this because we want more people using the Internet on high-throughput, low-latency, resilient and affordable Internet connections. It’s been written often since the start of the pandemic because it’s true: it isn’t acceptable that students need to go to a Taco Bell parking lot to do their homework, and a good Internet connection is increasingly important for doing adult jobs as well.

The Internet for All initiative is the result of $65 billion in broadband-related funding appropriated by the US Congress as part of the Infrastructure Investment and Jobs Act (IIJA). It’s been called a “once in a generation” funding opportunity, and compared with the Rural Electrification Act which brought power lines to rural America in the 1930s. The components of the broadband portion of the Infrastructure bill are:

  • \$42.5 billion for broadband deployment – new wires and wireless radios in places that don’t have them – called the Broadband Equity, Access, and Deployment Program (BEAD).
  • \$14.2 billion to make permanent a $30 per month subsidy for low-income families to purchase a home Internet subscription.
  • \$2.75 billion to establish a grant program that will improve digital equity, which means teaching Americans how to make the most of the Internet and their home connection.
  • \$2 billion for new connectivity on tribal lands.
  • \$1 billion to establish new “middle-mile” capacity, which will connect rural communities to the Internet “backbone”.

The US should be applauded for making this kind of investment in broadband infrastructure. By appropriating federal funds, the government is able to ensure the money is used as it’s intended. For example, federal rules will require that areas with no infrastructure and disadvantaged urban areas will receive priority funding. Individual states will have the option of adding their own rules.

There’s significant work to do. According to the latest numbers from the Federal Communications Commission, 12% of Americans lack access to home broadband with throughput of at least 100 Mbps download and 20 Mbps upload.

There’s another way to think about access to broadband. A wire running near your house doesn’t do any good if the residents can’t afford it, or don’t know how to use the Internet. According to Pew Research, 23% of Americans say they don’t have an Internet connection at home. Those aren’t just rural areas without broadband infrastructure, it’s also urban areas where the connection is too expensive.

Cloudflare isn’t a disinterested observer. When Internet users don’t have access to good broadband, their experience with our services – the websites, APIs and security products we offer – won’t work as well as they should. In the map below, we use the Resource Timing API to measure the latency between Internet users and the major Content Delivery Networks (CDNs), including Cloudflare. We see rural and southern states have worse performance than the northeastern United States, with Hawaii and Alaska being off the charts in terms of their poor speed.

50th percentile TCP Connect Time (ms) to Major Content Delivery Networks

The US government is working on an “Internet for all” plan. We’re on board.
*Alaska and Hawaii have TCP Connect times of 263 and 160 respectively. 

Access technology, which is how Internet users connect to the Internet (cable, fiber, DSL, wireless, satellite), is one important part of the overall quality of their connection, but there are other, less talked about factors. Another factor is how close geographically the user is to the content and services they are accessing. Midwestern states where requests for data need to travel to Internet hubs in Chicago or Dallas are going to be slower than requests for data from Washington, DC, served by the giant Internet hub around Ashburn, Virginia. To be as close as possible to users geographically, Cloudflare has servers in 51 locations across 28 states in the US, and is still growing.

Programs that provide funding for deployment are one piece of the puzzle, but there are important non-financial initiatives as well. For example, the IIJA directed the Federal Communications Commission to come up with “broadband nutrition labels” that will be shown to consumers at the point of purchase for any Internet service. Just a few weeks ago, the FCC announced their implementation. Cloudflare filed comments with the FCC with our suggestions for how to make these labels informative, future-proof, and easy for consumers to understand. We also wrote about it here.

The US government is working on an “Internet for all” plan. We’re on board.

We’d be remiss to not also mention our own contribution to digital divide initiatives – Project Pangea. For community and non-profit networks that have invested in last-mile infrastructure but need a connection to the Internet – “transit” in industry terms – the network can connect to Cloudflare, and we’ll provide that Internet transit at no charge to the network. It’s one piece of the puzzle, and we’re always looking for additional ways to help.

One thing everyone can do is help the FCC build the most accurate broadband map possible by going to the map, entering your address, and verifying the data. The map will show your individual location and all ISPs that claim to serve your address. If there’s a problem – and there can be, it’s a new map and new process – you can file a challenge right from the FCC’s mapping site.

It’s laudable that the US government is stepping up with billions of dollars in funding for broadband networks and digital equity programs. In the shared project of helping build a better Internet, this is an important and big step.

The challenges of sanctioning the Internet

Post Syndicated from Laura Klick original https://blog.cloudflare.com/the-challenges-of-sanctioning-the-internet/

The challenges of sanctioning the Internet

The challenges of sanctioning the Internet

Following Russia’s invasion of Ukraine, governments around the world, including the US, UK, and EU announced sweeping sanctions targeting the Russian and Belarussian economies. These sanctions prohibit a specified level of economic activity in an effort to use economic influences to punish targeted countries. Almost overnight, we saw unprecedented restrictions put in place for multinational companies doing business in Russia or Belarus.

Separately, recent events in Iran led the US government to authorize additional Internet/communications activities, which were being used widely by average Iranians protesting against the government. This was done by expanding some existing licenses, or exceptions, to sanctions the US has imposed on Iran.

While the use of sanctions as a tool for responding to foreign relations crises is nothing new, the wide-ranging multilateral sanctions that have been imposed on Russia and the recent authorizations in Iran are significant and provide fresh examples of how sanctions can affect access to a free and open global Internet.

Balancing interests in sanctions policy

Cloudflare is committed to complying with all applicable sanctions, including US, UK, and EU sanctions, and we have put in place programs to ensure that compliance. At the same time, we recognize the important role we and other Internet infrastructure companies play in protecting a key human right and principle also supported by the US, UK, and EU governments: free expression online.

One overarching principle of sanctions policy is that sanctions are intended to increase the cost of violating international norms and ultimately force authoritarian regimes and malicious actors to change behavior. The purpose of sanctions is not to punish or isolate ordinary citizens of a particular country or region. In fact, ordinary citizens can be powerful catalysts for the policy changes that sanctions are seeking to achieve. However, as we’ve seen over and over again, changes in policy, particularly in countries that have authoritarian regimes, do not happen overnight, and they often depend on the ability of individuals to communicate with each other and with the rest of the world. For example, in Iran, we’ve witnessed the important role that social media has played in helping support and spread the protest movement sparked by the killing of Mahsa Amini. Similarly, in the wake of Russia’s invasion of Ukraine, ordinary Russians continue to look for ways to access non-Russian news sources via private Internet access tools and VPNs.

It’s a tricky balance to impose costs on bad actors while maintaining open lines of communication for ordinary citizens, but it’s a balance that we’ve seen the US Government take a leading role in preserving, even in areas where most other transactions/activities might otherwise be prohibited. For example, the key US law authorizing the executive branch to deploy sanctions exempts “any postal, telegraphic, telephonic or other personal communication, which does not involve a transfer of anything of value.” The US government also has a long tradition of issuing authorizations, also known as General Licenses, permitting additional telecommunications and Internet-related activities, including in Cuba, Iran, Russia, Syria, and certain restricted regions of Ukraine. This means that US companies, like Cloudflare, can continue to provide many products and services that support free and secure Internet communications.

Although these exemptions and licenses can help the US Government establish the policy goal of supporting Internet freedom, they are only effective if private sector companies make use of them. That may be easier said than done. Because of the financial and reputational penalties that can be imposed if a company violates sanctions, even inadvertently, companies often have an incentive to take a simple and blunt approach to sanctions compliance without trying to do the nuanced thing and availing themselves of the exceptions in the General Licenses. Companies have to invest significant time and money into understanding the legal requirements and applicable exemptions and licenses when deciding whether to provide services in high risk countries. Cloudflare has made these investments because they align with our goal of helping build a better Internet and making a free and secure Internet accessible to all.

As governments continue to use sanctions as a foreign policy tool, we think it’s important that Internet infrastructure companies discuss how the legal framework is impacting their ability to support a global Internet. Described below are some of the key issues we’ve identified and ways that regulators can help balance the policy goals of sanctions with the need to support the free flow of communications for ordinary citizens around the world.

There are two broad categories of sanctions: (1) country-/region-based, and (2) individual/entity list-based. Sanctions can vary across jurisdictions, meaning that US sanctions look different from EU and UK sanctions and there can be significant differences. Companies that operate around the world have to pay close attention to individual rules and regulations to ensure compliance with sanctions.

Country-/region-based sanctions

With respect to country-/region-based sanctions, the US government has imposed comprehensive sanctions on doing business in Cuba, Iran, North Korea, Syria, and certain restricted regions of Ukraine (Crimea, Luhansk, and Donetsk). The purpose of comprehensive sanctions is to impose severe punishments on state actors in these countries by denying them access to valuable US goods/services. You might think that this means that Internet companies are therefore barred from providing services to these countries/regions, but that’s where things get complicated. The US government has issued General Licenses, which authorize US companies to engage in certain Internet- and telecommunications-related activities.

While these General Licenses are helpful in that they may authorize peering services, VPN, SSL certificates, and other services incident to the exchange of communications over the Internet, the activities authorized vary across sanctioned jurisdictions. In some countries/regions (e.g., Cuba, Iran, and the Donetsk and Luhansk regions), except for government parties, some free and paid services are authorized, but in other instances (e.g., Crimea and Syria), all authorized services must be available at no cost to the user. Along the same lines, some General Licenses list specific types of services/products that may be provided, while others leave it up to a company to make their own determination whether a product/service is authorized by the terms of the license. Neither the UK nor the EU has issued any Internet-related General Licenses, which has become a particular issue in the context of Russia where there are now significant restrictions in place.

With respect to Iran, the US government recently issued a new General License that broadens the products/services authorized and provides other clarifications to make it easier for companies to provide Internet services to ordinary Iranians. The new General License is encouraging for companies, like Cloudflare, that would like to help support access to the broader Internet for ordinary Iranian citizens. But as with any new policy, it takes time for companies to understand the changes and make decisions about whether to invest additional time and resources to expand services offerings in a high risk country like Iran. Given the significant restrictions that have been imposed on doing business in Iran over the years, there are a number of logistical challenges with seeking to enter a market where so many activities remain prohibited. Moreover, there is always a risk that sanctions policies can change, so companies will take this into account when weighing whether to deploy expensive hardware/equipment or make other long-term investments.

Party-based sanctions

Apart from country-based sanctions, many governments, including the US, UK, and EU maintain list-based sanctions, which prohibit dealings with specific listed parties. Like many multinational companies, Cloudflare screens customers and other third parties to identify links to sanctioned parties. We do not engage in any transactions with or provide services to any parties that have been listed on applicable sanctions lists or any parties that are owned or controlled by such parties and our Terms of Service prohibit sanctioned parties from using our services.

Over the years, the US government has continued to add parties to its sanctions list. Notably, when the US government adds a party to the sanctions list, it will include corresponding identifying information, including possible aliases, physical address, as well as email address and domain names to the extent they are known. The UK has also started adding domains and email addresses, but those domains and email addresses do not always align with what is on the US list, creating further complexities for multinational companies in this space.

While there are a number of sanctions screening providers that will help companies conduct due diligence on third parties they are considering doing business with, email addresses and domains are not automatically screened. This can be challenging for Internet infrastructure companies for whom email addresses and domain names are critical pieces of data when onboarding a customer. With limited automated solutions, companies must invest significant time and resources building proprietary tools that block sanctioned domains and email addresses from signing up for their services.

Cloudflare may also receive abuse reports alleging that domains are operated by sanctioned parties. However, unless a domain is listed on a sanctions list, it can be challenging to determine if a domain is subject to sanctions. Without clear guidance from regulators, companies must develop their own processes for reviewing these reports. While it is important that companies terminate services to domains owned or operated by a sanctioned party, it’s also critical that they do so in a way that is fair and consistent.

Implications for a free and open Internet

Sanctions are an important tool for responding to geopolitical challenges, and they can help impose economic costs on parties that violate international norms, including human rights. However, sanctions can also have unintended consequences when they are not properly deployed. While regulators have learned a number of lessons over the years when imposing sanctions on more traditional sanctions targets, like the financial and energy sectors, the global Internet remains a complicated area that has only recently become a more prominent focus of sanctions. With the technology constantly evolving and a number of different parties involved in maintaining a secure and reliable Internet, it is critical that regulators are clear about their expectations and seek to minimize any chilling effects.

Key stakeholders involved in maintaining a free and open Internet are likely to continue exiting sensitive markets in the absence of clear guidance from regulators. This will only lead to further fragmentation of the global Internet and open the door for authoritarian governments to monitor and control global communications – an outcome that clearly undermines the policy goals of the sanctions. These are complicated issues, and we don’t pretend to have all the answers. But, there are things that regulators can do to mitigate unintended consequences of sanctions policies and promote a free and open Internet. Here are a few key points that we advocate to policymakers:

  • Continue partnering with stakeholders to understand practical implications before imposing new sanctions and determine where additional clarifying guidance might be helpful.
  • Apply a consistent and coordinated approach to exemptions/authorizations to make it easier for multinational companies to provide services in challenging jurisdictions.
  • Provide clear guidelines for Internet-related companies as to when a domain or user may be subject to sanctions (i.e., adding domain names and email addresses to applicable sanctions lists) and ensure consistency across jurisdictions.

Looking forward

An integral part of Cloudflare’s mission to help build a better Internet involves making sure that ordinary individuals have access to a free and secure Internet. While global sanctions will continue to present challenges to Internet infrastructure companies, like Cloudflare, we are committed to both compliance with applicable sanctions and helping to maintain open lines of communication around the world–and we will continue to advocate for policies that do the same.

The challenges of sanctioning the Internet

Project A11Y: how we upgraded Cloudflare’s dashboard to adhere to industry accessibility standards

Post Syndicated from Emily Flannery original https://blog.cloudflare.com/project-a11y/

Project A11Y: how we upgraded Cloudflare’s dashboard to adhere to industry accessibility standards

Project A11Y: how we upgraded Cloudflare’s dashboard to adhere to industry accessibility standards

At Cloudflare, we believe the Internet should be accessible to everyone. And today, we’re happy to announce a more inclusive Cloudflare dashboard experience for our users with disabilities. Recent improvements mean our dashboard now adheres to industry accessibility standards, including Web Content Accessibility Guidelines (WCAG) 2.1 AA and Section 508 of the Rehabilitation Act.

Over the past several months, the Cloudflare team and our partners have been hard at work to make the Cloudflare dashboard1 as accessible as possible for every single one of our current and potential customers. This means incorporating accessibility features that comply with the latest Web Content Accessibility Guidelines (WCAG) and Section 508 of the US’s federal Rehabilitation Act. We are invested in working to meet or exceed these standards; to demonstrate that commitment and share openly about the state of accessibility on the Cloudflare dashboard, we have completed the Voluntary Product Accessibility Template (VPAT), a document used to evaluate our level of conformance today.

Conformance with a technical and legal spec is a bit abstract–but for us, accessibility simply means that as many people as possible can be successful users of the Cloudflare dashboard. This is important because each day, more and more individuals and businesses rely upon Cloudflare to administer and protect their websites.

For individuals with disabilities who work on technology, we believe that an accessible Cloudflare dashboard could mean improved economic and technical opportunities, safer websites, and equal access to tools that are shaping how we work and build on the Internet.

For designers and developers at Cloudflare, our accessibility remediation project has resulted in an overhaul of our component library. Our newly WCAG-compliant components expedite and simplify our work building accessible products. They make it possible for us to deliver on our commitment to an accessible dashboard going forward.

Our Journey to an Accessible Cloudflare Dashboard

In 2021, we initiated an audit with third party experts to identify accessibility challenges in the Cloudflare dashboard. This audit came back with a daunting 213-page document—a very, very long list of compliance gaps.

We learned from the audit that there were many users we had unintentionally failed to design and build for in Cloudflare dashboard user interfaces. Most especially, we had not done well accommodating keyboard users and screen reader users, who often rely upon these technologies because of a physical impairment. Those impairments include low vision or blindness, motor disabilities (examples include tremors and repetitive strain injury), or cognitive disabilities (examples include dyslexia and dyscalculia).

As a product and engineering organization, we had spent more than a decade in cycles of rapid growth and product development. While we’re proud of what we have built, the audit made clear to us that there was a great need to address the design and technical debt we had accrued along the way.

One year, four hundred Jira tickets, and over 25 new, accessible web components later, we’re ready to celebrate our progress with you. Major categories of work included:

  1. Forms: We re-wrote our internal form components with accessibility and developer experience top of mind. We improved form validation and error handling, labels, required field annotations, and made use of persistent input descriptions instead of placeholders. Then, we deployed those component upgrades across the dashboard.
  2. Data visualizations: After conducting a rigorous re-evaluation of their design, we re-engineered charts and graphs to be accessible to keyboard and screen reader users. See below for a brief case study.
  3. Heading tags: We corrected page structure throughout the dashboard by replacing all our heading tags (<h1>, <h2>, etc.) with a technique we borrowed from Heydon Pickering. This technique is an approach to heading level management that uses React Context and basic arithmetic.
  4. SVGs: We reworked how we create SVGs (Scalable Vector Graphics), so that they are labeled properly and only exposed to assistive technology when useful.
  5. Node modules: We jumped several major versions of old, inaccessible node modules that our UI components depend upon (and we broke many things along the way).
  6. Color: We overhauled our use of color, and contributed a new volume of accessible sequential colors to our design system.
  7. Bugs: We squashed a lot of bugs that had made their way into the dashboard over the years. The most common type of bug we encountered related to incorrect or unsemantic use of HTML elements—for example, using a <div> where we should have used a <td> (table data) or <tr> (table row) element within a table.

Case Study: Accessibility Work On Cloudflare Dashboard Data & Analytics

The Cloudflare dashboard is replete with analytics and data visualizations designed to offer deep insight into users’ websites’ performance, traffic, security, and more. Making those data visualizations accessible proved to be among the most complex and interdisciplinary issues we faced in the remediation work.

An example of a problem we needed to solve related to WCAG success criterion 1.4.1, which pertains to the use of color. 1.4.1 specifies that color cannot be the only means by which to convey information, such as the differentiation between two items compared in a chart or graph.

Our charts were clearly nonconforming with this standard, using color alone to represent different data being compared. For example, a typical graph might have used the color blue to show the number of requests to a website that were 200 OK, and the color orange to show 403 Forbidden, but failed to offer users another way to discern between the two status codes.

Our UI team went to work on the problem, and chose to focus our effort first on the Cloudflare dashboard time series graphs.

Interestingly, we found that design patterns recommended even by accessibility experts created wholly unusable visualizations when placed into the context of real world data. Examples of such recommended patterns include using different line weights, patterns (dashed, dotted or other line styles), and terminal glyphs (symbols set at the beginning and end of the lines) to differentiate items being compared.

We tried, and failed, to apply a number of these patterns; you can see the evolution of this work on our time series graph component in the three different images below.

v.1

Project A11Y: how we upgraded Cloudflare’s dashboard to adhere to industry accessibility standards
Here is an early attempt at using both terminal glyphs and patterns to differentiate data in a time series graph. You can see that the terminal glyphs pile up and become indistinguishable; the differences among the line patterns are very hard to discern. This code never made it into production.

v.2

Project A11Y: how we upgraded Cloudflare’s dashboard to adhere to industry accessibility standards
In this version, we eliminated terminal glyphs but kept line patterns. Additionally, we faded the unfocused items in the graph to help bring highlighted data to the forefront. This latter technique made it into our final solution.

v.3

Project A11Y: how we upgraded Cloudflare’s dashboard to adhere to industry accessibility standards
Here we eliminated patterns altogether, simplified the user interface to only use the fading technique on unfocused items, and put our new, sequentially accessible colors to use. Finally, a visual design solution approved by accessibility and data visualization experts, as well as our design and engineering teams.

After arriving at our design solution, we had some engineering work to do.

In order to meet WCAG success criterion 2.1.1, we rewrote our time series graphs to be fully keyboard accessible by adding focus handling to every data point, and enabling the traversal of data using arrow keys.

Navigating time series data points by keyboard on the Cloudflare dashboard.

We did some fine-tuning, specifically to support screen readers: we eliminated auditory “chartjunk” (unnecessary clutter or information in a chart or graph) and cleaned up decontextualized data (a scenario in which numbers are exposed to and read by a screen reader, but contextualizing information, like x- and y-axis labels, is not).

And lastly, to meet WCAG 1.1.1, we engineered new UI component wrappers to make chart and graph data downloadable in CSV format. We deployed this part of the solution across all charts and graphs, not just the time series charts like those shown above. No matter how you browse and interact with the web, we hope you’ll notice this functionality around the Cloudflare dashboard and find value in it.

Making all of this data available to low vision, keyboard, and assistive technology users was an interesting challenge for us, and a true team effort. It necessitated a separate data visualization report conducted by another, more specialized team of third party experts, deep collaboration between engineering and design, and many weeks of development.

Applying this thorough treatment to all data visualizations on the Cloudflare dashboard is our goal, but still work in progress. Please stay tuned for more accessible updates to our chart and graph components.

Conclusion

There’s a lot of nuance to accessibility work, and we were novices at the beginning: researching and learning as we were doing. We also broke a lot of things in the process, which (as any engineering team knows!) can be stressful.

Overall, our team’s biggest challenge was figuring out how to complete a high volume of cross-functional work in the shortest time possible, while also setting a foundation for these improvements to persist over time.

As a frontend engineering and design team, we are very grateful for having had the opportunity to focus on this problem space and to learn from truly world-class accessibility experts along the way.

Accessibility matters to us, and we know it does to you. We’re proud of our progress, and there’s always more to do to make Cloudflare more usable for all of our customers. This is a critical piece of our foundation at Cloudflare, where we are building the most secure, performant and reliable solutions for the Internet. Stay tuned for what’s next!

Not using Cloudflare yet? Get started today and join us on our mission to build a better Internet.

1All references to “dashboard” in this post are specific to the primary user authenticated Cloudflare web platform. This does not include Cloudflare’s product-specific dashboards, marketing, support, educational materials, or third party integrations.

Route leaks and confirmation biases

Post Syndicated from Maximilian Wilhelm original https://blog.cloudflare.com/route-leaks-and-confirmation-biases/

Route leaks and confirmation biases

Route leaks and confirmation biases

This is not what I imagined my first blog article would look like, but here we go.

On February 1, 2022, a configuration error on one of our routers caused a route leak of up to 2,000 Internet prefixes to one of our Internet transit providers. This leak lasted for 32 seconds and at a later time 7 seconds. We did not see any traffic spikes or drops in our network and did not see any customer impact because of this error, but this may have caused an impact to external parties, and we are sorry for the mistake.

Route leaks and confirmation biases

Timeline

All timestamps are UTC.

As part of our efforts to build the best network, we regularly update our Internet transit and peering links throughout our network. On February 1, 2022, we had a “hot-cut” scheduled with one of our Internet transit providers to simultaneously update router configurations on Cloudflare and ISP routers to migrate one of our existing Internet transit links in Newark to a link with more capacity. Doing a “hot-cut” means that both parties will change cabling and configuration at the same time, usually while being on a conference call, to reduce downtime and impact on the network. The migration started off-peak at 10:45 (05:45 local time) with our network engineer entering the bridge call with our data center engineers and remote hands on site as well as operators from the ISP.

At 11:17, we connected the new fiber link and established the BGP sessions to the ISP successfully. We had BGP filters in place on our end to not accept and send any prefixes, so we could evaluate the connection and settings without any impact on our network and services.

As the connection between our router and the ISP — like most Internet connections — was realized over a fiber link, the first item to check are the “light levels” of that link. This shows the strength of the optical signal received by our router from the ISP router and can indicate a bad connection when it’s too low. Low light levels are likely caused by unclean fiber ends or not fully seated connectors, but may also indicate a defective optical transceiver which connects the fiber link to the router – all of which can degrade service quality.

The next item on the checklist is interface errors, which will occur when a network device receives incorrect or malformed network packets, which would also indicate a bad connection and would likely lead to a degradation in service quality, too.

As light levels were good, and we observed no errors on the link, we deemed it ready for production and removed the BGP reject filters at 11:22.

This immediately triggered the maximum prefix-limit protection the ISP had configured on the BGP session and shut down the session, preventing further impact. The maximum prefix-limit is a safeguard in BGP to prevent the spread of route leaks and to protect the Internet. The limit is usually set just a little higher than the expected number of Internet prefixes from a peer to leave some headroom for growth but also catch configuration errors fast. The configured value was just 40 prefixes short of the number of prefixes we were advertising at that site, so this was considered the reason for the session to be shut down. After checking back internally, we asked the ISP to raise the prefix-limit, which they did.

The BGP session was reestablished at 12:08 and immediately shut down again. The problem was identified and fixed at 12:14.

10:45: Start of scheduled maintenance

11:17: New link was connected and BGP sessions went up (filters still in place)

11:22: Link was deemed ready for production and filters removed

11:23: BGP sessions were torn down by ISP router due to configured prefix-limit

12:08: ISP configures higher prefix-limits, BGP sessions briefly come up again and are shut down

12:14: Issue identified and configuration updated

What happened and what we’re doing about it

The outage occurred while migrating one of our Internet transits to a link with more capacity. Once the new link and a BGP session had been established, and the link deemed error-free, our network engineering team followed the peer-reviewed deployment plan. The team removed the filters from the BGP sessions, which prevented the Cloudflare router from accepting and sending prefixes via BGP.

Due to an oversight in the deployment plan, which had been peer-reviewed before without noticing this issue, no BGP filters to only export prefixes of Cloudflare and our customers were added. A peer review on the internal chat did not notice this either, so the network engineer performing this change went ahead.

ewr02# show |compare                                     
[edit protocols bgp group 4-ORANGE-TRANSIT]
-  import REJECT-ALL;
-  export REJECT-ALL;
[edit protocols bgp group 6-ORANGE-TRANSIT]
-  import REJECT-ALL;
-  export REJECT-ALL;

The change resulted in our router sending all known prefixes to the ISP router, which shut down the session as the number of prefixes received exceeded the maximum prefix-limit configured.

As the configured values for the maximum prefix-limits turned out to be rather low for the number of prefixes on our network, this didn’t come as a surprise to our network engineering team and no investigation into why the BGP session went down was started. The prefix-limit being too low seemed to be a perfectly valid reason.

We asked the ISP to increase the prefix-limit, which they did after they received approval on their side. Once the prefix-limit had been increased and the previously shutdown BGP sessions reset, the sessions were reestablished but were shut down immediately as the maximum prefix-limit was triggered again. This is when our network engineer started questioning whether there was another issue at fault and found and corrected the configuration error previously overlooked.

We made the following change in response to this event: we introduced an implicit reject policy for BGP sessions which will take effect if no import/export policy is configured for a specific BGP neighbor or neighbor group. This change has been deployed.

BGP security & preventing route-leaks — what’s in the cards?

Route leaks aren’t new, and they keep happening. The industry has come up with many approaches to limit the impact or even prevent route-leaks. Policies and filters are used to control which prefixes should be exported to or imported from a given peer. RPKI can help to make sure only allowed prefixes are accepted from a peer and a maximum prefix-limit can act as a last line of defense when everything else fails.

BGP policies and filters are commonly used to ensure only explicitly allowed prefixes are sent out to BGP peers, usually only allowing prefixes owned by the entity operating the network and its customers. They can also be used to tweak some knobs (BGP local-pref, MED, AS path prepend, etc.) to influence routing decisions and balance traffic across links. This is what the policies we have in place for our peers and transits do. As explained above, the maximum prefix-limit is intended to tear down BGP sessions if more prefixes are being sent or received than to be expected. We have talked about RPKI before, it’s the required cryptographic upgrade to BGP routing, and we still are on our path to securing Internet Routing.

To improve the overall stability of the Internet even more, in 2017, a new Internet standard was proposed, which adds another layer of protection into the mix: RFC8212 defines Default External BGP (EBGP) Route Propagation Behavior without Policies which pretty much tackles the exact issues we were facing.

This RFC updates the BGP-4 standard (RFC4271) which defines how BGP works and what vendors are expected to implement. On the Juniper operating system, JunOS, this can be activated by setting defaults ebgp no-policy reject-always on the protocols bgp hierarchy level starting with Junos OS Release 20.3R1.

If you are running an older version of JunOS, a similar effect can be achieved by defining a REJECT-ALL policy and setting this as import/export policy on the protocols bgp hierarchy level. Note that this will also affect iBGP sessions, which the solution above will have no impact on.

policy-statement REJECT-ALL {
  then reject;
}

protocol bgp {
  import REJECT-ALL;
  export REJECT-ALL;
}

Conclusion

We are sorry for leaking routes of prefixes which did not belong to Cloudflare or our customers and to network engineers who got paged as a result of this.

We have processes in place to make sure that changes to our infrastructure are reviewed before being executed, so potential issues can be spotted before they reach production. In this case, the review process failed to catch this configuration error. In response, we will increase our efforts to further our network automation, to fully derive the device configuration from an intended state.

While this configuration error was caused by human error, it could have been detected and mitigated significantly faster if the confirmation bias did not kick in, making the operator think the observed behavior was to be expected. This error underlines the importance of our existing efforts on training our people to be aware of biases we have in our life. This also serves as a great example on how confirmation bias can influence and impact our work and that we should question our conclusions (early).

It also shows how important protocols like RPKI are. Route leaks are something even experienced network operators can cause accidentally, and technical solutions are needed to reduce the impact of leaks whether they are intentional or the result of an error.

Heard in the halls of Web Summit 2021

Post Syndicated from João Tomé original https://blog.cloudflare.com/web-summit-2021-internet/

Heard in the halls of Web Summit 2021
Opening night of Web Summit 2021, at the Altice Arena in Lisbon, Portugal. Photo by Sam Barnes/Web Summit

Heard in the halls of Web Summit 2021

Global in-person events were back in a big way at the start of November (1-4) in Lisbon, Portugal, with Web Summit 2021 gathering more than 42,000 attendees from 128 countries. I was there to discover Internet trends and meet interesting people. What I saw was the contagious excitement of people from all corners of the world coming together for what seemed like a type of normality in a time when the Internet “is almost as important as having water”, according to Sonia Jorge from the World Wide Web Foundation.

Here’s some of what I heard in the halls.

With a lot happening on a screen, the lockdowns throughout the pandemic showed us a glimpse of what the metaverse could be, just without VR or AR headsets. Think about the way many were able to use virtual tools to work all day, learn, collaborate, order food, supplies, and communicate with friends and family — all from their homes.

While many had this experience, many others were unable to, with some talks at the event focusing on the digital divide and how “Internet access is a basic human right”, according to the grandson of Nelson Mandela — we interviewed him, and you can watch the conversation below.

The future already has some paths laid out, and many were discussed at the event.

The pandemic helped to accelerate most of them, especially by bringing more people (in some countries) to the digital world.

The CPO of Meta, Chris Cox, shared how the company previously known as Facebook has some ideas about the future of augmented reality, and how they want to see those ideas play out in the next five to 10 years. “We want to get the conversation going,” he said.

Also present at the event was Jon Vlassopulos, Global Head of Music, Roblox. He explained how virtual concerts on the video game platform could be the future of music performances, and even bring free tickets to fans of famous music stars like Adele. Stars like Zara Larsson, KSI and Ava Max have already performed on Roblox and “they’re making big money from selling digital merchandise”.

On the other hand, Paddy Cosgrave, CEO of Web Summit, says that there’s something magical about in-person big events that can’t be replicated in full online events. However, the real and virtual world can complement each other — it was announced that CES 2022 will use a combination of Web Summit online and offline software.

Web3 was another big part of the discussion, sometimes in clear sight, other times embedded in the many conversations about blockchain, NFTs and cryptocurrencies, and as a vision for a decentralized web (we’re actually working on that).

Speakers also focused on data privacy and security, ethics in AI and data protection. Ownership to the user and sovereignty were topics discussed and emphasized by Sir Tim Berners-Lee on the last day of the event.

The workplace was also a popular topic, as well as the changes it underwent in the past couple of years. We heard about the importance of diversity in the workplace, as well as the future of work — is it going to be flexible, hybrid, full remote or something in between? Speakers also mentioned The Great Resignation and the reset of people’s and organizations’ mindsets.

Using AI to hire and motivate people was also in the air, as well as big topics like the digitalization of healthcare, mental health, behaviour changes in humans (young and adult) who are more and more on the Internet and even the decentralization of financial services.

And here are some examples of the different speakers at the event we talked to:

Vice-Admiral Gouveia e Melo: Vaccination, misinformation and leadership

Portuguese Navy officer and coordinator of the Task Force for the Portugal COVID-19 vaccination plan

Portugal has achieved an 86% vaccination rate on the vice-admiral’s watch. He brought a sense of mission to a task that involved organization, focus and the use of both digital and communication tools.

The country started the vaccination process late but is now one of the countries with a higher vaccination rate in the world. We talked with the vice-admiral about how the Internet helped, but also how it created problems related to disinformation and misinformation, and we asked about the dangers of controlling speech online. Finally, we asked for bits of leadership advice.

Sonia Jorge: The need for Internet — affordable, fast and for everyone

Executive Director World Wide Web Foundation (Alliance for Affordable Internet)

“The Internet is now an essential public good that everybody needs at this time just like we need to drink water or to have electricity and shelter. We should do more to bring everyone into the digital society.”

In some countries around the world Internet access is very limited. In some places people have to go to a particular plaza to have access to the Internet five years ago John Graham-Cumming saw something similar in Cuba. Sonia Jorge knows that very well. She is trying to bring affordable Internet to everyone and that challenge is more difficult than it appears.

She explains that the world is far behind in the UN’s goals for Internet access — today only about half of the earth’s population has any Internet access at all. But many of those who have access to the World Wide Web have limited possibilities to be online: “some have access once a month, for example.” So the digital divide is real, and it “should worry everyone”.

The pandemic caused health and economic difficulties that didn’t help the mission of bringing good, fast and reliable Internet to everyone. Nevertheless, Sonia — who is Portuguese and moved to the US to study when she was 17 — saw that many African countries like Nigeria began to realize that the Internet is really important for knowledge and also for the possibilities it opens in terms of cultural, financial and societal growth.

Sonia also highlights that there is a big disparity in the world between men and women in terms of Internet access.

David Kiron: The future of work and how AI (and philosophy) can help

Editorial director of MIT Sloan Management Review

Technology will play a significant role in the future of work. In a way, that “future” is already here, but isn’t evenly distributed — and researchers are just beginning to study it. David Kiron goes on to explain the challenge for some people to be “really seen by their leadership when you’re not in the office.”

The former senior researcher at Harvard Business School tells us how companies started valuing employees even more through the pandemic. There’s also an opportunity for different ways of work interaction through digital tools — “Zoom calls aren’t it.” He’s also worried that the pandemic caused a great reset that is driving many out of the workforce entirely: “There’s a trend of working moms opting out,” for example.

About the metaverse and a universe of universes: “If tech leaders spent more time reading philosophy they might have a better sense of where the world is going (…) more and more leaders of companies are taking on the philosopher’s role.”

And how can AI help? “Once you get AI going in a company we saw in our new study that there’s a big bump in morale, collaboration, learning and people’s sense on what they should be doing”. AI can also help better identify talent and match candidates to skills that are already represented in a company, but he also highlights that “humans play a role in all the stages of the hiring and working process.”

David Kiron explains that “if you’re not asking the right questions to your AI teams you’re going to be behind other companies that are doing better questions”. He adds that AI can help with performance, but it also helps “redefine what performance means in your organization by finding other metrics to look at.”

Ana Maiques: neuroscience & women in tech

Co-founder and CEO of neuroscience-based medical device company Neuroelectrics

We talked to Ana about the future of the Internet. She thinks moving forward there will be more fluid interfaces — not only limited to computers and smartphones, but we will have different devices that go beyond VR headsets and that will lead to new types of interactions. In the neuroscience field, she has big hopes in the technology that Neuroelectrics, her company, is developing in Barcelona, Spain. They work with devices that use non-invasive transcranial electrical stimulation to treat the brain in diseases like epilepsy, depression and Alzheimer.

Neuroelectrics is also developing a process called digital copy (for better personalized treatments) that could be useful in the future if someone develops one of these problems. But she says humankind is still very far from the dangers of something like a mind-reading device or the possibility of reading and downloading thoughts and dreams: “it’s fun to think of science fiction possibilities, but we need to act now on things and problems that are affecting us today.”

She also talks about the difficulties of being a woman in the tech business and raising money. “But little by little I see more women and that’s why it’s important to get out there and explain to women that they can do it.”

Siyabulela Mandela: The Internet is a human right

Director for Africa Journalists for Human Rights

The grandson of Nelson Mandela is on a mission to help journalists in Africa to be free to publish human rights stories. He explains how the Internet is critical for this mission and “a human rights issue”. Not only does the Internet give communities access to trustworthy information, but it also helps them become aware of their rights, gives access to financial tools and allows them to grow in our era.

He also highlights how the Internet can be misused, for example when it becomes a vehicle for misinformation, or when governments shut down Internet access to control communities — in Sudan the Internet has been cut off since October 25, 2021 (you can track that information on Cloudflare Radar).

Carlos Moedas: The light (and innovation) in Lisbon

Newly elected Mayor of Lisbon; previous European Commissioner for Research, Science and Innovation

Why is Lisbon attracting so many tech companies and talent? Carlos Moedas welcomes Cloudflare to his city — we’re growing fast in the city, and we have more than 80 job openings in the country. He also talks about why Portugal’s capital is so special and should be considered by company leaders who want to grow innovative companies. Paddy Cosgrave, from the Web Summit, told us something similar four weeks ago.

The ambition? “Make Lisbon the capital of innovation of the world” or, at least, of Europe. The new mayor also has a project called Unicorn Factory to achieve just that.

Sudarsan Reddy: Why is Cloudflare Tunnel relevant?

Cloudflare engineer from the Tunnel Team

Also, at the event was our very own engineer Sudarsan Reddy (based in Lisbon). We asked him some questions about Cloudflare Tunnel, our tunneling software that lets you quickly secure and encrypt application traffic to any type of infrastructure, so you can hide your server IP addresses, block direct attacks, and get back to delivering great applications.

Sudarsan focuses on what Tunnel is, why it is relevant, how it works and examples of situations where it can make a difference.

Yusuf Sherwani: Addiction treated online

Co-founder & CEO, Quit Genius

Yusuf graduated as a doctor from Imperial College School of Medicine, in London, but joined two passions, healthcare and technology, when he co-founded Quit Genius. He explains how in just 18 months the pandemic accelerated the adoption of digital health by 10 years, and there’s no going back. “The Internet enables people to unlock improvements to their lives, and digital healthcare went from being convenient to a necessity”.

We dig into the benefits of digital healthcare, but also the scrutiny that is needed in technology, now that it is more powerful than ever and cemented in people’s lives. Yusuf also gives examples of how his digital clinic is helping people in treating tobacco, vaping, alcohol, and opioid addictions.

Yusuf has co-authored 12 peer-reviewed studies on behavioural health and substance addictions. He was featured on the Forbes 30 Under 30 List of 2018 and in Fast Company’s 100 Most Creative People in Business.

David Shrier: From sharing economy to blockchain

American futurist and Professor of Practice, AI & Innovation with Imperial College Business School in London

David sums up how the pandemic has affected people’s relationship with technology: “Everyone is tired of Zoom calls, but the convenience opened people’s minds”.

We also talk about the digital divide, about human-centered ways of working with AI, and we also address the potential in VR and AR and how nobody saw the sharing economy coming 20 years ago and, now, “it’s incredible to see how people embraced blockchain and the digitalization of financial services”.

Dame Til Wykes: The mental health discussion went viral

Professor of Clinical Psychology and Rehabilitation at King’s College London, Director of the NIHR Clinical Research Network: Mental Health

As someone with experience in the psychology field for more than 50 years, Dame Til Wykes still had to learn new ways of engaging with patients throughout the pandemic — and even learn which buttons to push on a computer to make Zoom calls. COVID-19 and the hardships of the pandemic made people more aware and ready to talk about their mental health issues, like anxiety or depression. But the pandemic wasn’t the same for everyone and Dame Til Wykes is worried about some of the effects, “most of them remain to be seen”.

Remote consultations were a big help, but she reminds us that in her field it is important to see the whole person and not just the face — for example, “if someone is tapping a foot nervously while giving us a smile, that tells us something that we cannot see in a Zoom call”. She also mentions the adoption of meditation apps bringing a form of help to some was another positive trend in this difficult period, as well as the reset button the pandemic brought to some people’s lives.

Multi-User IP Address Detection

Post Syndicated from Alex Chen original https://blog.cloudflare.com/multi-user-ip-address-detection/

Multi-User IP Address Detection

Multi-User IP Address Detection

Cloudflare provides our customers with security tools that help them protect their Internet applications against malicious or undesired traffic. Malicious traffic can include scraping content from a website, spamming form submissions, and a variety of other cyberattacks. To protect themselves from these types of threats while minimizing the blocking of legitimate site visitors, Cloudflare’s customers need to be able to identify traffic that might be malicious.

We know some of our customers rely on IP addresses to distinguish between traffic from legitimate users and potentially malicious users. However, in many cases the IP address of a request does not correspond to a particular user or even device. Furthermore, Cloudflare believes that in the long term, the IP address will be an even more unreliable signal for identifying the origin of a request. We envision a day where IP will be completely unassociated with identity. With that vision in mind, multi-user IP address detection represents our first step: pointing out situations where the IP address of a request cannot be assumed to be a single user. This gives our customers the ability to make more judicious decisions when responding to traffic from an IP address, instead of indiscriminately treating that traffic as though it was coming from a single user.

Historically, companies commonly treated IP addresses like mobile phone numbers: each phone number in theory corresponds to a single person. If you get several spam calls within an hour from the same phone number, you might safely assume that phone number represents a single person and ignore future calls or even block that number. Similarly, many Internet security detection engines rely on IP addresses to discern which requests are legitimate and which are malicious.

However, this analogy is flawed and can present a problem for security. In practice, IP addresses are more like postal addresses because they can be shared by more than one person at a time (and because of NAT and CG-NAT the number of people sharing an IP can be very large!). Many existing Internet security tools accept IP addresses as a reliable way to distinguish between site visitors. However, if multiple visitors share the same IP address, security products cannot rely on the IP address as a unique identifying signal. Thousands of requests from thousands of different users need to be treated differently from thousands of requests from the same user. The former is likely normal traffic, while the latter is almost certainly automated, malicious traffic.

Multi-User IP Address Detection

For example, if several people in the same apartment building accessed the same site, it’s possible all of their requests would be routed through a middlebox operated by their Internet service provider that has only one IP address. But this sudden series of requests from the same IP address could closely resemble the behavior of a bot. In this case, IP addresses can’t be used by our customers to distinguish this activity from a real threat, leading them to mistakenly block or challenge their legitimate site visitors.

By adding multi-user IP address detection to Cloudflare products, we’re improving the quality of our detection techniques and reducing false positives for our customers.

Examples of Multi-User IP Addresses

Multi-user IP addresses take on many forms. When your company uses an enterprise VPN, for example, employees may share the same IP address when accessing external websites. Other types of VPNs and proxies also place multiple users behind a single IP address.

Another type of multi-user IP address originated from the core communications protocol of the Internet. IPv4 was developed in the 1980s. The protocol uses a 32-bit address space, allowing for over four billion unique addresses. Today, however, there are many times more devices than IPv4 addresses, meaning that not every device can have a unique IP address. Though IPv6 (IPv4’s successor protocol) solves the problem with 128-bit addresses (supporting 2128 unique addresses), IPv4 still routes the majority of Internet traffic (76% of human-only traffic is IPv4, as shown on Cloudflare Radar).

Multi-User IP Address Detection

To solve this issue, many devices in the same Local Area Network (LAN) can share a single Internet-addressable IP address to communicate with the public Internet, while using private Internet addresses to communicate within the LAN. Since private addresses are to be used only within a LAN, different LANs can number their hosts using the same private IP address space. The Internet gateway of the LAN does the Network Address Translation (NAT), namely takes messages which arrive on that single public IP and forwards them to the private IP of the appropriate device on their local network. In effect it’s similar to how everyone in an office building shares the same street address, and the front desk worker is responsible for sorting out what mail was meant for which person.

While NAT allows multiple devices behind the same Internet gateway to share the same public IP address, the explosive growth of the Internet population necessitated further reuse of the limited IPv4 address space. Internet Service Providers (ISPs) required users in different LANs to share the same IP address for their service to scale. Carrier-Grade Network Address Translation (CG-NAT) emerged as another solution for address space reuse. Network operators can use CG-NAT middleboxes to translate hundreds or thousands of private IPv4 addresses into a single (or pool of) public IPv4 address. However, this sharing is not without side-effects. CG-NAT results in IP addresses that cannot be tied to single devices, users, or broadband subscriptions, creating issues for security products that rely on the IP address as a way to distinguish between requests from different users.

What We Built

We built a tool to help our customers detect when a /24 IP prefix (set of IP addresses that have the same first 24 bits) is likely to contain multi-user IP addresses, so they can more finely tune the security rules that protect their websites. In order to identify multi-user IP prefixes, we leverage both internal data and public data sources. Within this data, we look at a few key parameters.

Multi-User IP Address Detection
Each TCP connection between a source (client) and a destination (server) is identified by 4 identifiers (source IP, source port, destination IP, destination port)

When an Internet user visits a website, the underlying TCP stack opens a number of connections in order to send and receive data from remote servers. Each connection is identified by a 4-tuple (source IP, source port, destination IP, destination port). Repeating requests from the same web client will likely be mapped to the same source port, so the number of distinct source ports can serve as a good indication of the number of distinct client applications. By counting the number of open source ports for a given IP address, you can estimate whether this address is shared by multiple users.

User agents provide device-reported information about themselves such as browser and operating system versions. For multi-user IP detection, you can count the number of distinct user agents in requests from a given IP. To avoid overcounting web clients per device, you can exclude requests that are identified as triggered by bots and we only count requests from user agents that are used by web browsers. There are some tradeoffs to this approach: some users may use multiple web browsers and some other users may have exactly the same user agent. Nevertheless, past research has shown that the number of unique web browser user agents is the best tradeoff to most accurately determine CG-NAT usage.

Mozilla/5.0 (X11; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0

For our inferences, we group IP addresses to their corresponding /24 IP prefix. The figure below shows the distribution of browser User Agents per /24 IP prefix, based on data accumulated over the period of a day. About 35% of the prefixes have more than 100 different browser clients behind them.

Multi-User IP Address Detection

Our service also uses other publicly available data sources to further refine the accuracy of our identification and to classify the type of multi-user IP address. For example, we collect data from PeeringDB, which is a database where network operators self-identify their network type, traffic levels, interconnection points, and peering policy. This data only covers a fraction of the Internet’s autonomous systems (ASes). To overcome this limitation, we use this data and our own data (number of requests per AS, number of websites in each AS) to infer AS type. We also use external data sources such as IRR to identify requests from VPNs and proxy servers.

These details (especially AS type) can provide more information on the type of multi-user IP address. For instance, CG-NAT systems are almost exclusively deployed by broadband providers, so by inferring the AS type (ISP, CDN, Enterprise, etc.), we can more confidently infer the type of each multi-user IP address. A scheduled job periodically executes code to pull data from these sources, process it, and write the list of multi-user IP addresses to a database. That IP info data is then ingested by another system that deploys it to Cloudflare’s edge, enabling our security products to detect potential threats with minimal latency.

To validate our inferences for which IP addresses are multi-user, we created a dataset relying on separate data and measurements which we believe are more reliable indicators. One method we used was running traceroute queries through RIPE Atlas, from each RIPE Atlas probe to the probe’s public IP address. By examining the traceroute hops, we can determine if an IP is behind a CG-NAT or another middlebox. For example, if an IP is not behind a CG-NAT, the traceroute should terminate immediately or just have one hop (likely a home NAT). On the other hand, if a traceroute path includes addresses within the RFC 6598 CGNAT prefix or other hops in the private or shared address space, it is likely the corresponding probe is behind CG-NAT.

To further improve our validation datasets, we’re also reaching out to our ISP partners to confirm the known IP addresses of CG-NATs. As we refine our validation data, we can more accurately tune our multi-user IP address inference parameters and provide a better experience to ISP customers on sites protected by Cloudflare security products.

The multi-user IP detection service currently recognizes approximately 500,000 unique multi-user IP addresses and is being tuned to further improve detection accuracy. Be on the lookout for an upcoming technical blog post, where we will take a deeper look at the system we built and the metrics collected after running this service for a longer period of time.

How Will This Impact Bot Management and Rate Limiting Customers?

Our initial launch will integrate multi-user IP address detection into our Bot Management and Rate Limiting products.

Multi-User IP Address Detection
The three modules that comprise the bot detection system. 

The Cloudflare Bot Management product has five detection mechanisms. The integration will improve three of the five: the machine learning (ML) detection mechanism, the heuristics engine, and the behavioral analysis models. Multi-user IP addresses and their types will serve as additional features to train our ML model. Furthermore, logic will be added to ensure multi-user IP addresses are treated differently in our other detection mechanisms. For instance, our behavioral analysis detection mechanism shouldn’t treat a series of requests from a multi-user IP the same as a series of requests from a single-user IP. There won’t be any new ways to see or interact with this feature, but you should expect to see a decrease in false positive bot detections involving multi-user IP addresses.

The integration with Rate Limiting will allow us to increase the set rate limiting threshold when receiving requests coming from multi-user IP addresses. The factor by which we increase the threshold will be conservative so as not to completely bypass the rate limit. However, the increased threshold should greatly reduce cases where legitimate users behind multi-user IP addresses are blocked or challenged.

Looking Forward

We plan to further integrate across all of Cloudflare’s products that rely upon IP addresses as a measure of uniqueness, including but not limited to DDoS Protection, Cloudflare One Intel, and Web Application Firewall.

We will also continue to make improvements to our multi-user IP address detection system to incorporate additional data sources and improve accuracy. One data source would allow us to get a fraction for the estimated number of subscribers over the total number of IPs advertised (owned) by an AS. ASes that have more estimated subscribers than available IPs would have to rely on CG-NAT to provide service to all subscribers.

As mentioned above, with the help of our ISP partners we hope to improve the validation datasets we use to test and refine the accuracy of our inferences. Additionally, our integration with Bot Management will also unlock an opportunity to create a feedback loop that further validates our datasets. The challenge solve rate (CSR) is a metric generated by Bot Management that indicates the proportion of requests that were challenged and solved (and thus assumed to be human). Examining requests with both high and low CSRs will allow us to check if the multi-user IP addresses we have initially identified indeed represent mostly legitimate human traffic that our customers should not block.

The continued adoption of IPv6 might someday make CG-NATs and other IPv4 sharing technologies irrelevant, as the address space will no longer be limited. This could reduce the prevalence of multi-user IP addresses. However, with the development of new networking technologies that obfuscate IP addresses for user privacy (for example, IPv6 randomized address assignment), it seems unlikely it will become any easier to tie an IP address to a single user. Cloudflare firmly believes that eventually, IP will be completely unassociated with identity.

Yet in the short term, we recognize that IP addresses still play a pivotal role for the security of our customers. By integrating this multi-user IP address detection capability into our products, we aim to deliver a more free and fluid experience for everyone using the Internet.

Coalescing Connections to Improve Network Privacy and Performance

Post Syndicated from Talha Paracha original https://blog.cloudflare.com/connection-coalescing-experiments/

Coalescing Connections to Improve Network Privacy and Performance

Coalescing Connections to Improve Network Privacy and Performance

Web pages typically have a large number of embedded subresources (e.g., JavaScript, CSS, image files, ads, beacons) that are fetched by a browser on page loads. Requests for these subresources can prompt browsers to perform further DNS lookups, TCP connections, and TLS handshakes, which can have a significant impact on how long it takes for the user to see the content and interact with the page. Further, each additional request exposes metadata (such as plaintext DNS queries, or unencrypted SNI in TLS handshake) which can have potential privacy implications for the user. With these factors in mind, we carried out a measurement study to understand how we can leverage Connection Coalescing (aka Connection Reuse) to address such concerns, and study its feasibility.

Background

The web has come a long way and initially consisted of very simple protocols. One of them was HTTP/1.0, which required browsers to make a separate connection for every subresource on the page. This design was quickly recognized as having significant performance bottlenecks and was extended with HTTP pipelining and persistent connections in HTTP/1.1 revision, which allowed HTTP requests to reuse the same TCP connection. But, yet again, this was no silver bullet: while multiple requests could share the same connection, they still had to be serialized one after the other, so a client and server could only execute a single request/response exchange at any given time for each connection. As time passed, websites became more complex in structure and dynamic in nature, and HTTP/1.1 was identified as a major bottleneck. The only way to gain concurrency at the network layer was to use multiple TCP connections to the same origin in parallel, but this meant losing most benefits of persistent connections and ended up overloading the origin servers which were unable to meet the concurrency demand.

To address these performance limitations, the SPDY protocol was introduced over a decade later. SPDY supported stream multiplexing, where requests to and responses from the server used a single interleaved TCP connection, and allowed browsers to prioritize requests for critical subresources first — that were blocking page rendering. A modified variant of SPDY was standardized by the IETF as HTTP/2 in 2012 and published as RFC 7540 in 2015.

HTTP/2 and onwards retained this new standard for connection reuse. More specifically, all subresources on the same domain were able to reuse the same TCP/TLS (or UDP/QUIC) connection without any head-of-line blocking (at least on the application layer). This resulted in a single connection for all the subresources — reducing extraneous requests on page loads — potentially speeding up some websites and applications.

Interestingly, the protocol has a lesser-known feature to also enable subresources at different hostnames to be fetched over the same connection. We studied the real-world feasibility and benefits of this technique as an effort to improve users’ experience for websites across our network.

Coalescing Connections to Improve Network Privacy and Performance
Connection Coalescing allows reusing a TLS connection across different domains

Connection Coalescing

The technique is often referred to as Connection Coalescing and, to put it simply, is a way to access resources from different hostnames that are accessible from the same web server.

There are several reasons for why a single server could handle requests for different hosts, ranging from low-cost virtual hosting to the usage of CDNs and cloud providers (including Cloudflare, that acts as a reverse proxy for approximately 25 million Internet properties). Before going into the technical conditions required to enable connection coalescing, we should take a look at some benefits such a strategy can provide.

  • Privacy. When resources at different hostnames are loaded via separate TLS connections, those connections expose metadata to ISPs and other observers via the Server Name Indicator (SNI) field about the destinations that are being contacted (i.e., in the absence of encrypted SNI). This set of exposed SNI’s can allow an on-path adversary to fingerprint traffic and possibly determine user interactions on the webpage. On the other hand, coalesced requests for more than one hostname on a single connection exposes only one destination, and helps avoid such threats.
  • Performance. Additional TLS handshakes and TCP connections can incur significant costs in terms of cpu, memory and other resources. Thus, coalescing requests to use the same connection can optimize resource utilization.
  • Resource Prioritization. Multiplexing requests on a single connection means that applications have better visibility and more direct control over how related resources are prioritized and scheduled. In the absence of coalescing, the network properties (for example, route congestion) can interfere with the intended order of delivery for resources. This reliability gained through connection coalescing opens up new optimization opportunities to improve web page load times, among other things.

However, along with all these potential benefits, connection coalescing also has some associated risk factors that need to be considered in practice. First, TCP incorporates “fair” congestion control mechanisms — if there are ten connections on the same route, each gets approximately 1/10th of the total bandwidth. So with a route congested and bandwidth restricted, a client relying on multiple connections might be better off (for example, if they have five of the ten connections, their total share of bandwidth would be half). Second, browsers will use different parallelization routines for scheduling requests on multiple connections versus the same connection — it is not immediately clear whether the former or latter would perform better. Third, multiple connections exhibit an inherent form of load balancing for TLS-termination processes. That’s because multiple requests on the same connection must be answered by the same TLS-termination process that holds the session keys (often on the same physical server). So, it is important to study connection coalescing carefully before rolling it out widely.

With this context in mind, we studied the feasibility of connection coalescing on real-world traffic. More specifically, the two questions we wanted to answer were
(a) can we empirically demonstrate and quantify the theoretical benefits of connection coalescing?, and (b) could coalescing cause unintended side effects, such as performance degradation, due to the risks highlighted above?

In order to answer these questions, we first made the observation that a large number of Cloudflare customers request subresources from cdnjs — which is also powered by Cloudflare. For context, cdnjs has public JavaScript and CSS libraries (like jQuery), and is used by more than 12% of all websites on the Internet. One popular way these websites include resources from cdnjs is by using <script src="https://cdnjs.cloudflare.com/..." ></script> HTML tags. But there are other ways as well, such as the usage of XMLHttpRequest or Fetch APIs. Regardless of the way these resources are included, browsers will need to fetch them for completely loading a website.

We then identified a list of approximately four thousand websites using Cloudflare (on the Free plan) that likely used cdnjs. We divided this list of sites into evenly-sized and randomly-picked control and experiment groups. Our plan was to enable coalescing only for the experiment group, so that subresource requests generated from their web pages for cdnjs could reuse existing connections. In this way, we were able to compare results obtained on the experiment group, with the ones for the control group, and attribute any differences observed to connection coalescing.

In order to signal browsers that the requests can be coalesced, we served cdnjs and the sites from the same IP address in a few regions around the world. This meant the same DNS responses for all the zones that were part of the study — eventually load balanced by our Anycast network. These sites also had TLS certificates that included cdnjs.

The above two conditions (same IP and compatible certificate) are required to achieve coalescing as per the HTTP/2 spec. However, the QUIC spec allows coalescing even if only the second condition is met. Major web browsers are yet to adopt the QUIC coalescing mechanism, and currently use only the HTTP/2 coalescing logic for both protocols.

Coalescing Connections to Improve Network Privacy and Performance
Requests to Experiment Group Zones and cdnjs being coalesced on the same TLS connection

Results

We started noticing evidence of real-world coalescing from the day our experiment was launched. The following graph shows that approximately 50% of requests to cdnjs from our experiment group sites are coalesced (i.e., their TLS SNI does not equal cdnjs) as compared to 0% of requests from the control group sites.

Coalescing Connections to Improve Network Privacy and Performance
Coalesced Requests to cdnjs from Control and Experimental Group Zones

In addition, we conducted active measurements using our private WebPageTest instances at the landing pages of experiment and control sites — using the two well-supported browsers: Google Chrome and Firefox. From our results, Chrome created about 78% fewer TLS connections to cdnjs for our experiment group sites, as compared to the control group. But surprisingly, Firefox created just roughly 22% fewer connections. As TLS handshakes are computationally expensive because they involve cryptographic signatures and key exchange algorithms, fewer handshakes meant less CPU cycles spent by both the client and the server.

Upon further analysis, we were able to make two observations from the data:

  • A fraction of sites that never coalesced connections with either browser appeared to load subresources with CORS enabled (i.e., <script src="https://cdnjs.cloudflare.com/..." integrity="sha512-894Y..." crossorigin="anonymous">). This is the default way cdnjs recommends inclusion of subresources, as CORS is needed for integrity checks that provide substantial mitigations against script-manipulation attacks. We do not recommend removing this attribute. Our testing also revealed that using XMLHttpRequest or Fetch APIs to load subresources disabled coalescing as well. It is unclear why browsers choose to not coalesce such connections, and we are in contact with the vendors to find out.
  • Although both Firefox and Chrome coalesced requests for cdnjs on existing connections, the reason for the discrepancy in the number of TLS connections to cdnjs (approximately 78% vs roughly 22%) is because Firefox appears to open new connections even if it does not end up using them.

After evaluating the potential benefits of coalescing, we wanted to understand if coalescing caused any unintended side effects. Hence, the final measurement we conducted was to check whether our experiments were detrimental to a website’s performance. We tracked Page Load Times (PLT) and Largest Contentful Paint (LCP) across a variety of stimulated network conditions using both Chrome and Firefox and found the results for experiment vs control group to not be statistically significant.

Coalescing Connections to Improve Network Privacy and Performance
Page load times for control and experiment group sites. Each site was loaded once, and the “fullyLoaded” metric from WebPageTest is reported

Conclusion

We consider our experimentation successful in determining the feasibility of connection coalescing and highlighting its potential benefits in terms of privacy and performance. More specifically, we observed the privacy benefits of coalescing in more than 50% of requests to cdnjs from real-world traffic. In addition, our active testing demonstrated that browsers create fewer TLS connections with coalescing enabled. Interestingly, our results also revealed that the benefits might not always occur (i.e., CORS-enabled requests, Firefox creating additional TLS connections despite coalescing). Finally, we did not find any evidence that coalescing can cause harm to real-world users’ experience on the Internet.

Some future directions we would like to explore include:

  • More aggressive connection reuse with multiple hostnames, while identifying conditions most suitable for coalescing.
  • Understanding how different connection reuse methods compare, e.g., IP-based coalescing vs. use of Origin Frames, and what effects do they have on user experience over the Internet.
  • Evaluating coalescing support among different browser vendors, and encouraging adoption of HTTP/3 QUIC based coalescing.
  • Reaping the full benefits of connection coalescing by experimenting with custom priority schemes for requests within the same connection.

Please send questions and feedback to [email protected]. We’re excited to continue this line of work in our effort to help build a better Internet! For those interested in joining our team please visit our Careers Page.

Project Myriagon: Cloudflare Passes 10,000 Connected Networks

Post Syndicated from Ticiane Takami original https://blog.cloudflare.com/10000-networks-and-beyond/

Project Myriagon: Cloudflare Passes 10,000 Connected Networks

Project Myriagon: Cloudflare Passes 10,000 Connected Networks

During Speed Week, we’ve talked a lot about the products we’ve improved and the places we’ve expanded to. Today, we have a final exciting announcement: Cloudflare now connects with more than 10,000 other networks. Put another way, over 10,000 networks have direct on-ramps to the Cloudflare network.

This is the culmination of a special project we’ve been working on for the last few months dubbed Project Myriagon, a reference to the 10,000-sided polygon of the same name. In going about this project, we have learned a lot about the performance impact of adding more direct connections to our network — in one recent case, we saw a 90% reduction in median round-trip end-user latency.

But to really explain why this is such a big milestone, we first need to explain a bit about how the Internet works.

More roads leading to Rome

The Internet that all know and rely on is, on a basic level, an interconnected series of independently run local networks. Each network is defined as its own “autonomous system.” These networks are delineated numerically with Autonomous Systems Numbers, or ASNs. An ASN is like the Internet version of a zip code, a short number directly mapping to a distinct region of IP space using a clearly defined methodology. Network interconnection is all about bringing together different ASNs to exponentially multiply the number of possible paths between source and destination.

Most of us have home networks behind a modem and router, connecting your individual miniature network to your ISP. Your ISP then connects with other networks, to fetch the web pages or other Internet traffic you request. These networks in turn have connections to different networks, who in turn connect to interconnected networks, and so on, until your data reaches its destination. The fewer networks your request has to traverse, generally, the lower the end-to-end latency and odds that something will get lost along the way.

The average number of hops between any one network on the Internet to any other network is around 5.7 and 4.7, for the IPv4 and IPv6 networks respectively.

Project Myriagon: Cloudflare Passes 10,000 Connected Networks
Project Myriagon: Cloudflare Passes 10,000 Connected Networks
Source: https://blog.apnic.net/2020/01/14/bgp-in-2019-the-bgp-table/

How do ASNs work?

ASNs are a key part of the routing protocol that directs traffic along the Internet, BGP. Internet Assigned Numbers Authority (IANA), the global coordinator of the DNS Root, IP addressing, and other Internet protocol resources like AS Numbers, delegates ASN-making authority to Regional Internet Registries (RIRs), who in turn assign individual ASNs to network operators in line with their regional policies. The five RIRs are AFRINIC, APNIC, ARIN, LACNIC and RIPE, each entitled to assign and attribute ASN numbers in their respective appointed regions.

Cloudflare’s ASN is 13335, one of the approximately 70,000 ASNs advertised on the Internet. While we’d like to — and plan on — connecting to every one of these ASNs eventually, our team tries to prioritize those with the greatest impact on our overall breadth and improving the proximity to as many people on Earth as possible.

As enabling optimal routes is key to our core business and services, we continuously track how many ASNs we connect to (technically referred to as “adjacent networks”). With Project Myriagon, we aimed to speed up our rate of interconnection and pass 10,000 adjacent networks by the end of the year. By September 2021, we reached that milestone, bringing us from 8,300 at the start of 2020 to over 10,000 today.

As shown in the table below, that milestone is part of a continuous effort towards gradually hitting more of the total advertised ASNs on the Internet.

Project Myriagon: Cloudflare Passes 10,000 Connected Networks
The Regional Internet Registries and their Regions

Table 1: Cloudflare’s peer ASNs and their respective RIR

Project Myriagon: Cloudflare Passes 10,000 Connected Networks

Given that there are 70,000+ ASNs out there, you might be wondering: why is 10,000 a big deal? To understand this, we need to look deeply at BGP, the protocol that glues the Internet together. There are three different classes of ASNs:

  • Transit Only ASNs: these networks only provide connectivity to other networks. They don’t have any IP addresses inside their networks. These networks are quite rare, as it’s very unusual to not have any IP addresses inside your network. Instead, these networks are often used primarily for distinct management purposes within a single organization.
  • Origin Only ASNs: these are networks that do not provide connectivity to other networks. They are a stub network, and often, like your home network, only connected to a single ISP.
  • Mixed ASNs: these networks both have IP addresses inside their network, and provide connectivity to other networks.

Origin Only ASNs Mixed ASNs Transit Only ASNs
61,127 11,128 443

Source: https://bgp.potaroo.net/as6447/

One interesting fact: of the 61,127 origin only ASNs, nearly 43,000 of them are only connected to their ISP. As such, our direct connections to over 10,000 networks indicates that of the networks that connect more than one network, a very good percentage are now already connected to Cloudflare.

Cutting out the middle man

Directly connecting to a network — and eliminating the hops in between — can greatly improve performance in two ways. First, connecting with a network directly allows for Internet traffic to be exchanged locally rather than detouring through remote cities; and secondly, direct connections help avoid the congestion caused by bottlenecks that sometimes happen between networks.

To take a recent real-world example, turning up a direct peering session caused a 90% improvement in median end-user latency when turning up a peering session with a European network, from an average of 76ms to an average of 7ms.

Project Myriagon: Cloudflare Passes 10,000 Connected Networks
Immediate 90% improvement in median end-user latency after peering with a new network. 

By using our own on-ramps to other networks, we both ensure superior performance for our users and avoid adding load and causing congestion on the Internet at large.

And AS13335 is just getting started

Project Myriagon: Cloudflare Passes 10,000 Connected Networks

Cloudflare is an anycast network, meaning that the better connected we are, the faster and better-protected we are — obviating legacy concepts like scrubbing centers and slow origins. Hitting five digits of connected networks is something we’re really proud of as a step on our goal to helping to build a better Internet. As we’ve mentioned throughout the week, we’re all about high speed without having to pay a security or reliability cost.

There’s still work to do! While Project Myriagon has brought us, we believe, to be one of the top 5 most connected networks in the world, we estimate Google is connected to 12,000-15,000 networks. And so, today, we are kicking off Project CatchG. We won’t rest until we’re #1.

Interested in peering with us to help build a better Internet? Reach out to [email protected] with your request. More details on the locations we are present at can be found at http://as13335.peeringdb.com/.

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

Post Syndicated from Tanner Ryan original https://blog.cloudflare.com/cloudflare-backbone-internet-fast-lane/

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

The Internet is an amazing place. It’s a communication superhighway, allowing people and machines to exchange exabytes of information every day. But it’s not without its share of issues: whether it’s DDoS attacks, route leaks, cable cuts, or packet loss, the components of the Internet do not always work as intended.

The reason Cloudflare exists is to help solve these problems. As we continue to grow our rapidly expanding global network in more than 250 cities, while directly connecting with more than 9,800 networks, it’s important that our network continues to help bring improved performance and resiliency to the Internet. To accomplish this, we built our own backbone. Other than improving redundancy, the immediate advantage to you as a Cloudflare user? It can reduce your website loading times by up to 45% — and you don’t have to do a thing.

The Cloudflare Backbone

We began building out our global backbone in 2018. It comprises a network of long-distance fiber optic cables connecting various Cloudflare data centers across North America, South America, Europe, and Asia. This also includes Cloudflare’s metro fiber network, directly connecting data centers within a metropolitan area.

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

Our backbone is a dedicated network, providing guaranteed network capacity and consistent latency between various locations. It gives us the ability to securely, reliably, and quickly route packets between our data centers, without having to rely on other networks.

This dedicated network can be thought of as a fast lane on a busy highway. When traffic in the normal lanes of the highway encounter slowdowns from congestion and accidents, vehicles can make use of a fast lane to bypass the traffic and get to their destination on time.

Our software-defined network is like a smart GPS device, as we’re always calculating the performance of routes between various networks. If a route on the public Internet becomes congested or unavailable, our network automatically adjusts routing preferences in real-time to make use of all routes we have available, including our dedicated backbone, helping to deliver your network packets to the destination as fast as we can.

Measuring backbone improvements

As we grow our global infrastructure, it’s important that we analyze our network to quantify the impact we’re having on performance.

Here’s a simple, real-world test we’ve used to validate that our backbone helps speed up our global network. We deployed a simple API service hosted on a public cloud provider, located in Chicago, Illinois. Once placed behind Cloudflare, we performed benchmarks from various geographic locations with the backbone disabled and enabled to measure the change in performance.

Instead of comparing the difference in latency our backbone creates, it is important that our experiment captures a real-world performance gain that an API service or website would experience. To validate this, our primary metric is measuring the average request time when accessing an API service from Miami, Seattle, San Jose, São Paulo, and Tokyo. To capture the response of the network itself, we disabled caching on the Cloudflare dashboard and sent 100 requests from each testing location, both while forcing traffic through our backbone, and through the public Internet.

Now, before we claim our backbone solves all Internet problems, you can probably notice that for some tests (Seattle, WA and San Jose, CA), there was actually an increase in response time when we forced traffic through the backbone. Since latency is directly proportional to the distance of fiber optic cables, and since we have over 9,800 direct connections with other Internet networks, there is a possibility that an uncongested path on the public Internet might be geographically shorter, causing this speedup compared to our backbone.

Luckily for us, we have technologies like Argo Smart Routing, Argo Tiered Caching, WARP+, and most recently announced Orpheus, which dynamically calculates the performance of each route at our data centers, choosing the fastest healthy route at that time. What might be the fastest path during this test may not be the fastest at the time you are reading this.

With that disclaimer out of the way, now onto the test.

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

With the backbone disabled, if a visitor from São Paulo performed a request to our service, they would be routed to our São Paulo data center via BGP Anycast. With caching disabled, our São Paulo data center forwarded the request over the public Internet to the origin server in Chicago. On average, the entire process to fetch data from the origin server and return to the response to the requesting user took 335.8 milliseconds.

Once the backbone was enabled and requests were created, our software performed tests to determine the fastest healthy route to the origin, whether it was a route on the public Internet or through our private backbone. For this test the backbone was faster, resulting in an average total request time of 230.2 milliseconds. Just by routing the request through our private backbone, we improved the average response time by 31%.

We saw even better improvement when testing from Tokyo. When routing the request over the public Internet, the request took an average of 424 milliseconds. By enabling our backbone which created a faster path, the request took an average of 234 milliseconds, creating an average response time improvement of 44%.

Visitor Location Distance to Chicago Avg. response time using public Internet (ms) Avg. response using backbone (ms) Change in response time
Miami, FL, US 1917 km 84 75 10.7% decrease
Seattle, WA, US 2785 km 118 124 5.1% increase
San Jose, CA, US 2856 km 122 132 8.2% increase
São Paulo, BR 8403 km 336 230 31.5% decrease
Tokyo JP 10129 km 424 234 44.8% decrease

We also observed a smaller deviation in the response time of packets routed through our backbone over larger distances.

Cloudflare Backbone: A Fast Lane on the Busy Internet Highway

Our next generation network

Cloudflare is built on top of lossy, unreliable networks that we do not have control over. It’s our software that turns these traditional tubes of the Internet into a smart, high performing, and reliable network Cloudflare customers get to use today. Coupled with our new, but rapidly expanding backbone, it is this software that produces significant performance gains over traditional Internet networks.

Whether you visit a website powered by Cloudflare’s Argo Smart Routing, Argo Tiered Caching, Orpheus, or use our 1.1.1.1 service with WARP+ to access the Internet, you get direct access to the Internet fast lane we call the Cloudflare backbone.

For Cloudflare, a better Internet means improving Internet security, reliability, and performance. The backbone gives us the ability to build out our network in areas that have typically lacked infrastructure investments by other networks. Even with issues on the public Internet, these initiatives allow us to be located within 50 milliseconds of 95% of the Internet connected population.

In addition to our growing global infrastructure providing 1.1.1.1, WARP, Roughtime, NTP, IPFS Gateway, Drand, and F-Root to the greater Internet, it’s important that we extend our services to those who are most vulnerable. This is why we extend all our infrastructure benefits directly to the community, through projects like Galileo, Athenian, Fair Shot, and Pangea.

And while these thousands of fiber optic connections are already fixing today’s Internet issues, we truly are just getting started.

Want to help build the future Internet? Networks that are faster, safer, and more reliable than they are today? The Cloudflare Infrastructure team is currently hiring!

If you operate an ISP or transit network and would like to bring your users faster and more reliable access to websites and services powered by Cloudflare’s rapidly expanding network, please reach out to our Edge Partnerships team at [email protected].

Introducing Project Fair Shot: Ensuring COVID-19 Vaccine Registration Sites Can Keep Up With Demand

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/project-fair-shot/

Introducing Project Fair Shot: Ensuring COVID-19 Vaccine Registration Sites Can Keep Up With Demand

Introducing Project Fair Shot: Ensuring COVID-19 Vaccine Registration Sites Can Keep Up With Demand

Around the world government and medical organizations are struggling with one of the most difficult logistics challenges in history: equitably and efficiently distributing the COVID-19 vaccine. There are challenges around communicating who is eligible to be vaccinated, registering those who are eligible for appointments, ensuring they show up for their appointments, transporting the vaccine under the required handling conditions, ensuring that there are trained personnel to administer the vaccine, and then doing it all over again as most of the vaccines require two doses.

Cloudflare can’t help with most of that problem, but there is one key part that we realized we could help facilitate: ensuring that registration websites don’t crash under load when they first begin scheduling vaccine appointments. Project Fair Shot provides Cloudflare’s new Waiting Room service for free for any government, municipality, hospital, pharmacy, or other organization responsible for distributing COVID-19 vaccines. It is open to eligible organizations around the world and will remain free until at least July 1, 2021 or longer if there is still more demand for appointments for the vaccine than there is supply.

Crashing Registration Websites

The problem of vaccine scheduling registration websites crashing under load isn’t theoretical: it is happening over and over as organizations attempt to schedule the administration of the vaccine. This hit home at Cloudflare last weekend. The wife of one of our senior team members was trying to register her parents to receive the vaccine. They met all the criteria and the municipality where they lived was scheduled to open appointments at noon.

When the time came for the site to open, it immediately crashed. The cause wasn’t hackers or malicious activity. It was merely that so many people were trying to access the site at once. “Why doesn’t Cloudflare build a service that organizes a queue into an orderly fashion so these sites don’t get overwhelmed?” she asked her husband.

A Virtual Waiting Room

Turns out, we were already working on such a feature, but not for this use case. The problem of fairly distributing something where there is more demand than supply comes up with several of our clients. Whether selling tickets to a hot concert, the latest new sneaker, or access to popular national park hikes it is a difficult challenge to ensure that everyone eligible has a fair chance.

The solution is to open registration to acquire the scarce item ahead of the actual sale. Anyone who visits the site ahead of time can be put into a queue. The moment before the sale opens, the order of the queue can be randomly (and fairly) shuffled. People can then be let in in order of their new, random position in the queue — allowing only so many at any time as the backend of the site can handle.

At Cloudflare, we were building this functionality for our customers as a feature called Waiting Room. (You can learn more about the technical details of Waiting Room in this post by Brian Batraski who helped build it.) The technology is powerful because it can be used in front of any existing web registration site without needing any code changes or hardware installation. Simply deploy Cloudflare through a simple DNS change and then configure Waiting Room to ensure any transactional site, no matter how meagerly resourced, can keep up with demand.

Recognizing a Critical Need; Moving Up the Launch

We planned to release it in February. Then, when we saw vaccine sites crashing under load and frustration of people eligible for the vaccine building, we realized we needed to move the launch up and offer the service for free to organizations struggling to fairly distribute the vaccine. With that, Project Fair Shot was born.

Government, municipal, hospital, pharmacy, clinic, and any other organizations charged with scheduling appointments to distribute the vaccine can apply to participate in Project Fair Shot by visiting: projectfairshot.org

Giving Front Line Organizations the Technical Resources They Need

The service will be free for qualified organizations at least until July 1, 2021 or longer if there is still more demand for appointments for the vaccine than there is supply. We are not experts in medical cold storage and I get squeamish at the sight of needles, so we can’t help with many of the logistical challenges of distributing the vaccine. But, seeing how we could support this aspect, our team knew we needed to do all we could to help.

The superheroes of this crisis are the medical professionals who are taking care of the sick and the scientists who so quickly invented these miraculous vaccines. We’re proud of the supporting role Cloudflare has played helping ensure the Internet has continued to function well when the world needed it most. Project Fair Shot is one more way we are living up to our mission of helping build a better Internet.

Cloudflare Waiting Room

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/cloudflare-waiting-room/

Cloudflare Waiting Room

Cloudflare Waiting Room

Today, we are excited to announce Cloudflare Waiting Room! It will first be available to select customers through a new program called Project Fair Shot which aims to help with the problem of overwhelming demand for COVID-19 vaccinations causing appointment registration websites to fail. General availability in our Business and Enterprise plans will be added in the near future.

Wait, you’re excited about a… Waiting Room?

Most of us are familiar with the concept of a waiting room, and rarely are we excited about the idea of being in one. Usually our first experience of one is at a doctor’s office — yes, you have an appointment, but sometimes the doctor is running late (or one of the patients was). Given the doctor can only see one person at a time… the waiting room was born, as a mechanism to queue up patients.

While servers can handle more concurrent requests than a doctor can, they too can be overwhelmed. If, in a pre-COVID world, you’ve ever tried buying tickets to a popular concert or event, you’ve probably encountered a waiting room online. It limits requests inbound to an application, and places these requests into a virtual queue. Once the number of users in the application has reduced, new users are let in within the defined thresholds the application can handle. This protects the origin servers supporting the application from being inundated with too many requests, while also ensuring equity from a user perspective — users who try to access a resource when the system is overloaded are not unfairly dropped and forced to reconnect, hoping to join their chance in the queue.

Why Now?

Given not many of us are going to live concerts any time soon, why is Cloudflare doing this now?

Well, perhaps we aren’t going to concerts, but the second order effects of COVID-19 have created a huge need for waiting rooms. First of all, given social distancing and the closing of many places of business and government, customers and citizens have shifted to online channels, putting substantially more strain on business and government infrastructure.

Second, the pandemic and the flow-on consequences of it have meant many folks around the world have come to rely on resources that they didn’t need twelve months earlier. To be specific, these are often health or government-related resources — for example, unemployment insurance websites. The online infrastructure was set up to handle a peak load that didn’t foresee the impact of COVID-19. We’re seeing a similar pattern emerge with websites that are related to vaccines.

Historically, the number of organizations that needed waiting rooms was quite small. The nature of most businesses online usually involve a more consistent user load, rather than huge crushes of people all at once. Those organizations were able to build custom waiting rooms and were integrated deeply into their application (for example, buying tickets).  With Cloudflare’s Waiting Room, no code changes to the application are necessary and a Waiting Room can be set up in a matter of minutes for any website without writing a single line of code.

Whether you are an engineering architect or a business operations analyst, setting up a Waiting Room is simple. We make it quick and easy to ensure your applications are reliable and protected from unexpected spikes in traffic.  Other features we felt were important are automatic enablement and dynamic outflow. In other words, a waiting room should turn on automatically when thresholds are exceeded and as users finish their tasks in the application, let out different sized buckets of users and intake new ones already in the queue. It should just work. Lastly, we’ve seen the major impact COVID-19 has made on users and businesses alike, especially, but not limited to, the health and government sectors. We wanted to provide another way to ensure these applications remain available and functional so all users can receive the care that they need and not errors within their browser.

How does Cloudflare’s Waiting Room work?

We built Waiting Room on top of our edge network and our Workers product. By leveraging Workers and our new Durable Objects offerings, we were able to remove the need for any customer coding and provide a seamless, out of the box product that will ‘just work’. On top of this, we get the benefits of the scale and performance of our Workers product to ensure we maintain extremely low latency overhead, keep estimated times presented to end users accurate as can be and not keep any user in the queue longer than needed. But building a centralized system in a decentralized network is no easy task. When requests come into an application from around the world, we need to be able to get a broad, accurate view of what that load looks like inbound and outbound to a given application.

Cloudflare Waiting Room
Request going through Cloudflare without a Waiting Room

These requests, as fast as they are, still take time to travel across the planet. And so, a unique edge case was presented. What if a website is getting reasonable traffic from North America and Europe, but then a sudden major spike of traffic takes place from South America – how do we know when to keep letting users into the application and when to kick in the Waiting Room to protect the origin servers from being overloaded?

Thanks to some clever engineering and our Workers product, we were able to create a system that almost immediately keeps itself synced with global demand to an application giving us the necessary insight into when we should and should not be queueing users into the Waiting Room. By leveraging our global Anycast network and over 200+ data centers, we remove any single point of failure to protect our customers’ infrastructure yet also provide a great experience to end-users who have to wait a small amount of time to enter the application under high load.

Cloudflare Waiting Room
Request going through Cloudflare with a Waiting Room

How to setup a Waiting Room

Setting up a Waiting Room is incredibly easy and very fast! At the easiest side of the scale, a user needs to fill out only five fields: 1) the name of the Waiting Room, 2) a hostname (which will already be pre-populated with the zone it’s being configured on), 3) the total active users that can be in the application at any given time, 4) the new users per minute allowed into the application, and 5) the session duration for any given user. No coding or any application changes are necessary.

Cloudflare Waiting Room

We provide the option of using our default Waiting Room template for customers who don’t want to add additional branding. This simplifies the process of getting a Waiting Room up and running.

Cloudflare Waiting Room

That’s it! Press save and the Waiting Room is ready to go!

Cloudflare Waiting Room

For customers with more time and technical ability, the same process is followed, except we give full customization capabilities to our users so they can brand the Waiting Room, ensuring it matches the look and feel of their overall product.

Cloudflare Waiting Room

Lastly, managing different Waiting Rooms is incredibly easy. With our Manage Waiting Room table, at a glance you are able to get a full snapshot of which rooms are actively queueing, not queueing, and/or disabled.

Cloudflare Waiting Room

We are very excited to put the power of our Waiting Room into the hands of our customers to ensure they continue to focus on their businesses and customers. Keep an eye out for another blog post coming soon with major updates to our Waiting Room product for Enterprise!