Tag Archives: Network Performance Update

Network performance update: Security Week 2024

2024-03-08 David Tuber

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-security-week-2024

We constantly measure our own network’s performance against other networks, look for ways to improve our performance compared to them, and share the results of our efforts. Since June 2021, we’ve been sharing benchmarking results we’ve run against other networks to see how we compare.

In this post we are going to share the most recent updates since our last post in September, and talk about how we are getting as fast as we are.

How we stack up

Since June 2021, we’ve been taking a close look at the most reported eyeball-facing ISPs and taking actions for the specific networks where we have some room for improvement. Cloudflare was already the fastest provider for TCP Connection time at the 95th percentile for 44% of the networks around the world (we define a network as country and AS number pair). We chose this metric to show how our network helps make your websites faster by getting you to where your customers are. Taking a look at the numbers, in July 2022, Cloudflare was ranked #1 in 33% of the networks and was within 2 ms (95th percentile TCP Connection Time) or 5% of the #1 provider for 8% of the networks that we measured. For reference, our closest competitor was the fastest for 20% of networks.

As of August 30, 2023, Cloudflare was the fastest provider for 44% of networks — and was within 2 ms (95th percentile TCP Connection Time) or 5% of the fastest provider for 10% of the networks that we measured—whereas our closest competitor (Amazon Cloudfront) was the fastest for 19% of networks. As of February 15, 2024, we are still #1 in 44% of networks for 95th percentile TCP Connection Time. Let’s dig into the data.

Lightning fast

Looking at 95th percentile TCP connect times from November 18, 2023, to February 15, 2024, Cloudflare is the #1 provider in 44% of the top 1000 networks:

Our P95 TCP Connection time has been trending down since November, and we are consistently 50ms faster at P95 than our closest competitor (Amazon CloudFront):

Connect time comparisons between providers at 50th and 95th percentile
	P50 Connect (ms)	P95 Connect (ms)
Cloudflare	130	579
Amazon	145	637
Google	190	772
Akamai	195	774
Fastly	189	734

These graphs show that day over day, Cloudflare was consistently the fastest provider. They also show the gaps between Cloudflare and the other competitors. When you look at the 95th percentile, Cloudflare is almost 200ms faster than Akamai across the world for connect times. This shows that our network reaches more places and allows users to get their content faster than Akamai on a consistent basis.

When we aggregate this data over the whole time period, Cloudflare is the fastest in the most networks. For that whole time span of November 18, 2023, to February 15, 2024, Cloudflare was number 1 in 73% of networks for mean TCP connection time:

Looking at a map plotting by 95th percentile TCP connect time, Cloudflare is the fastest in the most countries, and you can see this by the fact that most of the map is orange:

For comparison, here’s what the map looked like in September 2023:

These numbers show that we’re reducing the overall TCP connection time around the world while simultaneously staying ahead of the competition. Let’s talk about how we get these numbers and what we’re doing to make you even faster.

Measuring What Matters

As a quick reminder, here’s how we get the data for our measurements: when users receive a Cloudflare-branded error page, we use Real User Measurements (RUM) and fetch a small file from Cloudflare, Akamai, Amazon CloudFront, Fastly, and Google Cloud CDN. Browsers around the world report the performance of those providers from the perspective of the end-user network they are on. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post.

Using the RUM data, we measure various performance metrics, such as TCP Connection Time, Time to First Byte (TTFB), and Time to Last Byte (TTLB), for ourselves and other providers.

If we only collect data from a browser when we return an error page, you could see how variable the data can get: if one network or website is having a problem in a certain country, that country could overreport, meaning those networks would be more highly weighted in the calculations because more users reported from that network during a given time period.

For example, if a lot of users connecting over a small Brazilian network were generating reports because their websites were throwing errors more frequently, that could make this small network look a lot bigger to us. This small network in Brazil could have as many reports as Claro, a major network in the region, despite them being totally different when you look at the number of subscribers. If we only look at the networks that report to us the most, it could cause smaller networks with fewer subscribers to be treated as more important because of point-in-time error conditions.

This phenomenon could cause the networks we look at to change week over week. Going back to the Brazil example, if the website that was throwing a bunch of errors fixed their problem, and we no longer saw measurements from that network, they may not show up as a “most reported network” depending on when we look at the data. This means that the networks we look at to consider where we are fastest are dependent on which networks are sending us the most reports at any given time, which is not optimal if we’re trying to get faster in these networks. We need to be able to get a consistent signal on these networks to understand where we’re faster and where we’re not.

We’ve addressed this issue by creating a fixed list of the networks we want to look at. We did this by looking at public stats on user population by network and then comparing that with our sample sizes by network until we identified the 1000 networks we want to examine. This ensures that day over day, the networks we look at are the same.

Now let’s talk about what makes us faster in more places than other networks: HTTP/3.

Blazing fast speeds with HTTP/3

One reason why Cloudflare is the fastest in the most networks is because we’ve been leading the charge with adoption and usage of HTTP/3 on our platform. HTTP/3 allows for faster connectivity behavior which means we can get connections established faster and get data flowing. HTTP/3 is currently used by around 31% of Internet traffic:

To show that HTTP/3 improves connection times, we looked at two different Cloudflare endpoints that these tests ran against: one with HTTP/3 enabled and one with HTTP/3 disabled. The performance difference between the two is night and day. Here’s a table showing the performance difference for 95th percentile connect time between Cloudflare zones when one zone has HTTP/3 enabled:

	P50 connect (ms)	P95 connect (ms)
Cloudflare HTTP/3	130	579
Cloudflare non-HTTP/3	174	695

At P95, Cloudflare is 116 ms faster for connection times when HTTP/3 is enabled. This performance gain helps us be the fastest in the most networks.

But why does HTTP/3 help make us faster? HTTP/3 allows for faster connection setup times, which lets us take greater advantage of our global network footprint to be the fastest in the most networks. HTTP/3 is built on top of the QUIC protocol, which multiplexes UDP packets to allow for parallel streams to be sent at the same time. This means that TLS encryption can happen in parallel with connection establishment, shortening the amount of time that is needed to set up a secure connection. Paired with Cloudflare’s network that is incredibly close to end-users, this makes for significant latency reductions on user Connect times. All major browsers have HTTP/3 enabled by default, so you too can realize these latency improvements by enabling HTTP/3 on your website today.

What’s next

We’re sharing our updates on our journey to become #1 everywhere so that you can see what goes into running the fastest network in the world. From here, our plan is the same as always: identify where we’re slower, fix it, and then tell you how we’ve gotten faster.

Network performance update: Birthday Week 2023

2023-09-29 David Tuber

Post Syndicated from David Tuber original http://blog.cloudflare.com/network-performance-update-birthday-week-2023/

Network performance update: Birthday Week 2023

In this post we are going to share the most recent updates since our last post in June, and tell you about our tools and processes that we use to monitor and improve our network performance.

How we stack up

Since June 2021, we’ve been taking a close look at every single network and taking actions for the specific networks where we have some room for improvement. Cloudflare was already the fastest provider for most of the networks around the world (we define a network as country and AS number pair). Taking a closer look at the numbers; in July 2022, Cloudflare was ranked #1 in 33% of the networks and was within 2 ms (95th percentile TCP Connection Time) or 5% of the #1 provider for 8% of the networks that we measured. For reference, our closest competitor on that front was the fastest for 20% of networks.

As of August 30, 2023, Cloudflare is the fastest provider for 44% of networks—and was within 2 ms (95th percentile TCP Connection Time) or 5% of the fastest provider for 10% of the networks that we measured—whereas our closest competitor is now the fastest for 19% of networks.

Below is the change in percentage of networks in which each provider is the fastest plotted over time.

Cloudflare is maintaining our steady growth in the percentage of networks where we’re the fastest. Despite the slight tick down the past couple of months, the trendline is still positive and with a higher rate of increase than other networks.

Now that we’ve reviewed how we stack up compared to other networks, let’s dig a little more into the other metrics we use to make us the fastest.

Our tooling

To provide insight into network performance, we use Real User Measurements (RUM) and fetch a small file from Cloudflare, Akamai, Amazon CloudFront, Fastly and Google Cloud CDN. Browsers around the world report the performance of those providers from the perspective of the end-user network they are on. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post here.

Using the RUM data, we are able to measure various performance metrics, such as TCP Connection Time, Time to First Byte (TTFB), Time to Last Byte (TTLB), for ourselves and other networks.

Let’s take a look at some of the metrics we monitor and what’s changed since our last blog in June.

The first metric we closely monitor is the percent of networks that we are ranked #1 in terms of TCP Connection Time. That's a key performance indicator that we evaluate ourselves against. This first line of the table shows that Cloudflare was ranked #1 in 45% of networks in June 2023 and 44% in August 2023. Here’s the full picture of how we looked in June versus how we look today.

Cloudflare’s rank by TCP connection time	% of networks in June 2023	% of networks in August 2023
1	45	44
2	26	24
3	16	16
4	9	10
5	4	6

Overall, these metrics align with what we saw above: Cloudflare is still the fastest provider in the most last mile networks, and while there has been slight changes in the month-to-month fluctuations, the overall trend shows us as being the fastest.

The second metric we monitor is our overall performance in each country. This gives us visibility into the countries or regions that we need to pay closer attention to and take action towards improving our performance. Those actions will be listed later. Orange indicates the countries that Cloudflare is the fastest provider based on the TCP Connection Time. Here’s how we look as of September 2023:

For comparison, this is what that map looks like from June 2023:

We’ve become faster in Iran and Paraguay, and in the few cases where we are no longer number 1, we are within 2ms of the fastest provider. In Brazil and Norway for example, we trail Fastly by only 1ms. In various countries in Africa, Amazon CloudFront pulled ahead but only by 2ms. We aim to fix that in the coming weeks and months and return to the #1 spot there also.

The third set of metrics we use are TCP Connection Time and TTLB. The number of networks where we are #1 in terms of 95th percentile TCP Connection Time is one of our key performance indicators. We actively monitor and work on improving that metric so that we are #1 in the most metrics for 95th percentile TCP Connection Time. For September 2023, we are still #1 in the most networks for TCP Connection Time, more than double the next best provider.

Provider	# of networks where the provider is fastest for 95th percentile TCP connection time
Cloudflare	826
Google	392
Fastly	348
Cloudfront	337
Akamai	52

The way we achieve these great results is by having our engineering teams constantly investigate the underlying reasons for degraded performance if there are any, and we track open work items until they are resolved.

What’s next

Network performance update: Speed Week 2023

2023-06-22 Onur Karaagaoglu

Post Syndicated from Onur Karaagaoglu original http://blog.cloudflare.com/speed-week-2023-network-performance-update/

Network performance update: Speed Week 2023

We constantly measure our own and other networks' performance, and look for ways to improve our performance; and share our results.

In this post we are going to share the most recent updates, and tell you about our tools and processes that we use to monitor and improve our network performance.

First, the results

In July, 2022, we started taking a more granular look down to every single network and taking actions for the specific networks where we have some room for improvement. Cloudflare was already the fastest provider for most of the networks around the world (we define a network as country and AS number pair). Taking a closer look at the numbers, Cloudflare was ranked #1 in 33% of the networks and was within 2 ms or 5% of the #1 provider for 8% of the networks that we measured in terms of the 95th percentile TCP Connection Time. For reference, our closest competitor on that front was the fastest for 20% of networks.

As of May 31, 2023 those numbers have improved significantly. Today, Cloudflare is the fastest provider for 46% of networks—and was within 2 ms (95th percentile TCP Connection Time) or 5% of the fastest provider for 10% of the networks that we measured—whereas our closest competitor is now the fastest for 18% of networks.

Below is the change in percentage of networks that each provider is the fastest over time for Cloudflare and other services.

Our tooling and process

We use Real User Measurements (RUM) and fetch a small file from Cloudflare, Akamai, CloudFront, Fastly and Google Cloud CDN. Browsers around the world report the performance of those providers from the perspective of the end-user network they are on. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post here.

Using the RUM data, we are able to measure various performance metrics, such as TCP Connection Time, Time to First Byte (TTFB), Time to Last Byte (TTLB), for ourselves and other networks.

One of the most important tools that we use for measuring and improving our performance is what we call Performance Benchmarks Dashboard. That's the dashboard where we can analyze the data that we collect in different dimensions.

Here are the metrics that we monitor based on some of the dimensions.

The first metric we closely monitor is the percent of networks that we are ranked #1 in terms of TCP Connection Time. That's a key performance indicator that we evaluate ourselves against.

Using all the metrics listed above, the Performance Benchmarks Dashboard helps us to find networks where we can improve our performance. Our engineering teams monitor that dashboard and investigate the underlying reasons for degraded performance if there are any and the action items are displayed on the dashboard until they are resolved.

Once we identify a particular network to improve, we investigate the root cause and document the action items to improve our performance. Those actions generally fall under three categories.

The first category is establishing peering with that network in a specific location so that users can take the optimum path. That’s a critical component of a better Internet! Here is our more detailed blog post about that from earlier this week.

The second category is expanding our compute capacity in a specific data center so that we can serve the users at the closest data center.

And finally, we apply traffic engineering actions to make sure that the network is served in the optimum way. Traffic engineering actions are generally manual configurations that we apply, in case the path that’s chosen by the routing protocols is not the most performant path.

What’s next

The data we collect gives us a granular view of every network that connects to Cloudflare and we constantly optimize our infrastructure to improve Cloudflare’s performance. We won’t rest until we’re #1 everywhere.

Spotlight on Zero Trust: We’re fastest and here’s the proof

2023-06-21 David Tuber

Post Syndicated from David Tuber original http://blog.cloudflare.com/spotlight-on-zero-trust/

Spotlight on Zero Trust: We're fastest and here's the proof

In January and in March we posted blogs outlining how Cloudflare performed against others in Zero Trust. The conclusion in both cases was that Cloudflare was faster than Zscaler and Netskope in a variety of Zero Trust scenarios. For Speed Week, we’re bringing back these tests and upping the ante: we’re testing more providers against more public Internet endpoints in more regions than we have in the past.

For these tests, we tested three Zero Trust scenarios: Secure Web Gateway (SWG), Zero Trust Network Access (ZTNA), and Remote Browser Isolation (RBI). We tested against three competitors: Zscaler, Netskope, and Palo Alto Networks. We tested these scenarios from 12 regions around the world, up from the four we’d previously tested with. The results are that Cloudflare is the fastest Secure Web Gateway in 42% of testing scenarios, the most of any provider. Cloudflare is 46% faster than Zscaler, 56% faster than Netskope, and 10% faster than Palo Alto for ZTNA, and 64% faster than Zscaler for RBI scenarios.

In this blog, we’ll provide a refresher on why performance matters, do a deep dive on how we’re faster for each scenario, and we’ll talk about how we measured performance for each product.

Performance is a threat vector

Performance in Zero Trust matters; when Zero Trust performs poorly, users disable it, opening organizations to risk. Zero Trust services should be unobtrusive when the services become noticeable they prevent users from getting their job done.

Zero Trust services may have lots of bells and whistles that help protect customers, but none of that matters if employees can’t use the services to do their job quickly and efficiently. Fast performance helps drive adoption and makes security feel transparent to the end users. At Cloudflare, we prioritize making our products fast and frictionless, and the results speak for themselves. So now let’s turn it over to the results, starting with our secure web gateway.

Cloudflare Gateway: security at the Internet

A secure web gateway needs to be fast because it acts as a funnel for all of an organization’s Internet-bound traffic. If a secure web gateway is slow, then any traffic from users out to the Internet will be slow. If traffic out to the Internet is slow, users may see web pages load slowly, video calls experience jitter or loss, or generally unable to do their jobs. Users may decide to turn off the gateway, putting the organization at risk of attack.

In addition to being close to users, a performant web gateway needs to also be well-peered with the rest of the Internet to avoid slow paths out to websites users want to access. Many websites use CDNs to accelerate their content and provide a better experience. These CDNs are often well-peered and embedded in last mile networks. But traffic through a secure web gateway follows a forward proxy path: users connect to the proxy, and the proxy connects to the websites users are trying to access. If that proxy isn’t as well-peered as the destination websites are, the user traffic could travel farther to get to the proxy than it would have needed to if it was just going to the website itself, creating a hairpin, as seen in the diagram below:

A well-connected proxy ensures that the user traffic travels less distance making it as fast as possible.

To compare secure web gateway products, we pitted the Cloudflare Gateway and WARP client against Zscaler, Netskope, and Palo Alto which all have products that perform the same functions. Cloudflare users benefit from Gateway and Cloudflare’s network being embedded deep into last mile networks close to users, being peered with over 12,000 networks. That heightened connectivity shows because Cloudflare Gateway is the fastest network in 42% of tested scenarios:

Number of testing scenarios where each provider is fastest for 95th percentile HTTP Response time (higher is better)
Provider	Scenarios where this provider is fastest
Cloudflare	48
Zscaler	14
Netskope	10
Palo Alto Networks	42

This data shows that we are faster to more websites from more places than any of our competitors. To measure this, we look at the 95th percentile HTTP response time: how long it takes for a user to go through the proxy, have the proxy make a request to a website on the Internet, and finally return the response. This measurement is important because it’s an accurate representation of what users see. When we look at the 95th percentile across all tests, we see that Cloudflare is 2.5% faster than Palo Alto Networks, 13% faster than Zscaler, and 6.5% faster than Netskope.

95th percentile HTTP response across all tests
Provider	95th percentile response (ms)
Cloudflare	515
Zscaler	595
Netskope	550
Palo Alto Networks	529

Cloudflare wins out here because Cloudflare’s exceptional peering allows us to succeed in places where others were not able to succeed. We are able to get locally peered in hard-to-reach places on the globe, giving us an edge. For example, take a look at how Cloudflare performs against the others in Australia, where we are 30% faster than the next fastest provider:

Cloudflare establishes great peering relationships in countries around the world: in Australia we are locally peered with all of the major Australian Internet providers, and as such we are able to provide a fast experience to many users around the world. Globally, we are peered with over 12,000 networks, getting as close to end users as we can to shorten the time requests spend on the public Internet. This work has previously allowed us to deliver content quickly to users, but in a Zero Trust world, it shortens the path users take to get to their SWG, meaning they can quickly get to the services they need.

Previously when we performed these tests, we only tested from a single Azure region to five websites. Existing testing frameworks like Catchpoint are unsuitable for this task because performance testing requires that you run the SWG client on the testing endpoint. We also needed to make sure that all of the tests are running on similar machines in the same places to measure performance as well as possible. This allows us to measure the end-to-end responses coming from the same location where both test environments are running.

In our testing configuration for this round of evaluations, we put four VMs in 12 cloud regions side by side: one running Cloudflare WARP connecting to our gateway, one running ZIA, one running Netskope, and one running Palo Alto Networks. These VMs made requests every five minutes to the 11 different websites mentioned below and logged the HTTP browser timings for how long each request took. Based on this, we are able to get a user-facing view of performance that is meaningful. Here is a full matrix of locations that we tested from, what websites we tested against, and which provider was faster:

	Endpoints
SWG Regions	Shopify	Walmart	Zendesk	ServiceNow	Azure Site	Slack	Zoom	Box	M365	GitHub	Bitbucket
East US	Cloudflare	Cloudflare	Palo Alto Networks	Cloudflare	Palo Alto Networks	Cloudflare	Palo Alto Networks	Cloudflare
West US	Palo Alto Networks	Palo Alto Networks	Cloudflare	Cloudflare	Palo Alto Networks	Cloudflare	Palo Alto Networks	Cloudflare
South Central US	Cloudflare	Cloudflare	Palo Alto Networks	Cloudflare	Palo Alto Networks	Cloudflare	Palo Alto Networks	Cloudflare
Brazil South	Cloudflare	Palo Alto Networks	Palo Alto Networks	Palo Alto Networks	Zscaler	Zscaler	Zscaler	Palo Alto Networks	Cloudflare	Palo Alto Networks	Palo Alto Networks
UK South	Cloudflare	Palo Alto Networks	Palo Alto Networks	Palo Alto Networks	Palo Alto Networks	Palo Alto Networks	Palo Alto Networks	Cloudflare	Palo Alto Networks	Palo Alto Networks	Palo Alto Networks
Central India	Cloudflare	Cloudflare	Cloudflare	Palo Alto Networks	Palo Alto Networks	Cloudflare	Cloudflare	Cloudflare
Southeast Asia	Cloudflare	Cloudflare	Cloudflare	Cloudflare	Palo Alto Networks	Cloudflare	Cloudflare	Cloudflare
Canada Central	Cloudflare	Cloudflare	Palo Alto Networks	Cloudflare	Cloudflare	Palo Alto Networks	Palo Alto Networks	Palo Alto Networks	Zscaler	Cloudflare	Zscaler
Switzerland North	netskope	Zscaler	Zscaler	Cloudflare	netskope	netskope	netskope	netskope	Cloudflare	Cloudflare	netskope
Australia East	Cloudflare	Cloudflare	netskope	Cloudflare	Cloudflare	Cloudflare	Cloudflare	Cloudflare
UAE Dubai	Zscaler	Zscaler	Cloudflare	Cloudflare	Zscaler	netskope	Palo Alto Networks	Zscaler	Zscaler	netskope	netskope
South Africa North	Palo Alto Networks	Palo Alto Networks	Palo Alto Networks	Zscaler	Palo Alto Networks	Palo Alto Networks	Palo Alto Networks	Palo Alto Networks	Zscaler	Palo Alto Networks	Palo Alto Networks

Blank cells indicate that tests to that particular website did not report accurate results or experienced failures for over 50% of the testing period. Based on this data, Cloudflare is generally faster, but we’re not as fast as we’d like to be. There are still some areas where we need to improve, specifically in South Africa, UAE, and Brazil. By Birthday Week in September, we want to be the fastest to all of these websites in each of these regions, which will bring our number up from fastest in 54% of tests to fastest in 79% of tests.

To summarize, Cloudflare’s Gateway is still the fastest SWG on the Internet. But Zero Trust isn’t all about SWG. Let’s talk about how Cloudflare performs in Zero Trust Network Access scenarios.

Instant (Zero Trust) access

Access control needs to be seamless and transparent to the user: the best compliment for a Zero Trust solution is for employees to barely notice it’s there. Services like Cloudflare Access protect applications over the public Internet, allowing for role-based authentication access instead of relying on things like a VPN to restrict and secure applications. This form of access management is more secure, but with a performant ZTNA service, it can even be faster.

Cloudflare outperforms our competitors in this space, being 46% faster than Zscaler, 56% faster than Netskope, and 10% faster than Palo Alto Networks:

Zero Trust Network Access P95 HTTP Response times
Provider	P95 HTTP response (ms)
Cloudflare	1252
Zscaler	2388
Netskope	2974
Palo Alto Networks	1471

For this test, we created applications hosted in three different clouds in 12 different locations: AWS, GCP, and Azure. However, it should be noted that Palo Alto Networks was the exception, as we were only able to measure them using applications hosted in one cloud from two regions due to logistical challenges with setting up testing: US East and Singapore.

For each of these applications, we created tests from Catchpoint that accessed the application from 400 locations around the world. Each of these Catchpoint nodes attempted two actions:

New Session: log into an application and receive an authentication token
Existing Session: refresh the page and log in passing the previously obtained credentials

We like to measure these scenarios separately, because when we look at 95th percentile values, we would almost always be looking at new sessions if we combined new and existing sessions together. For the sake of completeness though, we will also show the 95th percentile latency of both new and existing sessions combined.

Cloudflare was faster in both US East and Singapore, but let’s spotlight a couple of regions to delve into. Let’s take a look at a region where resources are heavily interconnected equally across competitors: US East, specifically Ashburn, Virginia.

In Ashburn, Virginia, Cloudflare handily beats Zscaler and Netskope for ZTNA 95th percentile HTTP Response:

95th percentile HTTP Response times (ms) for applications hosted in Ashburn, VA
AWS East US	Total (ms)	New Sessions (ms)	Existing Sessions (ms)
Cloudflare	2849	1749	1353
Zscaler	5340	2953	2491
Netskope	6513	3748	2897
Palo Alto Networks
Azure East US
Cloudflare	1692	989	1169
Zscaler	5403	2951	2412
Netskope	6601	3805	2964
Palo Alto Networks
GCP East US
Cloudflare	2811	1615	1320
Zscaler
Netskope	6694	3819	3023
Palo Alto Networks	2258	894	1464

You might notice that Palo Alto Networks looks to come out ahead of Cloudflare for existing sessions (and therefore for overall 95th percentile). But these numbers are misleading because Palo Alto Networks’ ZTNA behavior is slightly different than ours, Zscaler’s, or Netskope’s. When they perform a new session, it does a full connection intercept and returns a response from its processors instead of directing users to the login page of the application they are trying to access.

This means that Palo Alto Networks' new session response times don’t actually measure the end-to-end latency we’re looking for. Because of this, their numbers for new session latency and total session latency are misleading, meaning we can only meaningfully compare ourselves to them for existing session latency. When we look at existing sessions, when Palo Alto Networks acts as a pass-through, Cloudflare still comes out ahead by 10%.

This is true in Singapore as well, where Cloudflare is 50% faster than Zscaler and Netskope, and also 10% faster than Palo Alto Networks for Existing Sessions:

95th percentile HTTP Response times (ms) for applications hosted in Singapore
AWS Singapore	Total (ms)	New Sessions (ms)	Existing Sessions (ms)
Cloudflare	2748	1568	1310
Zscaler	5349	3033	2500
Netskope	6402	3598	2990
Palo Alto Networks
Azure Singapore
Cloudflare	1831	1022	1181
Zscaler	5699	3037	2577
Netskope	6722	3834	3040
Palo Alto Networks
GCP Singapore
Cloudflare	2820	1641	1355
Zscaler	5499	3037	2412
Netskope	6525	3713	2992
Palo Alto Networks	2293	922	1476

One critique of this data could be that we’re aggregating the times of all Catchpoint nodes together at P95, and we’re not looking at the 95th percentile of Catchpoint nodes in the same region as the application. We looked at that, too, and Cloudflare’s ZTNA performance is still better. Looking at only North America-based Catchpoint nodes, Cloudflare performs 50% better than Netskope, 40% better than Zscaler, and 10% better than Palo Alto Networks at P95 for warm connections:

Zero Trust Network Access 95th percentile HTTP Response times for warm connections with testing locations in North America
Provider	P95 HTTP response (ms)
Cloudflare	810
Zscaler	1290
Netskope	1351
Palo Alto Networks	871

Finally, one thing we wanted to show about our ZTNA performance was how well Cloudflare performed per cloud per region. This below chart shows the matrix of cloud providers and tested regions:

Fastest ZTNA provider in each cloud provider and region by 95th percentile HTTP Response
	AWS	Azure	GCP
Australia East	Cloudflare	Cloudflare	Cloudflare
Brazil South	Cloudflare	Cloudflare	N/A
Canada Central	Cloudflare	Cloudflare	Cloudflare
Central India	Cloudflare	Cloudflare	Cloudflare
East US	Cloudflare	Cloudflare	Cloudflare
South Africa North	Cloudflare	Cloudflare	N/A
South Central US	N/A	Cloudflare	Zscaler
Southeast Asia	Cloudflare	Cloudflare	Cloudflare
Switzerland North	N/A	N/A	Cloudflare
UAE Dubai	Cloudflare	Cloudflare	Cloudflare
UK South	Cloudflare	Cloudflare	netskope
West US	Cloudflare	Cloudflare	N/A

There were some VMs in some clouds that malfunctioned and didn’t report accurate data. But out of 30 available cloud regions where we had accurate data, Cloudflare was the fastest ZT provider in 28 of them, meaning we were faster in 93% of tested cloud regions.

To summarize, Cloudflare also provides the best experience when evaluating Zero Trust Network Access. But what about another piece of the puzzle: Remote Browser Isolation (RBI)?

Remote Browser Isolation: a secure browser hosted in the cloud

Remote browser isolation products have a very strong dependency on the public Internet: if your connection to your browser isolation product isn’t good, then your browser experience will feel weird and slow. Remote browser isolation is extraordinarily dependent on performance to feel smooth and seamless to the users: if everything is fast as it should be, then users shouldn’t even notice that they’re using browser isolation.

For this test, we’re again pitting Cloudflare against Zscaler. While Netskope does have an RBI product, we were unable to test it due to it requiring a SWG client, meaning we would be unable to get full fidelity of testing locations like we would when testing Cloudflare and Zscaler. Our tests showed that Cloudflare is 64% faster than Zscaler for remote browsing scenarios: Here’s a matrix of fastest provider per cloud per region for our RBI tests:

Fastest RBI provider in each cloud provider and region by 95th percentile HTTP Response
	AWS	Azure	GCP
Australia East	Cloudflare	Cloudflare	Cloudflare
Brazil South	Cloudflare	Cloudflare	Cloudflare
Canada Central	Cloudflare	Cloudflare	Cloudflare
Central India	Cloudflare	Cloudflare	Cloudflare
East US	Cloudflare	Cloudflare	Cloudflare
South Africa North	Cloudflare	Cloudflare
South Central US		Cloudflare	Cloudflare
Southeast Asia	Cloudflare	Cloudflare	Cloudflare
Switzerland North	Cloudflare	Cloudflare	Cloudflare
UAE Dubai	Cloudflare	Cloudflare	Cloudflare
UK South	Cloudflare	Cloudflare	Cloudflare
West US	Cloudflare	Cloudflare	Cloudflare

This chart shows the results of all of the tests run against Cloudflare and Zscaler to applications hosted on three different clouds in 12 different locations from the same 400 Catchpoint nodes as the ZTNA tests. In every scenario, Cloudflare was faster. In fact, no test against a Cloudflare-protected endpoint had a 95th percentile HTTP Response of above 2105 ms, while no Zscaler-protected endpoint had a 95th percentile HTTP response of below 5000 ms.

To get this data, we leveraged the same VMs to host applications accessed through RBI services. Each Catchpoint node would attempt to log into the application through RBI, receive authentication credentials, and then try to access the page by passing the credentials. We look at the same new and existing sessions that we do for ZTNA, and Cloudflare is faster in both new sessions and existing session scenarios as well.

Gotta go fast(er)

Our Zero Trust customers want us to be fast not because they want the fastest Internet access, but because they want to know that employee productivity won’t be impacted by switching to Cloudflare. That doesn’t necessarily mean that the most important thing for us is being faster than our competitors, although we are. The most important thing for us is improving our experience so that our users feel comfortable knowing we take their experience seriously. When we put out new numbers for Birthday Week in September and we’re faster than we were before, it won’t mean that we just made the numbers go up: it means that we are constantly evaluating and improving our service to provide the best experience for our customers. We care more that our customers in UAE have an improved experience with Office365 as opposed to beating a competitor in a test. We show these numbers so that we can show you that we take performance seriously, and we’re committed to providing the best experience for you, wherever you are.

Developer Week Performance Update: Spotlight on R2

2023-05-19 David Tuber

Post Syndicated from David Tuber original http://blog.cloudflare.com/r2-is-faster-than-s3/

Developer Week Performance Update: Spotlight on R2

For developers, performance is everything. If your app is slow, it will get outclassed and no one will use it. In order for your application to be fast, every underlying component and system needs to be as performant as possible. In the past, we’ve shown how our network helps make your apps faster, even in remote places. We’ve focused on how Workers provides the fastest compute, even in regions that are really far away from traditional cloud datacenters.

For Developer Week 2023, we’re going to be looking at one of the newest Cloudflare developer offerings and how it compares to an alternative when retrieving assets from buckets: R2 versus Amazon Simple Storage Service (S3). Spoiler alert: we’re faster than S3 when serving media content via public access. Our test showed that on average, Cloudflare R2 was 20-40% faster than Amazon S3. For this test, we used 95th percentile Response tests, which measures the time it takes for a user to make a request to the bucket, and get the entirety of the response. This test was designed with the goal of measuring end-user performance when accessing content in public buckets.

In this blog we’re going to talk about why your object store needs to be fast, how much faster R2 is, why that is, and how we measured it.

Storage performance is user-facing

Storage performance is critical to a snappy user experience. Storage services are used for many scenarios that directly impact the end-user experience, particularly in the case where the data stored doesn’t end up in a cache (uncacheable or dynamic content). Compute and database services can rely on storage services, so if they’re not fast, the services using them won’t be either. Even the basic content fetching scenarios that use a CDN require storage services to be fast if the asset is either uncacheable or was not cached on the request: if the storage service is slow or far away, users will be impacted by that performance. And as every developer knows, nobody remembers the nine fast API calls if the last call was slow. Users don’t care about API calls, they care about overall experience. One slow API call, one slow image, one slow anything can muck up the works and provide users with a bad experience.

Because there are lots of different ways to use storage services, we’re going to focus on a relatively simple scenario: fetching static images from these services. Let’s talk about R2 and how it compares to one of the alternatives in this scenario: Amazon S3.

Benchmarking storage performance

When looking at uncached raw file delivery for users in North America retrieving content from a bucket in Ashburn, Virginia (US-East) and examining 95th percentile Response, R2 is 38% faster than S3:

Storage Performance: Response in North America (US-East)
	95th percentile (ms)
Cloudflare R2	1,262
Amazon S3	2,055

For content hosted in US-East (Ashburn, VA) and only looking at North America-based eyeballs, R2 beats S3 by 30% in response time. When we look at why this is the case, the answer lies in our closeness to users and highly optimized HTTP stacks. Let’s take a look at the TCP connect and SSL times for these tests, which are the times it takes to reach the storage bucket (TCP connect) and the time to complete TLS handshake (SSL time):

Storage Performance: Connect and SSL in North America (US-East)
	95th percentile connect (ms)	95th percentile SSL (ms)
Cloudflare R2	32	59
Amazon S3	78	180

Cloudflare’s cumulative connect + SSL time is almost 1/2 of Amazon’s SSL time alone. Being able to be fast on connection establishment gives us an edge right off the bat, especially in North America where cloud and storage providers tend to optimize for performance, and connect times tend to be low because ISPs have good peering with cloud and storage providers. But this isn’t just true in North America. Let’s take a look at Europe (EMEA) and Asia (APAC), where Cloudflare also beats out AWS in 95th percentile response time when we look at eyeballs in region for both regions:

Storage Performance: Response in EMEA (EU-East)
	95th percentile (ms)
Cloudflare R2	1,303
Amazon S3	1,729

Cloudflare beats Amazon by 20% in EMEA. And when you look at the SSL times, you’ll see the same trends that were present in North America: faster Connect and SSL times:

Storage Performance: Connect and SSL in EMEA (EU-East)
	95th percentile connect (ms)	95th percentile SSL (ms)
Cloudflare R2	57	94
Amazon S3	80	178

Again, the separator is how optimized Cloudflare is at setting up connections to deliver content. This is also true in APAC, where objects stored in Tokyo are served about 1.5 times faster on Cloudflare than for AWS:

Storage Performance: Response in APAC (Tokyo)
	95th percentile (ms)
Cloudflare R2	4,057
Amazon S3	6,850

Focus on cross-region

Up until this point, we’ve been looking at scenarios where users are accessing data that is stored in the same region as they are. But what if that isn’t the case? What if a user in Germany is accessing content in Ashburn? In those cases, Cloudflare also pulls ahead. This is a chart showing 95th percentile response times for cases where users outside the United States are accessing content hosted in Ashburn, VA, or US-East:

Storage Performance: Response for users outside of US connecting to US-East
	95th percentile (ms)
Cloudflare R2	3,224
Amazon S3	6,387

Cloudflare wins again, at almost 2x faster than S3 at P95. This data shows that not only do our in-region calls win out, but we win across the world. Even if you don’t have the money to buy storage everywhere in the world, R2 can still give you world-class performance because not only is R2 faster cross-region, R2’s default no-region setup ensures your data will be close to your users as often as possible.

Testing methodology

To measure these tests, we set up over 400 Catchpoint backbone nodes embedded in last mile ISPs around the world to retrieve a 1 MB file from R2 and S3 in specific locations: Ashburn, Tokyo, and Frankfurt. We recognize that many users will store larger files than the one we tested with, and we plan to test with larger files next showing that we’re faster delivering larger files as well.

We had these 400 nodes retrieve the file uncached from each storage service every 30 minutes for four days. We configured R2 to disable caching. This allows us to make sure that we aren’t reaping any benefits from our CDN pipeline and are only retrieving uncached files from the storage services.

Finally, we had to fix where the public buckets were stored in R2 to get an equivalent test compared to S3. You may notice that when configuring R2, you aren’t able to select specific datacenter locations like you can in AWS. Instead, you can provide a location hint to a general region. Cloudflare will store data anywhere in that region.

This feature is designed to make it easier for developers to deploy storage that benefits larger ranges of users as opposed to needing to know where specific datacenters are. However, that makes performance comparisons difficult, so for this test we configured R2 to store data in those specific locations (consistent with the S3 placement) on the backend as opposed to in any location in that region to ensure we would get better apples-to-apples results.

Putting the pieces together

Storage services like R2 are only part of the equation. Developers will often use storage services in conjunction with other compute services for a complete end-to-end application experience. Previously, we performed comparisons of Workers and other compute products such as Fastly’s Compute@Edge and AWS’s Lambda@Edge. We’ve rerun the numbers, and for compute times, Workers is still the fastest compute around, beating AWS Lambda@Edge and Fastly’s Compute@Edge for end-to-end performance for Rust hard tests:

Cloudflare is faster than Fastly for both JavaScript and Rust tests, while also being faster than AWS at JavaScript, which is the only test Lambda@Edge supports

To run these tests, we perform two tests against each provider: a complex JavaScript function, and a complex Rust function. These tests run as part of our network benchmarking tests that run from real user browsers around the world. For a more in-depth look at how we collect this data for Workers scenarios, check our previous Developer Week posts.

Here are the functions for both complex functions in JavaScript and Rust:

JavaScript complex function:

function testHardBusyLoop() {
  let value = 0;
  let offset = Date.now();

  for (let n = 0; n < 15000; n++) {
    value += Math.floor(Math.abs(Math.sin(offset + n)) * 10);
  }

  return value;
}

Rust complex function:

fn test_hard_busy_loop() -> i32 {
  let mut value = 0;
  let offset = Date::now().as_millis();

  for n in 0..15000 {
    value += (((offset + n) as f64).sin().abs() * 10.0) as i32;
  }

  value
}

By combining Workers and R2, you get a much simpler developer experience and a much faster user experience than you would get with any of the competition.

Storage, sped up and simplified

R2 is a unique storage service that doesn’t require the knowledge of specific locations, has a more global footprint, and integrates easily with existing Cloudflare Developer Platform products for a simple, performant experience for both users and developers. However, because it’s built on top of Cloudflare, it comes with performance baked in, and that’s evidenced by R2 being faster than its primary alternatives.

At Cloudflare, we believe that developers shouldn’t have to think about performance, you have so many other things to think about. By choosing Cloudflare, you should be able to rest easy knowing that your application will be faster because it’s built on Cloudflare, not because you’re manipulating Cloudflare to be faster for you. And by using R2 and the rest of our developer platform, we’re happy to say that we’re delivering on our vision to make performance easy for you.

Cloudflare Access is the fastest Zero Trust proxy

2023-03-17 David Tuber

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-security-week-2023/

Cloudflare Access is the fastest Zero Trust proxy

During every Innovation Week, Cloudflare looks at our network’s performance versus our competitors. In past weeks, we’ve focused on how much faster we are compared to reverse proxies like Akamai, or platforms that sell serverless compute that compares to our Supercloud, like Fastly and AWS. This week, we’d like to provide an update on how we compare to other reverse proxies as well as an update to our application services security product comparison against Zscaler and Netskope. This product is part of our Zero Trust platform, which helps secure applications and Internet experiences out to the public Internet, as opposed to our reverse proxy which protects your websites from outside users.

In addition to our previous post showing how our Zero Trust platform compared against Zscaler, we also have previously shared extensive network benchmarking results for reverse proxies from 3,000 last mile networks around the world. It’s been a while since we’ve shown you our progress towards being #1 in every last mile network. We want to show that data as well as revisiting our series of tests comparing Cloudflare Access to Zscaler Private Access and Netskope Private Access. For our overall network tests, Cloudflare is #1 in 47% of the top 3,000 most reported networks. For our application security tests, Cloudflare is 50% faster than Zscaler and 75% faster than Netskope.

In this blog we’re going to talk about why performance matters for our products, do a deep dive on what we’re measuring to show that we’re faster, and we’ll talk about how we measured performance for each product.

Why does performance matter?

We talked about it in our last blog, but performance matters because it impacts your employees’ experience and their ability to get their job done. Whether it’s accessing services through access control products, connecting out to the public Internet through a Secure Web Gateway, or securing risky external sites through Remote Browser Isolation, all of these experiences need to be frictionless.

A quick summary: say Bob at Acme Corporation is connecting from Johannesburg out to Slack or Zoom to get some work done. If Acme’s Secure Web Gateway is located far away from Bob in London, then Bob’s traffic may go out of Johannesburg to London, and then back into Johannesburg to reach his email. If Bob tries to do something like a voice call on Slack or Zoom, his performance may be painfully slow as he waits for his emails to send and receive. Zoom and Slack both recommend low latency for optimal performance. That extra hop Bob has to take through his gateway could decrease throughput and increase his latency, giving Bob a bad experience.

As we’ve discussed before, if these products or experiences are slow, then something worse might happen than your users complaining: they may find ways to turn off the products or bypass them, which puts your company at risk. A Zero Trust product suite is completely ineffective if no one is using it because it’s slow. Ensuring Zero Trust is fast is critical to the effectiveness of a Zero Trust solution: employees won’t want to turn it off and put themselves at risk if they barely know it’s there at all.

Much like Zscaler, Netskope may outperform many older, antiquated solutions, but their network still fails to measure up to a highly performant, optimized network like Cloudflare’s. We’ve tested all of our Zero Trust products against Netskope equivalents, and we’re even bringing back Zscaler to show you how Zscaler compares against them as well. So let’s dig into the data and show you how and why we’re faster in a critical Zero Trust scenario, comparing Cloudflare Access to Zscaler Private Access and Netskope Private Access.

Cloudflare Access: the fastest Zero Trust proxy

Access control needs to be seamless and transparent to the user: the best compliment for a Zero Trust solution is employees barely notice it’s there. These services allow users to cache authentication information on the provider network, ensuring applications can be accessed securely and quickly to give users that seamless experience they want. So having a network that minimizes the number of logins required while also reducing the latency of your application requests will help keep your Internet experience snappy and reactive.

Cloudflare Access does all that 75% faster than Netskope and 50% faster than Zscaler, ensuring that no matter where you are in the world, you’ll get a fast, secure application experience:

Cloudflare measured application access across ourselves, Zscaler and Netskope from 300 different locations around the world connecting to 6 distinct application servers in Hong Kong, Toronto, Johannesburg, São Paulo, Phoenix, and Switzerland. In each of these locations, Cloudflare’s P95 response time was faster than Zscaler and Netskope. Let’s take a look at the data when the application is hosted in Toronto, an area where Zscaler and Netskope should do well as it’s in a heavily interconnected region: North America.

ZT Access – Response time (95th Percentile) – Toronto
	95th Percentile Response (ms)
Cloudflare	2,182
Zscaler	4,071
Netskope	6,072

Cloudflare really stands out in regions with more diverse connectivity options like South America or Asia Pacific, where Zscaler compares better to Netskope than it does Cloudflare:

When we look at application servers hosted locally in South America, Cloudflare stands out:

ZT Access – Response time (95th Percentile) – South America
	95th Percentile Response (ms)
Cloudflare	2,961
Zscaler	9,271
Netskope	8,223

Cloudflare’s network shines here, allowing us to ingress connections close to the users. You can see this by looking at the Connect times in South America:

ZT Access – Connect time (95th Percentile) – South America
	95th Percentile Connect (ms)
Cloudflare	369
Zscaler	1,753
Netskope	1,160

Cloudflare’s network sets us apart here because we’re able to get users onto our network faster and find the optimal routes around the world back to the application host. We’re twice as fast as Zscaler and three times faster than Netskope because of this superpower. Across all the different tests, Cloudflare’s Connect times is consistently faster across all 300 testing nodes.

In our last blog, we looked at two distinct scenarios that need to be measured individually when we compared Cloudflare and Zscaler. The first scenario is when a user logs into their application and has to authenticate. In this case, the Zero Trust Access service will direct the user to a login page, the user will authenticate, and then be redirected to their application.

This is called a new session, because no authentication information is cached or exists on the Access network. The second scenario is called an existing session, when a user has already been authenticated and that authentication information can be cached. This scenario is usually much faster, because it doesn’t require an extra call to an identity provider to complete.

We like to measure these scenarios separately, because when we look at 95th percentile values, we would almost always be looking at new sessions if we combined new and existing sessions together. But across both scenarios, Cloudflare is consistently faster in every region. Let’s go back and look at an application hosted in Toronto, where users connecting to us connect faster than Zscaler and Netskope for both new and existing sessions.

ZT Access – Response Time (95th Percentile) – Toronto
	New Sessions (ms)	Existing Sessions (ms)
Cloudflare	1,276	1,022
Zscaler	2,415	1,797
Netskope	5,741	1,822

You can see that new sessions are generally slower as expected, but Cloudflare’s network and optimized software stack provides a consistently fast user experience. In scenarios where end-to-end connectivity can be more challenging, Cloudflare stands out even more. Let’s take a look at users in Asia connecting through to an application in Hong Kong.

ZT Access – Response Time (95th Percentile) – Hong Kong
	New Sessions (ms)	Existing Sessions (ms)
Cloudflare	2,582	2,075
Zscaler	4,956	3,617
Netskope	5,139	3,902

One interesting thing that stands out here is that while Cloudflare’s network is hyper-optimized for performance, Zscaler more closely compares to Netskope on performance than they do to Cloudflare. Netskope also performs poorly on new sessions, which indicates that their service does not react well when users are establishing new sessions.

We like to separate these new and existing sessions because it’s important to look at similar request paths to do a proper comparison. For example, if we’re comparing a request via Zscaler on an existing session and a request via Cloudflare on a new session, we could see that Cloudflare was much slower than Zscaler because of the need to authenticate. So when we contracted a third party to design these tests, we made sure that they took that into account.

For these tests, Cloudflare configured five application instances hosted in Toronto, Los Angeles, Sao Paulo, and Hong Kong. Cloudflare then used 300 different Catchpoint nodes from around the world to mimic a browser login as follows:

User connects to the application from a browser mimicked by a Catchpoint instance – new session
User authenticates against their identity provider
User accesses resource
User refreshes the browser page and tries to access the same resource but with credentials already present – existing session

This allows us to look at Cloudflare versus all the other products for application performance for both new and existing sessions, and we’ve shown that we’re faster. As we’ve mentioned, a lot of that is due to our network and how we get close to our users. So now we’re going to talk about how we compare to other large networks and how we get close to you.

Network effects make the user experience better

Getting closer to users improves the last mile Round Trip Time (RTT). As we discussed in the Access comparison, having a low RTT improves customer performance because new and existing sessions don’t have to travel very far to get to Cloudflare’s Zero Trust network. Embedding ourselves in these last mile networks helps us get closer to our users, which doesn’t just help Zero Trust performance, it helps web performance and developer performance, as we’ve discussed in prior blogs.

To quantify network performance, we have to get enough data from around the world, across all manner of different networks, comparing ourselves with other providers. We used Real User Measurements (RUM) to fetch a 100kb file from several different providers. Users around the world report the performance of different providers. The more users who report the data, the higher fidelity the signal is. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week 2021 blog post here.

We are constantly going through the process of figuring out why we were slow — and then improving. The challenges we faced were unique to each network and highlighted a variety of different issues that are prevalent on the Internet. We’re going to provide an overview of some of the efforts we use to improve our performance for our users.

But before we do, here are the results of our efforts since Developer Week 2022, the last time we showed off these numbers. Out of the top 3,000 networks in the world (by number of IPv4 addresses advertised), here’s a breakdown of the number of networks where each provider is number one in p95 TCP Connection Time, which represents the time it takes for a user on a given network to connect to the provider:

Here’s what those numbers look like as of this week, Security Week 2023:

As you can see, Cloudflare has extended its lead in being faster in more networks, while other networks that previously were faster like Akamai and Fastly lost their lead. This translates to the effects we see on the World Map. Here’s what that world map looked like in Developer Week 2022:

Here’s how that world map looks today during Security Week 2023:

As you can see, Cloudflare has gotten faster in Brazil, many countries in Africa including South Africa, Ethiopia, and Nigeria, as well as Indonesia in Asia, and Norway, Sweden, and the UK in Europe.

A lot of these countries benefited from the Edge Partner Program that we discussed in the Impact Week blog. A quick refresher: the Edge Partner Program encourages last mile ISPs to partner with Cloudflare to deploy Cloudflare locations that are embedded in the last mile ISP. This improves the last mile RTT and improves performance for things like Access. Since we last showed you this map, Cloudflare has deployed more partner locations in places like Nigeria, and Saudi Arabia, which have improved performance for users in all scenarios. Efforts like the Edge Partner Program help improve not just the Zero Trust scenarios like we described above, but also the general web browsing experience for end users who use websites protected by Cloudflare.

Next-generation performance in a Zero Trust world

In a non-Zero Trust world, you and your IT teams were the network operator — which gave you the ability to control performance. While this control was comforting, it was also a huge burden on your IT teams who had to manage middle mile connections between offices and resources. But in a Zero Trust world, your network is now… well, it’s the public Internet. This means less work for your teams — but a lot more responsibility on your Zero Trust provider, which has to manage performance for every single one of your users. The better your Zero Trust provider is at improving end-to-end performance, the better an experience your users will have and the less risk you expose yourself to. For real-time applications like authentication and secure web gateways, having a snappy user experience is critical.

A Zero Trust provider needs to not only secure your users on the public Internet, but it also needs to optimize the public Internet to make sure that your users continuously stay protected. Moving to Zero Trust doesn’t just reduce the need for corporate networks, it also allows user traffic to flow to resources more naturally. However, given your Zero Trust provider is going to be the gatekeeper for all your users and all your applications, performance is a critical aspect to evaluate to reduce friction for your users and reduce the likelihood that users will complain, be less productive, or turn the solutions off. Cloudflare is constantly improving our network to ensure that users always have the best experience, through programs like the Edge Partner Program and constantly improving our peering and interconnectivity. It’s this tireless effort that makes us the fastest Zero Trust provider.

Network Performance Update: Developer Week 2022

2022-11-18 David Tuber

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-developer-week/

Network Performance Update: Developer Week 2022

Cloudflare is building the fastest network in the world. But we don’t want you to just take our word for it. To demonstrate it, we are continuously testing ourselves versus everyone else to make sure we’re the fastest. Since it’s Developer Week, we wanted to provide an update on how our Workers products perform against the competition, as well as our overall network performance.

Earlier this year, we compared ourselves to Fastly’s Compute@Edge and overall we were faster. This time, not only did we repeat the tests, but we also added AWS Lambda@Edge to help show how we stack up against more and more competitors. The summary: we offer the fastest developer platform on the market. Let’s talk about how we build our network to help make you faster, and then we’ll get into how that translates to our developer platform.

Latest update on network performance

We have two updates on data: a general network performance update, and then data on how Workers compares with Compute@Edge and Lambda@Edge.

To quantify global network performance, we have to get enough data from around the world, across all manner of different networks, comparing ourselves with other providers. We used Real User Measurements (RUM) to fetch a 100kB file from different providers. Users around the world report the performance of different providers. The more users who report the data, the higher fidelity the signal is. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post here.

During Cloudflare One Week (June 2022), we shared that we were faster in more of the most reported networks than our competitors. Out of the top 3,000 networks in the world (by number of IPv4 addresses advertised), here’s a breakdown of the number of networks where each provider is number one in p95 TCP Connection Time, which represents the time it takes for a user on a given network to connect to the provider. This data is from Cloudflare One Week (June 2022):

Here is what the distribution looks like for the top 3,000 networks for Developer Week (November 2022):

In addition to being the fastest across popular networks, Cloudflare is also committed to being the fastest provider in every country.

Using data on the top 3,000 networks from Cloudflare One Week (June 2022), here’s what the world map looks like (Cloudflare is in orange):

And here’s what the world looks like while looking at the top 3,000 networks for Developer Week (November 2022):

Cloudflare became #1 in more countries in Europe and Asia, specifically Russia, Ukraine, Kazakhstan, India, and China, further delivering on our mission to be the fastest network in the world. So let’s talk about how that network helps power the Supercloud to be the fastest developer platform around.

How we’re comparing developer platforms

It’s been six months since we published our initial tests, but here’s a quick refresher. We make comparisons by measuring time to connect to the network, time spent completing requests, and overall time to respond. We call these numbers connect, wait, and response. We’ve chosen these numbers because they are critical components of a request that need to be as fast as possible in order for users to see a good experience. We can reduce the connect times by peering as close as possible to the users. We can reduce the wait times by optimizing code execution to be as fast as possible. If we optimize those two processes, we’ve optimized the response, which represents the end-to-end latency of a request.

Test methodology

To measure connect, wait, and response, we perform three tests against each provider: a simple no-op JavaScript function, a complex JavaScript function, and a complex Rust function. We don’t do a simple Rust function because we expect it to take almost no time at all, and we already have a baseline for end-to-end functionality in the no-op JavaScript function since many providers will often compile both down to WebAssembly.

Here are the functions for each of them:

JavaScript no-op:

async function getErrorResponse(event, message, status) {
  return new Response(message, {status: status, headers: {'Content-Type': 'text/plain'}});
}

JavaScript hard function:

function testHardBusyLoop() {
  let value = 0;
  let offset = Date.now();

  for (let n = 0; n < 15000; n++) {
    value += Math.floor(Math.abs(Math.sin(offset + n)) * 10);
  }

  return value;
}

Rust hard function:

fn test_hard_busy_loop() -> i32 {
  let mut value = 0;
  let offset = Date::now().as_millis();

  for n in 0..15000 {
    value += (((offset + n) as f64).sin().abs() * 10.0) as i32;
  }

  value
}

We’re trying to test how good each platform is at optimizing compute in addition to evaluating how close each platform is to end-users. However, for this test, we did not run a Rust test on Lambda@Edge because it did not natively support our Rust function without uploading a WASM binary that you compile yourself. Since Lambda@Edge does not have a true first-class developer platform and tooling to run Rust, we decided to exclude the Rust scenarios for Lambda@Edge. So when we compare numbers for Lambda@Edge, it will only be for the JavaScript simple and JavaScript hard tests.

Measuring Workers performance from real users

To collect data, we use two different methods: one from a third party service called Catchpoint, and a second from our own network performance benchmarking tests. First, we used Catchpoint to gather a set of data from synthetic probes. Catchpoint is an industry standard “synthetic” testing tool, and measurements collected from real users distributed around the world. Catchpoint is a monitoring platform that has around 2,000 total endpoints distributed around the world that can be configured to fetch specific resources and time for each test. Catchpoint is useful for network providers like us as it provides a consistent, repeatable way to measure end-to-end performance of a workload, and delivers a best-effort approximation for what a user sees.

Catchpoint has backbone nodes that are embedded in ISPs around the world. That means that these nodes are plugged into ISP routers just like you are, and the traffic goes through the ISP network to each endpoint they are monitoring. These can approximate a real user, but they will never truly replicate a real user. For example, the bandwidth for these nodes is 100% dedicated for platform monitoring, as opposed to your home Internet connection, where your Internet experience will be a mixed bag of different use cases, some of which won’t talk to Workers applications at all.

For this new test, we chose 300 backbone nodes that are embedded in last mile ISPs around the world. We filtered out nodes in cloud providers, or in metro areas with multiple transit options, trying to remove duplicate paths as much as possible.

We cross-checked these tests with our own data set, which is collected from users connecting to free websites when they are served 1xxx error pages, just like how we collect data for global network performance. When a user sees this error page, that page that will execute these tests as a part of rendering and upload performance metrics on these calls to Cloudflare.

We also changed our test methodology to use paid accounts for Fastly, Cloudflare, and AWS.

Workers vs Compute@Edge vs Lambda@Edge

This time, let’s start off with the response times to show how we’re doing end-to-end:

Test	95th percentile response (ms)
Cloudflare JavaScript no-op	479
Fastly JavaScript no-op	634
AWS JavaScript no-op	1,400

Cloudflare JavaScript hard	471
Fastly JavaScript hard	683
AWS JavaScript hard	1,411

Cloudflare Rust hard	472
Fastly Rust hard	638

We’re fastest in all cases. Now let’s look at connect times, which show us how fast users connect to the compute platform before doing any actual compute:

Test	95th percentile connect (ms)
Cloudflare JavaScript no-op	82
Fastly JavaScript no-op	94
AWS JavaScript no-op	295

Cloudflare JavaScript hard	82
Fastly JavaScript hard	94
AWS JavaScript hard	297

Cloudflare Rust hard	79
Fastly Rust hard	94

Note that we don’t expect these times to differ based on the code being run, but we extract them from the same set of tests, so we’ve broken them out here.

But what about wait times? Remember, wait times represent time spent computing the request, so who has optimized their platform best? Again, it’s Cloudflare, although Fastly still has a slight edge on the hard Rust test (which we plan to beat by further optimization):

Test	95th percentile wait (ms)
Cloudflare JavaScript no-op	110
Fastly JavaScript no-op	122
AWS JavaScript no-op	362

Cloudflare JavaScript hard	115
Fastly JavaScript hard	178
AWS JavaScript hard	367

Cloudflare Rust hard	125
Fastly Rust hard	122

To verify these results, we compared the Catchpoint results to our own data set. Here is the p95 TTFB for the JavaScript and Rust hard loops for Fastly, AWS, and Cloudflare from our data:

Cloudflare is faster on JavaScript and Rust calls. These numbers also back up the slight compute advantage for Fastly on Rust calls.

The big takeaway from this is that in addition to Cloudflare being faster for the time spent processing requests in nearly every test, Cloudflare’s network and performance optimizations as a whole set us apart and make our Workers platform even faster for everything. And, of course, we plan to keep it that way.

Your application, but faster

Latency is an important component of the user experience, and for developers, being able to ensure their users can do things as fast as possible is critical for the success of an application. Whether you’re building applications in Workers, D1, and R2, hosting your documentation in Pages, or even leveraging Workers as part of your SaaS platform, having your code run in the SuperCloud that is our global network will ensure that your users see the best experience they possibly can.

Our network is hyper-optimized to make your code as fast as possible. By using Cloudflare’s network to run your applications, you can focus on making the best possible application possible and rest easy knowing that Cloudflare is providing you the best user experience possible. This is because Cloudflare’s developer platform is built on top of the world’s fastest network. So go out and build your dreams, and know that we’ll make your dreams as fast as they can possibly be.

Network performance update: Cloudflare One Week June 2022

2022-06-27 David Tuber

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-cloudflare-one-week-june-2022/

Network performance update: Cloudflare One Week June 2022

Since then, we’ve expanded our testing to cover not just 1,000 but 3,000 networks, and we’ve worked to continuously improve performance, with the ultimate goal of being the fastest everywhere and an intermediate goal to grow the number of networks where we’re the fastest by at least 10% every Innovation Week. We met that goal Platform Week May 2022), and we’re carrying the work over to Cloudflare One Week (June 2022).

We’re excited to share that Cloudflare has the fastest provider in 1,290 of the top 3,000 most reported networks, up from 1,280 even one month ago during Platform Week.

Measuring what matters

The more users who report the data, the higher fidelity the signal is. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post here.

Latest data on network performance

Here’s how the breakdown of the fastest networks looked during Platform Week (May 2022):

Here’s how that graph looks now during Cloudflare One Week (June 2022):

In addition to being the fastest across popular networks, Cloudflare is also committed to being the fastest provider in every country.

Here’s how the map of the fastest provider by country looked during Platform Week (May 2022):

And here’s how that map looks during Cloudflare One Week (June 2022):

Cloudflare became faster in more Eastern European countries during this time specifically.

Network performance in a Zero Trust world

A zero trust provider needs to not only secure your users on the public Internet, but it also needs to optimize the public Internet. Moving to Zero Trust doesn’t just reduce the need for corporate networks, it also allows user traffic to flow to resources more naturally.

However, given your Zero Trust provider is going to be the gatekeeper for all your users and all your applications, performance is a critical aspect to evaluate Cloudflare is constantly improving our network to ensure that users always have the best experience, and this comes not just from routing fixes, but also through expanding peering arrangements and adding new locations.

This tireless effort helps make us faster in more networks than anyone else, and allows us to deliver all of our services with high performance that customers expect. We know many organizations are just starting their Zero Trust journey, and that a priority of that project is to improve user experience, and we’re excited to keep obsessing over the performance in our network to make sure your teams have a seamless experience in any location.

Interested in learning more about how our Zero Trust products benefit from these improvements? Check out the full roster of our announcements from Cloudflare One Week.

Network performance update: Platform Week

2022-05-16 David Tuber

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-platform-week/

Network performance update: Platform Week

In September 2021, we shared extensive benchmarking results of 1,000 networks all around the world. The results showed that on a range of tests (TCP connection time, time to first byte, time to last byte), and on different measures (p95, mean), Cloudflare was the fastest provider in 49% of networks around the world. Since then, we’ve worked to continuously improve performance, with the ultimate goal of being the fastest everywhere and an intermediate goal to grow the number of networks where we’re the fastest by at least 10% every Innovation Week. We met that goal during Security Week (March 2022), and we’re carrying the work over to Platform Week (May 2022).

We’re excited to update you on the latest results, but before we do: after running with this benchmark for nine months, we’ve also been looking for ways to improve the benchmark itself — to make it even more representative of speeds in the real world. To that end, we’re expanding our measured networks from 1,000 to 3,000, to give an even more accurate sense of real world performance across the globe.

In terms of results: using the old benchmark of 1,000 networks, we’re the fastest in 69% of them. In this new expanded view of 3,000 networks, we’re the fastest in 42% of them. We’ve demonstrated a consistent ability to improve our performance against what we measure, and we’re excited to optimize our performance and lift our ranking in these smaller networks all around the world.

In addition to sharing a general update on where our network performance stands, we’re also sharing updated performance metrics on our Workers platform (given that it’s Platform Week!). We’ve done an extensive benchmark of Cloudflare Workers vs Fastly’s Compute@Edge.

We’ve got the results below, but before we get to that, we want to spend a bit of time on our revamped measurements.

Revamped measurements

A few months ago, we discussed the performance of Cloudflare Workers, as compared to other similar offerings out there. We compared our performance to Fastly’s Compute@Edge, showing that Workers was significantly faster.

After we published our results, there were questions and suggestions on how to improve our testing methodology (including sharing more detail about where and how we ran the tests). As we re-ran tests for this iteration, we made some small changes, and also worked to address the suggestions from the community. Let’s talk about what’s changed and why.

Measuring what matters

In the process of quantifying network performance, it became clear where we needed to expand our scope in measuring performance. After Security Week, we were fastest in 71% of networks, and we decided that we wanted to expand the pool of networks from the top 1,000 most reported networks to the top 3,000 most reported networks.

We’ve shown the graph detailing the number of networks where Cloudflare is #1 in network performance, but from here on out, we will only be showing this graph in percentages of networks where we are number one out of 100%, since the denominator will be changing going forward as we set ourselves harder and harder challenges.

Benchmarking for everyone

At the time we published our earlier set of benchmarks, we had an unnecessary “no benchmarking” clause in our Terms of Service. It has since been removed. It’s been a while since we’ve worried about such things, and the clause lived past its intended life in our ToS.

We’ve done the work to show where we are the fastest provider, and it’s important that everyone else be able to validate that independently. We’re also confident that well run benchmarks will only help further improve performance for Workers, other Cloudflare products, and the Internet as a whole. Game on.

Measuring Workers performance from real users

To run our tests, we used Catchpoint, an industry standard “synthetic” testing tool, and measurements collected from real users distributed around the world. Catchpoint is a monitoring platform that has around 2,000 total endpoints distributed around the world that can be configured to fetch specific resources and time each test. Catchpoint is useful for network providers like us as it provides a consistent, repeatable way to measure end-to-end performance of a workload, and delivers a best-effort approximation for what a user sees.

Catchpoint has a series of backbone nodes that are embedded in ISPs around the world. That means that these nodes are plugged into ISP routers just like you are, and the traffic goes through the ISP network to each endpoint they are monitoring. These can approximate a real user, but they will never truly approximate a real user. For example, the bandwidth for these nodes is 100% dedicated for platform monitoring, as opposed to your home Internet connection, where your Internet experience will be a mixed bag of different use cases, some of which won’t talk to Workers applications at all.

We cross-checked these tests with our own data set, which is collected from users connecting to free websites when they are served 1xxx error pages, similar to how we collect data for global network performance. When a user sees this error page, that page will contain a call that will execute these tests and upload performance metrics on these calls to Cloudflare. Users would run these calls independently of the error page, ensuring that Cloudflare did not get a head start in these tests.

We also changed our test methodology to use paid accounts for both Fastly and Cloudflare.

Changing the numbers we look at

For this new test, we decided to look at Wait time in addition to Time to First Byte. Time to First Byte is the time it takes for a web server to establish a connection, fetch the content, and deliver the first byte of a response to a user, and Wait time is the time the client spends waiting for the server to send back that first byte after the connection is established. We are using this particular measurement to look at the actual time it takes for the server to compute the request on the machine. Wait time is a subcomponent of TTFB, and contains the machine processing time, the time it takes the server to give the response to the socket to be sent back to the user, and the time it takes the response to reach the user. There could be latency in passing the request to the socket on the machine, but we measured the actual time it took the server to send the request for all tests and found it to be zero:

So we would calculate the wait time as the amount of time spent on the box plus the time it took the server to send the packet over the wire to the client.

However, others have noted that using Time to First Byte to measure performance of serverless computing solutions could potentially be misleading because user performance measurements can be influenced by more than just the time spent computing functions on the machine. For example, things like the time to connect to the server, DNS resolution, and cache times can impact Time to First Byte. Time to connect to the server (connect) can be impacted by how well peered a network is (or how poorly). DNS resolution can be impacted by client behavior, local DNS behavior, or by the performance of the provider’s DNS. Cache times can be driven by the performance of the cache on the server itself.

We agree that it is difficult to tease out computing time from Time to First Byte. To look at that particular aspect of serverless computing, we look at Wait, which is a value that contains the least amount of variables: we aren’t caching anything during our tests, DNS isn’t part of the wait time, and the only thing impacting Wait aside from the time spent on the machine is the Connect time.

However, since we’re trying to measure the end user experience and not just the amount of time spent on a server, we want to use both the Wait and TTFB so that we can show the time spent both on the server and how that impacts end-to-end performance.

What’s in the test?

For our test this time around, we decided to measure three things: a simple JavaScript function like last time, a complex JavaScript function, and a complex Rust function. Here are the functions for each of them:

async function getErrorResponse(event, message, status) {
  return new Response(message, {status: status, headers: {'Content-Type': 'text/plain'}});
}

function testHardBusyLoop() {
  let value = 0;
  let offset = Date.now();

  for (let n = 0; n < 15000; n++) {
    value += Math.floor(Math.abs(Math.sin(offset + n)) * 10);
  }

  return value;
}

fn test_hard_busy_loop() -> i32 {
  let mut value = 0;
  let offset = Date::now().as_millis();

  for n in 0..15000 {
    value += (((offset + n) as f64).sin().abs() * 10.0) as i32;
  }

  value
}

The goals of each of these tests is simple: test the ability of Workers and Compute@Edge to perform compute actions.

Latest update on network performance

We have two updates on data: a general network performance update, and then data on how Workers compares with Compute@Edge.

At Security Week (March 2022), we shared that we were faster in more of the most reported networks than our competitors. Out of the top 1,000 networks in the world (by number of IPv4 addresses advertised), here’s a breakdown of how many providers are number one in p95 TCP Connection Time, which represents the time it takes for a user to connect to the provider. This data is from Security Week (March 2022):

Recognizing that we are now looking at different numbers of networks, here is what the distribution looks like for the top 3,000 networks for Platform Week (May 2022):

In addition to being the fastest across popular networks, Cloudflare is also committed to being the fastest provider in every country.

Using data on the top 1,000 networks from Full Stack Week (March 2022), here’s what the world map looks like:

And here’s what the world looks like while looking at the top 3,000 networks:

Cloudflare became #1 in more countries in Africa and Europe, some of which previously did not have enough samples to officially be counted to one provider or another.

Workers vs Compute@Edge

Moving on from general network results, what does performance look like when comparing serverless compute products across providers — in this case, Cloudflare Workers performance to Compute@Edge? Looking at the Catchpoint data, the first thing we noticed was Cloudflare is faster than Fastly in all tests at the 95th percentile for Time to First Byte:

Test	95th percentile TTFB (ms)
Cloudflare JS no-op	469
Fastly JS no-op	596
Cloudflare JS hard	481
Fastly JS hard	631
Cloudflare Rust hard	493
Fastly Rust hard	577

Cloudflare is significantly faster at all tests compared to Fastly. But let’s dig into why that is. Looking at p95 Wait, we can see that Cloudflare does have an edge in most tests related to compute on-box:

Test	95th Percentile Wait (ms)
Cloudflare JS no-op	123
Fastly JS no-op	123
Cloudflare JS hard	136
Fastly JS hard	170
Cloudflare Rust hard	160
Fastly Rust hard	121

Looking at the Wait times, you can see that Cloudflare does have a significant edge in on-box performance, except in Rust, where Fastly claims their workloads are the most optimized. This data backs up that claim. But why is Fastly so much slower on Time to First Byte? The answer lies in the rest of the request. While latency spent in compute matters, it only matters in conjunction with the rest of the network performance. And Cloudflare has an advantage over Fastly in Connect times and SSL establishment times:

Test	95th Percentile Connect (ms)	95th Percentile SSL (ms)
Cloudflare JS no-op	81	289
Fastly JS no-op	88	293
Cloudflare JS hard	81	286
Fastly JS hard	88	291
Cloudflare Rust hard	81	288
Fastly Rust hard	88	295

Cloudflare’s hyper-optimized web stack allows us to process all requests faster, meaning that Workers code gets started faster. Having that extra head start in Connect and SSL allows us to further increase the distance between us and Compute@Edge.

To verify these results, we compared the Catchpoint results to our own data set. Here is the p95 TTFB for the JavaScript and Rust hard loops for both Fastly and Cloudflare from our data:

As you can see, Cloudflare is faster on JavaScript and Rust calls. And when you look at wait time, you will see that outside of Catchpoint results, Cloudflare even beats Fastly in the Rust hard tests:

The big takeaway from this is that in addition to Cloudflare being faster for the time spent processing requests, Cloudflare’s network and performance optimizations as a whole set us apart and make our Workers platform even faster.

A fast network makes for a faster developer platform

Cloudflare’s commitment to building the fastest network allows us to deliver unparalleled performance for all applications, including our developer tools. Whether it’s by accelerating Cloudflare Pages performance by hosting on every single Cloudflare server, or by deploying Workers in 275 cities without developers needing to configure a thing, Cloudflare’s developer tools are all built on top of Cloudflare’s global network. We’re committed to making our network faster so that our developer products are as performant as they could possibly be.

And it’s not just the developer platform. Cloudflare runs an integrated and optimized stack that includes DDoS protection, WAF, rate limiting, bot management, caching, SSL, smart routing, and more. By having a single software stack we are able to offer the widest range of features while ensuring that performance (no matter which of our products you use) remains excellent. We don’t want you to have to compromise performance to get security or vice versa.

Network performance update: Security Week

2022-03-21 David Tuber

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-security-week/

Network performance update: Security Week

Almost a year ago, we shared extensive benchmarking results of last mile networks all around the world. The results showed that on a range of tests (TCP connection time, time to first byte, time to last byte), and on different measures (p95, mean), Cloudflare was the fastest provider in 49% of networks around the world. Since then, we’ve worked to continuously improve performance towards the ultimate goal of being the fastest everywhere. We set a goal to grow the number of networks where we’re the fastest by 10% every Innovation Week. We met that goal last year, and we’re carrying the work over to 2022.

Today, we’re proud to report we are the fastest provider in 71% of the top 1,000 most reported networks around the world. Of course, we’re not done yet, but we wanted to share the latest results and explain how we did it.

Measuring what matters

In the process of quantifying network performance, it became clear where we were not the fastest everywhere. After Full Stack Week, we found 596 country/network pairs where we were more than 100ms behind the leading provider (where a country/network pair is defined as the performance of a network within a particular country).

We are constantly going through the process of figuring out why we were slow — and then improving. The challenges we faced were unique to each network and highlighted a variety of different issues that are prevalent on the Internet. We’re going to deep dive into a couple of networks, and show how we diagnosed and then improved performance.

But before we do, here are the results of our efforts since Full Stack Week.

*Performance is defined by p95 TCP connection time across top 1,000 networks in the world by number of IPv4 addresses advertised

Curing congestion in Canada

In the spirit of Security Week, we want to highlight how a Magic Transit (Cloudflare’s network layer DDoS security) customer’s network problems provided line of sight into their internal congestion issues, and how our network was able to mitigate the problem in the short term.

One Magic Transit customer saw congestion in Canada due to insufficient peering with the Internet at key interconnection points. Congestion for customers means bad performance for users: for games, it can lead to lag and jittery gameplay, for video streaming, it can lead to buffering and poor resolution, and for video/VoIP applications, it can lead to calls dropping, garbled video/voice, and sections of calls missing entirely. Fixing congestion in this case means improving the way this customer connects to the rest of the Internet to make the user experience better for both the customer and users.

When customers connect to the Internet, they can do so in several ways: through an ISP that connects to other networks, through an Internet Exchange which houses many different providers interconnecting at a singular point, or point-to-point connections with other providers.

In the case of this customer, they had direct connections to other providers and to Internet exchanges. They ran out of bandwidth with their point-to-point connections, meaning that they had too much traffic for the size of the links they had bought, which meant that the excess traffic had to go over Internet Exchanges that were out of the way, creating suboptimal network paths which increased latency.

We were able to use our network to help solve this problem. In the short term, we spread the traffic away from the congestion points. This removed hairpins to immediately improve the user experience. This restored service for this customer and all of their users.

Then, we went into action by accelerating previously planned upgrades of all of our Internet Exchange (IX) ports across Canada to ensure that we had enough capacity to handle them, even though the congestion wasn’t happening on our network. Finally, we reached out to the customer’s provider and quickly set up direct peering with them in Canada to provide direct interconnection close to the customer, so that we could provide them with a much better Internet experience. These actions made us the fastest provider on networks in Canada as well.

Keeping traffic in Australia

Next, we turn to a network that had poor performance in Australia. Users for that network were going all the way out of the country before going to Cloudflare. This created what is called a network hairpin. A network hairpin is caused by suboptimal connectivity in certain locations, which can cause users to traverse a network path that takes longer than it should. This hairpin effect made Cloudflare one of the slower providers for this network in Australia.

To fix this, Cloudflare set up peering with this network in Sydney, and this allowed traffic from this network to go to Cloudflare within the country the network was based in. This reduced our connection time from 65ms to 45ms, catapulting us to be the #1 provider for this network in the region.

Update on Full Stack Week

All of this work and more has helped us optimize our network even further. At Full Stack Week, we announced that we were faster in more of the most reported networks than our competitors. Out of the top 1,000 networks in the world (by number of IPv4 addresses advertised), here’s a breakdown of how many providers are number 1 in p95 TCP Connection Time, which represents the time it takes for a user to connect to the provider. This data is from Full Stack Week (November 2021):

As of Security Week, we improved our position to be faster in 19 new networks:

Cloudflare is also committed to being the fastest provider in every country. This is a world map using the data that was to show the countries with the fastest network provider during Full Stack Week (November 2021):

Here’s how the map of the world looks during Security Week (March 2022):

We moved to number 1 in all of South America, more countries in Africa, the UK, Sweden, Iceland, and also more countries in the Asia Pacific region.

A fast network means fast security products

Cloudflare’s commitment to building the fastest network allows us to deliver unparalleled performance for all applications, including our security applications. There’s an adage in the industry that you have to sacrifice performance for security, but Cloudflare believes that you should be able to have your security and performance without having to sacrifice either. We’ve unveiled a ton of awesome new products and features for Security Week and all of them are built on top of our lightning-fast network. That means that all of these products will continue to get faster as we relentlessly pursue our goal of being the fastest network everywhere.

Network Performance Update: Full Stack Week

2021-11-20 David Tuber

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-full-stack-week/

Network Performance Update: Full Stack Week

This blog was published on November 20, 2021. As we continue to optimize our network we’re publishing regular updates, which are available here.

A little over two months ago, we shared extensive benchmarking results of last mile networks all around the world. The results showed that on a range of tests (TCP connection time, time to first byte, time to last byte), and on a range of measurements (p95, mean), that Cloudflare was the fastest provider in 49% of networks around the world. Since then, we’ve worked to continuously improve performance until we’re the fastest everywhere. We set a goal to grow the number of networks where we’re the fastest by 10% every Innovation Week. We met that goal during Birthday Week (September 2021).

Today, we’re proud to report we blew the goal away for Full Stack Week (November 2021). Cloudflare measured our performance against the top 1,000 networks in the world (by number of IPv4 addresses advertised). Out of those, Cloudflare has become the fastest provider in 79 new networks, an increase of 14% of these 1,000 networks. Of course, we’re not done yet, but we wanted to share the latest results and explain how we did it.

However, before we go into more detail on our network performance, we wanted to share new performance metrics on our Workers platform (given it’s Full Stack Week!). We’ve crunched the numbers of Cloudflare Workers vs Fastly’s Compute@Edge, and the results are in: Workers is 196% faster.

Faster Network Means Faster Stack

A few months ago, we also discussed the performance of Cloudflare Workers, as compared to other similar offerings out there. We compared our performance to Lambda and Lambda@Edge, where Cloudflare Workers outperformed at 210% and 298% respectively.

At the time, we wanted to see how we measured up against all comparable offerings, but not all offerings were generally available. As a result, we weren’t able to report on how Workers compared to another solution: Fastly’s Compute@Edge.

Today, we’re excited to report that Cloudflare Workers is 196% faster than Fastly’s Compute@Edge based on the time to first byte from the tests we ran on 50 nodes using Catchpoint’s data from across the world.

As we have done in the past, we executed a function that simply returns the current time and measured wait time to first byte (the length of time between a client making an HTTP request to when the client receives the first byte of the request’s response, after DNS, connection, and TLS handshake). The tests were performed on November 8, 2021, using a free tier account for both Cloudflare Workers and Fastly’s Compute@Edge.

The code we ran on both providers was exactly identical — a small function that returns all request headers:

addEventListener('fetch', event => event.respondWith(handleRequest(event)));


async function handleRequest(event) {
  let requestHeaders = Object.fromEntries(event.request.headers)

  return new Response(JSON.stringify(requestHeaders), {status: 200})
};

Blue: Cloudflare Workers
Green: Compute@Edge

If you want to explore the results on your own, here is a link to the data.

By building on our global network that we’re constantly accelerating, leveraging isolates, and driving cold starts to zero, we’re able to offer our customers ludicrously fast speeds across the board.

Now, let’s move on to an update on how Cloudflare’s broader network performance has continued to improve!

Measuring What Matters

To quantify network performance, we have to get enough data from around the world across all manner of different networks comparing ourselves with other providers. We used Real User Measurements (RUM) to fetch a 100kb file from several different providers. Users around the world report the performance of different providers. The more users who report the data, the higher fidelity the signal is. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post here.

In the process of quantifying network performance, it became clear where we were not the fastest everywhere. After Birthday Week, we found 601 country/network pairs where we were more than 100ms behind the leading provider (where a country/network pair is defined as the performance of a network within a particular country).

We are constantly going through the process of figuring out why we were slow — and then improve. The challenges we faced were unique to each network and highlighted a variety of different issues that are prevalent on the Internet. We’re going to deep dive into a couple of networks, and show how we diagnosed and then improved performance.

But before we do, here are the results of our efforts in the past two weeks.

Cloudflare has become number one in TCP Connection Time in 79 new networks. This graph shows the number of networks where we ranked number 1 for TCP Connection Time during Full Stack Week compared to Birthday Week:

We’ve become faster in 79 more networks thanks to our efforts, which have represented a growth of 14% in networks where we were the fastest. Here’s a chart showing our ranking distribution comparing Birthday Week and Full Stack Week:

Now that we’ve talked about how we’ve improved, let’s share our stories about chasing peak performance across the world — each with a different set of challenges.

Placing Traffic Properly in Peru

Our first stop for improving network performance was in Peru. We observed that a lot of users in Lima were actually getting sent to Chile to be served. Cloudflare has multiple locations in Peru, so this shouldn’t have happened. Sending traffic to Chile caused us to be ranked fourth on that particular network in the country. Our engineers knew that the best way to get to number one was to ensure that all the Lima traffic stayed within the country, so we decided to look at why so much of our traffic was getting routed outside the country.

The reason that so much traffic was being routed outside the country was due to the network provider distributing traffic to Cloudflare unevenly, and too many users were sent to one specific location. Our network has a series of checks and fail-safes that allow us to ensure that even if this happens, our users will continue to see a good experience. The checks were being engaged here because of the uneven distribution of traffic to our locations in Lima; however, the traffic was being sent out of the country.

To fix the situation in the short term, we decided to do a bit of manual load balancing across our locations in Lima while building automation to remove the need for manual actions in the future. We took one of the locations that was seeing the most traffic and stopped advertising some prefixes from that location. The hypothesis we had was that the traffic would simply flow to the other Lima locations instead of Chile, and everything would balance out, improving the TCP connect time for everyone while keeping the traffic in the country. We started to make the change on a small portion of our free traffic, and our hypothesis proved correct. At that point, we deployed the change in a larger scope, and the P90 Client TCP RTT dropped from 240ms to 60ms.

As a result, Cloudflare is now number one in network performance in Peru.

Slimming Down Latencies in Sri Lanka

Our next example takes us halfway around the world to Sri Lanka, where we found a network provider who was routing requests from their users to Newark.

1 * * *
2 100.85.0.1 3.061ms 2.522ms 2.728ms
3 198.51.100.146 AS29766 3.651ms 1.855ms 2.715ms
4 198.51.100.145 AS29766 3.438ms 3.225ms 2.805ms
5 222.165.177.150 AS9329 2.233ms 2.272ms 2.843ms
6 222.165.177.145 AS9329 2.703ms 2.862ms 2.291ms
7 103.87.125.253 AS45489 3.658ms 3.708ms 3.613ms
8 103.87.124.245 AS45489 120.027ms 120.665ms 120.471ms
9 103.87.124.146 AS45489 115.597ms 115.863ms 115.178ms
10 50.208.235.157 be-107-2008-pe01.60hudson.ny.ibone.comcast.net AS7922 249.884ms 249.475ms 250.063ms -> going from Sri Lanka to New York
11 96.110.41.145 be-4101-cs01.newyork.ny.ibone.comcast.net AS7922 267.839ms 267.979ms 268.719ms
12 96.110.34.34 be-3112-pe12.111eighthave.ny.ibone.comcast.net AS7922 262.647ms 261.272ms 262.272ms
13 66.208.233.106 AS7922 262.378ms 258.948ms 258.057ms
14 172.70.108.4 AS13335 268.974ms 280.475ms 268.158ms
15 172.67.182.209 AS13335 267.329ms 266.466ms 266.593ms

This understandably caused significant latency problems, and Cloudflare was ranked fourth in Sri Lanka on this network as a result. Even though Colombo is a relatively small location, we moved as much traffic as possible and advertised through the location to improve the user experience and reduce the potential amount of traffic sent to Newark.

Once this was done, we noticed that the P90 Client TCP RTT dropped from 150ms to 50ms.

However, even though we were advertising all of our ranges through Colombo and our performance in aggregate improved, this provider was still sending traffic for some Cloudflare prefixes to Newark. We reached out to the provider and let them know about this user-impacting change they made.

After doing all of these things, Cloudflare moved from fourth in Sri Lanka to number one.

Update on Birthday Week

All of these network changes and more have allowed Cloudflare to become number one in network performance in more networks than before. During Birthday Week, we announced that we were faster in more networks than our competitors. Out of the top 1,000 networks in the world (by number of IPv4 addresses advertised), here’s how Cloudflare performed during Birthday Week (September 2021):

As of Full Stack Week (November 2021), we further improved our position to be faster in 79 new networks:

But we haven’t just increased our performance on the last mile, we’ve even gotten better on Time to Last Byte as well. Here’s how the landscape looked leading up to Birthday Week (September 2021):

And here’s the network landscape now (November 2021):

Cloudflare is also committed to being the fastest provider in every country. Network performance by country is a moving target, and is largely driven by users who are accessing at any given day. Also, looking at network performance in aggregate across countries for long time frames can leave a lot of data out. That being said, this is a world map using the data that was to show the countries with the fastest network provider during Birthday Week (Sept 2021):

Here’s how it looks two months later during Full Stack Week (November 2021):

Long Tail Latency

The running theme of these performance updates has always been the long tail of issues to solve. Ironing out these kinks on our network is critical to ensure that we provide premiere performance as we grow.

Our team has put in a lot of effort and yielded some great results, but we’re constantly trying to be faster. We’ve automated the discovery of performance issues like these, and we’re looking to build automation that will detect and remediate different classes of these issues to stay on top of network performance in the future.

Tracking performance like this doesn’t just make one number faster; it helps improve the performance of your entire stack, making everything lightning-fast.

We have one more innovation week coming in 2021, and we’ll be back to report on further progress on optimizing our performance globally.

How we stack up

Lightning fast

Measuring What Matters

Blazing fast speeds with HTTP/3

What’s next

How we stack up

Our tooling

What’s next

First, the results

Our tooling and process

What’s next

Performance is a threat vector

Cloudflare Gateway: security at the Internet

Instant (Zero Trust) access

Remote Browser Isolation: a secure browser hosted in the cloud

Gotta go fast(er)

Storage performance is user-facing

Benchmarking storage performance

Focus on cross-region

Testing methodology

Putting the pieces together

Storage, sped up and simplified

Why does performance matter?

Cloudflare Access: the fastest Zero Trust proxy

Network effects make the user experience better

Next-generation performance in a Zero Trust world

Latest update on network performance

How we’re comparing developer platforms

Test methodology

Measuring Workers performance from real users

Workers vs Compute@Edge vs Lambda@Edge

Your application, but faster

Measuring what matters

Latest data on network performance

Network performance in a Zero Trust world

Revamped measurements

Measuring what matters

Benchmarking for everyone

Measuring Workers performance from real users

Changing the numbers we look at

What’s in the test?

Latest update on network performance

Workers vs Compute@Edge

A fast network makes for a faster developer platform

Measuring what matters

Curing congestion in Canada

Keeping traffic in Australia

Update on Full Stack Week

A fast network means fast security products

Faster Network Means Faster Stack

Measuring What Matters

Placing Traffic Properly in Peru

Slimming Down Latencies in Sri Lanka

Update on Birthday Week

Long Tail Latency

The collective thoughts of the interwebz