Tag Archives: Argo Smart Routing

Argo Smart Routing for UDP: speeding up gaming, real-time communications and more

Post Syndicated from Achiel van der Mandele original http://blog.cloudflare.com/turbo-charge-gaming-and-streaming-with-argo-for-udp/

Argo Smart Routing for UDP: speeding up gaming, real-time communications and more

Argo Smart Routing for UDP: speeding up gaming, real-time communications and more

Today, Cloudflare is super excited to announce that we’re bringing traffic acceleration to customer’s UDP traffic. Now, you can improve the latency of UDP-based applications like video games, voice calls, and video meetings by up to 17%. Combining the power of Argo Smart Routing (our traffic acceleration product) with UDP gives you the ability to supercharge your UDP-based traffic.

When applications use TCP vs. UDP

Typically when people talk about the Internet, they think of websites they visit in their browsers, or apps that allow them to order food. This type of traffic is sent across the Internet via HTTP which is built on top of the Transmission Control Protocol (TCP). However, there’s a lot more to the Internet than just browsing websites and using apps. Gaming, live video, or tunneling traffic to different networks via a VPN are all common applications that don’t use HTTP or TCP. These popular applications leverage the User Datagram Protocol (or UDP for short). To understand why these applications use UDP instead of TCP, we’ll need to dig into how these different applications work.

When you load a web page, you generally want to see the entire web page; the website would be confusing if parts of it are missing. For this reason, HTTP uses TCP as a method of transferring website data. TCP ensures that if a packet ever gets lost as it crosses the Internet, that packet will be resent. Having a reliable protocol like TCP is generally a good idea when 100% of the information sent needs to be loaded. It’s worth noting that later HTTP versions like HTTP/3 actually deviated from TCP as a transmission protocol, but they still ensure packet delivery by handling packet retransmission using the QUIC protocol.

There are other applications that prioritize quickly sending real time data and are less concerned about perfectly delivering 100% of the data. Let’s explore Real-Time Communications (RTC) like video meetings as an example. If two people are streaming video live, all they care about is what is happening now. If a few packets are lost during the initial transmission, retransmission is usually too slow to render the lost packet data in the current video frame. TCP doesn’t really make sense in this scenario.

Instead, RTC protocols are built on top of UDP. TCP is like a formal back and forth conversation where every sentence matters. UDP is more like listening to your friend's stream of consciousness: you don’t care about every single bit as long as you get the gist of it. UDP transfers packet data with speed and efficiency without guaranteeing the delivery of those packets. This is perfect for applications like RTC where reducing latency is more important than occasionally losing a packet here or there. The same applies to gaming traffic; you generally want the most up-to-date information, and you don’t really care about retransmitting lost packets.

Gaming and RTC applications really care about latency. Latency is the length of time it takes a packet to be sent to a server plus the length of time to receive a response from the server (called round-trip time or RTT). In the case of video games, the higher the latency, the longer it will take for you to see other players move and the less time you’ll have to react to the game. With enough latency, games become unplayable: if the players on your screen are constantly blipping around it’s near impossible to interact with them. In RTC applications like video meetings, you’ll experience a delay between yourself and your counterpart. You may find yourselves accidentally talking over each other which isn’t a great experience.

Companies that host gaming or RTC infrastructure often try to reduce latency by spinning up servers that are geographically closer to their users. However, it’s common to have two users that are trying to have a video call between distant locations like Amsterdam and Los Angeles. No matter where you install your servers, that's still a long distance for that traffic to travel. The longer the path, the higher the chances are that you're going to run into congestion along the way. Congestion is just like a traffic jam on a highway, but for networks. Sometimes certain paths get overloaded with traffic. This causes delays and packets to get dropped. This is where Argo Smart Routing comes in.

Argo Smart Routing

Cloudflare customers that want the best cross-Internet application performance rely on Argo Smart Routing’s traffic acceleration to reduce latency. Argo Smart Routing is like the GPS of the Internet. It uses real time global network performance measurements to accelerate traffic, actively route around Internet congestion, and increase your traffic’s stability by reducing packet loss and jitter.

Argo Smart Routing was launched in May 2017, and its first iteration focused on reducing website traffic latency. Since then, we’ve improved Argo Smart Routing and also launched Argo Smart Routing for Spectrum TCP traffic which reduces latency in any TCP-based protocols. Today, we’re excited to bring the same Argo Smart Routing technology to customer’s UDP traffic which will reduce latency, packet loss, and jitter in gaming, and live audio/video applications.

Argo Smart Routing accelerates Internet traffic by sending millions of synthetic probes from every Cloudflare data center to the origin of every Cloudflare customer. These probes measure the latency of all possible routes between Cloudflare’s data centers and a customer’s origin. We then combine that with probes running between Cloudflare’s data centers to calculate possible routes. When an Internet user makes a request to an origin, Cloudflare calculates the results of our real time global latency measurements, examines Internet congestion data, and calculates the optimal route for customer’s traffic. To enable Argo Smart Routing for UDP traffic, Cloudflare extended the route computations typically used for HTTP and TCP traffic and applied them to UDP traffic.

We knew that Argo Smart Routing offered impressive benefits for HTTP traffic, reducing time to first byte by up to 30% on average for customers. But UDP can be treated differently by networks, so we were curious to see if we would see a similar reduction in round-trip-time for UDP. To validate, we ran a set of tests. We set up an origin in Iowa, USA and had a client connect to it from Tokyo, Japan. Compared to a regular Spectrum setup, we saw a decrease in round-trip-time of up to 17.3% on average. For the standard setup, Spectrum was able to proxy packets to Iowa in 173.3 milliseconds on average. Comparatively, turning on Argo Smart Routing reduced the average round-trip-time down to 143.3 milliseconds. The distance between those two cities is 6,074 miles (9,776 kilometers), meaning we've effectively moved the two closer to each other by over a thousand miles (or 1,609 km) just by turning on this feature.

We're incredibly excited about Argo Smart Routing for UDP and what our customers will use it for. If you're in gaming or real-time-communications, or even have a different use-case that you think would benefit from speeding up UDP traffic, please contact your account team today. We are currently in closed beta but are excited about accepting applications.

Argo for Packets is Generally Available

Post Syndicated from David Tuber original https://blog.cloudflare.com/argo-for-packets-generally-available/

Argo for Packets is Generally Available

Argo for Packets is Generally Available

What would you say if we told you your IP network can be faster by 10%, and all you have to do is reach out to your account team to make it happen?

Today, we’re announcing the general availability of Argo for Packets, which provides IP layer network optimizations to supercharge your Cloudflare network services products like Magic Transit (our Layer 3 DDoS protection service), Magic WAN (which lets you build your own SD-WAN on top of Cloudflare), and Cloudflare for Offices (our initiative to provide secure, performant connectivity into thousands of office buildings around the world).

If you’re not familiar with Argo, it’s a Cloudflare product that makes your traffic faster. Argo finds the fastest, most available path for your traffic on the Internet. Every day, Cloudflare carries trillions of requests, connections, and packets across our network and the Internet. Because our network, our customers, and their end users are well distributed globally, all of these requests flowing across our infrastructure paint a great picture of how different parts of the Internet are performing at any given time. Cloudflare leverages this picture to ensure that your traffic takes the fastest path through our infrastructure.

Previously, Argo optimized traffic at the Layer 7 application layer and at the Layer 4 protocol layer. With the GA of Argo for Packets, we’re now optimizing the IP layer for your private network. During Speed Week we announced the early access for Argo for Packets, and how it can offer a 10% latency reduction. Today, to celebrate Argo for Packets reaching GA, we’re going to dive deeper into the latency reductions, show you examples, explain how you can see even greater optimizations, and talk about how Argo’s secure data plane gives you additional encryption even at Layer 3.

And if you’re interested in enabling Argo for Packets today, please reach out to your account team to get the process started!

Better than BGP

As we said during Speed Week, Argo for Packets provides an average 10% latency improvement across the world in our internal testing:

Argo for Packets is Generally Available

As we moved towards GA, we found that our real world numbers match our internal testing, and we still see that 10% improvement. But it’s important to note that the 10% latency reduction numbers are an average across all paths across the world. Different customers can see different latency gains depending on their setup.

Argo for Packets achieves these latency gains by dynamically choosing the best possible path throughout our network. Let’s talk a bit about what that means.

Normal packets on the network find their way to their destination using something called the Border Gateway Protocol (BGP), which allows packets to traverse the “shortest” path to its destination. However, the shortest path in BGP terms isn’t strongly correlated with latency, but with network hops. For example, path A in a network has two possible paths: 12345 – 54321 – 13335, and 12345 13335. Both networks start from the network 12345 and end at Cloudflare, which is AS 13335. BGP logic dictates that traffic will always go through the second path. But if the first path has a lower network latency or lower packet loss, customers could potentially see better performance and not know it!

There are two ways to remedy this. The first way is to invest in building out more pipes with network 12345 while expanding the network to be right next to every network. Customers can also build out their own networks or purchase expensive vendor MPLS networks. Either solution will cost a lot of money and time to reach the levels of performance customers want.

Cloudflare improves customer performance by leveraging our existing global network and backbone, plus the networking data from traffic that’s already being sent over to optimize routes back to you. This allows us to improve which paths are taken as traffic changes and congestion on the Internet happens. Argo looks at every path back from every Cloudflare datacenter back to your origin, down to the individual network path. Argo compares existing Layer 4 traffic and network analytics across all of these unique paths to determine the fastest, most available path.

To make Argo personalized to your private network, Cloudflare makes use of a data source that we already built for Magic Transit. That data source: health check probes. Cloudflare leverages existing health check probes from every single Cloudflare data center back to each customer’s origin. These probes are used to determine the health of paths from Cloudflare back to a customer for Magic Transit, so that Cloudflare knows which paths back to origin are healthy. These probes contain a variety of information that can also be used to improve performance such as packet loss and latency data. By examining health check probes and adding them to existing Layer 4 data, Cloudflare can get a better understanding of one-way latencies and can construct a map that allows us to see all the interconnected data centers and how fast they are to each other. Cloudflare then finds the best path at layer 3 back to the customer datacenter by picking an entry location where the packet entered our network, and an exit location that is directly connected back to the customer via a Cloudflare Network Interconnect.

Argo for Packets is Generally Available

Using this map, Cloudflare constructs dynamic routes for each customer based on where their traffic enters Cloudflare’s network and where they need to go.

Let’s dive into some examples of how your latency reductions manifest depending on your setup.

Cloudflare’s Network is Your Network

In our Speed Week blog outlining how Magic products make your network faster, we outlined several different network topology examples and showed the improvements Magic Transit and Magic WAN had on their networks. Let’s supercharge those numbers by adding Argo for Packets on top of that to see how we can improve performance even further.

The example from the blog outlined a company with locations in South Carolina, Oregon, and Los Angeles. In that blog, we showed the latency improvements that Magic Transit by itself provided for one leg of the trip. That network looks like this:

Argo for Packets is Generally Available
Argo for Packets is Generally Available

Let’s break that out to show the latencies between all paths on that network. Let’s assume that South Carolina connects to Atlanta, and Oregon connects to Seattle, which is the most likely scenario:

Source Location Destination Location Magic WAN one-way latency Argo for Packets One-way latency Argo improvement (in ms) Latency percent improvement
Los Angeles Atlanta 49.1 45 4.11 8.36
Los Angeles Seattle 32.4 27.2 5.18 16
Atlanta Los Angeles 49 44.9 4.09 8.35
Atlanta Seattle 78.1 56.9 21.2 27.1
Seattle Los Angeles 32.2 27 5.22 16.2
Seattle Atlanta 77.7 56.7 20.9 26.9

For this sample customer network, Argo for Packets improves latencies on every possible path. As you can see the average percent improvement is much higher for this particular network than the global average of 10%.

Let’s take another example of a customer with locations in Asia: South Korea, Philippines, Singapore, Osaka, and Hong Kong. For a network with those locations, Argo for Packets is able to create a 17% latency reduction by finding the optimal paths between locations that were typically trickiest to navigate, like between South Korea, Osaka, and the Philippines. A customer with many locations will see huge benefits from Argo for Packets, because it optimizes the trickiest paths on the Internet and makes them just as fast as the other paths. It removes the latency incurred by your worst network paths and makes not only your average numbers look good, but especially your 90th percentile latency numbers.

Reducing these long-tail latencies is critical especially as customers move back to better and start returning to offices all around the world.

Next Stop: Your Office

Argo for Packets pairs brilliantly with Magic WAN and Cloudflare for Offices to create a hyper-optimized, ultra-secure, private network that adapts to whatever you throw at it. If this is your first time hearing about Cloudflare for Offices, it’s our new initiative to provide private, secure, performant connectivity to thousands of new locations around the world. And that private connectivity provides a great foundation for Argo for Packets to speed up your network.

Taking the above example from the United States, if this company adds two new locations in Boston and Dallas, those locations also see significant latency reduction through Argo for Packets. Now, their network looks like this:

Argo for Packets is Generally Available

Argo for Packets also ensures that those freshly added new offices will immediately see great performance on the private network:

Source Location Destination Location Argo improvement (in ms) Latency percent improvement
Los Angeles Dallas 9.89 23.3
Los Angeles Atlanta 0.774 1.58
Los Angeles Seattle 0.478 1.51
Los Angeles Boston 13.3 16.8
Dallas Los Angeles 9.66 23
Dallas Atlanta 0 0
Dallas Seattle 2.96 5.2
Dallas Boston 0.43 0.955
Atlanta Los Angeles 0.687 1.4
Atlanta Dallas 0 0
Atlanta Seattle 9.7 12.4
Atlanta Boston 4.39 15.2
Seattle Los Angeles 0.322 1.02
Seattle Dallas 3.11 5.43
Seattle Atlanta 9.81 12.6
Seattle Boston 34.7 30.3
Boston Los Angeles 13.3 16.8
Boston Dallas 0.386 0.85
Boston Atlanta 4.37 15
Boston Seattle 33.7 29.6

Cloudflare for Offices makes it so easy to get those offices set up because customers don’t have to bring their perimeter firewalls, WAN devices, or anything else — they can just plug into Cloudflare at their building, and the power of Cloudflare One allows them to get all their network security services over a private connection to Cloudflare, optimized by Argo for Packets.

Your Network, but Faster

Argo for Packets is the perfect complement to any of our Cloudflare One solutions: providing faster bits through your network, built on Cloudflare. Now, your SD-WAN and Magic Transit solutions can be optimized to not just be secure, but performant as well.

If you’re interested in turning on Argo for Packets or onboarding your offices to a private and secure connectivity solution, reach out to your account team to get the process started.

Announcing Argo for Spectrum

Post Syndicated from Achiel van der Mandele original https://blog.cloudflare.com/argo-spectrum/

Announcing Argo for Spectrum

Announcing Argo for Spectrum

Today we’re excited to announce the general availability of Argo for Spectrum, a way to turbo-charge any TCP based application. With Argo for Spectrum, you can reduce latency, packet loss and improve connectivity for any TCP application, including common protocols like Minecraft, Remote Desktop Protocol and SFTP.

The Internet — more than just a browser

When people think of the Internet, many of us think about using a browser to view websites. Of course, it’s so much more! We often use other ways to connect to each other and to the resources we need for work. For example, you may interact with servers for work using SSH File Transfer Protocol (SFTP), git or Remote Desktop software. At home, you might play a video game on the Internet with friends.

To help people that protect these services against DDoS attacks, Spectrum launched in 2018 and extends Cloudflare’s DDoS protection to any TCP or UDP based protocol. Customers use it for a wide variety of use cases, including to protect video streaming (RTMP), gaming and internal IT systems. Spectrum also supports common VoIP protocols such as SIP and RTP, which have recently seen an increase in DDoS ransomware attacks. A lot of these applications are also highly sensitive to performance issues. No one likes waiting for a file to upload or dealing with a lagging video game.

Latency and throughput are the two metrics people generally discuss when talking about network performance. Latency refers to the amount of time a piece of data (a packet) takes to traverse between two systems. Throughput refers to the amount of bits you can actually send per second. This blog will discuss how these two interplay and how we improve them with Argo for Spectrum.

Argo to the rescue

There are a number of factors that cause poor performance between two points on the Internet, including network congestion, the distance between the two points, and packet loss. This is a problem many of our customers have, even on web applications. To help, we launched Argo Smart Routing in 2017, a way to reduce latency (or time to first byte, to be precise) for any HTTP request that goes to an origin.

That’s great for folks who run websites, but what if you’re working on an application that doesn’t speak HTTP? Up until now people had limited options for improving performance for these applications. That changes today with the general availability of Argo for Spectrum. Argo for Spectrum offers the same benefits as Argo Smart Routing for any TCP-based protocol.

Argo for Spectrum takes the same smarts from our network traffic and applies it to Spectrum. At time of writing, Cloudflare sits in front of approximately 20% of the Alexa top 10 million websites. That means that we see, in near real-time, which networks are congested, which are slow and which are dropping packets. We use that data and take action by provisioning faster routes, which sends packets through the Internet faster than normal routing. Argo for Spectrum works the exact same way, using the same intelligence and routing plane but extending it to any TCP based application.

Performance

But what does this mean for real application performance? To find out, we ran a set of benchmarks on Catchpoint. Catchpoint is a service that allows you to set up performance monitoring from all over the world. Tests are repeated at intervals and aggregate results are reported. We wanted to use a third party such as Catchpoint to get objective results (as opposed to running themselves).

For our test case, we used a file server in the Netherlands as our origin. We provisioned various tests on Catchpoint to measure file transfer performance from various places in the world: Rabat, Tokyo, Los Angeles and Lima.

Announcing Argo for Spectrum
Throughput of a 10MB file. Higher is better.

Depending on location, transfers saw increases of up to 108% (for locations such as Tokyo) and 85% on average. Why is it so much faster? The answer is bandwidth delay product. In layman’s terms, bandwidth delay product means that the higher the latency, the lower the throughput. This is because with transmission protocols such as TCP, we need to wait for the other party to acknowledge that they received data before we can send more.

As an analogy, let’s assume we’re operating a water cleaning facility. We send unprocessed water through a pipe to a cleaning facility, but we’re not sure how much capacity the facility has! To test, we send an amount of water through the pipe. Once the water has arrived, the facility will call us up and say, “we can easily handle this amount of water at a time, please send more.” If the pipe is short, the feedback loop is quick: the water will arrive, and we’ll immediately be able to send more without having to wait. If we have a very, very long pipe, we have to stop sending water for a while before we get confirmation that the water has arrived and there’s enough capacity.

The same happens with TCP: we send an amount of data to the wire and wait to get confirmation that it arrived. If the latency is high it reduces the throughput because we’re constantly waiting for confirmation. If latency is low we can throttle throughput at a high rate. With Spectrum and Argo, we help in two ways: the first is that Spectrum terminates the TCP connection close to the user, meaning that latency for that link is low. The second is that Argo reduces the latency between our edge and the origin. In concert, they create a set of low-latency connections, resulting in a low overall bandwidth delay product between users in origin. The result is a much higher throughput than you would otherwise get.

Argo for Spectrum supports any TCP based protocol. This includes commonly used protocols like SFTP, git (over SSH), RDP and SMTP, but also media streaming and gaming protocols such as RTMP and Minecraft. Setting up Argo for Spectrum is easy. When creating a Spectrum application, just hit the “Argo Smart Routing” toggle. Any traffic will automatically be smart routed.

Announcing Argo for Spectrum

Argo for Spectrum covers much more than just these applications: we support any TCP-based protocol. If you’re interested, reach out to your account team today to see what we can do for you.

Improving Origin Performance for Everyone with Orpheus and Tiered Cache

Post Syndicated from David Tuber original https://blog.cloudflare.com/orpheus/

Improving Origin Performance for Everyone with Orpheus and Tiered Cache

Improving Origin Performance for Everyone with Orpheus and Tiered Cache

Cloudflare’s mission is to help build a better Internet for everyone. Building a better Internet means helping build more reliable and efficient services that everyone can use. To help realize this vision, we’re announcing the free distribution of two products, one old and one new:

  • Tiered Caching is now available to all customers for free. Tiered Caching reduces origin data transfer and improves performance, making web properties cheaper and faster to operate. Tiered Cache was previously a paid addition to Free, Pro, and Business plans as part of Argo.
  • Orpheus is now available to all customers for free. Orpheus routes around problems on the Internet to ensure that customer origin servers are reachable from everywhere, reducing the number of errors your visitors see.

Tiered Caching: improving website performance and economics for everyone

Tiered Cache uses the size of our network to reduce requests to customer origins by dramatically increasing cache hit ratios. With data centers around the world, Cloudflare caches content very close to end users, but if a piece of content is not in cache, the Cloudflare edge data centers must contact the origin server to receive the cacheable content. This can be slow and places load on an origin server compared to serving directly from cache.

Tiered Cache works by dividing Cloudflare’s data centers into a hierarchy of lower-tiers and upper-tiers. If content is not cached in lower-tier data centers (generally the ones closest to a visitor), the lower-tier must ask an upper-tier to see if it has the content. If the upper-tier does not have it, only the upper-tier can ask the origin for content. This practice improves bandwidth efficiency by limiting the number of data centers that can ask the origin for content, reduces origin load, and makes websites more cost-effective to operate.

Dividing data centers like this results in improved performance for visitors because distances and links traversed between Cloudflare data centers are generally shorter and faster than the links between data centers and origins. It also reduces load on origins, making web properties more economical to operate. Customers enabling Tiered Cache can achieve a 60% or greater reduction in their cache miss rate as compared to Cloudflare’s traditional CDN service.

Additionally, Tiered Cache concentrates connections to origin servers so they come from a small number of data centers rather than the full set of network locations. This results in fewer open connections using server resources.

Improving Origin Performance for Everyone with Orpheus and Tiered Cache

Tiered Cache is simple to enable:

  • Log into your Cloudflare account.
  • Navigate to the Caching in the dashboard.
  • Under Caching, select Tiered Cache.
  • Enable Tiered Cache.

From there, customers will automatically be enrolled in Smart Tiered Cache Topology without needing to make any additional changes. Enterprise Customers can select from different prefab topologies or have a custom topology created for their unique needs.

Improving Origin Performance for Everyone with Orpheus and Tiered Cache

Smart Tiered Cache dynamically selects the single best upper tier for each of your website’s origins with no configuration required. We will dynamically find the single best upper tier for an origin by using Cloudflare’s performance and routing data. Cloudflare collects latency data for each request to an origin. Using this latency data, we can determine how well any upper-tier data center is connected with an origin and can empirically select the best data center with the lowest latency to be the upper-tier for an origin.

Today, Smart Tiered Cache is being offered to ALL Cloudflare customers for free, in contrast to other CDNs who may charge exorbitant fees for similar or worse functionality. Current Argo customers will get additional benefits described here. We think that this is a foundational improvement to the performance and economics of running a website.

But what happens if an upper-tier can’t reach an origin?

Orpheus: solving origin reachability problems for everyone

Cloudflare is a reverse proxy that receives traffic from end users and proxies requests back to customer servers or origins. To be successful, Cloudflare needs to be reachable by end users while simultaneously being able to reach origins. With end users around the world, Cloudflare needs to be able to reach origins from multiple points around the world at the same time. This is easier said than done! The Internet is not homogenous, and diverse Cloudflare network locations do not necessarily take the same paths to a given customer origin at any given time. A customer origin may be reachable from some networks but not from others.

Cloudflare developed Argo to be the Waze of the Internet, allowing our network to react to changes in Internet traffic conditions and route around congestion and breakages in real-time, ensuring end users always have a good experience. Argo Smart Routing provides amazing performance and reliability improvements to our customers.

Enter Orpheus. Orpheus provides reachability benefits for customers by finding unreachable paths on the Internet in real time, and guiding traffic away from those paths, ensuring that Cloudflare will always be able to reach an origin no matter what is happening on the Internet.  

Today, we’re excited to announce that Orpheus is available to and being used by all our customers.

Fewer 522s

You may have seen this error before at one time or another.

Improving Origin Performance for Everyone with Orpheus and Tiered Cache

This error indicates that a user was unable to reach content because Cloudflare couldn’t reach the origin. Because of the unpredictability of the Internet described above, users may see this error even when an origin is up and able to receive traffic.

So why do you see this error? The 522 error occurs when network instability causes traffic sent by Cloudflare to fail either before it reaches the origin, or on the way back from the origin to Cloudflare. This is the equivalent of either Cloudflare or your origin sending a request and never getting a response. Both sides think that they’re fine, but the network path between them is not reachable at all. This causes customer pain.

Orpheus solves that pain, ensuring that no matter where users are or where the origin is, an Internet application will always be reachable from Cloudflare.

How it works

Orpheus builds and provisions routes from Cloudflare to origins by analyzing data from users on every path from Cloudflare and ordering them on a per-data center level with the goal of eliminating connection errors and minimizing packet loss. If Orpheus detects errors on the current path from Cloudflare back to a customer origin, Orpheus will steer subsequent traffic from the impacted network path to the healthiest path available.

Improving Origin Performance for Everyone with Orpheus and Tiered Cache

This is similar to how Argo works but with some key differences: Argo is always steering traffic down the fastest path, whereas Orpheus is reactionary and steers traffic down healthy (and not necessarily the fastest) paths when needed.

Improving origin reachability for customers

Let’s look at an example.

Barry has an origin hosted in WordPress in Chicago for his daughter’s band. This zone primarily sees traffic from three locations: the location closest to his daughter in Seattle, the location closest to him in Boston, and the location closest to his parents in Tampa, who check in on their granddaughter’s site daily for updates.

One day, a link between Tampa and the Chicago origin gets cut by a wandering backhoe. This means that Tampa loses some connectivity back to the Chicago origin. As a result, Barry’s parents start to see failures when connecting back to origin when connecting to the site. This reflects in origin reachability decreasing. Orpheus helps here by finding alternate paths for Barry’s parents, whether it’s through Boston, Seattle, or any location in between that isn’t impacted by the fiber cut seen in Tampa.

Improving Origin Performance for Everyone with Orpheus and Tiered Cache

So even though there is packet loss between one of Cloudflare’s data centers and Barry’s origin, because there is a path through a different Cloudflare data center that doesn’t have loss, the traffic will still succeed because the traffic will go down the non-lossy path.

How much does Orpheus help my origin reachability?

In our rollout of Orpheus for customers, we observed that Orpheus improved Origin reachability by 23%, from 99.87% to 99.90%. Here is a chart showing the improvement Orpheus provides (lower is better):

Improving Origin Performance for Everyone with Orpheus and Tiered Cache

We measure this reachability improvement by measuring 522 rates for every data center-origin pair and then comparing traffic that traversed Orpheus routes with traffic that went directly back to origin. Orpheus was especially helpful at improving reachability for slightly lossy paths that could present small amounts of failure over a long period of time, whereas direct to origin would see those failures.

Note that we’ll never get this number to 0% because, with or without Orpheus, some origins really are unreachable because they are down!

Orpheus makes Cloudflare products better

Orpheus pairs well with some of our products that are already designed to provide highly available services on an uncertain Internet. Let’s go over the interactions between Orpheus and three of our products: Load Balancing, Cloudflare Network Interconnect, and Tiered Cache.

Load Balancing

Orpheus and Load Balancing go together to provide high reachability for every origin endpoint. Load balancing allows for automatic selection of endpoints based on health probes, ensuring that if an origin isn’t working, customers will still be available and operational. Orpheus finds reachable paths from Cloudflare to every origin. These two products in tandem provide a highly available and reachable experience for customers.

Cloudflare Network Interconnect

Orpheus and Cloudflare Network Interconnect (CNI) combine to always provide a highly reachable path, no matter where in the world you are. Consider Acme, a company who is connected to the Internet by only one provider that has a lot of outages. Orpheus will do its best to steer traffic around the lossy paths, but if there’s only one path back to the customer, Orpheus won’t be able to find a less-lossy path. Cloudflare Network Interconnect solves this problem by providing a path that is separate from the transit provider that any Cloudflare data center can access. CNI provides a viable path back to Acme’s origin that will allow Orpheus to engage from any data center in the world if loss occurs.

Shields for All

Orpheus and Tiered Cache can combine to build an adaptive shield around an origin that caches as much as possible while improving traffic back to origin. Tiered Cache topologies allow for customers to deflect much of their static traffic away from their origin to reduce load, and Orpheus helps ensure that any traffic that has to go back to the origin traverses over highly available links.

Improving origin performance for everyone

The Internet is a growing, ever-changing ecosystem. With the release of Orpheus and Tiered Cache for everyone, we’ve given you the ability to navigate whatever the Internet has in store to provide the best possible experience to your customers.

Argo 2.0: Smart Routing Learns New Tricks

Post Syndicated from David Tuber original https://blog.cloudflare.com/argo-v2/

Argo 2.0: Smart Routing Learns New Tricks

Argo 2.0: Smart Routing Learns New Tricks

We launched Argo in 2017 to improve performance on the Internet. Argo uses real-time global network information to route around brownouts, cable cuts, packet loss, and other problems on the Internet. Argo makes the network that Cloudflare relies on—the Internet—faster, more reliable, and more secure on every hop around the world.

Without any complicated configuration, Argo is able to use real-time traffic data to pick the fastest path across the Internet, improving performance and delivering more satisfying experiences to your customers and users.

Today, Cloudflare is announcing several upgrades to Argo’s intelligent routing:

  • When it launched, Argo was entirely focused on the “middle mile,” speeding up connections from Cloudflare to our customers’ servers. Argo now delivers optimal routes from clients and users to Cloudflare, further reducing end-to-end latency while still providing the impressive edge to origin performance that Argo is known for. These last-mile improvements reduce end user round trip times by up to 40%.
  • We’re also adding support for accelerating pure IP workloads, allowing Magic Transit and Magic WAN customers to build IP networks to enjoy the performance benefits of Argo.

Starting today, all Free, Pro, and Business plan Argo customers will see improved performance with no additional configuration or charge. Enterprise customers have already enjoyed the last mile performance improvements described here for some time. Magic Transit and WAN customers can contact their account team to request Early Access to Argo Smart Routing for Packets.

What’s Argo?

Argo finds the best and fastest possible path for your traffic on the Internet. Every day, Cloudflare carries hundreds of billions of requests across our network and the Internet. Because our network, our customers, and their end users are well distributed globally, all of these requests flowing across our infrastructure paint a great picture of how different parts of the Internet are performing at any given time.

Just like Waze examines real data from real drivers to give you accurate, uncongested — and sometimes unorthodox — routes across town, Argo Smart Routing uses the timing data Cloudflare collects from each request to pick faster, more efficient routes across the Internet.

In practical terms, Cloudflare’s network is expansive in its reach. Some Internet links in a given region may be congested and cause poor performance (a literal traffic jam). By understanding this is happening and using alternative network locations and providers, Argo can put traffic on a less direct, but faster, route from its origin to its destination.

These benefits are not theoretical: enabling Argo Smart Routing shaves an average of 33% off HTTP time to first byte (TTFB).

One other thing we’re proud of: we’ve stayed super focused on making it easy to use. One click in the dashboard enables better, smarter routing, bringing the full weight of Cloudflare’s network, data, and engineering expertise to bear on making your traffic faster. Advanced analytics allow you to understand exactly how Argo is performing for you around the world.

You can read a lot more about how Argo works in our original launch blog post.

Even More Blazing Fast

We’ve continuously improved Argo since the day it was launched, making it faster, quicker to respond to changes on the Internet, and allowing more types of traffic to flow over smart routes.

Argo’s new performance optimizations improve last mile latencies and reduce time to first byte even further. Argo’s last mile optimizations can save up to 40% on last mile round trip time (RTT) with commensurate improvements to end-to-end latency.

Running benchmarks against an origin server in the central United States, with visitors coming from around the world, Argo delivered the following results:

Argo 2.0: Smart Routing Learns New Tricks

The Argo improvements on the last mile reduced overall time to first byte by 39%, and reduced end-to-end latencies by 5% overall:

Argo 2.0: Smart Routing Learns New Tricks

Faster, better caching

Argo customers don’t just see benefits to their dynamic traffic. Argo’s new found skills provide benefits for static traffic as well. Because Argo now finds the best path to Cloudflare, client TTFB for cache hits sees the same last mile benefit as dynamic traffic.

Getting access to faster Argo

The best part about all these improvements? They’re already deployed and enabled for all Argo customers! These optimizations have been live for Enterprise customers for some time and were enabled for Free, Pro, and Business plans this week.

Moving Down the Stack: Argo Smart Routing for Packets

Customers use Magic Transit and Magic WAN to create their own IP networks on top of Cloudflare’s network, with access to a full suite of network functions (firewalls, DDoS mitigation, and more) delivered as a service. This allows customers to build secure, private, global networks without the need to purchase specialized hardware. Now, Argo Smart Routing for Packets allows these customers to create these IP networks with the performance benefits of Argo.

Consider a fictional gaming company, Golden Fleece Games. Golden Fleece deployed Magic Transit to mitigate attacks by malicious actors on the Internet. They want to be able to provide a quality game to their users while staying up. However, they also need their service to be as fast as possible. If their game sees additional latency, then users won’t play it, and even if their service is technically up, the increased latency will show a decrease in users. For Golden Fleece, being slow is just as bad as being down.

Finance customers also have similar requirements for low latency, high security scenarios. Consider Jason Financial, a fictional Magic Transit customer using Packet Smart Routing. Jason Financial employees connect to Cloudflare in New York, and their requests are routed to their data center which is connected to Cloudflare through a Cloudflare Network Interconnect attached to Cloudflare in Singapore. For Jason Financial, reducing latency is extraordinarily important: if their network is slow, then the latency penalties they incur can literally cost them millions of dollars due to how fast the stock market moves. Jason wants Magic Transit and other Cloudflare One products to secure their network and prevent attacks, but improving performance is important for them as well.

Argo’s Smart Routing for Packets provides these customers with the security they need at speeds faster than before. Now, customers can get the best of both worlds: security and performance. Now, let’s talk a bit about how it works.

A bird’s eye view of the Internet

Argo Smart Routing for Packets picks the fastest possible path between two points. But how does Argo know that the chosen route is the fastest? As with all Argo products, the answer comes by analyzing a wealth of network data already available on the Cloudflare edge. In Argo for HTTP or Argo for TCP, Cloudflare is able to use existing timing data from traffic that’s already being sent over our edge to optimize routes. This allows us to improve which paths are taken as traffic changes and congestion on the Internet happens. However, to build Smart Routing for Packets, the game changed, and we needed to develop a new approach to collect latency data at the IP layer.

Let’s go back to the Jason Financial case. Before, Argo would understand that the number of paths that are available from Cloudflare’s data centers back to Jason’s data center is proportional to the number of data centers Cloudflare has multiplied by the number of distinct interconnections between each data center. By looking at the traffic to Singapore, Cloudflare can use existing Layer 4 traffic and network analytics to determine the best path. But Layer 4 is not Layer 3, and when you move down the stack, you lose some insight into things like round trip time (RTT), and other metrics that compose time to first byte because that data is only produced at higher levels of the application stack. It can become harder to figure out what the best path actually is.

Optimizing performance at the IP layer can be more difficult than at higher layers. This is because protocol and application layers have additional headers and stateful protocols that allow for further optimization. For example, connection reuse is a performance improvement that can only be realized at higher layers of the stack because HTTP requests can reuse existing TCP connections. IP layers don’t have the concept of connections or requests at all: it’s just packets flowing over the wire.

To help bridge the gap, Cloudflare makes use of an existing data source that already exists for every Magic Transit customer today: health check probes. Every Magic Transit customer leverages existing health check probes from every single Cloudflare data center back to the customer origin. These probes are used to determine tunnel health for Magic Transit, so that Cloudflare knows which paths back to origin are healthy. These probes contain a variety of information that can also be used to improve performance as well. By examining health check probes and adding them to existing Layer 4 data, Cloudflare can get a better understanding of one-way latencies and can construct a map that allows us to see all the interconnected data centers and how fast they are to each other. Once this customer gets a Cloudflare Network Interconnect, Argo can use the data center-to-data center probes to create an alternate path for the customer that’s different from the public Internet.

Argo 2.0: Smart Routing Learns New Tricks

Using this map, Cloudflare can construct dynamic routes for each customer based on where their traffic enters Cloudflare’s network and where they need to go. This allows us to find the optimal route for Jason Financial and allows us to always pick the fastest path.

Packet-Level Latency Reductions

We’ve kind of buried the lede here! We’ve talked about how hard it is to optimize performance for IP traffic. The important bit: despite all these difficulties, Argo Smart Routing for Packets is able to provide a 10% average latency improvement worldwide in our internal testing!

Argo 2.0: Smart Routing Learns New Tricks

Depending on your network topology, you may see latency reductions that are even higher!

How do I get Argo Smart Routing for Packets?

Argo Smart Routing for Packets is in closed beta and is available only for Magic Transit customers who have a Cloudflare Network Interconnect provisioned. If you are a Magic Transit customer interested in seeing the improved performance of Argo Smart Routing for Packets for yourself, reach out to your account team today! If you don’t have Magic Transit but want to take advantage of bigger performance gains while acquiring uncompromised levels of network security, begin your Magic Transit onboarding process today!

What’s next for Argo

Argo’s roadmap is simple: get ever faster, for any type of traffic.

Argo’s recent optimizations will help customers move data across the Internet at as close to the speed of light as possible. Internally, “how fast are we compared to the speed of light” is one of our engineering team’s key success metrics. We’re not done until we’re even.