Tag Archives: Network

Cloudflare partners to simplify China connectivity for corporate networks

Post Syndicated from Kyle Krum original https://blog.cloudflare.com/cloudflare-one-china/

Cloudflare partners to simplify China connectivity for corporate networks

Cloudflare partners to simplify China connectivity for corporate networks

IT teams have historically faced challenges with performance, security, and reliability for employees and network resources in mainland China. Today, along with our strategic partners, we’re excited to announce expansion of our Cloudflare One product suite to tackle these problems, with the goal of creating the best SASE experience for users and organizations in China.

Cloudflare One, our comprehensive SASE platform, allows organizations to connect any source or destination and apply single-pass security policies from one unified control plane. Cloudflare One is built on our global network, which spans 275 cities across the globe and is within 50ms of 95% of the world’s Internet-connected population. Our ability to serve users extremely close to wherever they’re working—whether that’s in a corporate office, their home, or a coffee shop—has been a key reason customers choose our platform since day one.

In 2015, we extended our Application Services portfolio to cities in mainland China; in 2020, we expanded these capabilities to offer better performance and security through our strategic partnership with JD Cloud. Today, we’re unveiling our latest steps in this journey: extending the capabilities of Cloudflare One to users and organizations in mainland China, through additional strategic partnerships. Let’s break down a few ways you can achieve better connectivity, security, and performance for your China network and users with Cloudflare One.

Accelerating traffic from China networks to private or public resources outside of China through China partner networks

Performance and reliability for traffic flows across the mainland China border have been a consistent challenge for IT teams within multinational organizations. Packets crossing the China border often experience reachability, congestion, loss, and latency challenges on their way to an origin server outside of China (and vice versa on the return path). Security and IT teams can also struggle to enforce consistent policies across this traffic, since many aspects of China networking are often treated separately from the rest of an organization’s global network because of their unique challenges.

Cloudflare is excited to address these challenges with our strategic China partners, combining our network infrastructure to deliver a better end-to-end experience to customers. Here’s an example architecture demonstrating the optimized packet flow with our partners and Cloudflare together:

Cloudflare partners to simplify China connectivity for corporate networks

Acme Corp, a multinational organization, has offices in Shanghai and Beijing. Users in those offices need to reach resources hosted in Acme’s data centers in Ashburn and London, as well as SaaS applications like Jira and Workday. Acme procures last mile connectivity at each office in mainland China from Cloudflare’s China partners.

Cloudflare’s partners route local traffic to its destination within China, and global traffic across a secure link to the closest available Cloudflare data center on the other side of the Chinese border.

At that data center, Cloudflare enforces a full stack of security functions across the traffic including network firewall-as-a-service and Secure Web Gateway policies. The traffic is then routed to its destination, whether that’s another connected location on Acme’s private network (via Anycast GRE or IPsec tunnel or direct connection) or a resource on the public Internet, across an optimized middle-mile path. Acme can choose whether Internet-bound traffic egresses from a shared or dedicated Cloudflare-owned IP pool.

Return traffic back to Acme’s connected network location in China takes the opposite path: source → Cloudflare’s network (where, again, security policies are applied) → Partner network → Acme local network.

Cloudflare and our partners are excited to help customers solve challenges with cross-border performance and security. This solution is easy to deploy and available now – reach out to your account team to get started today.

Enforcing uniform security policy across remote China user traffic

The same challenges that impact connectivity from China-based networks reaching out to global resources also impact remote users working in China. Expanding on the network connectivity solution we just described, we’re looking forward to improving user connectivity to cross-border resources by adapting our device client (WARP). This solution will also allow security teams to enforce consistent policy across devices connecting to corporate resources, rather than managing separate security stacks for users inside and outside of China.

Cloudflare partners to simplify China connectivity for corporate networks

Acme Corp has users that are either based in or traveling to China for business and need to access corporate resources that are hosted beyond China, without necessarily being physically in an Acme office in order to enable this access. Acme uses an MDM provider to install the WARP client on company-managed devices and enroll them in Acme’s Cloudflare Zero Trust organization. Within China, the WARP client utilizes Cloudflare’s China partner networks to establish the same Wireguard tunnel to the nearest Cloudflare point of presence outside of mainland China. Cloudflare’s partners act as the carrier of our customers’ IP traffic through their acceleration service and the content remains secure inside WARP.

Just as with traffic routed via our partners to Cloudflare at the network layer, WARP client traffic arriving at its first stop outside of China is filtered through Gateway and Access policies. Acme’s IT administrators can choose to enforce the same, or additional policies for device traffic from China vs other global locations. This setup makes life easier for Acme’s IT and security teams – they only need to worry about installing and managing a single device client in order to grant access and control security regardless of where employees are in the world.

Cloudflare and our partners are actively testing this solution in private beta. If you’re interested in getting access as soon as it’s available to the broader public, please contact your account team.

Extending SASE filtering to local China data centers (future)

The last two use cases have focused primarily on granting network and user access from within China to resources on the other side of the border – but what about improving connectivity and security for local traffic?

We’ve heard from both China-based and multinational organizations that are excited to have the full suite of Cloudflare One functions available across China to achieve a full SASE architecture just a few milliseconds from everywhere their users and applications are in the world. We’re actively working toward this objective with our strategic partners, expanding upon the current availability of our application services platform across 45 data centers in 38 unique cities in mainland China.

Cloudflare partners to simplify China connectivity for corporate networks

Talk to your account team today to get on the waitlist for the full suite of Cloudflare One functions delivered across our China Network and be notified as soon as beta access is available!

Get started today

We’re so excited to help organizations improve connectivity, performance and security for China networks and users. Contact your account team today to learn more about how Cloudflare One can help you transform your network and achieve a SASE architecture inside and outside of mainland China.

If you’d like to learn more, join us for a live webinar on Dec 6, 2022 10:00 AM PST through this link where we can answer all your questions about connectivity in China.

Cloudflare servers don’t own IPs anymore – so how do they connect to the Internet?

Post Syndicated from Marek Majkowski original https://blog.cloudflare.com/cloudflare-servers-dont-own-ips-anymore/

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

A lot of Cloudflare’s technology is well documented. For example, how we handle traffic between the eyeballs (clients) and our servers has been discussed many times on this blog: “A brief primer on anycast (2011)”, “Load Balancing without Load Balancers (2013)“, “Path MTU discovery in practice (2015)“,  “Cloudflare’s edge load balancer (2020)“, “How we fixed the BSD socket API (2022)“.

However, we have rarely talked about the second part of our networking setup — how our servers fetch the content from the Internet. In this blog we’re going to cover this gap. We’ll discuss how we manage Cloudflare IP addresses used to retrieve the data from the Internet, how our egress network design has evolved and how we optimized it for best use of available IP space.

Brace yourself. We have a lot to cover.

Terminology first!

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

Each Cloudflare server deals with many kinds of networking traffic, but two rough categories stand out:

  • Internet sourced traffic – Inbound connections initiated by eyeball to our servers. In the context of this blog post we’ll call these “ingress connections”.
  • Cloudflare sourced traffic – Outgoing connections initiated by our servers to other hosts on the Internet. For brevity, we’ll call these “egress connections”.

The egress part, while rarely discussed on this blog, is critical for our operation. Our servers must initiate outgoing connections to get their jobs done! Like:

  • In our CDN product, before the content is cached, it’s fetched from the origin servers. See “Pingora, the proxy that connects Cloudflare to the Internet (2022)“, Argo and Tiered Cache.
  • For the Spectrum product, each ingress TCP connection results in one egress connection.
  • Workers often run multiple subrequests to construct an HTTP response. Some of them might be querying servers to the Internet.
  • We also operate client-facing forward proxy products – like WARP and Teams. These proxies deal with eyeball connections destined to the Internet. Our servers need to establish connections to the Internet on behalf of our users.

And so on.

Anycast on ingress, unicast on egress

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

Our ingress network architecture is very different from the egress one. On ingress, the connections sourced from the Internet are handled exclusively by our anycast IP ranges. Anycast is a technology where each of our data centers “announces” and can handle the same IP ranges. With many destinations possible, how does the Internet know where to route the packets? Well, the eyeball packets are routed towards the closest data center based on Internet BGP metrics, often it’s also geographically the closest one. Usually, the BGP routes don’t change much, and each eyeball IP can be expected to be routed to a single data center.

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

However, while anycast works well in the ingress direction, it can’t operate on egress. Establishing an outgoing connection from an anycast IP won’t work. Consider the response packet. It’s likely to be routed back to a wrong place – a data center geographically closest to the sender, not necessarily the source data center!

For this reason, until recently, we established outgoing connections in a straightforward and conventional way: each server was given its own unicast IP address. “Unicast IP” means there is only one server using that address in the world. Return packets will work just fine and get back exactly to the right server identified by the unicast IP.

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

Segmenting traffic based on egress IP

Originally connections sourced by Cloudflare were mostly HTTP fetches going to origin servers on the Internet. As our product line grew, so did the variety of traffic. The most notable example is our WARP app. For WARP, our servers operate a forward proxy, and handle the traffic sourced by end-user devices. It’s done without the same degree of intermediation as in our CDN product. This creates a problem. Third party servers on the Internet — like the origin servers — must be able to distinguish between connections coming from Cloudflare services and our WARP users. Such traffic segmentation is traditionally done by using different IP ranges for different traffic types (although recently we introduced more robust techniques like Authenticated Origin Pulls).

To work around the trusted vs untrusted traffic pool differentiation problem, we added an untrusted WARP IP address to each of our servers:

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

Country tagged egress IP addresses

It quickly became apparent that trusted vs untrusted weren’t the only tags needed. For WARP service we also need country tags. For example, United Kingdom based WARP users expect the bbc.com website to just work. However, the BBC restricts many of its services to people just in the UK.

It does this by geofencing — using a database mapping public IP addresses to countries, and allowing only the UK ones. Geofencing is widespread on today’s Internet. To avoid geofencing issues, we need to choose specific egress addresses tagged with an appropriate country, depending on WARP user location. Like many other parties on the Internet, we tag our egress IP space with country codes and publish it as a geofeed (like this one). Notice, the published geofeed is just data. The fact that an IP is tagged as say UK does not mean it is served from the UK, it just means the operator wants it to be geolocated to the UK. Like many things on the Internet, it is based on trust.

Notice, at this point we have three independent geographical tags:

  • the country tag of the WARP user – the eyeball connecting IP
  • the location of the data center the eyeball connected to
  • the country tag of the egressing IP

For best service, we want to choose the egressing IP so that its country tag matches the country from the eyeball IP. But egressing from a specific country tagged IP is challenging: our data centers serve users from all over the world, potentially from many countries! Remember: due to anycast we don’t directly control the ingress routing. Internet geography doesn’t always match physical geography. For example our London data center receives traffic not only from users in the United Kingdom, but also from Ireland, and Saudi Arabia. As a result, our servers in London need many WARP egress addresses associated with many countries:

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

Can you see where this is going? The problem space just explodes! Instead of having one or two egress IP addresses for each server, now we require dozens, and IPv4 addresses aren’t cheap. With this design, we need many addresses per server, and we operate thousands of servers. This architecture becomes very expensive.

Is anycast a problem?

Let me recap: with anycast ingress we don’t control which data center the user is routed to. Therefore, each of our data centers must be able to egress from an address with any conceivable tag. Inside the data center we also don’t control which server the connection is routed to. There are potentially many tags, many data centers, and many servers inside a data center.

Maybe the problem is the ingress architecture? Perhaps it’s better to use a traditional networking design where a specific eyeball is routed with DNS to a specific data center, or even a server?

That’s one way of thinking, but we decided against it. We like our anycast on ingress. It brings us many advantages:

  • Performance: with anycast, by definition, the eyeball is routed to the closest (by BGP metrics) data center. This is usually the fastest data center for a given user.
  • Automatic failover: if one of our data centers becomes unavailable, the traffic will be instantly, automatically re-routed to the next best place.
  • DDoS resilience: during a denial of service attack or a traffic spike, the load is automatically balanced across many data centers, significantly reducing the impact.
  • Uniform software: The functionality of every data center and of every server inside a data center is identical. We use the same software stack on all the servers around the world. Each machine can perform any action, for any product. This enables easy debugging and good scalability.

For these reasons we’d like to keep the anycast on ingress. We decided to solve the issue of egress address cardinality in some other way.

Solving a million dollar problem

Out of the thousands of servers we operate, every single one should be able to use an egress IP with any of the possible tags. It’s easiest to explain our solution by first showing two extreme designs.

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

Each server owns all the needed IPs: each server has all the specialized egress IPs with the needed tags.

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

One server owns the needed IP: a specialized egress IP with a specific tag lives in one place, other servers forward traffic to it.

Both options have pros and cons:

Specialized IP on every server Specialized IP on one server
Super expensive $$$, every server needs many IP addresses. Cheap $, only one specialized IP needed for a tag.
Egress always local – fast Egress almost always forwarded – slow
Excellent reliability – every server is independent Poor reliability – introduced chokepoints

There’s a third way

We’ve been thinking hard about this problem. Frankly, the first extreme option of having every needed IP available locally on every Cloudflare server is not totally unworkable. This is, roughly, what we were able to pull off for IPv6. With IPv6, access to the large needed IP space is not a problem.

However, in IPv4 neither option is acceptable. The first offers fast and reliable egress, but requires great cost — the IPv4 addresses needed are expensive. The second option uses the smallest possible IP space, so it’s cheap, but compromises on performance and reliability.

The solution we devised is a compromise between the extremes. The rough idea is to change the assignment unit. Instead of assigning one /32 IPv4 address for each server, we devised a method of assigning a /32 IP per data center, and then sharing it among physical servers.

Specialized IP on every server Specialized IP per data center Specialized IP on one server
Super expensive $$$ Reasonably priced $$ Cheap $
Egress always local – fast Egress always local – fast Egress almost always forwarded – slow
Excellent reliability – every server is independent Excellent reliability – every server is independent Poor reliability – many choke points

Sharing an IP inside data center

The idea of sharing an IP among servers is not new. Traditionally this can be achieved by Source-NAT on a router. Sadly, the sheer number of egress IP’s we need and the size of our operation, prevents us from relying on stateful firewall / NAT at the router level. We also dislike shared state, so we’re not fans of distributed NAT installations.

What we chose instead, is splitting an egress IP across servers by a port range. For a given egress IP, each server owns a small portion of available source ports – a port slice.

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

When return packets arrive from the Internet, we have to route them back to the correct machine. For this task we’ve customized “Unimog” – our L4 XDP-based load balancer – (“Unimog, Cloudflare’s load balancer (2020)“) and it’s working flawlessly.

With a port slice of say 2,048 ports, we can share one IP among 31 servers. However, there is always a possibility of running out of ports. To address this, we’ve worked hard to be able to reuse the egress ports efficiently. See the “How to stop running out of ports (2022)“, “How to share IPv4 addresses (2022)” and our Cloudflare.TV segment.

This is pretty much it. Each server is aware which IP addresses and port slices it owns. For inbound routing Unimog inspects the ports and dispatches the packets to appropriate machines.

Sharing a subnet between data centers

This is not the end of the story though, we haven’t discussed how we can route a single /32 address into a datacenter. Traditionally, in the public Internet, it’s only possible to route subnets with granularity of /24 or 256 IP addresses. In our case this would lead to great waste of IP space.

To solve this problem and improve the utilization of our IP space, we deployed our egress ranges as… anycast! With that in place, we customized Unimog and taught it to forward the packets over our backbone network to the right data center. Unimog maintains a database like this: - forward to LHR - forward to CDG - forward to MAN

With this design, it doesn’t matter to which data center return packets are delivered. Unimog can always fix it and forward the data to the right place. Basically, while at the BGP layer we are using anycast, due to our design, semantically an IP identifies a datacenter and an IP and port range identify a specific machine. It behaves almost like a unicast.

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

We call this technology stack “soft-unicast” and it feels magical. It’s like we did unicast in software over anycast in the BGP layer.

Soft-unicast is indistinguishable from magic

With this setup we can achieve significant benefits:

  • We are able to share a /32 egress IP amongst many servers.
  • We can spread a single subnet across many data centers, and change it easily on the fly. This allows us to fully use our egress IPv4 ranges.
  • We can group similar IP addresses together. For example, all the IP addresses tagged with the “UK” tag might form a single continuous range. This reduces the size of the published geofeed.
  • It’s easy for us to onboard new egress IP ranges, like customer IP’s. This is useful for some of our products, like Cloudflare Zero Trust.

All this is done at sensible cost, at no loss to performance and reliability:

  • Typically, the user is able to egress directly from the closest datacenter, providing the best possible performance.
  • Depending on the actual needs we can allocate or release the IP addresses. This gives us flexibility with the IP cost management, we don’t need to overspend upfront.
  • Since we operate multiple egress IP addresses in different locations, the reliability is not compromised.

The true location of our IP addresses is: “the cloud”

While soft-unicast allows us to gain great efficiency, we’ve hit some issues. Sometimes we get a question “Where does this IP physically exist?”. But it doesn’t have an answer! Our egress IPs don’t exist physically anywhere. From a BGP standpoint our egress ranges are anycast, so they live everywhere. Logically each address is used in one data center at a time, but we can move it around on demand.

Content Delivery Networks misdirect users

Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?

As another example of problems, here’s one issue we’ve hit with third party CDNs. As we mentioned before, there are three country tags in our pipeline:

  • The country tag of the IP eyeball is connecting from.
  • The location of our data center.
  • The country tag of the IP addresses we chose for the egress connections.

The fact that our egress address is tagged as “UK” doesn’t always mean it actually is being used in the UK. We’ve had cases when a UK-tagged WARP user, due to the maintenance of our LHR data center, was routed to Paris. A popular CDN performed a reverse-lookup of our egress IP, found it tagged as “UK”, and directed the user to a London CDN server. This is generally OK… but we actually egressed from Paris at the time. This user ended up routing packets from their home in the UK, to Paris, and back to the UK. This is bad for performance.

We address this issue by performing DNS requests in the egressing data center. For DNS we use IP addresses tagged with the location of the data center, not the intended geolocation for the user. This generally fixes the problem, but sadly, there are still some exceptions.

The future is here

Our 2021 experiments with Addressing Agility proved we have plenty of opportunity to innovate with the addressing of the ingress. Soft-unicast shows us we can achieve great flexibility and density on the egress side.

With each new product, the number of tags we need on the egress grows – from traffic trustworthiness, product category to geolocation. As the pool of usable IPv4 addresses shrinks, we can be sure there will be more innovation in the space. Soft-unicast is our solution, but for sure it’s not our last development.

For now though, it seems like we’re moving away from traditional unicast. Our egress IP’s really don’t exist in a fixed place anymore, and some of our servers don’t even own a true unicast IP nowadays.

Why BGP communities are better than AS-path prepends

Post Syndicated from Tom Strickx original https://blog.cloudflare.com/prepends-considered-harmful/

Why BGP communities are better than AS-path prepends

Why BGP communities are better than AS-path prepends

The Internet, in its purest form, is a loosely connected graph of independent networks (also called Autonomous Systems (AS for short)). These networks use a signaling protocol called BGP (Border Gateway Protocol) to inform their neighbors (also known as peers) about the reachability of IP prefixes (a group of IP addresses) in and through their network. Part of this exchange contains useful metadata about the IP prefix that are used to inform network routing decisions. One example of the metadata is the full AS-path, which consists of the different autonomous systems an IP packet needs to pass through to reach its destination.

As we all want our packets to get to their destination as fast as possible, selecting the shortest AS-path for a given prefix is a good idea. This is where something called prepending comes into play.

Routing on the Internet, a primer

Let’s briefly talk about how the Internet works at its most fundamental level, before we dive into some nitty-gritty details.

The Internet is, at its core, a massively interconnected network of thousands of networks. Each network owns two things that are critical:

1. An Autonomous System Number (ASN): a 32-bit integer that uniquely identifies a network. For example, one of the Cloudflare ASNs (we have multiple) is 13335.

2. IP prefixes: An IP prefix is a range of IP addresses, bundled together in powers of two: In the IPv4 space, two addresses form a /31 prefix, four form a /30, and so on, all the way up to /0, which is shorthand for “all IPv4 prefixes”. The same applies for IPv6  but instead of aggregating 32 bits at most, you can aggregate up to 128 bits. The figure below shows this relationship between IP prefixes, in reverse — a /24 contains two /25s that contains two /26s, and so on.

Why BGP communities are better than AS-path prepends

To communicate on the Internet, you must be able to reach your destination, and that’s where routing protocols come into play. They enable each node on the Internet to know where to send your message (and for the receiver to send a message back).

Why BGP communities are better than AS-path prepends

As mentioned earlier, these destinations are identified by IP addresses, and contiguous ranges of IP addresses are expressed as IP prefixes. We use IP prefixes for routing as an efficiency optimization: Keeping track of where to go for four billion (232)  IP addresses in IPv4 would be incredibly complex, and require a lot of resources. Sticking to prefixes reduces that number down to about one million instead.

Now recall that Autonomous Systems are independently operated and controlled. In the Internet’s network of networks, how do I tell Source A in some other network that there is an available path to get to Destination B in (or through) my network? In comes BGP! BGP is the Border Gateway Protocol, and it is used to signal reachability information. Signal messages generated by the source ASN are referred to as ‘announcements’ because they declare to the Internet that IP addresses in the prefix are online and reachable.

Why BGP communities are better than AS-path prepends

Have a look at the figure above. Source A should now know how to get to Destination B through 2 different networks!

This is what an actual BGP message would look like:

BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496
        Path Attribute - NEXT_HOP:
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):

As you can see, BGP messages contain more than just the IP prefix (the NLRI bit) and the path, but also a bunch of other metadata that provides additional information about the path. Other fields include communities (more on that later), as well as MED, or origin code. MED is a suggestion to other directly connected networks on which path should be taken if multiple options are available, and the lowest value wins. The origin code can be one of three values: IGP, EGP or Incomplete. IGP will be set if you originate the prefix through BGP, EGP is no longer used (it’s an ancient routing protocol), and Incomplete is set when you distribute a prefix into BGP from another routing protocol (like IS-IS or OSPF).

Now that source A knows how to get to Destination B through two different networks, let’s talk about traffic engineering!

Traffic engineering

Traffic engineering is a critical part of the day to day management of any network. Just like in the physical world, detours can be put in place by operators to optimize the traffic flows into (inbound) and out of (outbound) their network. Outbound traffic engineering is significantly easier than inbound traffic engineering because operators can choose from neighboring networks, even prioritize some traffic over others. In contrast, inbound traffic engineering requires influencing a network that is operated by someone else entirely. The autonomy and self-governance of a network is paramount, so operators use available tools to inform or shape inbound packet flows from other networks. The understanding and use of those tools is complex, and can be a challenge.

The available set of traffic engineering tools, both in- and outbound, rely on manipulating attributes (metadata) of a given route. As we’re talking about traffic engineering between independent networks, we’ll be manipulating the attributes of an EBGP-learned route. BGP can be split into two categories:

  1. EBGP: BGP communication between two different ASNs
  2. IBGP: BGP communication within the same ASN.

While the protocol is the same, certain attributes can be exchanged on an IBGP session that aren’t exchanged on an EBGP session. One of those is local-preference. More on that in a moment.

BGP best path selection

When a network is connected to multiple other networks and service providers, it can receive path information to the same IP prefix from many of those networks, each with slightly different attributes. It is then up to the receiving network of that information to use a BGP best path selection algorithm to pick the “best” prefix (and route), and use this to forward IP traffic. I’ve put “best” in quotation marks, as best is a subjective requirement. “Best” is frequently the shortest, but what can be best for my network might not be the best outcome for another network.

BGP will consider multiple prefix attributes when filtering through the received options. However, rather than combine all those attributes into a single selection criteria, BGP best path selection uses the attributes in tiers — at any tier, if the available attributes are sufficient to choose the best path, then the algorithm terminates with that choice.

The BGP best path selection algorithm is extensive, containing 15 discrete steps to select the best available path for a given prefix. Given the numerous steps, it’s in the interest of the network to decide the best path as early as possible. The first four steps are most used and influential, and are depicted in the figure below as sieves.

Why BGP communities are better than AS-path prepends

Picking the shortest path possible is usually a good idea, which is why “AS-path length” is a step executed early on in the algorithm. However, looking at the figure above, “AS-path length” appears second, despite being the attribute to find the shortest path. So let’s talk about the first step: local preference.

Local preference
Local preference is an operator favorite because it allows them to handpick a route+path combination of their choice. It’s the first attribute in the algorithm because it is unique for any given route+neighbor+AS-path combination.

A network sets the local preference on import of a route (having learned about the route from a neighbor network). Being a non-transitive property, meaning that it’s an attribute that is never sent in an EBGP message to other networks. This intrinsically means, for example, that the operator of AS 64496 can’t set the local preference of routes to their own (or transiting) IP prefixes inside neighboring AS 64511. The inability to do so is partially why inbound traffic engineering through EBGP is so difficult.

Prepending artificially increases AS-path length
Since no network is able to directly set the local preference for a prefix inside another network, the first opportunity to influence other networks’ choices is modifying the AS-path. If the next hops are valid, and the local preference for all the different paths for a given route are the same, modifying the AS-path is an obvious option to change the path traffic will take towards your network. In a BGP message, prepending looks like this:


BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496
        Path Attribute - NEXT_HOP:
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):


BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496 64496
        Path Attribute - NEXT_HOP:
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):

Specifically, operators can do AS-path prepending. When doing AS-path prepending, an operator adds additional autonomous systems to the path (usually the operator uses their own AS, but that’s not enforced in the protocol). This way, an AS-path can go from a length of 1 to a length of 255. As the length has now increased dramatically, that specific path for the route will not be chosen. By changing the AS-path advertised to different peers, an operator can control the traffic flows coming into their network.

Unfortunately, prepending has a catch: To be the deciding factor, all the other attributes need to be equal. This is rarely true, especially in large networks that are able to choose from many possible routes to a destination.

Business Policy Engine

BGP is colloquially also referred to as a Business Policy Engine: it does not select the best path from a performance point of view; instead, and more often than not, it will select the best path from a business point of view. The business criteria could be anything from investment (port) efficiency to increased revenue, and more. This may sound strange but, believe it or not, this is what BGP is designed to do! The power (and complexity) of BGP is that it enables a network operator to make choices according to the operator’s needs, contracts, and policies, many of which cannot be reflected by conventional notions of engineering performance.

Different local preferences

A lot of networks (including Cloudflare) assign a local preference depending on the type of connection used to send us the routes. A higher value is a higher preference. For example, routes learned from transit network connections will get a lower local preference of 100 because they are the most costly to use; backbone-learned routes will be 150, Internet exchange (IX) routes get 200, and lastly private interconnect (PNI) routes get 250. This means that for egress (outbound) traffic, the Cloudflare network, by default, will prefer a PNI-learned route, even if a shorter AS-path is available through an IX or transit neighbor.

Part of the reason a PNI is preferred over an IX is reliability, because there is no third-party switching platform involved that is out of our control, which is important because we operate on the assumption that all hardware can and will eventually break. Another part of the reason is for port efficiency reasons. Here, efficiency is defined by cost per megabit transferred on each port. Roughly speaking, the cost is calculated by:

((cost_of_switch / port_count) + transceiver_cost)

which is combined with the cross-connect cost (might be monthly recurring (MRC), or a one-time fee). PNI is preferable because it helps to optimize value by reducing the overall cost per megabit transferred, because the unit price decreases with higher utilization of the port.

This reasoning is similar for a lot of other networks, and is very prevalent in transit networks. BGP is at least as much about cost and business policy, as it is about performance.

Transit local preference

For simplicity, when referring to transits, I mean the traditional tier-1 transit networks. Due to the nature of these networks, they have two distinct sets of network peers:

1. Customers (like Cloudflare)
2. Settlement-free peers (like other tier-1 networks)

In normal circumstances, transit customers will get a higher local preference assigned than the local preference used for their settlement-free peers. This means that, no matter how much you prepend a prefix, if traffic enters that transit network, traffic will always land on your interconnection with that transit network, it will not be offloaded to another peer.

A prepend can still be used if you want to switch/offload traffic from a single link with one transit if you have multiple distinguished links with them, or if the source of traffic is multihomed behind multiple transits (and they don’t have their own local preference playbook preferring one transit over another). But inbound traffic engineering traffic away from one transit port to another through AS-path prepending has significant diminishing returns: once you’re past three prepends, it’s unlikely to change much, if anything, at that point.


Why BGP communities are better than AS-path prepends

In the above scenario, no matter the adjustment Cloudflare makes in its AS-path towards AS 64496, the traffic will keep flowing through the Transit B <> Cloudflare interconnection, even though the path Origin A → Transit B → Transit A → Cloudflare is shorter from an AS-path point of view.

Why BGP communities are better than AS-path prepends

In this scenario, not a lot has changed, but Origin A is now multi-homed behind the two transit providers. In this case, the AS-path prepending was effective, as the paths seen on the Origin A side are both the prepended and non-prepended path. As long as Origin A is not doing any egress traffic engineering, and is treating both transit networks equally, then the path chosen will be Origin A → Transit A → Cloudflare.

Community-based traffic engineering

So we have now identified a pretty critical problem within the Internet ecosystem for operators: with the tools mentioned above, it’s not always (some might even say outright impossible) possible to accurately dictate paths traffic can ingress your own network, reducing the control an autonomous system has over its own network. Fortunately, there is a solution for this problem: community-based local preference.

Some transit providers allow their customers to influence the local preference in the transit network through the use of BGP communities. BGP communities are an optional transitive attribute for a route advertisement. The communities can be informative (“I learned this prefix in Rome”), but they can also be used to trigger actions on the receiving side. For example, Cogent publishes the following action communities:

Community Local preference
174:10 10
174:70 70
174:120 120
174:125 125
174:135 135
174:140 140

When you know that Cogent uses the following default local preferences in their network:

Peers → Local preference 100
Customers → Local preference 130

It’s easy to see how we could use the communities provided to change the route used. It’s important to note though that, as we can’t set the local preference of a route to 100 (or 130), AS-path prepending remains largely irrelevant, as the local preference won’t ever be the same.

Take for example the following configuration:

    from {
        prefix-list SITE-LOCAL;
        route-type internal;
    then {
        as-path-prepend "13335 13335";

Why BGP communities are better than AS-path prepends

We’re prepending the Cloudflare ASN two times, resulting in a total AS-path of three, yet we were still seeing a lot (too much) traffic coming in on our Cogent link. At that point, an engineer could add another prepend, but for a well-connected network as Cloudflare, if two prepends didn’t do much, or three, then four or five isn’t going to do much either. Instead, we can leverage the Cogent communities documented above to change the routing within Cogent:

    from {
        prefix-list SITE-LOCAL;
        route-type internal;
    then {
        community add COGENT_LPREF70;

The above configuration changes the traffic flow to this:

Why BGP communities are better than AS-path prepends

Which is exactly what we wanted!


AS-path prepending is still useful, and has its use as part of the toolchain for operators to do traffic engineering, but should be used sparingly. Excessive prepending opens a network up to wider spread route hijacks, which should be avoided at all costs. As such, using community-based ingress traffic engineering is highly preferred (and recommended). In cases where communities aren’t available (or not available to steer customer traffic), prepends can be applied, but I encourage operators to actively monitor their effects, and roll them back if ineffective.

As a side-note, P Marcos et al. have published an interesting paper on AS-path prepending, and go into some trends seen in relation to prepending, I highly recommend giving it a read: https://www.caida.org/catalog/papers/2020_aspath_prepending/aspath_prepending.pdf

Network Performance Update: Developer Week 2022

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-developer-week/

Network Performance Update: Developer Week 2022

Network Performance Update: Developer Week 2022

Cloudflare is building the fastest network in the world. But we don’t want you to just take our word for it. To demonstrate it, we are continuously testing ourselves versus everyone else to make sure we’re the fastest. Since it’s Developer Week, we wanted to provide an update on how our Workers products perform against the competition, as well as our overall network performance.

Earlier this year, we compared ourselves to Fastly’s [email protected] and overall we were faster. This time, not only did we repeat the tests, but we also added AWS [email protected] to help show how we stack up against more and more competitors. The summary: we offer the fastest developer platform on the market. Let’s talk about how we build our network to help make you faster, and then we’ll get into how that translates to our developer platform.

Latest update on network performance

We have two updates on data: a general network performance update, and then data on how Workers compares with [email protected] and [email protected]

To quantify global network performance, we have to get enough data from around the world, across all manner of different networks, comparing ourselves with other providers. We used Real User Measurements (RUM) to fetch a 100kB file from different providers. Users around the world report the performance of different providers. The more users who report the data, the higher fidelity the signal is. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post here.

During Cloudflare One Week (June 2022), we shared that we were faster in more of the most reported networks than our competitors. Out of the top 3,000 networks in the world (by number of IPv4 addresses advertised), here’s a breakdown of the number of networks where each provider is number one in p95 TCP Connection Time, which represents the time it takes for a user on a given network to connect to the provider. This data is from Cloudflare One Week (June 2022):

Network Performance Update: Developer Week 2022

Here is what the distribution looks like for the top 3,000 networks for Developer Week (November 2022):

Network Performance Update: Developer Week 2022

In addition to being the fastest across popular networks, Cloudflare is also committed to being the fastest provider in every country.

Using data on the top 3,000 networks from Cloudflare One Week (June 2022), here’s what the world map looks like (Cloudflare is in orange):

Network Performance Update: Developer Week 2022

And here’s what the world looks like while looking at the top 3,000 networks for Developer Week (November 2022):

Network Performance Update: Developer Week 2022

Cloudflare became #1 in more countries in Europe and Asia, specifically Russia, Ukraine, Kazakhstan, India, and China, further delivering on our mission to be the fastest network in the world. So let’s talk about how that network helps power the Supercloud to be the fastest developer platform around.

How we’re comparing developer platforms

It’s been six months since we published our initial tests, but here’s a quick refresher. We make comparisons by measuring time to connect to the network, time spent completing requests, and overall time to respond. We call these numbers connect, wait, and response. We’ve chosen these numbers because they are critical components of a request that need to be as fast as possible in order for users to see a good experience. We can reduce the connect times by peering as close as possible to the users. We can reduce the wait times by optimizing code execution to be as fast as possible. If we optimize those two processes, we’ve optimized the response, which represents the end-to-end latency of a request.

Test methodology

To measure connect, wait, and response, we perform three tests against each provider: a simple no-op JavaScript function, a complex JavaScript function, and a complex Rust function.  We don’t do a simple Rust function because we expect it to take almost no time at all, and we already have a baseline for end-to-end functionality in the no-op JavaScript function since many providers will often compile both down to WebAssembly.

Here are the functions for each of them:

JavaScript no-op:

async function getErrorResponse(event, message, status) {
  return new Response(message, {status: status, headers: {'Content-Type': 'text/plain'}});

JavaScript hard function:

function testHardBusyLoop() {
  let value = 0;
  let offset = Date.now();

  for (let n = 0; n < 15000; n++) {
    value += Math.floor(Math.abs(Math.sin(offset + n)) * 10);

  return value;

Rust hard function:

fn test_hard_busy_loop() -> i32 {
  let mut value = 0;
  let offset = Date::now().as_millis();

  for n in 0..15000 {
    value += (((offset + n) as f64).sin().abs() * 10.0) as i32;


We’re trying to test how good each platform is at optimizing compute in addition to evaluating how close each platform is to end-users. However, for this test, we did not run a Rust test on [email protected] because it did not natively support our Rust function without uploading a WASM binary that you compile yourself. Since [email protected] does not have a true first-class developer platform and tooling to run Rust, we decided to exclude the Rust scenarios for [email protected]ge. So when we compare numbers for [email protected], it will only be for the JavaScript simple and JavaScript hard tests.

Measuring Workers performance from real users

To collect data, we use two different methods: one from a third party service called Catchpoint, and a second from our own network performance benchmarking tests. First, we used Catchpoint to gather a set of data from synthetic probes. Catchpoint is an industry standard “synthetic” testing tool, and measurements collected from real users distributed around the world. Catchpoint is a monitoring platform that has around 2,000 total endpoints distributed around the world that can be configured to fetch specific resources and time for each test. Catchpoint is useful for network providers like us as it provides a consistent, repeatable way to measure end-to-end performance of a workload, and delivers a best-effort approximation for what a user sees.

Catchpoint has backbone nodes that are embedded in ISPs around the world. That means that these nodes are plugged into ISP routers just like you are, and the traffic goes through the ISP network to each endpoint they are monitoring. These can approximate a real user, but they will never truly replicate a real user. For example, the bandwidth for these nodes is 100% dedicated for platform monitoring, as opposed to your home Internet connection, where your Internet experience will be a mixed bag of different use cases, some of which won’t talk to Workers applications at all.

For this new test, we chose 300 backbone nodes that are embedded in last mile ISPs around the world. We filtered out nodes in cloud providers, or in metro areas with multiple transit options, trying to remove duplicate paths as much as possible.

We cross-checked these tests with our own data set, which is collected from users connecting to free websites when they are served 1xxx error pages, just like how we collect data for global network performance. When a user sees this error page, that page that will execute these tests as a part of rendering and upload performance metrics on these calls to Cloudflare.

We also changed our test methodology to use paid accounts for Fastly, Cloudflare, and AWS.

Workers vs [email protected] vs [email protected]

This time, let’s start off with the response times to show how we’re doing end-to-end:

Network Performance Update: Developer Week 2022

Test 95th percentile response (ms)
Cloudflare JavaScript no-op 479
Fastly JavaScript no-op 634
AWS JavaScript no-op 1,400
Cloudflare JavaScript hard 471
Fastly JavaScript hard 683
AWS JavaScript hard 1,411
Cloudflare Rust hard 472
Fastly Rust hard 638

We’re fastest in all cases. Now let’s look at connect times, which show us how fast users connect to the compute platform before doing any actual compute:

Network Performance Update: Developer Week 2022

Test 95th percentile connect (ms)
Cloudflare JavaScript no-op 82
Fastly JavaScript no-op 94
AWS JavaScript no-op 295
Cloudflare JavaScript hard 82
Fastly JavaScript hard 94
AWS JavaScript hard 297
Cloudflare Rust hard 79
Fastly Rust hard 94

Note that we don’t expect these times to differ based on the code being run, but we extract them from the same set of tests, so we’ve broken them out here.

But what about wait times? Remember, wait times represent time spent computing the request, so who has optimized their platform best? Again, it’s Cloudflare, although Fastly still has a slight edge on the hard Rust test (which we plan to beat by further optimization):

Network Performance Update: Developer Week 2022

Test 95th percentile wait (ms)
Cloudflare JavaScript no-op 110
Fastly JavaScript no-op 122
AWS JavaScript no-op 362
Cloudflare JavaScript hard 115
Fastly JavaScript hard 178
AWS JavaScript hard 367
Cloudflare Rust hard 125
Fastly Rust hard 122

To verify these results, we compared the Catchpoint results to our own data set. Here is the p95 TTFB for the JavaScript and Rust hard loops for Fastly, AWS, and Cloudflare from our data:

Network Performance Update: Developer Week 2022

Cloudflare is faster on JavaScript and Rust calls. These numbers also back up the slight compute advantage for Fastly on Rust calls.

The big takeaway from this is that in addition to Cloudflare being faster for the time spent processing requests in nearly every test, Cloudflare’s network and performance optimizations as a whole set us apart and make our Workers platform even faster for everything. And, of course, we plan to keep it that way.

Your application, but faster

Latency is an important component of the user experience, and for developers, being able to ensure their users can do things as fast as possible is critical for the success of an application. Whether you’re building applications in Workers, D1, and R2, hosting your documentation in Pages, or even leveraging Workers as part of your SaaS platform, having your code run in the SuperCloud that is our global network will ensure that your users see the best experience they possibly can.

Our network is hyper-optimized to make your code as fast as possible. By using Cloudflare’s network to run your applications, you can focus on making the best possible application possible and rest easy knowing that Cloudflare is providing you the best user experience possible. This is because Cloudflare’s developer platform is built on top of the world’s fastest network. So go out and build your dreams, and know that we’ll make your dreams as fast as they can possibly be.

Making peering easy with the new Cloudflare Peering Portal

Post Syndicated from David Tuber original https://blog.cloudflare.com/making-peering-easy-with-the-new-cloudflare-peering-portal/

Making peering easy with the new Cloudflare Peering Portal

Making peering easy with the new Cloudflare Peering Portal

In 2018, we launched the Cloudflare Peering Portal, which allows network operators to see where your traffic is coming from and to identify the best possible places to interconnect with Cloudflare. We’re excited to announce that we’ve made it even easier to interconnect with Cloudflare through this portal by removing Cloudflare-specific logins and allowing users to request sessions in the portal itself!

We’re going to walk through the changes we’ve made to make peering easier, but before we do that, let’s talk a little about peering: what it is, why it’s important, and how Cloudflare is making peering easier.

What is peering and why is it important?

Put succinctly, peering is the act of connecting two networks together. If networks are like towns, peering is the bridges, highways, and streets that connect the networks together. There are lots of different ways to connect networks together, but when networks connect, traffic between them flows to their destination faster. The reason for this is that peering reduces the number of Border Gateway Protocol (BGP) hops between networks.

What is BGP?

For a quick refresher, Border Gateway Protocol (or BGP for short) is a protocol that propagates instructions on how networks should forward packets so that traffic can get from its origin to its destination. BGP provides packets instructions on how to get from one network to another by indicating which networks the packets need to go through to get to the destination, prioritizing the paths with the smallest number of hops between origin and destination. BGP sees networks as Autonomous Systems (AS), and each AS has its own number. For example, Cloudflare’s ASN is 13335.

In the example below, AS 1 is trying to send packets to AS 3, but there are two possible paths the packets can go:

Making peering easy with the new Cloudflare Peering Portal

The BGP decision algorithm will select the path with the least number of hops, meaning that the path the packets will take is AS 1 → AS 2 → AS 3.

When two networks peer with each other, the number of networks needed to connect AS 1 and AS 3 is reduced to one, because AS 1 and AS 3 are directly connected with each other. But “connecting with” another network can be kind of vague, so let’s be more specific. In general, there are three ways that networks can connect with other networks: directly through Private Network Interconnects (PNI), at Internet Exchanges (IX), and through transit networks that connect with many networks.

Private Network Interconnect

A private network interconnect (PNI) sounds complicated, but it’s really simple at its core: it’s a cable connecting two networks to each other. If networks are in the same datacenter facility, it’s often easy for these two networks to connect and by doing so over a private connection, they can get dedicated bandwidth to each other as well as reliable uptime. Cloudflare has a product called Cloudflare Network Interconnect (CNI) that allows other networks to directly connect their networks to Cloudflare in this way.

Internet Exchanges

An Internet exchange (IX) is a building that specifically houses many networks in the same place. Each network gets one or more ports, and plugs into what is essentially a shared switch so that every network has the potential to interconnect. These switches are massive and house hundreds, even thousands of networks. This is similar to a PNI, but instead of two networks directly connecting to each other, thousands of networks connect to the same device and establish BGP sessions through that device.

Making peering easy with the new Cloudflare Peering Portal
Fabienne Serriere: An optical patch panel of the AMS-IX Internet exchange point, in Amsterdam, Netherlands. CC BY-SA 3.0

At Internet Exchanges, traffic is generally exchanged between networks free of charge, and it’s a great way to interconnect a network with other networks and save money on bandwidth between networks.

Transit networks

Transit networks are networks that are specifically designed to carry packets between other networks. These networks peer at Internet Exchanges and directly with many other networks to provide connectivity for your network without having to get PNIs or IX presence with networks. This service comes at a price, and may impact network performance as the transit network is an intermediary hop between your network and the place your traffic is trying to reach. Transit networks aren’t peering, but they do peering on your behalf.

No matter how you may decide to connect your network to Cloudflare, we have an open peering policy, and strongly encourage you to connect your networks directly to Cloudflare. If you’re interested, you can get started by going through the Cloudflare Peering Portal, which has now been made even easier. But let’s take a second to talk about why peering is so important.

Why is peering important?

Peering is important on the Internet for three reasons: it distributes traffic across many networks reducing single points of failure of the Internet, it often reduces bandwidth prices on the Internet making overall costs lower, and it improves performance by removing network hops. Let’s talk about each of those benefits.

Peering improves the overall uptime of the Internet by distributing traffic across multiple networks, meaning that if one network goes down, traffic from your network will still be able to reach your users. Compare that to connecting to a transit network: if the transit network has an issue, your network will be unreachable because that network was the only thing connecting your network to the rest of the Internet (unless you decide to pay multiple transit providers). With peering, any individual network failure will not completely impact the ability for your users to reach your network.

Peering helps reduce your network bandwidth costs because it distributes the cost you pay an Internet Exchange for a port across all the networks you interconnect with at the IX. If you’re paying \$1000/month for a port at an IX, and you’re peered with 100 networks there, you’re effectively paying \$10/network, as opposed to paying \$1000/month to connect to one transit network. Furthermore, many networks including Cloudflare have open peering policies and settlement free peering, which means we don’t charge you to send traffic to us or the other way round, making peering even more economical.

Peering also improves performance for Internet traffic by bringing networks closer together, reducing the time it takes for a packet to go from one network to another. The more two networks peer with each other, the more physical places on the planet they can exchange traffic directly, meaning that users everywhere see better performance.

Here’s an example. Janine is trying to order food from Acme Food Services, a site protected by Cloudflare. She lives in Boston and connects to Cloudflare via her ISP. Acme Food Services has their origin in Boston as well, so for Janine to see the fastest performance, her ISP should connect to Cloudflare in Boston and then Cloudflare should route her traffic directly to the Acme origin in Boston. Unfortunately for Janine, her ISP doesn’t peer with Cloudflare in Boston, but instead peers with Cloudflare in New York: meaning that when Janine connects to Acme, her traffic is going through her ISP to New York before it reaches Cloudflare, and then all the way back to Boston to the Acme origins!

Making peering easy with the new Cloudflare Peering Portal

But with proper peering, we can ensure that traffic is routed over the fastest possible path to ensure Janine connects to Cloudflare in Boston and everything stays local:

Making peering easy with the new Cloudflare Peering Portal

Fortunately for Janine, Cloudflare peers with over 10,000 networks in the world in over 275 locations, so high latency on the network is rare. And every time a new network peers with us, we help make user traffic even faster. So now let’s talk about how we’ve made peering even easier.

Cloudflare Peering Portal supports PeeringDB login

Cloudflare, along with many other networks, rely on PeeringDB as a source of truth for which networks are present on the Internet. PeeringDB is a community-maintained database of all the networks that are present on the Internet and what datacenter facilities and IXs they are present at, as well as what IPs are used for peering at each public location. Many networks, including Cloudflare, require you to have an account on PeeringDB before you can initiate a peering session with their network.

You can now use that same PeeringDB account to log into the Cloudflare Peering Portal directly, saving you the need to make a specific Cloudflare Peering Portal account.

When you log into the Cloudflare Peering Portal, simply click on the PeeringDB login button and enter your PeeringDB credentials. Cloudflare will then use this login information to determine what networks you are responsible for and automatically load data for those networks.

Making peering easy with the new Cloudflare Peering Portal

From here you can see all the places your network exchanges traffic with Cloudflare. You can see all the places you currently have direct peering with us, as well as locations for potential peering: places you could peer with us but currently don’t. Wouldn’t it be great if you could just click a button and configure a peering session with Cloudflare directly from that view? Well now you can!

Requesting sessions in the Peering Portal

Starting today, you can now request peering sessions with Cloudflare at Internet Exchanges right from the peering portal, making it even easier to get connected with Cloudflare. When you’re looking at potential peering sessions in the portal, you’ll now see a button that will allow you to verify your peering information is correct, and if it is to proceed with a peering request:

Making peering easy with the new Cloudflare Peering Portal

Once you click that button, a ticket will go immediately to our network team to configure a peering session with you using the details already provided in PeeringDB. Our network team looks at whether we already have existing connections with your network at that location, and also what the impact to your Internet traffic will be if we peer with you there. Once we’ve evaluated these variables, we’ll proceed to establish a BGP session with you at the location and inform you via email that you’ve already provided via PeeringDB. Then all you have to do is accept the BGP sessions, and you’ll be exchanging traffic with Cloudflare!

Peer with Cloudflare today!

It has never been easier to peer with Cloudflare, and our simplified peering portal will make it even easier to get connected. Visit our peering portal today and get started on the path of faster, cheaper connectivity to Cloudflare!

Monitor your own network with free network flow analytics from Cloudflare

Post Syndicated from Chris Draper original https://blog.cloudflare.com/free-magic-network-monitoring/

Monitor your own network with free network flow analytics from Cloudflare

Monitor your own network with free network flow analytics from Cloudflare

As a network engineer or manager, answering questions about the traffic flowing across your infrastructure is a key part of your job. Cloudflare built Magic Network Monitoring (previously called Flow Based Monitoring) to give you better visibility into your network and to answer questions like, “What is my network’s peak traffic volume? What are the sources of that traffic? When does my network see that traffic?” Today, Cloudflare is excited to announce early access to a free version of Magic Network Monitoring that will be available to everyone. You can request early access by filling out this form.

Magic Network Monitoring now features a powerful analytics dashboard, self-serve configuration, and a step-by-step onboarding wizard. You’ll have access to a tool that helps you visualize your traffic and filter by packet characteristics including protocols, source IPs, destination IPs, ports, TCP flags, and router IP. Magic Network Monitoring also includes network traffic volume alerts for specific IP addresses or IP prefixes on your network.

Making Network Monitoring easy

Magic Networking Monitoring allows customers to collect network analytics without installing a physical device like a network TAP (Test Access Point) or setting up overly complex remote monitoring systems. Our product works with any hardware that exports network flow data, and customers can quickly configure any router to send flow data to Cloudflare’s network. From there, our network flow analyzer will aggregate your traffic data and display it in Magic Network Monitoring analytics.

Analytics dashboard

In Magic Network Monitoring analytics, customers can take a deep dive into their network traffic data. You can filter traffic data by protocol, source IP, destination IP, TCP flags, and router IP. Customers can combine these filters together to answer questions like, “How much ICMP data was requested from my speed test server over the past 24 hours?” Visibility into traffic analytics is a key part of understanding your network’s operations and proactively improving your security. Let’s walk through some cases where Magic Network Monitoring analytics can answer your network visibility and security questions.

Monitor your own network with free network flow analytics from Cloudflare

Create network volume alert thresholds per IP address or IP prefix

Magic Network Monitoring is incredibly flexible, and it can be customized to meet the needs of any network hobbyist or business. You can monitor your traffic volume trends over time via the analytics dashboard and build an understanding of your network’s traffic profile. After gathering historical network data, you can set custom volumetric threshold alerts for one IP prefix or a group of IP prefixes. As your network traffic changes over time, or their network expands, they can easily update their Magic Network Monitoring configuration to receive data from new routers or destinations within their network.

Monitoring a speed test server in a home lab

Let’s run through an example where you’re running a network home lab. You decide to use Magic Network Monitoring to track the volume of requests a speed test server you’re hosting receives and check for potential bad actors. Your goal is to identify when your speed test server experiences peak traffic, and the volume of that traffic. You set up Magic Network Monitoring and create a rule that analyzes all traffic destined for your speed test server’s IP address. After collecting data for seven days, the analytics dashboard shows that peak traffic occurs on weekdays in the morning, and that during this time, your traffic volume ranges from 450 – 550 Mbps.

As you’re checking over the analytics data, you also notice strange traffic spikes of 300 – 350 Mbps in the middle of the night that occur at the same time. As you investigate further, the analytics dashboard shows the source of this traffic spike is from the same IP prefix. You research some source IPs, and find they’re associated with malicious activity. As a result, you update your firewall to block traffic from this problematic source.

Identifying a network layer DDoS attack

Magic Network Monitoring can also be leveraged to identify a variety of L3, L4, and L7 DDoS attacks. Let’s run through an example of how ACME Corp, a small business using Magic Network Monitoring, can identify a Ping (ICMP) Flood attack on their network. Ping Flood attacks aim to overwhelm the targeted network’s ability to respond to a high number of requests or overload the network connection with bogus traffic.

At the start of a Ping Flood attack, your server’s traffic volume will begin to ramp up. Magic Network Monitoring will analyze traffic across your network, and send an email, webhook, or PagerDuty alert once an unusual volume of traffic is identified. Your network and security team can respond to the volumetric alert by checking the data in Magic Network Monitoring analytics and identifying the attack type. In this case, they’ll notice the following traffic characteristics:

  1. Network traffic volume above your historical traffic averages
  2. An unusually large amount of ICMP traffic
  3. ICMP traffic coming from a specific set of source IPs

Now, your network security team has confirmed the traffic is malicious by identifying the attack type, and can begin taking steps to mitigate the attack.

Magic Network Monitoring and Magic Transit

If your business is impacted by DDoS attacks, Magic Network Monitoring will identify attacks, and Magic Transit can be used to mitigate those DDoS attacks. Magic Transit protects customers’ entire network from DDoS attacks by placing our network in front of theirs. You can use Magic Transit Always On to reduce latency and mitigate attacks all the time, or Magic Transit On Demand to protect your network during active attacks. With Magic Transit, you get DDoS protection, traffic acceleration, and other network functions delivered as a service from every Cloudflare data center. Magic Transit works by allowing Cloudflare to advertise customers’ IP prefixes to the Internet with BGP to route the customer’s traffic through our network for DDoS protection. If you’re interested in protecting your network with Magic Transit, you can visit the Magic Transit product page and request a demo today.

Monitor your own network with free network flow analytics from Cloudflare

Sign up for early access and what’s next

The free version of Magic Network Monitoring (MNM) will be released in the next few weeks. You can request early access by filling out this form.

This is just the beginning for Magic Network Monitoring. In the future, you can look forward to features like advanced DDoS attack identification, network incident history and trends, and volumetric alert threshold recommendations.

Bringing Zero Trust to mobile network operators

Post Syndicated from Mike Conlow original https://blog.cloudflare.com/zero-trust-for-mobile-operators/

Bringing Zero Trust to mobile network operators

Bringing Zero Trust to mobile network operators

At Cloudflare, we’re excited about the quickly-approaching 5G future. Increasingly, we’ll have access to high throughput and low-latency wireless networks wherever we are. It will make the Internet feel instantaneous, and we’ll find new uses for this connectivity such as sensors that will help us be more productive and energy-efficient. However, this type of connectivity doesn’t have to come at the expense of security, a concern raised in this recent Wired article. Today we’re announcing the creation of a new partnership program for mobile networks—Zero Trust for Mobile Operators—to jointly solve the biggest security and performance challenges.

SASE for Mobile Networks

Every network is different, and the key to managing the complicated security environment of an enterprise network is having lots of tools in the toolbox. Most of these functions fall under the industry buzzword SASE, which stands for Secure Access Service Edge. Cloudflare’s SASE product is Cloudflare One, and it’s a comprehensive platform for network operators.  It includes:

  • Magic WAN, which offers secure Network-as-a-Service (NaaS) connectivity for your data centers, branch offices and cloud VPCs and integrates with your legacy MPLS networks
  • Cloudflare Access, which is a Zero Trust Network Access (ZTNA) service requiring strict verification for every user and every device before authorizing them to access internal resources.
  • Gateway, our Secure Web Gateway, which operates between a corporate network and the Internet to enforce security policies and protect company data.
  • A Cloud Access Security Broker, which monitors the network and external cloud services for security threats.
  • Cloudflare Area 1, an email threat detection tool to scan email for phishing, malware, and other threats.
Bringing Zero Trust to mobile network operators

We’re excited to partner with mobile network operators for these services because our networks and services are tremendously complementary. Let’s first think about SD-WAN (Software-Defined Wide Area Network) connectivity, which is the foundation on which much of the SASE framework rests. As an example, imagine a developer working from home developing a solution with a Mobile Network Operator’s (MNO) Internet of Things APIs. Maybe they’re developing tracking software for the number of drinks left in a soda machine, or want to track the routes for delivery trucks.

The developer at home and their fleet of devices should be on the same wide area network, securely, and at reasonable cost. What Cloudflare provides is the programmable software layer that enables this secure connectivity. The developer and the developer’s employer still need to have connectivity to the Internet at home, and for the fleet of devices. The ability to make a secure connection to your fleet of devices doesn’t do any good without enterprise connectivity, and the enterprise connectivity is only more valuable with the secure connection running on top of it. They’re the perfect match.

Once the connectivity is established, we can layer on a Zero Trust platform to ensure every user can only access a resource to which they’ve been explicitly granted permission. Any time a user wants to access a protected resource – via ssh, to a cloud service, etc. – they’re challenged to authenticate with their single-sign-on credentials before being allowed access. The networks we use are growing and becoming more distributed. A Zero Trust architecture enables that growth while protecting against known risks.

Bringing Zero Trust to mobile network operators

Edge Computing

Given the potential of low-latency 5G networks, consumers and operators are both waiting for a “killer 5G app”. Maybe it will be autonomous vehicles and virtual reality, but our bet is on a quieter revolution: moving compute – the “work” that a server needs to do to respond to a request – from big regional data centers to small city-level data centers, embedding the compute capacity inside wireless networks, and eventually even to the base of cell towers.

Cloudflare’s edge compute platform is called Workers, and it does exactly this – execute code at the edge. It’s designed to be simple. When a developer is building an API to support their product or service, they don’t want to worry about regions and availability zones. With Workers, a developer writes code they want executed at the edge, deploys it, and within seconds it’s running at every Cloudflare data center globally.

Some workloads we already see, and expect to see more of, include:

  • IoT (Internet of Things) companies implementing complex device logic and security features directly at the edge, letting them add cutting-edge capabilities without adding cost or latency to their devices.
  • eCommerce platforms storing and caching customized assets close to their visitors for improved customer experience and great conversion rates.
  • Financial data platforms, including new Web3 players, providing near real-time information and transactions to their users.
  • A/B testing and experimentation run at the edge without adding latency or introducing dependencies on the client-side.
  • Fitness-type devices tracking a user’s movement and health statistics can offload compute-heavy workloads while maintaining great speed/latency.
  • Retail applications providing fast service and a customized experience for each customer without an expensive on-prem solution.

The Cloudflare Case Studies section has additional examples from NCR, Edgemesh, BlockFi, and others on how they’re using the Workers platform. While these examples are exciting, we’re most excited about providing the platform for new innovation.

You may have seen last week we announced Workers for Platforms is now in General Availability. Workers for Platforms is an umbrella-like structure that allows a parent organization to enable Workers for their own customers. As an MNO, your focus is on providing the means for devices to send communication to clients. For IoT use cases, sending data is the first step, but the exciting potential of this connectivity is the applications it enables. With Workers for Platforms, MNOs can expose an embedded product that allows customers to access compute power at the edge.

Network Infrastructure

The complementary networks between mobile networks and Cloudflare is another area of opportunity. When a user is interacting with the Internet, one of the most important factors for the speed of their connection is the physical distance from their handset to the content and services they’re trying to access. If the data request from a user in Denver needs to wind its way to one of the major Internet hubs in Dallas, San Jose, or Chicago (and then all the way back!), that is going to be slow. But if the MNO can link to the service locally in Denver, the connection will be much faster.

One of the exciting developments with new 5G networks is the ability of MNOs to do more “local breakout”. Many MNOs are moving towards cloud-native and distributed radio access networks (RANs) which provides more flexibility to move and multiply packet cores. These packet cores are the heart of a mobile network and all of a subscriber’s data flows through one.

For Cloudflare – with a data center presence in 275+ cities globally – a user never has to wait long for our services. We can also take it a step further. In some cases, our services are embedded within the MNO or ISP’s own network. The traffic which connects a user to a device, authorizes the connection, and securely transmits data is all within the network boundary of the MNO – it never needs to touch the public Internet, incur added latency, or otherwise compromise the performance for your subscribers.

We’re excited to partner with mobile networks because our security services work best when our customers have excellent enterprise connectivity underneath. Likewise, we think mobile networks can offer more value to their customers with our security software added on top. If you’d like to talk about how to integrate Cloudflare One into your offerings, please email us at [email protected], and we’ll be in touch!

Identifying publicly accessible resources with Amazon VPC Network Access Analyzer

Post Syndicated from Patrick Duffy original https://aws.amazon.com/blogs/security/identifying-publicly-accessible-resources-with-amazon-vpc-network-access-analyzer/

Network and security teams often need to evaluate the internet accessibility of all their resources on AWS and block any non-essential internet access. Validating who has access to what can be complicated—there are several different controls that can prevent or authorize access to resources in your Amazon Virtual Private Cloud (Amazon VPC). The recently launched Amazon VPC Network Access Analyzer helps you understand potential network paths to and from your resources without having to build automation or manually review security groups, network access control lists (network ACLs), route tables, and Elastic Load Balancing (ELB) configurations. You can use this information to add security layers, such as moving instances to a private subnet behind a NAT gateway or moving APIs behind AWS PrivateLink, rather than use public internet connectivity. In this blog post, we show you how to use Network Access Analyzer to identify publicly accessible resources.

What is Network Access Analyzer?

Network Access Analyzer allows you to evaluate your network against your design requirements and network security policy. You can specify your network security policy for resources on AWS through a Network Access Scope. Network Access Analyzer evaluates the configuration of your Amazon VPC resources and controls, such as security groups, elastic network interfaces, Amazon Elastic Compute Cloud (Amazon EC2) instances, load balancers, VPC endpoint services, transit gateways, NAT gateways, internet gateways, VPN gateways, VPC peering connections, and network firewalls.

Network Access Analyzer uses automated reasoning to produce findings of potential network paths that don’t meet your network security policy. Network Access Analyzer reasons about all of your Amazon VPC configurations together rather than in isolation. For example, it produces findings for paths from an EC2 instance to an internet gateway only when the following conditions are met: the security group allows outbound traffic, the network ACL allows outbound traffic, and the instance’s route table has a route to an internet gateway (possibly through a NAT gateway, network firewall, transit gateway, or peering connection). Network Access Analyzer produces actionable findings with more context such as the entire network path from the source to the destination, as compared to the isolated rule-based checks of individual controls, such as security groups or route tables.

Sample environment

Let’s walk through a real-world example of using Network Access Analyzer to detect publicly accessible resources in your environment. Figure 1 shows an environment for this evaluation, which includes the following resources:

  • An EC2 instance in a public subnet allowing inbound public connections on port 80/443 (HTTP/HTTPS).
  • An EC2 instance in a private subnet allowing connections from an Application Load Balancer on port 80/443.
  • An Application Load Balancer in a public subnet with a Target Group connected to the private web server, allowing public connections on port 80/443.
  • An Amazon Aurora database in a public subnet allowing public connections on port 3306 (MySQL).
  • An Aurora database in a private subnet.
  • An EC2 instance in a public subnet allowing public connections on port 9200 (OpenSearch/Elasticsearch).
  • An Amazon EMR cluster allowing public connections on port 8080.
  • A Windows EC2 instance in a public subnet allowing public connections on port 3389 (Remote Desktop Protocol).
Figure 1: Example environment of web servers hosted on EC2 instances, remote desktop servers hosted on EC2, Relational Database Service (RDS) databases, Amazon EMR cluster, and OpenSearch cluster on EC2

Figure 1: Example environment of web servers hosted on EC2 instances, remote desktop servers hosted on EC2, Relational Database Service (RDS) databases, Amazon EMR cluster, and OpenSearch cluster on EC2

Let us assume that your organization’s security policy requires that your databases and analytics clusters not be directly accessible from the internet, whereas certain workload such as instances for web services can have internet access only through an Application Load Balancer over ports 80 and 443. Network Access Analyzer allows you to evaluate network access to resources in your VPCs, including database resources such as Amazon RDS and Amazon Aurora clusters, and analytics resources such as Amazon OpenSearch Service clusters and Amazon EMR clusters. This allows you to govern network access to your resources on AWS, by identifying network access that does not meet your security policies, and creating exclusions for paths that do have the appropriate network controls in place.

Configure Network Access Analyzer

In this section, you will learn how to create network scopes, analyze the environment, and review the findings produced. You can create network access scopes by using the AWS Command Line Interface (AWS CLI) or AWS Management Console. When creating network access scopes using the AWS CLI, you can supply the scope by using a JSON document. This blog post provides several network access scopes as JSON documents that you can deploy to your AWS accounts.

To create a network scope (AWS CLI)

  1. Verify that you have the AWS CLI installed and configured.
  2. Download the network-scopes.zip file, which contains JSON documents that detect the following publicly accessible resources:
    • OpenSearch/Elasticsearch clusters
    • Databases (MySQL, PostgreSQL, MSSQL)
    • EMR clusters
    • Windows Remote Desktop
    • Web servers that can be accessed without going through a load balancer

    Make note of the folder where you save the JSON scopes because you will need it for the next step.

  3. Open a systems shell, such as Bash, Zsh, or cmd.
  4. Navigate to the folder where you saved the preceding JSON scopes.
  5. Run the following commands in the shell window:
    aws ec2 create-network-insights-access-scope 
    --cli-input-json file://detect-public-databases.json 
    --tag-specifications 'ResourceType="network-insights-access-scope",
    		   Value="Detects publicly accessible databases."}]' 
    --region us-east-1
    aws ec2 create-network-insights-access-scope 
    --cli-input-json file://detect-public-elastic.json 
    --tag-specifications 'ResourceType="network-insights-access-scope",
    		   Value="Detects publicly accessible OpenSearch/Elasticsearch endpoints."}]' 
    --region us-east-1
    aws ec2 create-network-insights-access-scope 
    --cli-input-json file://detect-public-emr.json 
    --tag-specifications 'ResourceType="network-insights-access-scope",
    		   Value="Detects publicly accessible Amazon EMR endpoints."}]'
    --region us-east-1
    aws ec2 create-network-insights-access-scope 
    --cli-input-json file://detect-public-remotedesktop.json 
    --tag-specifications 'ResourceType="network-insights-access-scope",
    		   Value="Detects publicly accessible Microsoft Remote Desktop servers."}]' 
    --region us-east-1
    aws ec2 create-network-insights-access-scope 
    --cli-input-json file://detect-public-webserver-noloadbalancer.json 
    --tag-specifications 'ResourceType="network-insights-access-scope",
    		   Value="Detects publicly accessible web servers that can be accessed without using a load balancer."}]' 
    --region us-east-1

Now that you’ve created the scopes, you will analyze them to find resources that match your match conditions.

To analyze your scopes (console)

  1. Open the Amazon VPC console.
  2. In the navigation pane, under Network Analysis, choose Network Access Analyzer.
  3. Under Network Access Scopes, select the checkboxes next to the scopes that you want to analyze, and then choose Analyze, as shown in Figure 2.
    Figure 2: Custom network scopes created for Network Access Analyzer

    Figure 2: Custom network scopes created for Network Access Analyzer

If Network Access Analyzer detects findings, the console indicates the status Findings detected for each scope, as shown in Figure 3.

Figure 3: Network Access Analyzer scope status

Figure 3: Network Access Analyzer scope status

To review findings for a scope (console)

  1. On the Network Access Scopes page, under Network Access Scope ID, select the link for the scope that has the findings that you want to review. This opens the latest analysis, with the option to review past analyses, as shown in Figure 4.
    Figure 4: Finding summary identifying Amazon Aurora instance with public access to port 3306

    Figure 4: Finding summary identifying Amazon Aurora instance with public access to port 3306

  2. To review the path for a specific finding, under Findings, select the radio button to the left of the finding, as shown in Figure 4. Figure 5 shows an example of a path for a finding.
    Figure 5: Finding details showing access to the Amazon Aurora instance from the internet gateway to the elastic network interface, allowed by a network ACL and security group.

    Figure 5: Finding details showing access to the Amazon Aurora instance from the internet gateway to the elastic network interface, allowed by a network ACL and security group.

  3. Choose any resource in the path for detailed information, as shown in Figure 6.
    Figure 6: Resource detail within a finding outlining a specific security group allowing access on port 3306

    Figure 6: Resource detail within a finding outlining a specific security group allowing access on port 3306

How to remediate findings

After deploying network scopes and reviewing findings for publicly accessible resources, you should next limit access to those resources and remove public access. Use cases vary, but the scopes outlined in this post identify resources that you should share publicly in a more secure manner or remove public access entirely. The following techniques will help you align to the Protecting Networks portion of the AWS Well-Architected Framework Security Pillar.

If you have a need to share a database with external entities, consider using AWS PrivateLink, VPC peering, or use AWS Site-to-Site VPN to share access. You can remove public access by modifying the security group attached to the RDS instance or EC2 instance serving the database, but you should migrate the RDS database to a private subnet as well.

When creating web servers in EC2, you should not place web servers directly in a public subnet with security groups allowing HTTP and HTTPS ports from all internet addresses. Instead, you should place your EC2 instances in private subnets and use Application Load Balancers in a public subnet. From there, you can attach a security group that allows HTTP/HTTPS access from public internet addresses to your Application Load Balancer, and attach a security group that allows HTTP/HTTPS from your Load Balancer security group to your web server EC2 instances. You can also associate AWS WAF web ACLs to the load balancer to protect your web applications or APIs against common web exploits and bots that may affect availability, compromise security, or consume excessive resources.

Similarly, if you have OpenSearch/Elasticsearch running on EC2 or Amazon OpenSearch Service, or are using Amazon EMR, you can share these resources using PrivateLink. Use the Amazon EMR block public access configuration to verify that your EMR clusters are not shared publicly.

To connect to Remote Desktop on EC2 instances, you should use AWS Systems Manager to connect using Fleet Manager. Connecting with Fleet Manager only requires your Windows EC2 instances to be a managed node. When connecting using Fleet Manager, the security group requires no inbound ports, and the instance can be in a private subnet. For more information, see the Systems Manager prerequisites.


This blog post demonstrates how you can identify and remediate publicly accessible resources. Amazon VPC Network Access Analyzer helps you identify available network paths by using automated reasoning technology and user-defined access scopes. By using these scopes, you can define non-permitted network paths, identify resources that have those paths, and then take action to increase your security posture. To learn more about building continuous verification of network compliance at scale, see the blog post Continuous verification of network compliance using Amazon VPC Network Access Analyzer and AWS Security Hub. Take action today by deploying the Network Access Analyzer scopes in this post to evaluate your environment and add layers of security to best fit your needs.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.


Patrick Duffy

Patrick is a Solutions Architect in the Small Medium Business (SMB) segment at AWS. He is passionate about raising awareness and increasing security of AWS workloads. Outside work, he loves to travel and try new cuisines and enjoys a match in Magic Arena or Overwatch.

Peter Ticali

Peter Ticali

Peter is a Solutions Architect focused on helping Media & Entertainment customers transform and innovate. With over three decades of professional experience, he’s had the opportunity to contribute to architecture that stream live video to millions, including two Super Bowls, PPVs, and even a Royal Wedding. Previously he held Director, and CTO roles in the EdTech, advertising & public relations space. Additionally, he is a published photo journalist.

John Backes

John Backes

John is a Senior Applied Scientist in AWS Networking. He is passionate about applying Automated Reasoning to network verification and synthesis problems.

Vaibhav Katkade

Vaibhav Katkade

Vaibhav is a Senior Product Manager in the Amazon VPC team. He is interested in areas of network security and cloud networking operations. Outside of work, he enjoys cooking and the outdoors.

Integrating Network Analytics Logs with your SIEM dashboard

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/network-analytics-logs/

Integrating Network Analytics Logs with your SIEM dashboard

Integrating Network Analytics Logs with your SIEM dashboard

We’re excited to announce the availability of Network Analytics Logs. Magic Transit, Magic Firewall, Magic WAN, and Spectrum customers on the Enterprise plan can feed packet samples directly into storage services, network monitoring tools such as Kentik, or their Security Information Event Management (SIEM) systems such as Splunk to gain near real-time visibility into network traffic and DDoS attacks.

What’s included in the logs

By creating a Network Analytics Logs job, Cloudflare will continuously push logs of packet samples directly to the HTTP endpoint of your choice, including Websockets. The logs arrive in JSON format which makes them easy to parse, transform, and aggregate. The logs include packet samples of traffic dropped and passed by the following systems:

  1. Network-layer DDoS Protection Ruleset
  2. Advanced TCP Protection
  3. Magic Firewall

Note that not all mitigation systems are applicable to all Cloudflare services. Below is a table describing which mitigation service is applicable to which Cloudflare service:

Mitigation System Cloudflare Service
Magic Transit Magic WAN Spectrum
Network-layer DDoS Protection Ruleset
Advanced TCP Protection
Magic Firewall

Packets are processed by the mitigation systems in the order outlined above. Therefore, a packet that passed all three systems may produce three packet samples, one from each system. This can be very insightful when troubleshooting and wanting to understand where in the stack a packet was dropped. To avoid overcounting the total passed traffic, Magic Transit users should only take into consideration the passed packets from the last mitigation system, Magic Firewall.

An example of a packet sample log:


All the available log fields are documented here: https://developers.cloudflare.com/logs/reference/log-fields/account/network_analytics_logs/

Setting up the logs

In this walkthrough, we will demonstrate how to feed the Network Analytics Logs into Splunk via Postman. At this time, it is only possible to set up Network Analytics Logs via API. Setting up the logs requires three main steps:

  1. Create a Cloudflare API token.
  2. Create a Splunk Cloud HTTP Event Collector (HEC) token.
  3. Create and enable a Cloudflare Logpush job.

Let’s get started!

1) Create a Cloudflare API token

  1. Log in to your Cloudflare account and navigate to My Profile.
  2. On the left-hand side, in the collapsing navigation menu, click API Tokens.
  3. Click Create Token and then, under Custom token, click Get started.
  4. Give your custom token a name, and select an Account scoped permission to edit Logs. You can also scope it to a specific/subset/all of your accounts.
  5. At the bottom, click Continue to summary, and then Create Token.
  6. Copy and save your token. You can also test your token with the provided snippet in Terminal.

When you’re using an API token, you don’t need to provide your email address as part of the API credentials.

Integrating Network Analytics Logs with your SIEM dashboard

Read more about creating an API token on the Cloudflare Developers website: https://developers.cloudflare.com/api/tokens/create/

2) Create a Splunk token for an HTTP Event Collector

In this walkthrough, we’re using a Splunk Cloud free trial, but you can use almost any service that can accept logs over HTTPS. In some cases, if you’re using an on-premise SIEM solution, you may need to allowlist Cloudflare IP address in your firewall to be able to receive the logs.

  1. Create a Splunk Cloud account. I created a trial account for the purpose of this blog.
  2. In the Splunk Cloud dashboard, go to Settings > Data Input.
  3. Next to HTTP Event Collector, click Add new.
  4. Follow the steps to create a token.
  5. Copy your token and your allocated Splunk hostname and save both for later.
Integrating Network Analytics Logs with your SIEM dashboard

Read more about using Splunk with Cloudflare Logpush on the Cloudflare Developers website: https://developers.cloudflare.com/logs/get-started/enable-destinations/splunk/

Read more about creating an HTTP Event Collector token on Splunk’s website: https://docs.splunk.com/Documentation/Splunk/8.2.6/Data/UsetheHTTPEventCollector

3) Create a Cloudflare Logpush job

Creating and enabling a job is very straightforward. It requires only one API call to Cloudflare to create and enable a job.

To send the API calls I used Postman, which is a user-friendly API client that was recommended to me by a colleague. It allows you to save and customize API calls. You can also use Terminal/CMD or any other API client/script of your choice.

One thing to notice is Network Analytics Logs are account-scoped. The API endpoint is therefore a tad different from what you would normally use for zone-scoped datasets such as HTTP request logs and DNS logs.

This is the endpoint for creating an account-scoped Logpush job:


Your account identifier number is a unique identifier of your account. It is a string of 32 numbers and characters. If you’re not sure what your account identifier is, log in to Cloudflare, select the appropriate account, and copy the string at the end of the URL.


Then, set up a new request in Postman (or any other API client/CLI tool).

To successfully create a Logpush job, you’ll need the HTTP method, URL, Authorization token, and request body (data). The request body must include a destination configuration (destination_conf), the specified dataset (network_analytics_logs, in our case), and the token (your Splunk token).

Integrating Network Analytics Logs with your SIEM dashboard





Authorization: Define a Bearer authorization in the Authorization tab, or add it to the header, and add your Cloudflare API token.

Body: Select a Raw > JSON

"destination_conf": "{your-unique-splunk-configuration}",
"dataset": "network_analytics_logs",
"token": "{your-splunk-hec-tag}",
"enabled": "true"

If you’re using Splunk Cloud, then your unique configuration has the following format:


Definition of the variables:

{your-splunk-hostname}= Your allocated Splunk Cloud hostname.

{channel-id} = A unique ID that you choose to assign for.`{your-splunk–hec-token}` = The token that you generated for your Splunk HEC.

An important note is that customers should have a valid SSL/TLS certificate on their Splunk instance to support an encrypted connection.

After you’ve done that, you can create a GET request to the same URL (no request body needed) to verify that the job was created and is enabled.

The response should be similar to the following:

    "errors": [],
    "messages": [],
    "result": {
        "id": {job-id},
        "dataset": "network_analytics_logs",
        "frequency": "high",
        "kind": "",
        "enabled": true,
        "name": null,
        "logpull_options": null,
        "destination_conf": "{your-unique-splunk-configuration}",
        "last_complete": null,
        "last_error": null,
        "error_message": null
    "success": true

Shortly after, you should start receiving logs to your Splunk HEC.

Integrating Network Analytics Logs with your SIEM dashboard

Read more about enabling Logpush on the Cloudflare Developers website: https://developers.cloudflare.com/logs/reference/logpush-api-configuration/examples/example-logpush-curl/

Reduce costs with R2 storage

Depending on the amount of logs that you read and write, the cost of third party cloud storage can skyrocket — forcing you to decide between managing a tight budget and being able to properly investigate networking and security issues. However, we believe that you shouldn’t have to make those trade-offs. With R2’s low costs, we’re making this decision easier for our customers. Instead of feeding logs to a third party, you can reap the cost benefits of storing them in R2.

To learn more about the R2 features and pricing, check out the full blog post. To enable R2, contact your account team.

Cloudflare logs for maximum visibility

Cloudflare Enterprise customers have access to detailed logs of the metadata generated by our products. These logs are helpful for troubleshooting, identifying network and configuration adjustments, and generating reports, especially when combined with logs from other sources, such as your servers, firewalls, routers, and other appliances.

Network Analytics Logs joins Cloudflare’s family of products on Logpush: DNS logs, Firewall events, HTTP requests, NEL reports, Spectrum events, Audit logs, Gateway DNS, Gateway HTTP, and Gateway Network.

Not using Cloudflare yet? Start now with our Free and Pro plans to protect your websites against DDoS attacks, or contact us for comprehensive DDoS protection and firewall-as-a-service for your entire network.

Cloudflare partners with Microsoft to protect joint customers with a Global Zero Trust Network

Post Syndicated from Abhi Das original https://blog.cloudflare.com/cloudflare-partners-with-microsoft-to-protect-joint-customers-with-global-zero-trust-network/

Cloudflare partners with Microsoft to protect joint customers with a Global Zero Trust Network

Cloudflare partners with Microsoft to protect joint customers with a Global Zero Trust Network

As a company, we are constantly asking ourselves what we can do to provide more value to our customers, including integrated solutions with our partners. Joint customers benefit from our integrations below with Azure Active Directory by:

First, centralized identity and access management via Azure Active Directory which provides single sign-on, multifactor authentication, and access via conditional authentication.

Second, policy oriented access to specific applications using Cloudflare Access—a VPN replacement service.

Third, an additional layer of security for internal applications by connecting them to Cloudflare global network and not having to open them up to the whole Internet.

Cloudflare partners with Microsoft to protect joint customers with a Global Zero Trust Network

Let’s step back a bit.

Why Zero Trust?

Companies of all sizes are faced with an accelerating digital transformation of their IT stack and an increasingly distributed workforce, changing the definition of the security perimeter. We are moving away from the castle and moat model to the whole Internet, requiring security checks for every user accessing every resource. As a result, all companies, especially those whose use of Azure’s broad cloud portfolio is increasing, are adopting Zero Trust architectures as an essential part of their cloud and SaaS journey.

Cloudflare Access provides secure access to Azure hosted applications and on-premise applications. Also, it acts as an on-ramp to the world’s fastest network to Azure and the rest of the Internet. Users connect from their devices or offices via Cloudflare’s network in over 250 cities around the world. You can use Cloudflare Zero Trust on that global network to ensure that every request to every resource is evaluated for security, including user identity. We are excited to bring this secure global network on-ramp to Azure hosted applications and on-premise applications.

Also, performance is one of our key advantages in addition to security. Cloudflare serves over 32 million HTTP requests per second, on average, for the millions of customers who secure their applications on our network. When those applications do not run on our network, we can rely on our own global private backbone and our connectivity with over 10,000 networks globally to connect the user.

We are excited to bring this global security and performance perimeter network as our Cloudflare Zero Trust product for your Azure hosted applications and on-premises applications.

Cloudflare Access: a modern Zero Trust approach

Cloudflare’s Zero Trust solution Cloudflare Access provides a modern approach to authentication for internally managed applications. When corporate applications on Azure or on-premise are protected with Cloudflare Access, they feel like SaaS applications, and employees can log in to them with a simple and consistent flow. Cloudflare Access acts as a unified reverse proxy to enforce access control by making sure every request is authenticated, authorized, and encrypted.

Identity: Cloudflare Access integrates out of the box with all the major identity providers, including Azure Active Directory, allowing use of the policies and users you already created to provide conditional access to your web applications. For example, you can use Cloudflare Access to ensure that only company employees and no contractors can get to your internal kanban board, or you can lock down the SAP finance application hosted on Azure or on-premise.

Devices: You can use TLS with Client Authentication and limit connections only to devices with a unique client certificate. Cloudflare will ensure the connecting device has a valid client certificate signed by the corporate CA, and authenticate user credentials to grant access to an internal application.

Additional security: Want to use Cloudflare Access in front of an internal application but don’t want to open up that application to the whole Internet? For additional security, you can combine Access with Cloudflare Tunnel. Cloudflare Tunnel will connect from your Azure environment directly to Cloudflare’s network, so there is no publicly accessible IP.

Cloudflare partners with Microsoft to protect joint customers with a Global Zero Trust Network

Secure both your legacy applications and Azure hosted applications via Azure AD and Cloudflare Access jointly

For on-premise legacy applications, we are excited to announce that Cloudflare is an Azure Active Directory secure hybrid access partner. Azure AD secure hybrid access enables customers to centrally manage access to their legacy on-premise applications using SSO authentication without incremental development. Starting today, joint customers can easily use Cloudflare Access solution as an additional layer of security with built-in performance, in front of their legacy applications.

Cloudflare partners with Microsoft to protect joint customers with a Global Zero Trust Network

Traditionally for on-premise applications, customers have to change their existing code or add additional layers of code to integrate Azure AD or Cloudflare Access–like capabilities. With the help of Azure Active Directory secure hybrid access, customers can integrate these capabilities seamlessly without much code changes. Once integrated, customers can take advantage of the below Azure AD features and more:

  1. Multi-factor authentication (MFA)
  2. Single sign-on (SSO)
  3. Passwordless authentication
  4. Unified user access management
  5. Azure AD Conditional Access and device trust

Very similarly, the Azure AD and Cloudflare Access combo can also be used to secure your Azure hosted applications. Cloudflare Access enables secure on-ramp to Azure hosted applications or on-premise applications via the below two integrations:

Cloudflare partners with Microsoft to protect joint customers with a Global Zero Trust Network

1. Cloudflare Access integration with Azure AD:

Cloudflare Access is a Zero Trust Network Access (ZTNA) solution that allows you to configure precise access policies across their applications. You can integrate Microsoft Azure Active Directory with Cloudflare Zero Trust and build rules based on user identity, group membership and Azure AD Conditional Access policies. Users will authenticate with their Azure AD credentials and connect to Cloudflare Access. Additional policy controls include Device Posture, Network, Location and more. Setup typically takes less than a few hours!

2. Cloudflare Tunnel integration with Azure:

Cloudflare Tunnel can expose applications running on the Microsoft Azure platform. See guide to install and configure Cloudflare Tunnel. Also, a prebuilt Cloudflare Linux image exists on the Azure Marketplace. To simplify the process of connecting Azure applications to Cloudflare’s network, deploy the prebuilt image to an Azure resource group. Cloudflare Tunnel is now available on Microsoft’s Azure marketplace.

“The hybrid work environment has accelerated the cloud transition and increased the need for CIOs everywhere to provide secure and performant access to applications for their employees. This is especially true for self-hosted applications. Cloudflare’s global network security perimeter via Cloudflare Access provides this additional layer of security together with Azure Active Directory to enable employees to get work done from anywhere securely and performantly.” said David Gregory, Principal Program Manager
– Microsoft Azure Active Directory


Over the last ten years, Cloudflare has built one of the fastest, most reliable, most secure networks in the world. You now have the ability to use that network as a global security and performance perimeter to your Azure hosted applications via the above integrations, and it is easy.

What’s next?

In the coming months, we will be further strengthening our integrations with Microsoft Azure allowing customers to better implement our SASE perimeter.

If you’re using Cloudflare Zero Trust products today and are interested in using this integration with Azure, please visit our above developer documentation to learn about how you can enable it. If you want to learn more or have additional questions, please fill out the form or get in touch with your Cloudflare CSM or AE, and we’ll be happy to help you.

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations

Post Syndicated from Matt Lewis original https://blog.cloudflare.com/interconnect-anywhere/

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations

Customers choose Cloudflare for our network performance, privacy and security.  Cloudflare Network Interconnect is the best on-ramp for our customers to utilize our diverse product suite. In the past, we’ve talked about Cloudflare’s physical footprint in over 200+ data centers, and how Cloudflare Network Interconnect enabled companies in those data centers to connect securely to Cloudflare’s network. Today, Cloudflare is excited to announce expanded partnerships that allows customers to connect to Cloudflare from their own Layer 2 service fabric. There are now over 1,600 locations where enterprise security and network professionals have the option to connect to Cloudflare securely and privately from their existing fabric.

Interconnect Anywhere is a journey

Since we launched Cloudflare Network Interconnect (CNI) in August 2020, we’ve been focused on extending the availability of Cloudflare’s network to as many places as possible. The initial launch opened up 150 physical locations alongside 25 global partner locations. During Security Week this year, we grew that availability by adding data center partners to our CNI Partner Program. Today, we are adding even more connectivity options by expanding Cloudflare availability to all of our partners’ locations, as well as welcoming CoreSite Open Cloud Exchange (OCX) and Infiny by Epsilon into our CNI Partner Program. This totals 1,638 locations where our customers can now connect securely to the Cloudflare network. As we continue to expand, customers are able to connect the fabric of their choice to Cloudflare from a growing list of data centers.

Fabric Partner Enabled Locations
PacketFabric 180+
Megaport 700+
Equinix Fabric 46+
Console Connect 440+
CoreSite Open Cloud Exchange 22+
Infiny by Epsilon 250+

“We are excited to expand our partnership with Cloudflare to ensure that our mutual customers can benefit from our carrier-class Software Defined Network (SDN) and Cloudflare’s network security in all Packetfabric locations. Now customers can easily connect from wherever they are located to access best of breed security services alongside Packetfabric’s Cloud Connectivity options.”
Alex Henthorn-Iwane, PacketFabric’s Chief Marketing Officer

“With the significant rise in DDoS attacks over the past year, it’s becoming ever more crucial for IT and Operations teams to prevent and mitigate network security threats. We’re thrilled to enable Cloudflare Interconnect everywhere on Megaport’s global Software Defined Network, which is available in over 700 enabled locations in 24 countries worldwide. Our partnership will give organizations the ability to reduce their exposure to network attacks, improve customer experiences, and simplify their connectivity across our private on-demand network in a matter of a few mouse clicks.”
Misha Cetrone, Megaport VP of Strategic Partnerships

“Expanding the connectivity options to Cloudflare ensures that customers can provision hybrid architectures faster and more easily, leveraging enterprise-class network services and automation on the Open Cloud Exchange. Simplifying the process of establishing modern networks helps achieve critical business objectives, including reducing total cost of ownership, and improving business agility as well as interoperability.”
Brian Eichman, CoreSite Senior Director of Product Development

“Partner accessibility is key in our cloud enablement and interconnection strategy. We are continuously evolving to offer our customers and partners simple and secure connectivity no matter where their network resides in the world. Infiny enables access to Cloudflare from across a global footprint while delivering high-quality cloud connectivity solutions at scale. Customers and partners gain an innovative Network-as-a-Service SDN platform that supports them with programmable and automated connectivity for their cloud and networking needs.”
Mark Daley, Epsilon Director of Digital Strategy

Uncompromising security and increased reliability from your choice of network fabric

Now, companies can connect to Cloudflare’s suite of network and security products without traversing shared public networks by taking advantage of software-defined networking providers. No matter where a customer is connected to one of our fabric partners, Cloudflare’s 200+ data centers ensure that world-class network security is close by and readily available via a secure, low latency, and cost-effective connection. An increased number of locations further allows customers to have multiple secure connections to the Cloudflare network, increasing redundancy and reliability. As we further expand our global network and increase the number of data centers where Cloudflare and our partners are connected, latency becomes shorter and customers will reap the benefits.

Let’s talk about how a customer can use Cloudflare Network Interconnect to improve their security posture through a fabric provider.

Plug and Play Fabric connectivity

Acme Corp is an example company that wants to deliver highly performant digital services to their customers and ensure employees can securely connect to their business apps from anywhere. They’ve purchased Magic Transit and Cloudflare Access and are evaluating Magic WAN to secure their network while getting the performance Cloudflare provides. They want to avoid potential network traffic congestion and latency delays, so they have designed a network architecture with their software-defined network fabric and Cloudflare using Cloudflare Network Interconnect.

With Cloudflare Network Interconnect, provisioning this connection is simple. Acme goes to their partner portal, and requests a virtual layer 2 connection to Cloudflare with the bandwidth of their choice. Cloudflare accepts the connection, provides the BGP session establishment information and organizes a turn up call if required. Easy!

Let’s talk about how Cloudflare and our partners have worked together to simplify the interconnectivity experience for the customer.

With Cloudflare Network interconnect, availability only gets better

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations
Connection Established with Cloudflare Network Interconnect

When a customer uses CNI to establish a connection between a fabric partner and Cloudflare, that connection runs over layer 2, configured via the partner user interface. Our new partnership model allows customers to connect privately and securely to Cloudflare’s network even when the customer is not located in the same data center as Cloudflare.

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations
Shorter Customer Path with New Partner Locations‌‌

The diagram above shows a shorter customer path thanks to an incremental partner-enabled location. Every time Cloudflare brings a new data center online and connects with a partner fabric, all customers in that region immediately benefit from the closer proximity and reduced latency.

Network fabrics in action

For those who want to self-serve, we’ve published documentation that details the steps in provisioning these connections. You can find the steps for each partner below:

As we expand our network, it’s critical we provide more ways to allow our customers to connect easily. We will continue to shorten the time it takes to set up a new interconnection, drive down costs, strengthen security and improve customer experience with all of our Network On-Ramp partners.

If you are using one of our software-defined networking partners and would like to connect to Cloudflare via their fabric, contact your fabric partner account team or reach out to us using the Cloudflare Network Interconnect page. If you are not using a fabric today, but would like to take advantage of software-defined networking to connect to Cloudflare, reach out to your account team.

Conntrack turns a blind eye to dropped SYNs

Post Syndicated from Jakub Sitnicki original https://blog.cloudflare.com/conntrack-turns-a-blind-eye-to-dropped-syns/


Conntrack turns a blind eye to dropped SYNs

We have been working with conntrack, the connection tracking layer in the Linux kernel, for years. And yet, despite the collected know-how, questions about its inner workings occasionally come up. When they do, it is hard to resist the temptation to go digging for answers.

One such question popped up while writing the previous blog post on conntrack:

“Why are there no entries in the conntrack table for SYN packets dropped by the firewall?”

Ready for a deep dive into the network stack? Let’s find out.

Conntrack turns a blind eye to dropped SYNs
Image by chulmin park from Pixabay

We already know from last time that conntrack is in charge of tracking incoming and outgoing network traffic. By running conntrack -L we can inspect existing network flows, or as conntrack calls them, connections.

So if we spin up a toy VM, connect to it over SSH, and inspect the contents of the conntrack table, we will see…

$ vagrant init fedora/33-cloud-base
$ vagrant up
$ vagrant ssh
Last login: Sun Jan 31 15:08:02 2021 from
[[email protected] ~]$ sudo conntrack -L
conntrack v1.4.5 (conntrack-tools): 0 flow entries have been shown.

… nothing!

Even though the conntrack kernel module is loaded:

[[email protected] ~]$ lsmod | grep '^nf_conntrack\b'
nf_conntrack          163840  1 nf_conntrack_netlink

Hold on a minute. Why is the SSH connection to the VM not listed in conntrack entries? SSH is working. With each keystroke we are sending packets to the VM. But conntrack doesn’t register it.

Isn’t conntrack an integral part of the network stack that sees every packet passing through it? 🤔

Conntrack turns a blind eye to dropped SYNs
Based on an image by Jan Engelhardt CC BY-SA 3.0

Clearly everything we learned about conntrack last time is not the whole story.

Calling into conntrack

Our little experiment with SSH’ing into a VM begs the question — how does conntrack actually get notified about network packets passing through the stack?

We can walk the receive path step by step and we won’t find any direct calls into the conntrack code in either the IPv4 or IPv6 stack. Conntrack does not interface with the network stack directly.

Instead, it relies on the Netfilter framework, and its set of hooks baked into in the stack:

int ip_rcv(struct sk_buff *skb, struct net_device *dev, …)
               net, NULL, skb, dev, NULL,

Netfilter users, like conntrack, can register callbacks with it. Netfilter will then run all registered callbacks when its hook processes a network packet.

For the INET family, that is IPv4 and IPv6, there are five Netfilter hooks  to choose from:

Conntrack turns a blind eye to dropped SYNs
Based on Nftables – Packet flow and Netfilter hooks in detail, thermalcircle.de, CC BY-SA 4.0

Which ones does conntrack use? We will get to that in a moment.

First, let’s focus on the trigger. What makes conntrack register its callbacks with Netfilter?

The SSH connection doesn’t show up in the conntrack table just because the module is loaded. We already saw that. This means that conntrack doesn’t register its callbacks with Netfilter at module load time.

Or at least, it doesn’t do it by default. Since Linux v5.1 (May 2019) the conntrack module has the enable_hooks parameter, which causes conntrack to register its callbacks on load:

[[email protected] ~]$ modinfo nf_conntrack
parm:           enable_hooks:Always enable conntrack hooks (bool)

Going back to our toy VM, let’s try to reload the conntrack module with enable_hooks set:

[[email protected] ~]$ sudo rmmod nf_conntrack_netlink nf_conntrack
[[email protected] ~]$ sudo modprobe nf_conntrack enable_hooks=1
[[email protected] ~]$ sudo conntrack -L
tcp      6 431999 ESTABLISHED src= dst= sport=22 dport=34858 src= dst= sport=34858 dport=22 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.5 (conntrack-tools): 1 flow entries have been shown.
[[email protected] ~]$

Nice! The conntrack table now contains an entry for our SSH session.

The Netfilter hook notified conntrack about SSH session packets passing through the stack.

Now that we know how conntrack gets called, we can go back to our question — can we observe a TCP SYN packet dropped by the firewall with conntrack?

Listing Netfilter hooks

That is easy to check:

  1. Add a rule to drop anything coming to port tcp/25702

[[email protected] ~]$ sudo iptables -t filter -A INPUT -p tcp --dport 2570 -j DROP

2) Connect to the VM on port tcp/2570 from the outside

host $ nc -w 1 -z 2570

3) List conntrack table entries

[[email protected] ~]$ sudo conntrack -L
tcp      6 431999 ESTABLISHED src= dst= sport=22 dport=34858 src= dst= sport=34858 dport=22 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.5 (conntrack-tools): 1 flow entries have been shown.

No new entries. Conntrack didn’t record a new flow for the dropped SYN.

But did it process the SYN packet? To answer that we have to find out which callbacks conntrack registered with Netfilter.

Netfilter keeps track of callbacks registered for each hook in instances of struct nf_hook_entries. We can reach these objects through the Netfilter state (struct netns_nf), which lives inside network namespace (struct net).

struct netns_nf {
    struct nf_hook_entries __rcu *hooks_ipv4[NF_INET_NUMHOOKS];
    struct nf_hook_entries __rcu *hooks_ipv6[NF_INET_NUMHOOKS];

struct nf_hook_entries, if you look at its definition, is a bit of an exotic construct. A glance at how the object size is calculated during its allocation gives a hint about its memory layout:

    struct nf_hook_entries *e;
    size_t alloc = sizeof(*e) +
               sizeof(struct nf_hook_entry) * num +
               sizeof(struct nf_hook_ops *) * num +
               sizeof(struct nf_hook_entries_rcu_head);

It’s an element count, followed by two arrays glued together, and some RCU-related state which we’re going to ignore. The two arrays have the same size, but hold different kinds of values.

We can walk the second array, holding pointers to struct nf_hook_ops, to discover the registered callbacks and their priority. Priority determines the invocation order.

Conntrack turns a blind eye to dropped SYNs

With drgn, a programmable C debugger tailored for the Linux kernel, we can locate the Netfilter state in kernel memory, and walk its contents relatively easily. Given we know what we are looking for.

[[email protected] ~]$ sudo drgn
drgn 0.0.8 (using Python 3.9.1, without libkdumpfile)
>>> pre_routing_hook = prog['init_net'].nf.hooks_ipv4[0]
>>> for i in range(0, pre_routing_hook.num_hook_entries):
...     pre_routing_hook.hooks[i].hook
(nf_hookfn *)ipv4_conntrack_defrag+0x0 = 0xffffffffc092c000
(nf_hookfn *)ipv4_conntrack_in+0x0 = 0xffffffffc093f290

Neat! We have a way to access Netfilter state.

Let’s take it to the next level and list all registered callbacks for each Netfilter hook (using less than 100 lines of Python):

[[email protected] ~]$ sudo /vagrant/tools/list-nf-hooks
       -400 → ipv4_conntrack_defrag     ☜ conntrack callback
       -300 → iptable_raw_hook
       -200 → ipv4_conntrack_in         ☜ conntrack callback
       -150 → iptable_mangle_hook
       -100 → nf_nat_ipv4_in

🪝 ipv4 LOCAL_IN
       -150 → iptable_mangle_hook
          0 → iptable_filter_hook
         50 → iptable_security_hook
        100 → nf_nat_ipv4_fn
 2147483647 → ipv4_confirm

The output from our script shows that conntrack has two callbacks registered with the PRE_ROUTING hook – ipv4_conntrack_defrag and ipv4_conntrack_in. But are they being called?

Conntrack turns a blind eye to dropped SYNs
Based on Netfilter PRE_ROUTING hook, thermalcircle.de, CC BY-SA 4.0

Tracing conntrack callbacks

We expect that when the Netfilter PRE_ROUTING hook processes a TCP SYN packet, it will invoke ipv4_conntrack_defrag and then ipv4_conntrack_in callbacks.

To confirm it we will put to use the tracing powers of BPF 🐝. BPF programs can run on entry to functions. These kinds of programs are known as BPF kprobes. In our case we will attach BPF kprobes to conntrack callbacks.

Usually, when working with BPF, we would write the BPF program in C and use clang -target bpf to compile it. However, for tracing it will be much easier to use bpftrace. With bpftrace we can write our BPF kprobe program in a high-level language inspired by AWK:

    $skb = (struct sk_buff *)arg1;
    $iph = (struct iphdr *)($skb->head + $skb->network_header);
    $th = (struct tcphdr *)($skb->head + $skb->transport_header);

    if ($iph->protocol == 6 /* IPPROTO_TCP */ &&
        $th->dest == 2570 /* htons(2570) */ &&
        $th->syn == 1) {
        time("%H:%M:%S ");
        printf("%s:%u > %s:%u tcp syn %s\n",
               (uint16)($th->source << 8) | ($th->source >> 8),
               (uint16)($th->dest << 8) | ($th->dest >> 8),

What does this program do? It is roughly an equivalent of a tcpdump filter:

dst port 2570 and tcp[tcpflags] & tcp-syn != 0

But only for packets passing through conntrack PRE_ROUTING callbacks.

(If you haven’t used bpftrace, it comes with an excellent reference guide and gives you the ability to explore kernel data types on the fly with bpftrace -lv 'struct iphdr'.)

Let’s run the tracing program while we connect to the VM from the outside (nc -z 2570):

[[email protected] ~]$ sudo bpftrace /vagrant/tools/trace-conntrack-prerouting.bt
Attaching 3 probes...
Tracing conntrack prerouting callbacks... Hit Ctrl-C to quit
13:22:56 > tcp syn ipv4_conntrack_defrag
13:22:56 > tcp syn ipv4_conntrack_in

[[email protected] ~]$

Conntrack callbacks have processed the TCP SYN packet destined to tcp/2570.

But if conntrack saw the packet, why is there no corresponding flow entry in the conntrack table?

Going down the rabbit hole

What actually happens inside the conntrack PRE_ROUTING callbacks?

To find out, we can trace the call chain that starts on entry to the conntrack callback. The function_graph tracer built into the Ftrace framework is perfect for this task.

But because all incoming traffic goes through the PRE_ROUTING hook, including our SSH connection, our trace will be polluted with events from SSH traffic. To avoid that, let’s switch from SSH access to a serial console.

When using libvirt as the Vagrant provider, you can connect to the serial console with virsh:

host $ virsh -c qemu:///session list
 Id   Name                State
 1    conntrack_default   running

host $ virsh -c qemu:///session console conntrack_default
Once connected to the console and logged into the VM, we can record the call chain using the trace-cmd wrapper for Ftrace:
[[email protected] ~]$ sudo trace-cmd start -p function_graph -g ipv4_conntrack_defrag -g ipv4_conntrack_in
  plugin 'function_graph'
[[email protected] ~]$ # … connect from the host with `nc -z 2570` …
[[email protected] ~]$ sudo trace-cmd stop
[[email protected] ~]$ sudo cat /sys/kernel/debug/tracing/trace
# tracer: function_graph
# |     |   |                     |   |   |   |
 1)   1.219 us    |  finish_task_switch();
 1)   3.532 us    |  ipv4_conntrack_defrag [nf_defrag_ipv4]();
 1)               |  ipv4_conntrack_in [nf_conntrack]() {
 1)               |    nf_conntrack_in [nf_conntrack]() {
 1)   0.573 us    |      get_l4proto [nf_conntrack]();
 1)               |      nf_ct_get_tuple [nf_conntrack]() {
 1)   0.487 us    |        nf_ct_get_tuple_ports [nf_conntrack]();
 1)   1.564 us    |      }
 1)   0.820 us    |      hash_conntrack_raw [nf_conntrack]();
 1)   1.255 us    |      __nf_conntrack_find_get [nf_conntrack]();
 1)               |      init_conntrack.constprop.0 [nf_conntrack]() {  ❷
 1)   0.427 us    |        nf_ct_invert_tuple [nf_conntrack]();
 1)               |        __nf_conntrack_alloc [nf_conntrack]() {      ❶
 1)   3.680 us    |        }
 1) + 15.847 us   |      }
 1) + 34.595 us   |    }
 1) + 35.742 us   |  }
[[email protected] ~]$

What catches our attention here is the allocation, __nf_conntrack_alloc() (❶), inside init_conntrack() (❷). __nf_conntrack_alloc() creates a struct nf_conn object which represents a tracked connection.

This object is not created in vain. A glance at init_conntrack() source shows that it is pushed onto a list of unconfirmed connections3.

Conntrack turns a blind eye to dropped SYNs

What does it mean that a connection is unconfirmed? As conntrack(8) man page explains:

       This table shows new entries, that are not yet inserted into the
       conntrack table. These entries are attached to packets that  are
       traversing  the  stack, but did not reach the confirmation point
       at the postrouting hook.

Perhaps we have been looking for our flow in the wrong table? Does the unconfirmed table have a record for our dropped TCP SYN?

Pulling the rabbit out of the hat

I have bad news…

[[email protected] ~]$ sudo conntrack -L unconfirmed
conntrack v1.4.5 (conntrack-tools): 0 flow entries have been shown.
[[email protected] ~]$

The flow is not present in the unconfirmed table. We have to dig deeper.

Let’s for a moment assume that a struct nf_conn object was added to the unconfirmed list. If the list is now empty, then the object must have been removed from the list before we inspected its contents.

Has an entry been removed from the unconfirmed table? What function removes entries from the unconfirmed table?

It turns out that nf_ct_add_to_unconfirmed_list() which init_conntrack() invokes, has its opposite defined just right beneath it – nf_ct_del_from_dying_or_unconfirmed_list().

It is worth a shot to check if this function is being called, and if so, from where. For that we can again use a BPF tracing program, attached to function entry. However, this time our program will record a kernel stack trace:

kprobe:nf_ct_del_from_dying_or_unconfirmed_list { @[kstack()] = count(); exit(); }

With bpftrace running our one-liner, we connect to the VM from the host with nc as before:

[[email protected] ~]$ sudo bpftrace -e 'kprobe:nf_ct_del_from_dying_or_unconfirmed_list { @[kstack()] = count(); exit(); }'
Attaching 1 probe...

    nf_ct_del_from_dying_or_unconfirmed_list+1 ❹
    kfree_skb+50 ❸
    nf_hook_slow+143 ❷
    ip_local_deliver+152 ❶
]: 1

[[email protected] ~]$

Bingo. The conntrack delete function was called, and the captured stack trace shows that on local delivery path (❶), where LOCAL_IN Netfilter hook runs (❷), the packet is destroyed (❸). Conntrack must be getting called when sk_buff (the packet and its metadata) is destroyed. This causes conntrack to remove the unconfirmed flow entry (❹).

It makes sense. After all we have a DROP rule in the filter/INPUT chain. And that iptables -j DROP rule has a significant side effect. It cleans up an entry in the conntrack unconfirmed table!

This explains why we can’t observe the flow in the unconfirmed table. It lives for only a very short period of time.

Not convinced? You don’t have to take my word for it. I will prove it with a dirty trick!

Making the rabbit disappear, or actually appear

If you recall the output from list-nf-hooks that we’ve seen earlier, there is another conntrack callback there – ipv4_confirm, which I have ignored:

[[email protected] ~]$ sudo /vagrant/tools/list-nf-hooks
🪝 ipv4 LOCAL_IN
       -150 → iptable_mangle_hook
          0 → iptable_filter_hook
         50 → iptable_security_hook
        100 → nf_nat_ipv4_fn
 2147483647 → ipv4_confirm              ☜ another conntrack callback

ipv4_confirm is “the confirmation point” mentioned in the conntrack(8) man page. When a flow gets confirmed, it is moved from the unconfirmed table to the main conntrack table.

The callback is registered with a “weird” priority – 2,147,483,647. It’s the maximum positive value of a 32-bit signed integer can hold, and at the same time, the lowest possible priority a callback can have.

This ensures that the ipv4_confirm callback runs last. We want the flows to graduate from the unconfirmed table to the main conntrack table only once we know the corresponding packet has made it through the firewall.

Luckily for us, it is possible to have more than one callback registered with the same priority. In such cases, the order of registration matters. We can put that to use. Just for educational purposes.

Good old iptables won’t be of much help here. Its Netfilter callbacks have hard-coded priorities which we can’t change. But nftables, the iptables successor, is much more flexible in this regard. With nftables we can create a rule chain with arbitrary priority.

So this time, let’s use nftables to install a filter rule to drop traffic to port tcp/2570. The trick, though, is to register our chain before conntrack registers itself. This way our filter will run last.

First, delete the tcp/2570 drop rule in iptables and unregister conntrack.

vm # iptables -t filter -F
vm # rmmod nf_conntrack_netlink nf_conntrack

Then add tcp/2570 drop rule in nftables, with lowest possible priority.

vm # nft add table ip my_table
vm # nft add chain ip my_table my_input { type filter hook input priority 2147483647 \; }
vm # nft add rule ip my_table my_input tcp dport 2570 counter drop
vm # nft -a list ruleset
table ip my_table { # handle 1
        chain my_input { # handle 1
                type filter hook input priority 2147483647; policy accept;
                tcp dport 2570 counter packets 0 bytes 0 drop # handle 4

Finally, re-register conntrack hooks.

vm # modprobe nf_conntrack enable_hooks=1

The registered callbacks for the LOCAL_IN hook now look like this:

vm # /vagrant/tools/list-nf-hooks
🪝 ipv4 LOCAL_IN
       -150 → iptable_mangle_hook
          0 → iptable_filter_hook
         50 → iptable_security_hook
        100 → nf_nat_ipv4_fn
 2147483647 → ipv4_confirm, nft_do_chain_ipv4

What happens if we connect to port tcp/2570 now?

vm # conntrack -L
tcp      6 115 SYN_SENT src= dst= sport=54868 dport=2570 [UNREPLIED] src= dst= sport=2570 dport=54868 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.5 (conntrack-tools): 1 flow entries have been shown.

We have fooled conntrack 💥

Conntrack promoted the flow from the unconfirmed to the main conntrack table despite the fact that the firewall dropped the packet. We can observe it.


Conntrack processes every received packet4 and creates a flow for it. A flow entry is always created even if the packet is dropped shortly after. The flow might never be promoted to the main conntrack table and can be short lived.

However, this blog post is not really about conntrack. Its internals have been covered by magazines, papers, books, and on other blogs long before. We probably could have learned elsewhere all that has been shown here.

For us, conntrack was really just an excuse to demonstrate various ways to discover the inner workings of the Linux network stack. As good as any other.

Today we have powerful introspection tools like drgn, bpftrace, or Ftrace, and a cross referencer to plow through the source code, at our fingertips. They help us look under the hood of a live operating system and gradually deepen our understanding of its workings.

I have to warn you, though. Once you start digging into the kernel, it is hard to stop…

1Actually since Linux v5.10 (Dec 2020) there is an additional Netfilter hook for the INET family named NF_INET_INGRESS. The new hook type allows users to attach nftables chains to the Traffic Control ingress hook.
2Why did I pick this port number? Because 2570 = 0x0a0a. As we will see later, this saves us the trouble of converting between the network byte order and the host byte order.
3To be precise, there are multiple lists of unconfirmed connections. One per each CPU. This is a common pattern in the kernel. Whenever we want to prevent CPUs from contending for access to a shared state, we give each CPU a private instance of the state.
4Unless we explicitly exclude it from being tracked with iptables -j NOTRACK.

Cloudflare recognized as a ‘Leader’ in The Forrester Wave for DDoS Mitigation Solutions

Post Syndicated from Vivek Ganti original https://blog.cloudflare.com/cloudflare-is-named-a-leader-in-the-forrester-wave-for-ddos-mitigation-solutions/

Cloudflare recognized as a 'Leader' in The Forrester Wave for DDoS Mitigation Solutions

Cloudflare recognized as a 'Leader' in The Forrester Wave for DDoS Mitigation Solutions

We’re thrilled to announce that Cloudflare has been named a leader in The Forrester WaveTM: DDoS Mitigation Solutions, Q1 2021. You can download a complimentary copy of the report here.

According to the report, written by, Forrester Senior Analyst for Security and Risk, David Holmes, “Cloudflare protects against DDoS from the edge, and fast… customer references view Cloudflare’s edge network as a compelling way to protect and deliver applications.”

Unmetered and unlimited DDoS protection for all

Cloudflare was founded with the mission to help build a better Internet — one where the impact of DDoS attacks is a thing of the past. Over the last 10 years, we have been unwavering in our efforts to protect our customers’ Internet properties from DDoS attacks of any size or kind. In 2017, we announced unmetered DDoS protection for free — as part of every Cloudflare service and plan including the Free plan — to make sure every organization can stay protected and available.

Thanks to our home-grown automated DDoS protection systems, we’re able to provide unmetered and unlimited DDoS protection for free. Our automated systems constantly analyze traffic samples asynchronously as to avoid impact to performance. They scan for DDoS attacks across layers 3-7 of the OSI model. They look for patterns in IP packets, HTTP requests and HTTP responses. When an attack is identified, a real-time signature is generated in the form of an ephemeral mitigation rule. The rule is propagated to the most optimal location in our edge for the most cost-efficient mitigation: either in the Linux kernel’s eXpress Data Path (XDP), Linux userspace iptables or in the HTTP reverse-proxy. A cost-efficient mitigation strategy means that we can mitigate the most volumetric, distributed attacks without impacting performance.

Cloudflare recognized as a 'Leader' in The Forrester Wave for DDoS Mitigation Solutions

Read more about how Cloudflare’s DDoS protection systems work here.

DDoS attacks increasing

We’d like to say DDoS attacks are a thing of the past. But unfortunately, they are not.

On the contrary, we continue to see the frequency, sophistication, and geographical distribution of DDoS attacks rise every quarter – in quantity or size. See our reports from last year (Q1 ‘20, Q2 ‘20, Q3 ‘20, and Q4 ‘20) and view overall Internet traffic trends here on Cloudflare Radar.

Over the past year, Cloudflare has seen and automatically mitigated some of the largest and arguably the most creative cyber attacks. As attackers are getting bolder and smarter in their ways, organizations are looking for ways to battle these kinds of attacks with no disruption to the services they provide.

Cloudflare recognized as a 'Leader' in The Forrester Wave for DDoS Mitigation Solutions
DDoS attacks in 2020

Organizations are being extorted under threat of DDoS

In January this year, we shared the story of how we helped a Fortune Global 500 company stay online and protected whilst they were targeted by a ransom DDoS attack. They weren’t the only one. In fact, in the fourth quarter of 2020, 17% of surveyed Cloudflare customers reported receiving a ransom or a threat of DDoS attack. In Q1 2021, this increased to 26% — roughly 1 out of every 4 respondents reported a ransom threat and a subsequent DDoS attack on their network infrastructure.

Cloudflare recognized as a 'Leader' in The Forrester Wave for DDoS Mitigation Solutions

Whether organizations are targeted with ransom attacks or amateur ‘cyber vandalism’, it’s important for organizations to utilize an always-on, automated DDoS protection service that doesn’t require manual human intervention in the hour of need. We take great pride in being able to provide this level of protection to our customers.

Continuous improvement

As attacks have continued to evolve, and the number of customers using our services has increased, Cloudflare has continually invested in our technology to stay several steps ahead of attackers. We’ve made significant investments in bolstering our mitigation capacity, honing our detection algorithms, and providing better analytics capabilities to our customers. Our aim is to make impact from DDoS attacks a thing of the past, for all customers, just like spam in the 90s.

In 2019, we rolled out our autonomous DDoS detection and mitigation system, dosd. This component of our mitigation stack is fully software-defined, leverages Linux’s eXpress Data Path (XDP), and allows us to quickly and automatically deploy eBPF rules that run on each packet received for inspection — mitigating the most sophisticated attacks within less than 3 seconds on average at the edge and other common attacks instantly. It works by detecting patterns in the attack traffic and then quickly deploying rules autonomously to drop the offenders at wire speed. Additionally, because dosd operates independently within each data center, with no reliance on a centralized data center, it greatly increases the resilience of our network.

While dosd is great at mitigating attacks by detecting patterns in the traffic, what about patternless attacks? That’s where flowtrackd comes in, our novel TCP state classification engine, built in 2020, to defend against disruptive L3/L4 attacks targeting our Magic Transit customers. It’s able to detect and mitigate the most randomized, sophisticated attacks. Additionally, at L7, we also learn our customer’s traffic baselines and identify when their origin is in distress. When an origin server shows signs of deterioration, our systems begin soft mitigation in order to reduce the impact on the server and allow it to recuperate.

Building advanced DDoS protection systems is not only about the detection, but also about cost efficient mitigation. We aim to mitigate attacks without impacting performance that can be caused due to excessive computational consumption. This requirement is why we introduced IP Jails to the world: IP Jails is a gatebot capability that mitigates the most volumetric and distributed attacks without impacting performance. Gatebot activates IP Jails when attacks become significantly volumetric, and then instead of blocking at L7, IP Jails temporarily drops the connection of the offending IP address that generated the request which matched the attack signature that gatebot created. IP Jails leverages the Linux iptables mechanism to drop packets at wirespeed. Dropping L7 attacks at L4, is significantly more cost-efficient, and benefits both our customers and our Site Reliability Engineering team.

Lastly, to provide our customers better visibility and insight into the increasingly sophisticated attacks we’re seeing and mitigating, we released the Firewall Analytics dashboard in 2019. This dashboard provides insights into both HTTP application security and DDoS activity at L7, allowing customers to configure rules directly from within analytics dashboards thus tightening the feedback loop for responding to events. Later in 2020, we released an equivalent dashboard for L3/4 activity for our enterprise Magic Transit and Spectrum customers, in the form of the Network Analytics dashboard. Network Analytics provides insight into packet-level traffic and DDoS attack activity, along with periodical Insights and Trends. To complement the dashboards and provide our users the right information as they need it, we rolled out real-time DDoS alerts and also periodical DDoS reports — right into your inboxes. Or if you prefer, directly into your SIEM dashboards.

Cloudflare received the top score in the strategy category

This year, due to our advanced DDoS protection capabilities, Cloudflare received the top score in the strategy category and among the top three in the current offering category. Additionally, we were given the highest possible scores in 15 criteria in the report, including:

  • Threat detection
  • Burst attacks
  • Response automation
  • Speed of implementation
  • Product vision
  • Performance
  • Security operation center (SOC) service

We believe that our standing stems from the sustained investments we’ve made over the last few years in our global Anycast network — which serves as the foundation of all services we provide to our customers.

Our network is architected for scale — every service runs on every server in every Cloudflare data center that spans over 200 cities globally. And as opposed to some of the other vendors in the report, every Cloudflare service is delivered from every one of our edge data centers.

Integrated security and performance

A leading application performance monitoring company that uses Cloudflare’s services for serverless compute and content delivery recently told us that they wanted to consolidate their performance and security services under one provider. They got rid of their incumbent L3 services provider and onboarded Cloudflare for their application and network services (with Magic Transit) for easier management and better support.

We see this more and more. The benefits of using a single cloud provider for bundled security and performance services are plentiful:

  • Easier management — users can manage all of Cloudflare’s services such as DDoS protection, WAF, CDN, bot management and serverless compute from a single dashboard and a single API endpoint.
  • Deep service integration – all of our services are deeply integrated which allows our users to truly leverage the power of Cloudflare. As an example, Bot Management rules are implemented with our Application Firewall.
  • Easier troubleshooting — instead of having to reach out to multiple providers, our customers have a single point of contact when troubleshooting. Additionally, we provide immediate human response in our under attack hotline.
  • Lower latency — because every one of our services are delivered from all of our data centers, there are no performance penalties. As an example, there are no additional routing hops between the DDoS service to Bot Management service to CDN service.

However, not all cloud services are built the same, i.e. most vendors today do not have a comprehensive and robust solution to offer. Cloudflare’s unique architecture enables it to offer an integrated solution that comprises an all-star cast featuring the following to name a few:

  • CDN: Customer’s Choice LEADER in 2020 Gartner Peer Insights ‘Voice of the Customer’: Global CDN1
  • DDoS: Received the highest number of high scores in the 2020 Gartner report for Solution Comparison for DDoS Cloud Scrubbing Centers2
  • WAF: Cloudflare is a CHALLENGER in the 2020 Gartner Magic Quadrant for Web Application Firewall (receiving the highest placement in the ‘Ability to Execute’)3
  • Zero Trust: Cloudflare is a LEADER in the Omdia Market Radar: Zero-Trust Access Report, 20204
  • Bot Management: Leader in the 2020 SPARK Matrix of Bot Management Market5
  • Integrated solution: Innovation leader in the Global Holistic Web Protection Market for 2020 by Frost & Sullivan6

We are pleased to be named a LEADER in The Forrester Wave™: for DDoS Mitigation Solutions, Q1 2021 report, and will continue to work tirelessly to remain, as the report puts it, a “compelling way to protect and deliver applications” for our customers.

For more information about Cloudflare’s DDoS protection, reach out to us here or hands-on evaluation of Cloudflare, sign up here.

3Gartner, “Magic Quadrant for Web Application Firewalls”, Analyst(s): Jeremy D’Hoinne, Adam Hils, John Watts, Rajpreet Kaur, October 19, 2020. https://www.gartner.com/doc/reprints?id=1-249JQ6L1&ct=200929&st=sb

An introduction to three-phase power and PDUs

Post Syndicated from Rob Dinh original https://blog.cloudflare.com/an-introduction-to-three-phase-power-and-pdus/

An introduction to three-phase power and PDUs

Our fleet of over 200 locations comprises various generations of servers and routers. And with the ever changing landscape of services and computing demands, it’s imperative that we manage power in our data centers right. This blog is a brief Electrical Engineering 101 session going over specifically how power distribution units (PDU) work, along with some good practices on how we use them. It appears to me that we could all use a bit more knowledge on this topic, and more love and appreciation of something that’s critical but usually taken for granted, like hot showers and opposable thumbs.

A PDU is a device used in data centers to distribute power to multiple rack-mounted machines. It’s an industrial grade power strip typically designed to power an average consumption of about seven US households. Advanced models have monitoring features and can be accessed via SSH or webGUI to turn on and off power outlets. How we choose a PDU depends on what country the data center is and what it provides in terms of voltage, phase, and plug type.

An introduction to three-phase power and PDUs

For each of our racks, all of our dual power-supply (PSU) servers are cabled to one of the two vertically mounted PDUs. As shown in the picture above, one PDU feeds a server’s PSU via a red cable, and the other PDU feeds that server’s other PSU via a blue cable. This is to ensure we have power redundancy maximizing our service uptime; in case one of the PDUs or server PSUs fail, the redundant power feed will be available keeping the server alive.

Faraday’s Law and Ohm’s Law

Like most high-voltage applications, PDUs and servers are designed to use AC power. Meaning voltage and current aren’t constant — they’re sine waves with magnitudes that can alternate between positive and negative at a certain frequency. For example, a voltage feed of 100V is not constantly at 100V, but it bounces between 100V and -100V like a sine wave. One complete sine wave cycle is one phase of 360 degrees, and running at 50Hz means there are 50 cycles per second.

The sine wave can be explained by Faraday’s Law and by looking at how an AC power generator works. Faraday’s Law tells us that a current is induced to flow due to a changing magnetic field. Below illustrates a simple generator with a permanent magnet rotating at constant speed and a coil coming in contact with the magnet’s magnetic field. Magnetic force is strongest at the North and South ends of the magnet. So as it rotates on itself near the coil, current flow fluctuates in the coil. One complete rotation of the magnet represents one phase. As the North end approaches the coil, current increases from zero. Once the North end leaves, current decreases to zero. The South end in turn approaches, now the current “increases” in the opposite direction. And finishing the phase, the South end leaves returning the current back to zero. Current alternates its direction at every half cycle, hence the naming of Alternating Current.

An introduction to three-phase power and PDUs

Current and voltage in AC power fluctuate in-phase, or “in tandem”, with each other. So by Ohm’s Law of Power = Voltage x Current, power will always be positive. Notice on the graph below that AC power (Watts) has two peaks per cycle. But for practical purposes, we’d like to use a constant power value. We do that by interpreting AC power into “DC” power using root-mean-square (RMS) averaging, which takes the max value and divides it by √2. For example, in the US, our conditions are 208V 24A at 60Hz. When we look at spec sheets, all of these values can be assumed as RMS’d into their constant DC equivalent values. When we say we’re fully utilizing a PDU’s max capacity of 5kW, it actually means that the power consumption of our machines bounces between 0 and 7.1kW (5kW x √2).

An introduction to three-phase power and PDUs
An introduction to three-phase power and PDUs

It’s also critical to figure out the sum of power our servers will need in a rack so that it falls under the PDU’s design max power capacity. For our US example, a PDU is typically 5kW (208 volts x 24 amps); therefore, we’re budgeting 5kW and fit as many machines as we can under that. If we need more machines and the total sum power goes above 5kW, we’d need to provision another power source. That would lead to possibly another set of PDUs and racks that we may not fully use depending on demand; e.g. more underutilized costs. All we can do is abide by P = V x I.

However there is a way we can increase the max power capacity economically — 3-phase PDU. Compared to single phase, its max capacity is √3 or 1.7 times higher. A 3-phase PDU of the same US specs above has a capacity of 8.6kW (5kW x √3), allowing us to power more machines under the same source. Using a 3-phase setup might mean it has thicker cables and bigger plug; but despite being more expensive than a 1-phase, its value is higher compared to two 1-phase rack setups for these reasons:

  • It’s more cost-effective, because there are fewer hardware resources to buy
  • Say the computing demand adds up to 215kW of hardware, we would need 25 3-phase racks compared to 43 1-phase racks.
  • Each rack needs two PDUs for power redundancy. Using the example above, we would need 50 3-phase PDUs compared to 86 1-phase PDUs to power 215kW worth of hardware.
  • That also means a smaller rack footprint and fewer power sources provided and charged by the data center, saving us up to √3 or 1.7 times in opex.
  • It’s more resilient, because there are more circuit breakers in a 3-phase PDU — one more than in a 1-phase. For example, a 48-outlet PDU that is 1-phase would be split into two circuits of 24 outlets. While a 3-phase one would be split into 3 circuits of 16 outlets. If a breaker tripped, we’d lose 16 outlets using a 3-phase PDU instead of 24.
An introduction to three-phase power and PDUs

The PDU shown above is a 3-phase model of 48 outlets. We can see three pairs of circuit breakers for the three branches that are intertwined with each other white, grey, and black. Industry demands today pressure engineers to maximize compute performance and minimize physical footprint, making the 3-phase PDU a widely-used part of operating a data center.

What is 3-phase?

A 3-phase AC generator has three coils instead of one where the coils are 120 degrees apart inside the cylindrical core, as shown in the figure below. Just like the 1-phase generator, current flow is induced by the rotation of the magnet thus creating power from each coil sequentially at every one-third of the magnet’s rotation cycle. In other words, we’re generating three 1-phase power offset by 120 degrees.

An introduction to three-phase power and PDUs

A 3-phase feed is set up by joining any of its three coils into line pairs. L1, L2, and L3 coils are live wires with each on their own phase carrying their own phase voltage and phase current. Two phases joining together form one line carrying a common line voltage and line current. L1 and L2 phase voltages create the L1/L2 line voltage. L2 and L3 phase voltages create the L2/L3 line voltage. L1 and L3 phase voltages create the L1/L3 line voltage.

An introduction to three-phase power and PDUs

Let’s take a moment to clarify the terminology. Some other sources may refer to line voltage (or current) as line-to-line or phase-to-phase voltage (or current). It can get confusing, because line voltage is the same as phase voltage in 1-phase circuits, as there’s only one phase. Also, the magnitude of the line voltage is equal to the magnitude of the phase voltage in 3-phase Delta circuits, while the magnitude of the line current is equal to the magnitude of the phase current in 3-phase Wye circuits.

Conversely, the line current equals to phase current times √3 in Delta circuits. In Wye circuits, the line voltage equals to phase voltage times √3.

In Delta circuits:
Vline = Vphase
Iline = √3 x Iphase

In Wye circuits:
Vline = √3 x Vphase
Iline = Iphase

Delta and Wye circuits are the two methods that three wires can join together. This happens both at the power source with three coils and at the PDU end with three branches of outlets. Note that the generator and the PDU don’t need to match each other’s circuit types.

An introduction to three-phase power and PDUs

On PDUs, these phases join when we plug servers into the outlets. So we conceptually use the wirings of coils above and replace them with resistors to represent servers. Below is a simplified wiring diagram of a 3-phase Delta PDU showing the three line pairs as three modular branches. Each branch carries two phase currents and its own one common voltage drop.

An introduction to three-phase power and PDUs

And this one below is of a 3-phase Wye PDU. Note that Wye circuits have an additional line known as the neutral line where all three phases meet at one point. Here each branch carries one phase and a neutral line, therefore one common current. The neutral line isn’t considered as one of the phases.

An introduction to three-phase power and PDUs

Thanks to a neutral line, a Wye PDU can offer a second voltage source that is √3 times lower for smaller devices, like laptops or monitors. Common voltages for Wye PDUs are 230V/400V or 120V/208V, particularly in North America.

Where does the √3 come from?

Why are we multiplying by √3? As the name implies, we are adding phasors. Phasors are complex numbers representing sine wave functions. Adding phasors is like adding vectors. Say your GPS tells you to walk 1 mile East (vector a), then walk a 1 mile North (vector b). You walked 2 miles, but you only moved by 1.4 miles NE from the original location (vector a+b). That 1.4 miles of “work” is what we want.

Let’s take in our application L1 and L2 in a Delta circuit. we add phases L1 and L2, we get a L1/L2 line. We assume the 2 coils are identical. Let’s say α represents the voltage magnitude for each phase. The 2 phases are 120 degrees offset as designed in the 3-phase power generator:

|L1| = |L2| = α
L1 = |L1|∠0° = α∠0°
L2 = |L2|∠-120° = α∠-120°

Using vector addition to solve for L1/L2:

L1/L2 = L1 + L2

An introduction to three-phase power and PDUs
An introduction to three-phase power and PDUs

Convert L1/L2 into polar form:

An introduction to three-phase power and PDUs

Since voltage is a scalar, we’re only interested in the “work”:

|L1/L2| = √3α

Given that α also applies for L3. This means for any of the three line pairs, we multiply the phase voltage by √3 to calculate the line voltage.

Vline = √3 x Vphase

Now with the three line powers being equal, we can add them all to get the overall effective power. The derivation below works for both Delta and Wye circuits.

Poverall = 3 x Pline
Poverall = 3 x (Vline x Iline)
Poverall = (3/√3) x (Vphase x Iphase)
Poverall = √3 x Vphase x Iphase

Using the US example, Vphase is 208V and Iphase is 24A. This leads to the overall 3-phase power to be 8646W (√3 x 208V x 24A) or 8.6kW. There lies the biggest advantage for using 3-phase systems. Adding 2 sets of coils and wires (ignoring the neutral wire), we’ve turned a generator that can produce √3 or 1.7 times more power!

Dealing with 3-phase

The derivation in the section above assumes that the magnitude at all three phases is equal, but we know in practice that’s not always the case. In fact, it’s barely ever. We rarely have servers and switches evenly distributed across all three branches on a PDU. Each machine may have different loads and different specs, so power could be wildly different, potentially causing a dramatic phase imbalance. Having a heavily imbalanced setup could potentially hinder the PDU’s available capacity.

A perfectly balanced and fully utilized PDU at 8.6kW means that each of its three branches has 2.88kW of power consumed by machines. Laid out simply, it’s spread 2.88 + 2.88 + 2.88. This is the best case scenario. If we were to take 1kW worth of machines out of one branch, spreading power to 2.88 + 1.88 + 2.88. Imbalance is introduced, the PDU is underutilized, but we’re fine. However, if we were to put back that 1kW into another branch — like 3.88 + 1.88 + 2.88 — the PDU is over capacity, even though the sum is still 8.6kW. In fact, it would be over capacity even if you just added 500W instead of 1kW on the wrong branch, thus reaching 3.18 + 1.88 + 2.88 (8.1kW).

That’s because a 8.6kW PDU is spec’d to have a maximum of 24A for each phase current. Overloading one of the branches can force phase currents to go over 24A. Theoretically, we can reach the PDU’s capacity by loading one branch until its current reaches 24A and leave the other two branches unused. That’ll render it into a 1-phase PDU, losing the benefit of the √3 multiplier. In reality, the branch would have fuses rated less than 24A (usually 20A) to ensure we won’t reach that high and cause overcurrent issues. Therefore the same 8.6kW PDU would have one of its branches tripped at 4.2kW (208V x 20A).

Loading up one branch is the easiest way to overload the PDU. Being heavily imbalanced significantly lowers PDU capacity and increases risk of failure. To help minimize that, we must:

  • Ensure that total power consumption of all machines is under the PDU’s max power capacity
  • Try to be as phase-balanced as possible by spreading cabling evenly across the three branches
  • Ensure that the sum of phase currents from powered machines at each branch is under the fuse rating at the circuit breaker.

This spreadsheet from Raritan is very useful when designing racks.

For the sake of simplicity, let’s ignore other machines like switches. Our latest 2U4N servers are rated at 1800W. That means we can only fit a maximum of four of these 2U4N chassis (8600W / 1800W = 4.7 chassis). Rounding them up to 5 would reach a total rack level power consumption of 9kW, so that’s a no-no.

Splitting 4 chassis into 3 branches evenly is impossible, and will force us to have one of the branches to have 2 chassis. That would lead to a non-ideal phase balancing:

An introduction to three-phase power and PDUs

Keeping phase currents under 24A, there’s only 1.1A (24A – 22.9A) to add on L1 or L2 before the PDU gets overloaded. Say we want to add as many machines as we can under the PDU’s power capacity. One solution is we can add up to 242W on the L1/L2 branch until both L1 and L2 currents reach their 24A limit.

An introduction to three-phase power and PDUs

Alternatively, we can add up to 298W on the L2/L3 branch until L2 current reaches 24A. Note we can also add another 298W on the L3/L1 branch until L1 current reaches 24A.

An introduction to three-phase power and PDUs

In the examples above, we can see that various solutions are possible. Adding two 298W machines each at L2/L3 and L3/L1 is the most phase balanced solution, given the parameters. Nonetheless, PDU capacity isn’t optimized at 7.8kW.

Dealing with a 1800W server is not ideal, because whichever branch we choose to power one would significantly swing the phase balance unfavorably. Thankfully, our Gen X servers take up less space and are more power efficient. Smaller footprint allows us to have more flexibility and fine-grained control over our racks in many of our diverse data centers. Assuming each 1U server is 450W, as if we physically split the 1800W 2U4N into fours each with their own power supplies, we’re now able to fit 18 nodes. That’s 2 more nodes than the four 2U4N setup:

An introduction to three-phase power and PDUs

Adding two more servers here means we’ve increased our value by 12.5%. While there are more factors not considered here to calculate the Total Cost of Ownership, this is still a great way to show we can be smarter with asset costs.

Cloudflare provides the back-end services so that our customers can enjoy the performance, reliability, security, and global scale of our edge network. Meanwhile, we manage all of our hardware in over 100 countries with various power standards and compliances, and ensure that our physical infrastructure is as reliable as it can be.

There’s no Cloudflare without hardware, and there’s no Cloudflare without power. Want to know more? Watch this Cloudflare TV segment about power: https://cloudflare.tv/event/7E359EDpCZ6mHahMYjEgQl.

ASICs at the Edge

Post Syndicated from Tom Strickx original https://blog.cloudflare.com/asics-at-the-edge/

ASICs at the Edge

At Cloudflare we pride ourselves in our global network that spans more than 200 cities in over 100 countries. To handle all the traffic passing through our network, there are multiple technologies at play. So let’s have a look at one of the cornerstones that makes all of this work… ASICs. No, not the running shoes.

What’s an ASIC?

ASIC stands for Application Specific Integrated Circuit. The name already says it, it’s a chip with a very narrow use case, geared towards a single application. This is in stark contrast to a CPU (Central Processing Unit), or even a GPU (Graphics Processing Unit). A CPU is designed and built for general purpose computation, and does a lot of things reasonably well. A GPU is more geared towards graphics (it’s in the name), but in the last 15 years, there’s been a drastic shift towards GPGPU (General Purpose GPU), in which technologies such as CUDA or OpenCL allow you to use the highly parallel nature of the GPU to do general purpose computing. A good example of GPU use is video encoding, or more recently, computer vision, used in applications such as self-driving cars.

Unlike CPUs or GPUs, ASICs are built with a single function in mind. Great examples are the Google Tensor Processing Units (TPU), used to accelerate machine learning functions[1], or for orbital maneuvering[2], in which specific orbital maneuvers are encoded, like the Hohmann Transfer, used to move rockets (and their payloads) to a new orbit at a different altitude. And they are also heavily used in the networking industry. Technically, the use case in the network industry should be called an ASSP (Application Specific Standard Product), but network engineers are simple people, so we prefer to call it an ASIC.

Why an ASIC

ASICs have the major benefit of being hyper-efficient. The more complex hardware is, the more it will need cooling and power. As ASICs only contain the hardware components needed for their function, their overall size can be reduced, and so are their power requirements. This has a positive impact on the overall physical size of the network (devices don’t need to be as bulky to provide sufficient cooling), and helps reduce the power consumption of a data center.

Reducing hardware complexity also reduces the failure rate of the manufacturing process, and allows for easier production.

The downside is that you need to embed a lot of your features in hardware, and once a new technology or specification comes around, any chips made without that technology baked in, won’t be able to support it (VXLAN for example).

For network equipment, this works perfectly. Overall, the networking industry is slow-moving, and considerable time is taken before new technologies make it to the market (as can be seen with IPv6, MPLS implementations, xDSL availability, …). This means the chips don’t need to evolve on a yearly basis, and can instead be created on a much slower cycle, with bigger leaps in technology. For example, it took Broadcom two years to go from Tomahawk 3 to Tomahawk 4, but in that process they doubled the throughput. The benefits listed earlier are super helpful for network equipment, as they allow for considerable throughput in a small form factor.

Building an ASIC

As with chips of any kind, building an ASIC is a long-term process. Just like with CPUs, if there’s a defect in the hardware design, you have to start from scratch, and scrap the entire build line. As such, the development lifecycle is incredibly long. It starts with prototyping in an FPGA (Field Programmable Gate Array), in which chip designers can program their required functionality and confirm compatibility. All of this is done in a HDL (Hardware Description Language), such as Verilog.

Once the prototyping stage is over, they move to baking the new packet processing pipeline into the chip at a foundry. After that, no more changes can be made to the chip, as it’s literally baked into the hardware (unlike an FPGA, which can be reprogrammed). Further difficulty is added by the fact that there are a very small number of hardware companies that will buy ASICs in bulk to build equipment with; as such the unit cost can increase drastically.

All of this means that the iteration cycle of an ASIC tends to be on the slower side of things (compared to the yearly refreshes in the Intel Process-Architecture-Optimization model for example), and will usually be smaller incremental updates: For example, increases in port-speeds are incremental (1G → 10G → 25G → 40G → 100G → 400G → 800G → …), and are tied into upgrades to the SerDes (Serialiser/Deserialiser) part of the chip.

New protocol support is a lot harder, and might require multiple development cycles before it shows up in a chip.

What ASICs do

The ASICs in our network equipment are responsible for the switching and routing of packets, as well as being the first layer of defense (in the form of a stateless firewall). Due to the sheer nature of how fast packets get switched, fast memory access is a primary concern. Most ASICs will use a special sort of memory, called TCAM (Ternary Content-Addressable Memory). This memory will be used to store all sorts of lookup tables. These may be forwarding tables (where does this packet go), ACL (Access Control List) tables (is this packet allowed), or CoS (Class of Service) tables (which priority should be given to this packet)

CAM, and its more advanced sibling, TCAM, are fascinating kinds of memory, as they operate fundamentally different than traditional Random Access Memory (RAM). While you have to use a memory address to access data in RAM, with CAM and TCAM you can directly refer to the content you are looking for. It is a physical implementation of a key-value store.

In CAM you use the exact binary representation of a word, in a network application, that word is likely going to be an IP address, so 11001011.00000000.01110001.00000000 for example ( While this is definitely useful, networks operate a big collection of IP addresses, and storing each individually would require significant memory. To remedy this memory requirement, TCAM can store three states, instead of the binary two. This third state, sometimes called ‘ignore’ state, allows for the storage of multiple sequential data words as a single entry.

In networking, these sequential data words are IP prefixes. So for the previous example, if we wanted to store the collection of that IP address, and the 254 IPs following it, in TCAM it would as follows: 11001011.00000000.01110001.XXXXXXXX ( This storage method means we can ask questions of the ASIC such as “where should I send packets with the destination IP address of”, to which the ASIC can have a reply ready in a single clock cycle, as it does not need to run through all memory, but instead can directly reference the key. This reply will usually be a reference to a memory address in traditional RAM, where more data can be stored, such as output port, or firewall requirements for the packet.
ASICs at the Edge

To dig a bit deeper into what ASICs do in network equipment, let’s briefly go over some fundamentals.

Networking can be split into two primary components: routing and switching. Switching allows you to directly interconnect multiple devices, so they can talk with each other across the network. It’s what allows your phone to connect to your TV to play a new family video. Routing is the next level up. It’s the mechanism that interconnects all these switched networks into a network of networks, and eventually, the Internet.

So routers are the devices responsible for steering traffic through this complex maze of networks, so it gets to its destination safely, and hopefully, as fast as possible. On the Internet, routers will usually use a routing protocol called BGP (Border Gateway Protocol) to exchange reachability information for a prefix (a collection of IP addresses), also called NLRI (Network Layer Reachability Information).

As with navigating the roads, there are multiple ways to get from point A to point B on the Internet. To make sure the router makes the right decision, it will store all of the reachability information in the RIB (Routing Information Base). That way, if anything changes with one route, the router still has other options immediately available.

With this information, a BGP daemon can calculate the ideal path to take for any given destination from its own point-of-view. This Cisco documentation explains the decision process the daemon goes through to calculate that ideal path.

Once we have this ideal path for a given destination, we should store this information, as it would be very inefficient to calculate this every single time we need to go there. The storage database is called the FIB (Forwarding Information Base). The FIB will be a subset of the RIB, as it will only ever contain the best path for a destination at any given time, while the RIB keeps all the available paths, even the non-ideal ones.

With these individual components, routers can make packets go from point A to point B in a blink of an eye.

Here’ are some of the more specific functions our ASICs need to perform:

  1. FIB install: Once the router has calculated its FIB, it’s important the router can access this as quickly as possible. To do so, the ASIC will install (write) this calculated FIB into the TCAM, so any lookups can happen as quickly as possible.
    ASICs at the Edge

  2. Packet forwarding lookups: as we need to know where to send a received packet, we look up this information in TCAM, which is, as we mentioned, incredibly fast.

  3. Stateless Firewall: while a router routes packets between destinations, you also want to ensure that certain packets don’t reach a destination at all. This can be done using either a stateless or stateful firewall. “State” in this case refers to TCP state, so the router would need to understand if a connection is new, or already established. As maintaining state is a complex issue, which requires storing tables, and can quickly consume a lot of memory, most routers will only operate a stateless firewall.
    Instead, stateful firewalls often have their own appliances. At Cloudflare, we’ve opted to move maintaining state to our compute nodes, as that severely reduces the state-table (one router for all state vs X metals for all state combined). A stateless firewall makes use of the TCAM again to store rules on what to do with specific types of packets. For example, one of the rules we employ at our edge is DENY-BOGON-RANGES , in which we discard traffic sourced from RFC1918 space (and other unroutable space). As this makes use of TCAM, it can all be done at line rate (the maximum speed of the interface).

  4. Advanced features, such as GRE encapsulation: modern networking isn’t just packet switching and packet routing anymore, and more advanced features are needed. One of these is encapsulation. With packet encapsulation, a system will put a data packet into another data packet. Using this technique, it’s possible to build a network on top of an existing network (an overlay). Overlays can be used to build a virtual backbone for example, in which multiple locations can be virtually connected through the Internet.
    While you can encapsulate packets on a CPU (we do this for Magic Transit), there are considerable challenges in doing so in software. As such, the ASIC can have built-in functionality to encapsulate a packet in a multitude of protocols, such as GRE. You may not want encapsulated packets to have to take a second trip through your entire pipeline, as this adds latency, so these shortcuts can also be built into the chip.

  5. MPLS, EVPN, VXLAN, SDWAN, SDN, …: I ran out of buzzwords to enumerate here, but while MPLS isn’t new (the first RFC was created in 2001), it’s a rather advanced requirement, just as the others listed, which means not all ASIC vendors will implement this for all their chips due to the increased complexity.

Vendor Landscape

At Cloudflare, we interact with both hardware and software vendors on a daily basis while operating our global network. As we’re talking about ASICs today, we’ll explore the hardware landscape, but some hardware vendors also have their own NOS (Network Operating System).
There’s a vast selection of hardware out there, all with different features and pricing. It can become incredibly hard to see the wood for the trees, so we’ll focus on 4 important distinguishing factors: Throughput (how many bits can the ASIC push through), buffer size (how many bits can the ASIC store in memory in case of resource contention), programmability (how easy is it for a third party programmer like Cloudflare to interact directly with the ASIC), feature set (how many advanced things outside of routing/switching can the ASIC do).

The landscape is so varied because different companies have different requirements. A company like Cloudflare has different expectations for its network hardware than your typical corner shop. Even within our own network we’ll have different requirements for the different layers that make up our network.


The elephant in the networking room (or is it the jumbo frame in the switch?) is Broadcom. Broadcom is a semiconductor company, with their primary revenue in the wired infrastructure segment (over 50% of revenue[3]). While they’ve been around since 1991, they’ve become an unstoppable force in the last 10 years, in part due to their reliance on Apple (25% of revenue). As a semiconductor manufacturer, their market dominance is primarily achieved by acquiring other companies. A great example is the acquisition of Dune Networks, which has become an excellent revenue generator as the StrataDNX series of ASIC (Arad, QumranMX, Jericho). As such, they have become the biggest ASIC vendor by far, and own 59% of the entire Ethernet Integrated Circuits market[4].

As such, they supply a lot of merchant silicon to Cisco, Juniper, Arista and others. Up until recently, if you wanted to use the Broadcom SDK to accelerate your packet forwarding, you have to sign so many NDAs you might get a hand cramp, which makes programming them a lot trickier. This changed recently when Broadcom open-sourced their SDK. Let’s have a quick look at some of their products.


The Tomahawk line of ASICs are the bread-and-butter for the enterprise market. They’re cheap and incredibly fast. The first generation of Tomahawk chips did 3.2Tbps linerate, with low-latency switching. The latest generation of this chip (Tomahawk 4) does 25.6Tbps in a 7nm transistor footprint[5]). As you can’t have a cheap, fast, and full feature set for a single package, this means you lose out on features. In this case, you’re missing most of the more advanced networking technologies such as VXLAN, and you have no buffer to speak of.
As an example of a different vendor using this silicon, you can have a look at the Juniper QFX5200 switching platform.

StrataDNX (Arad, QumranMX, Jericho)

These chipsets came through the acquisition of Dune Networks, and are a collection of high-bandwidth, deep buffer (large amount of memory available to store (buffer) packets) chips, allowing them to be deployed in versatile environments, including the Cloudflare edge. The Arista DCS-7280SR that we run in some of our edge locations as edge routers run on the Jericho chipset. Since then, the chips have evolved, and with Jericho2, Broadcom now have a 10Tbps deep buffer chip[6]. With their fabric chip (this links multiple ASICs together), you can build switches with 48x400G ports[7] without much effort.
Cisco built their NCS5500 line of routers using the QumranMX[8].


This ASIC is an upgrade from the Tomahawk chipset, with a complex and extensive feature set, while maintaining high throughput rates. The latest Trident4 does 12.8Tbps at incredibly low latencies[9], making it an incredibly flexible platform. It unfortunately has no buffer space to speak of, which limits its scope for Cloudflare, as we need the buffer space to be able to switch between the different port speeds we have on our edge routers. The Arista 7050X and 7300X are built on top of this.


Intel is well known in the network industry for building stable and high-performance 10G NICs (Network Interface Controller). They’re not known for ASICs. They made an initial attempt with their acquisition of Fulcrum[10], which built the FM6000[11] series of ASIC, but nothing of note was really built with them. Intel decided to try again in 2019 with their acquisition of Barefoot. This small manufacturer is responsible for the Barefoot Tofino ASIC, which may well be a fundamental paradigm shift in the network industry.

Barefoot Tofino

The Tofino[12] is built using a PISA (Protocol Independent Switch Architecture), and using P4 (Programming Protocol-Independent Packet Processors)[13], you can program the data-plane (packet forwarding) as you see fit. It’s a drastic move away from the traditional method of networking, in which direct programming of the ASIC isn’t easily possible, and definitely not through a standard programming language. As an added benefit, P4 also allows you to perform a formal verification of your forwarding program, and be sure that it will do what you expect it to. Caveat: OpenFlow tried this, but unfortunately never really got much traction.
ASICs at the Edge[14]

There are multiple variations of the Tofino 1 available, but the top-end ASIC has a 6.5Tbps linerate capacity. As the ASIC is programmable, its featureset is as rich as you’d want it to be. Unfortunately, the chip does not come with a lot of buffer memory, so we can’t deploy these as edge devices (yet). Both Arista (7170 Series[15]) and Cisco (Nexus 34180YC and 3464C series[16]) have built equipment with the Tofino chip inside.


As some of you may know, Mellanox is the vendor that recently got acquired by Nvidia, which also provides our 25G NICs in our compute nodes. Besides NICs, Mellanox has a well-established line of ASICs, mostly for switching.


The latest iteration of this ASIC, Spectrum 3 offers 12.8Tbps switching capacity, with an extensive featureset, including Deep Packet Inspection and NAT. This chip allows for building dense high-speed port devices, going up to 25.6Tbps[17]. Buffering wise, there’s none to really speak of (64MB). Mellanox also builds their own hardware platforms. Unlike the other vendors below, they aren’t shipped with the Mellanox Operating System, instead, they offer you a variety of choices to run on top, including Cumulus Linux (which was also acquired by Nvidia 🤔).

As mentioned, while we use their NIC technology extensively, we currently don’t have any Mellanox ASIC silicon in our network.


Juniper is a network hardware supplier, and currently the biggest supplier of network equipment for Cloudflare. As previously mentioned in the Broadcom section, Juniper buys some of their silicon from Broadcom, but they also have a significant lineup of home-grown silicon, which can be split into 2 families: Trio and Express.


The Express family is the switching-skewed family, where bandwidth is a priority, while still maintaining a broad range of feature capabilities. These chips live in the same application landscape as the Broadcom StrataDNX chips.

Paradise (Q5)

The Q5 is the new generation of the Juniper switching ASIC[18]. While by itself it doesn’t boast high linerates (500Gbps), when combined into a chassis with a fabric chip (Clos network in this case), they can produce switches (or line cards) with up to 12Tbps of throughput capacity[19]. In addition to allowing for high-throughput, dense network appliances, the chip also comes with a staggering amount of buffer space (4GB per ASIC), provided by external HMC (Hybrid Memory Cube). In this HMC, they’ve also decided to put the FIB, MAC and other tables (so no TCAM).
The Q5 chip is used in their QFX1000 lineup of switches, which include the QFX10002-36Q, QFX10002-60C, QFX10002-72Q and QFX10008, all of which are deployed in our datacenters, as either edge routers or core aggregation switches.

ExpressPlus (ZX)

The ExpressPlus is the more feature-rich and faster evolution of the Paradise chip. It offers double the bandwidth per chip (1Tbps) and is built into a combined Clos-fabric reaching 6Tbps in a 2U form-factor (PTX10002). It also has an increased logical scale, which comes with bigger buffers, larger FIB storage, and more ACL space.

The ExpressPlus drives some of the PTX line of IP routers, together with its newest sibling, Triton.

Triton (BT)

Triton is the latest generation of ASIC in the Express family, with 3.6Tbps of capacity per chip, making way for some truly bandwidth-dense hardware. Both Triton and ExpressPlus are 400GE capable.


The Trio family of chips are primarily used in the feature-heavy MX routing platform, and is currently at its 5th generation.

ASICs at the EdgeA Juniper MPC4E-3D-32XGE line card

Trio Eagle (Trio 4.0) (EA)

The Trio Eagle is the previous generation of the Trio Penta, and can be found on the MPC7E line cards for example. It’s a feature-rich ASIC, with a 400Gbps forwarding capacity, and significant buffer and TCAM capacity (as is to be expected from a routing platform ASIC)

Trio Penta (Trio 5.0) (ZT)

Penta is the new generation routing chip, which is built for the MX platform routers. On top of being a very beefy chip, capable of 500Gbps per ASIC, allowing Juniper to build line cards of up to 4Tbps of capacity, the chip also has a lot of baked in features, offering advanced hardware offloading for for example MACSec, or Layer 3 IPsec.

The Penta chip is packaged on the MPC10E and MPC11E line card, which can be installed in multiple variations of the MX chassis routers (MX480 included).


Last but not least, there’s Cisco. As the saying goes “nobody ever got fired for buying Cisco”, they’re the biggest vendor of network solutions around. Just like Juniper, they have a mixed product fleet of merchant silicon, as well as home-grown. While we used to operate Cisco routers as edge routers (Cisco ASR 9000), this is no longer the case. We do still use them heavily for our ToR (Top-of-Rack) switching needs, utilizing both their Nexus 5000 series and Nexus 9000 series switches.


Bigsur is custom silicon developed for the Nexus 6000 line of switches (confusingly, the switches themselves are called Cisco Nexus 5672UP and Cisco Nexus 6001). In our specific model, the Cisco Nexus 5672UP, there’s 7 of them interconnected, providing 10G and 40G connectivity. Unfortunately Cisco is a lot more tight-lipped about their ASIC capabilities, so I can’t go as deep as I did with the Juniper chips. Feature-wise, there’s not a lot we require from them in our edge network. They’re simple Layer 2 forwarding switches, with no added requirements. Buffer wise, they use a system called Virtual Output Queueing, just like the Juniper Express chip. Unlike the Juniper silicon, the Bigsur ASIC doesn’t come with a lot of TCAM or buffer space.


The Tahoe is the Cisco ASIC found in the Cisco 9300-EX switches, also known as the LSE (Leaf Spine Engine). It offers higher-density port configurations compared to the Bigsur (1.6Tbps)[20]. Overall, this ASIC is a maturation of the Bigsur silicon, offering more advanced features such as advanced VXLAN+EVPN fabrics, greater port flexibility (10G, 25G, 40G and 100G), and increased buffer sizes (40MB). We use this ASIC extensively in both our edge data centers as well as in our core data centers.


A lot of different factors come into play when making the decision to purchase the next generation of Cloudflare network equipment. This post only scratches the surface of technical considerations to be made, and doesn’t come near any other factors, such as ecosystem contributions, openness, interoperability, or pricing. None of this would’ve been possible without the contributions from other network engineers—this post was written on the shoulders of giants. In particular, thanks to the excellent work by Jim Warner at UCSC, the engrossing book on the new MX platforms, written by David Roy (Day One: Inside the MX 5G), as well as the best book on the Juniper QFX lineup: Juniper QFX10000 Series by Douglas Richard Hanks Jr, and to finish it off, the Summary of Network ASICs post by Justin Pietsch.

  1. https://cloud.google.com/tpu/ ↩︎

  2. https://angel.co/company/spacex/jobs/744408-sr-fpga-asic-design-engineer ↩︎

  3. https://marketrealist.com/2017/02/wired-infrastructure-segment-protects-broadcom/ ↩︎

  4. https://www.wsj.com/articles/broadcom-lands-deals-to-place-components-in-apple-smartphones-11579821914 ↩︎

  5. https://www.globenewswire.com/news-release/2019/12/09/1958047/0/en/Broadcom-Ships-Tomahawk-4-Industry-s-Highest-Bandwidth-Ethernet-Switch-Chip-at-25-6-Terabits-per-Second.html ↩︎

  6. https://www.broadcom.com/products/ethernet-connectivity/switching/stratadnx/BCM88690 ↩︎

  7. https://www.ufispace.com/products/telco/core-edge/s9705-48d ↩︎

  8. https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2019/pdf/BRKSPG-2900.pdf ↩︎

  9. https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm56880-series ↩︎

  10. https://newsroom.intel.com/news-releases/intel-to-acquire-fulcrum-microsystems/ ↩︎

  11. https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/ethernet-switch-fm5000-fm6000-datasheet.pdf ↩︎

  12. https://barefootnetworks.com/products/brief-tofino/ ↩︎

  13. http://www.sigcomm.org/node/3503 ↩︎

  14. https://github.com/p4lang/p4lang.github.io/blob/master/assets/p4_switch_model-600px.png ↩︎

  15. https://www.arista.com/assets/data/pdf/Whitepapers/7170_White_Paper.pdf ↩︎

  16. https://www.barefootnetworks.com/press-releases/barefoot-networks-to-showcase-technologies-to-build-fast-and-resilient-networks-using-deep-insight-and-tofino-powered-cisco-nexus-switches-at-cisco-live-us-2019/ ↩︎

  17. https://www.mellanox.com/products/ethernet-switches/sn4000 ↩︎

  18. https://www.juniper.net/assets/us/en/local/pdf/whitepapers/2000599-en.pdf ↩︎

  19. https://www.juniper.net/assets/us/en/local/pdf/datasheets/1000531-en.pdf#page=7 ↩︎

  20. https://www.cisco.com/c/dam/global/fr_ch/solutions/data-center-virtualization/pdf/Cisco_Nexus_9300_EX_Platform.pdf#page=8 ↩︎

Investigate VPC flow with Amazon Detective

Post Syndicated from Ross Warren original https://aws.amazon.com/blogs/security/investigate-vpc-flow-with-amazon-detective/

Many Amazon Web Services (AWS) customers need enhanced insight into IP network flow. Traditionally, cost, the complexity of collection, and the time required for analysis has led to incomplete investigations of network flows. Having good telemetry is paramount, and VPC Flow Logs are a very important part of a robust centralized logging architecture. The information that VPC Flow Logs provide is frequently used by security analysts to determine the scope of security issues, to validate that network access rules are working as expected, and to help analysts investigate issues and diagnose network behaviors. Flow logs capture information about the IP traffic going to and from EC2 interfaces in a VPC. Each record describes aspects of the traffic flow, such as where it originated and where it was sent to, what network ports were used, and how many bytes were sent.

Amazon Detective now enables you to interactively examine the details of the virtual private cloud (VPC) network flows of your Amazon Elastic Compute Cloud (Amazon EC2) instances. Amazon Detective makes it easy to analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities. Detective automatically collects VPC flow logs from your monitored accounts, aggregates them by EC2 instance, and presents visual summaries and analytics about these network flows. Detective doesn’t require VPC Flow Logs to be configured and doesn’t impact existing flow log collection.

In this blog post, I describe how to use the new VPC flow feature in Detective to investigate an UnauthorizedAccess:EC2/TorClient finding from Amazon GuardDuty. Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts, workloads, and data stored in Amazon S3. GuardDuty documentation states that this alert can indicate unauthorized access to your AWS resources with the intent of hiding the unauthorized user’s true identity. I’ll demonstrate how to use Amazon Detective to investigate an instance that was flagged by Amazon GuardDuty to determine whether it is compromised or not.

Starting the investigation in GuardDuty

In my GuardDuty console, I’m going to select the UnauthorizedAccess:EC2/TorClient finding shown in Figure 1, choose the Actions menu, and select Investigate.

Figure 1: Investigating from the GuardDuty console

Figure 1: Investigating from the GuardDuty console

This opens a new browser tab and launches the Amazon Detective console, where I’m presented with the profile page for this finding, shown in Figure 2. You must have Detective enabled to pivot between a GuardDuty finding and Detective. Detective provides profile pages for supported GuardDuty findings and AWS resources (for example, IP address, EC2 instance, user, and role) that include information and data visualizations that summarize observed behaviors and give guidance for interpreting them. Profiles help analysts to determine whether the finding is of genuine concern or a false positive. For resources, profiles provide supporting details for an investigation into a finding or for a general hunt for suspicious activity.

Figure 2: Finding profile panel

Figure 2: Finding profile panel

In this case, the profile page for this GuardDuty UnauthorizedAccess:EC2/TorClient finding provides contextual and behavioral data about the EC2 instance on which GuardDuty has noted the issue. As I dive into this finding, I’m going to be asking questions that help assess whether the instance was in fact accessed unintentionally, such as, “What IP port or network service was in use at that time?,” “Were any large data transfers involved?,” “Was the traffic allowed by my security groups?” Profile pages in Detective organize content that helps security analysts investigate GuardDuty findings, examine unexpected network behavior, and identify other AWS resources that might be affected by a potential security issue.

I begin scrolling down the page and notice the Findings associated with EC2 instance i-9999999999999999 panel. Detective displays related findings to provide analysts with additional evidence and context about potentially related issues. The finding I’m investigating is listed there, as well as an Unusual Behaviors/VM/Behavior:EC2-NetworkPortUnusual finding. GuardDuty builds a baseline on your network traffic and will generate findings where there is traffic outside the calculated normal. While we might not investigate every instance of anomalous traffic, having these alerts correlated by Detective provides context for validating the issue. Keeping this in mind as I scroll down, at the bottom of this profile page, I find the Overall VPC flow volume panel. If you choose the Info link next to the panel title, you can see helpful tips that describe how to use the visualizations and provide ideas for questions to ask within your investigation. These info links are available throughout Detective. Check them out!

Investigating VPC flow in Detective

In this investigation, I’m very curious about the two large spikes in inbound traffic that I see in the Overall VPC flow volume panel, which seem to be visually associated with some unusual outbound traffic spikes. It’s most likely that these outbound spikes are related to the Unusual Behaviors/VM/Behavior:EC2-NetworkPortUnusual finding I mentioned earlier. To start the investigation, I choose the display details for scope time button, shown circled at the bottom of Figure 2. This expands the VPC Flow Details, shown in Figure 3.

Figure 3: Our first look at VPC Flow Details

Figure 3: Our first look at VPC Flow Details

We now can see that each entry displays the volume of inbound traffic, the volume of outbound traffic, and whether the access request was accepted or rejected. Detective provides annotations on the VPC flows to help guide your investigation. These From finding annotations make it clear which flows and resources were involved in the finding. In this case, we can easily see (in Figure 3) the three IP addresses at the top of the list that triggered this GuardDuty finding.

I’m first going to focus on the spikes in traffic that are above the baseline. When I click on one of the spikes in the graph, the time window for the VPC flow activity now matches the dates of these spikes I’m investigating.

If I choose the Inbound Traffic column header, shown in Figure 4, I can find the flows that contributed to the spike during this time window.

Figure 4: Inbound traffic spikes

Figure 4: Inbound traffic spikes

Note that the two large inbound spikes aren’t associated with the IP address from the UnauthorizedAccess:EC2/TorClient finding, based on the Detective annotation From finding. Let’s check the outbound traffic. If I do a quick sort of the table based on the outbound traffic column, as shown in Figure 5, we can also see the outbound spikes, and it isn’t immediately evident whether the spikes are associated with this finding. I could continue to investigate the spikes (because they are a visual anomaly), or focus just on the VPC flow traffic that GuardDuty and Detective have labeled as associated with this TOR finding.

Figure 5: Outbound traffic spikes

Figure 5: Outbound traffic spikes

Let’s focus on the outbound and inbound spikes and see if we can determine what’s happening. The inbound spikes are on port 443, typically an HTTPS port, or a secure web connection. The outbound spikes are on port 22 (ssh), but go to IP addresses that look to be internal based on their addresses of 172.16.x.x. The port 443 traffic might indicate a web server that’s open to the internet and receiving traffic. With further investigation, we can determine if this idea is valid, and continue hunting for potentially malicious traffic.

A good next step would be to investigate the two specific IP addresses to rule out their involvement in the finding. I can do this by right-clicking on either of the external IP addresses and opening a new tab, where I can focus on investigating these two specific IP addresses. I would take this line of investigation to possibly rule out the involvement of these IP addresses in this finding, determine if they regularly communicate with my resources, find out what instance(s) they’re related to, and see if there are other findings associated with these instances or IP addresses. This deeper investigation is outside the scope of this blog post, but it’s something you should be doing in your own environment.

IP addresses in AWS are ephemeral in nature. The unique identifier in VPC flow logs is the Instance ID. At the time of this investigation, is assigned to the instance related to this finding, so let’s continue to take a look at the internal IP address with 218 MB outbound traffic on port 22. I choose, and Detective opens up the profile page for this specific IP address, as shown in Figure 6. Here we see some interesting correlations: two other GuardDuty findings related to SSH brute-force attacks. These could be related to our outbound port 22 spikes, because they’re certainly in the window of time we’re investigating.

Figure 6: IP address profile panel

Figure 6: IP address profile panel

As part of a deeper investigation, you would investigate the SSH brute-force findings for and but for now I’m interested what this IP is involved in. Detective easily associates this IP address with the instance that was assigned the IP during the scope time. I scroll down to the bottom of the profile page for and investigate the i-9999999999999999 instance by choosing the instance name.

Filtering VPC flow activity

In Detective, as the investigator we are looking at an instance profile panel, similar to the one in Figure 2, and since we’re interested in VPC flow details, I’m going to scroll down and select display details for scope time.

To focus on specific activity, I can filter the activity details by the following values:

  • IP address
  • Local or remote port
  • Direction
  • Protocol
  • Whether the request was accepted or rejected

I’m going to filter these VPC flow details and just look at port 22 (sshd) inbound traffic. I select the Filter check box and select Local Port and 22, as shown in Figure 7. Detective fills in all the available ports for you, making it easy to complete this filter.

Figure 7: Port 22 traffic

Figure 7: Port 22 traffic

The activity details show a few IP addresses related to port 22, and we’re still following the large outbound spikes of traffic. It’s outside the scope of this blog post, but now it would be time to start looking at your security groups and network access control lists (ACLs) and determine why port 22 is open to the internet and sending all this traffic.

Understanding traffic behavior

As an investigator, I now have a good picture of the traffic related to the initial finding, and by diving deeper we’re able to discover other interesting traffic during the same timeframe. While we may not always determine “who has done it,” the goal should be to improve our understanding of the behavior of our environment and gather important technical evidence. Detective helps you identify and investigate anomalies to give you insight into your environment. If we were to continue our investigation into the finding, here are some actions we can take within Detective.

Investigate VPC findings with Detective:

  • Perform ports and utilization analysis
    • Identify service and ephemeral ports
    • Determine whether traffic was accepted or rejected based on security groups and NACL configurations
    • Investigate possible reconnaissance traffic by exploring the significant amount of rejected traffic
  • Correlate EC2 instances to TCP/IP ports and IPs
  • Analyze traffic spikes and anomalies
  • Discover traffic patterns and make behavioral correlations

Explore EC2 instance behavior with Detective:

  • Directional Traffic Analysis
  • Investigate possible data exfiltration events by digging into large transfers
  • Enumerate distinct IP connections and sort and filter by protocol, amount of traffic, and traffic direction
  • Gather data related to a spike in port count from a single IP address (potential brute force) or multiple IP addresses (distributed denial of service (DDoS))

Additional forensics steps to consider

  • Snapshot EC2 Volumes
  • Memory dump of EC2 instance
  • Isolate EC2 instance
  • Review your authentication strategy and assess whether the chosen authentication method is sufficient to protect your asset


Without requiring you to set up infrastructure or spend time configuring log ingestion, Detective collects, organizes, and presents relevant data for your threat analysis and investigations. Security and operations teams will find this new capability helpful for simplifying EC2 traffic analysis, validating security group permissions, and diagnosing EC2 instance behavior. Detective does the heavy lifting of storing, and analyzing VPC flow data so you can focus on quickly answering your investigative questions. VPC network flow details are available now in all Detective supported Regions and are included as part of your service subscription.

To get started, you can enable a 30-day free trial of Amazon Detective. See the AWS Regional Services page for all the regions where Detective is available. To learn more, visit the Amazon Detective product page.

Are you a visual learner? Check out Amazon Detective Overview and Demonstration. This video helps you learn how and when to use Amazon Detective to improve the security of your AWS resources.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Ross Warren

Ross Warren is a Solution Architect at AWS based in Northern Virginia. Prior to his work at AWS, Ross’ areas of focus included cyber threat hunting and security operations. Ross has worked at a handful of startups and has enjoyed the transition to AWS because he can continue to build solutions for customers on today’s most innovative platform.


Jim Miller

Jim is a Solution Architect at AWS based in Connecticut. Jim has worked within cyber security his entire career with areas of focus including cyber security architecture and incident response. At AWS he loves building secure solutions for customers to enable teams to build and innovate with confidence.

Delivering HTTP/2 upload speed improvements

Post Syndicated from Junho Choi original https://blog.cloudflare.com/delivering-http-2-upload-speed-improvements/

Delivering HTTP/2 upload speed improvements

Delivering HTTP/2 upload speed improvements

Cloudflare recently shipped improved upload speeds across our network for clients using HTTP/2. This post describes our journey from troubleshooting an issue to fixing it and delivering faster upload speeds to the global Internet.

We launched speed.cloudflare.com in May 2020 to give our users insight into how well their networks perform. The test provides download, upload and latency tests. Soon after release, we received reports from a small number of users that sometimes upload speeds were underreported. Our investigation determined that it seemed to happen with end users that had high upload bandwidth available (several hundreds Mbps class cable modem or fiber service). Our speed tests are performed via browser JavaScript, and most browsers use HTTP/2 by default. We found that HTTP/2 upload speeds were sometimes much slower than HTTP/1.1 (assuming all TLS) when the user had high available upload bandwidth.

Upload speed is more important than ever, especially for people using home broadband connections. As many people have been forced to work from home they’re using their broadband connections differently than before. Prior to the pandemic broadband traffic was very asymmetric (you downloaded way more than you uploaded… think listening to music, or streaming a movie), but now we’re seeing an increase in uploading as people video conference from home or create content from their home office.

Initial Tests

User reports were focused on particularly fast home networks. I set up a dummynet network simulator to test upload speed in a controlled environment. I launched a linux VM running our code inside my Macbook Pro and set up a dummynet between the VM and Mac host.  Measuring upload speed is simple – create a file and upload using curl to an endpoint which accepts a request body. I ran the same test 20 times and took a median upload speed (Mbps).

% dd if=/dev/urandom of=test.dat bs=1M count=10
% curl --http1.1 -w '%{speed_upload}\n' -sf -o/dev/null --data-binary @test.dat https://edge/upload-endpoint
% curl --http2 -w '%{speed_upload}\n' -sf -o/dev/null --data-binary @test.dat https://edge/upload-endpoint

Stepping up to uploading a 10MB object over a network which has 200Mbps available bandwidth and 40ms RTT, the result was surprising. Using our production configuration, HTTP/2 upload speed tested at almost half of the same test conditions using HTTP/1.1 (higher is better).

Delivering HTTP/2 upload speed improvements

The result may differ depending on your network, but the gap is bigger when the network is fast. On a slow network, like my home cable connection (5Mbps upload and 20ms RTT), HTTP/2 upload speed was almost identical to the performance observed with HTTP/1.1.

Receiver Flow Control

Before we get into more detail on this topic, my intuition suggested the issue was related to receiver flow control. Usually the client (browser or any HTTP client) is the receiver of data, but in the case the client is uploading content to the server, the server is the receiver of data. And the receiver needs some type of flow control of the receive buffer.

How we handle receiver flow control differs between HTTP/1.1 and HTTP/2. For example, HTTP/1.1 doesn’t define protocol-level receiver flow control since there is no multiplexing of requests in the connection and it’s up to the TCP layer which handles receiving data. Note that most of the modern OS TCP stacks have auto tuning of the receive buffer (we will revisit that later) and they tune based on the current BDP (bandwidth-delay product).

In the case of HTTP/2, there is a stream-level flow control mechanism because the protocol supports multiplexing of streams. Each HTTP/2 stream has its own flow control window and there is connection level flow control for all streams in the connection. If it’s too tight, the sender will be blocked by the flow control. If it’s too loose we may end up wasting memory for buffering. So keeping it optimal is important when implementing flow control and the most optimal strategy is to keep the receive buffer matching the current BDP. BDP represents the maximum bytes in flight in the given network and can be used as an optimal buffer size.

How NGINX handles the request body buffer

Initially I tried to find a parameter which controls NGINX upload buffering and tried to see if tuning the values improved the result. There are a couple of parameters which are related to uploading a request body.

And this is HTTP/2 specific:

Cloudflare does not use the proxy_buffering directive, so it can be immediately discounted.  client_body_buffer_size is the size of the request body buffer which is used regardless of the protocol, so this one applies to HTTP/1.1 and HTTP/2 as well.

When looking into the code, here is how it works:

Delivering HTTP/2 upload speed improvements

  • HTTP/1.1: use client_body_buffer_size buffer as a buffer between upstream and the client, simply repeating reading and writing using this buffer.
  • HTTP/2: since we need a flow control window update for the HTTP/2 DATA frame, there are two parameters:
    • http2_body_preread_size: it specifies the size of the initial request body read before it starts to send to the upstream.
    • client_body_buffer_size: it specifies the size of the request body buffer.
    • Those two parameters are used for allocating a request body buffer during uploading. Here is a brief summary of how unbuffered upload works:
      • Allocate a single request body buffer which size is a maximum of http2_body_preread_size and client_body_buffer_size. This means if http2_body_preread_size is 64KB and client_body_buffer_size is 128KB, then a 128KB buffer is allocated. We use 128KB for client_body_buffer_size.
      • HTTP/2 Settings INITIAL_WINDOW_SIZE of the stream is set to http2_body_preread_size and we use 64KB as a default (the RFC7540 default value).
      • HTTP/2 module reads up to http2_body_preread_size before sending it to upstream.
      • After flushing the preread buffer, keep reading from the client and write to upstream and send WINDOW_UPDATE frame back to the client when necessary until the request body is fully received.

To summarise what this means: HTTP/1.1 simply uses a single buffer, so TCP socket buffers do the flow control. However with HTTP/2, the application layer also has receiver flow control and NGINX uses a fixed size buffer for the receiver. This limits upload speed when the current link has a BDP larger than the current request body buffer size. So the bottleneck is HTTP/2 flow control when the buffer size is too tight.

We’re going to need a bigger buffer?

In theory, bigger buffer sizes should avoid upload bottlenecks, so I tried a few out by running my tests again. The previous chart result is now labelled “prod” and plotted alongside HTTP/2 tests with client_body_buffer_size set to 256KB, 512KB and 1024KB:

Delivering HTTP/2 upload speed improvements

It appears 512KB is an optimal value for client_body_buffer_size.

What if I test with some other network parameter? Here is when RTT is 10ms, in this case, 256KB looks optimal.

Delivering HTTP/2 upload speed improvements

Both cases look much better than current 128KB and get a similar performance to HTTP/1.1 or even better. However, it seems like the optimal buffer size is a moving target and furthermore having too large a buffer size can hurt the performance: we need a smart way to find the optimal buffer size.

Autotuning request body buffer size

One of the ideas that can help this kind of situation is autotuning. For example, modern TCP stacks autotune their receive buffers automatically. In production, our edge also has TCP receiver buffer autotuning enabled by default.

net.ipv4.tcp_moderate_rcvbuf = 1

But in case of HTTP/2, TCP buffer autotuning is not very effective because the HTTP/2 layer is doing its own flow control and the existing 128KB was too small for a high BDP link. At this point, I decided to pursue autotuning HTTP/2 receive buffer sizing as well, similar to what TCP does.

The basic idea is that NGINX doubles the size of HTTP/2 request body buffer size based on its BDP. Here is an algorithm currently implemented in our version of NGINX:

  • Allocate a request body buffer as explained above.
  • For every RTT (using linux tcp_info), update the current BDP.
  • Double the request body buffer size when the current BDP > (receiver window / 4).

Test Result

Lab Test

Here is a test result when HTTP/2 autotune upload is enabled (still using client_body_buffer_size 128KB). You can see “h2 autotune” is doing pretty well – similar or even slightly faster than HTTP/1.1 speed (that’s the initial goal). It might be slightly worse than a hand-picked optimal buffer size for given conditions, but you can see now NGINX picks up optimal buffer size automatically based on network conditions.

Delivering HTTP/2 upload speed improvements
Delivering HTTP/2 upload speed improvements

Production test

After we deployed this feature, I ran similar tests against our production edge, uploading a 10MB file from well connected client nodes to our edge. I created a Linux VM instance in Google Cloud and ran the upload test where the network is very fast (a few Gbps) and low latency (<10ms).

Here is when I run the test in the Google Cloud Belgium region to our CDG (Paris) PoP which has 7ms RTT. This looks very good with almost 3x improvement.

Delivering HTTP/2 upload speed improvements

I also tested between the Google Cloud Tokyo region and our NRT (Tokyo) PoP, which had a 2.3ms RTT. Although this is not realistic for home users, the results are interesting. A 128KB fixed buffer performs well, but HTTP/2 with buffer autotune outperforms even HTTP/1.1.

Delivering HTTP/2 upload speed improvements


HTTP/2 upload buffer autotuning is now fully deployed in the Cloudflare edge. Customers should now benefit from improved upload performance for all HTTP/2 connections, including speed tests on speed.cloudflare.com. Autotuning upload buffer size logic works well for most cases, and now HTTP/2 upload is much faster than before! When we think about the performance we usually tend to think about download speed or latency reduction, but faster uploading can also help users working from home when they need a large amount of upload, such as photo/video sharing apps, content creation, video conferencing or self broadcasting.

Many thanks to Lucas Pardue and Rustam Lalkaka for great feedback and suggestions on the article.

Introducing Cloudflare Network Interconnect

Post Syndicated from David Tuber original https://blog.cloudflare.com/cloudflare-network-interconnect/

Introducing Cloudflare Network Interconnect

Introducing Cloudflare Network Interconnect

Today we’re excited to announce Cloudflare Network Interconnect (CNI). CNI allows our customers to interconnect branch and HQ locations directly with Cloudflare wherever they are, bringing Cloudflare’s full suite of network functions to their physical network edge. Using CNI to interconnect provides security, reliability, and performance benefits vs. using the public Internet to connect to Cloudflare. And because of Cloudflare’s global network reach, connecting to our network is straightforward no matter where on the planet your infrastructure and employees are.

At its most basic level, an interconnect is a link between two networks. Today, we’re offering customers the following options to interconnect with Cloudflare’s network:

  • Via a private network interconnect (PNI). A physical cable (or a virtual “pseudo-wire”; more on that later) that connects two networks.
  • Over an Internet Exchange (IX). A common switch fabric where multiple Internet Service Providers (ISPs) and Internet networks can interconnect with each other.

To use a real world analogy: Cloudflare over the years has built a network of highways across the Internet to handle all our customers’ traffic. We’re now providing dedicated on-ramps for our customers’ on-prem networks to get onto those highways.

Why interconnect with Cloudflare?

CNI provides more reliable, faster, and more private connectivity between your infrastructure and Cloudflare’s. This delivers benefits across our product suite. Here are some examples of specific products and how you can combine them with CNI:

  • Cloudflare Access: Cloudflare Access replaces corporate VPNs with Cloudflare’s network. Instead of placing internal tools on a private network, teams deploy them in any environment, including hybrid or multi-cloud models, and secure them consistently with Cloudflare’s network. CNI allows you to bring your own MPLS network to meet ours, allowing your employees to connect to your network securely and quickly no matter where they are.
  • CDN: Cloudflare’s CDN places content closer to visitors, improving site speed while minimizing origin load. CNI improves cache fill performance and reduces costs.
  • Magic Transit: Magic Transit protects datacenter and branch networks from unwanted attack and malicious traffic. Pairing Magic Transit with CNI decreases jitter and drives throughput improvements, and further hardens infrastructure from attack.
  • Cloudflare Workers: Workers is Cloudflare’s serverless compute platform. Integrating with CNI provides a secure connection to serverless cloud compute that does not traverse the public Internet, allowing customers to use Cloudflare’s unique set of Workers services with tighter network performance tolerances.

Let’s talk more about how CNI delivers these benefits.

Improving performance through interconnection

CNI is a great way to boost performance for many existing Cloudflare products. By utilizing CNI and setting up interconnection with Cloudflare wherever a customer’s origin infrastructure is, customers can get increased performance and security at lower cost than using public transit providers.

CNI makes things faster

As an example of the performance improvements network interconnects can deliver for Cloudflare customers, consider an HTTP application workload which flows through Cloudflare’s CDN and WAF. Many of our customers rely on our CDN to make their HTTP applications more responsive.

Cloudflare caches content very close to end users to provide the best performance possible. But, if content is not in cache, Cloudflare edge PoPs must contact the origin server to retrieve cacheable content. This can be slow, and places more load on an origin server compared to serving directly from cache.

With CNI, these origin pulls can be completed over a dedicated link, improving throughput and reducing overall time needed for origin pulls. Using Argo Tiered Cache, customers can manage tiered cache topologies and specify upstream cache tiers that correspond with locations where network interconnects are in place. Using Tiered Cache in this fashion lowers origin loads and increases cache hit rates, thereby improving performance and reducing origin infrastructures costs.

Here’s anonymized and sampled data from a real Cloudflare customer who recently provisioned interconnections between our network and theirs to further improve performance. Heavy users of our CDN, they were able to shave off precious milliseconds from their origin round trip time (RTT) by adding PNIs in multiple locations.

Introducing Cloudflare Network Interconnect

As an example, their 90th percentile round trip time in Warsaw, Poland decreased by 6.5ms as a result of provisioning a private network interconnect (from 7.5ms to 1ms), which is a performance win of 87%!  The jitter (variation in delay in received packets) on the link decreased from 82.9 to 0.3, which speaks to the dedicated, reliable nature of the link. CNI helps deliver reliable and performant network connectivity to your customers and employees.

Introducing Cloudflare Network Interconnect

Enhanced security through private connectivity

Customers with large on-premise networks want to move to the cloud: it’s cheaper, less hassle, and less overhead and maintenance.  However, customers want to also preserve their existing security and threat models.

Traditionally, CIOs trying to connect their IP networks to the Internet do so in two steps:

  1. Source connectivity to the Internet from transit providers (ISPs).
  2. Purchase, operate, and maintain network function specific hardware appliances. Think hardware load balancers, firewalls, DDoS mitigation equipment, WAN optimization, and more.

CNI allows CIOs to provision security services on Cloudflare and connect their existing networks to Cloudflare in a way that bypasses the public Internet.  Because Cloudflare integrates with on-premise networks and the cloud, customers can enforce security policies across both networks and create a consistent, secure boundary.

CNI increases cloud and network security by providing a private, dedicated link to the Cloudflare network. Since this link is reserved exclusively for the customer that provisions it, the customer’s traffic is isolated and private.

CNI + Magic Transit: Removing public Internet exposure

To use a product-specific example: through CNI’s integration with Magic Transit, customers can take advantage of private connectivity to minimize exposure of their network to the public Internet.

Magic Transit attracts customers’ IP traffic to our data centers by advertising their IP addresses from our edge via BGP. When traffic arrives, it’s filtered and sent along to customers’ data centers. Before CNI, all Magic Transit traffic was sent from Cloudflare to customers via Generic Routing Encapsulation (GRE) tunnels over the Internet. Because GRE endpoints are publicly routable, there is some risk these endpoints could be discovered and attacked, bypassing Cloudflare’s DDoS mitigation and security tools.

Using CNI removes this exposure to the Internet. Advantages of using CNI with Magic Transit include:

  • Reduced threat exposure. Although there are many steps companies can take to increase network security, some risk-sensitive organizations prefer not to expose endpoints to the public Internet at all. CNI allows Cloudflare to absorb that risk and forward only clean traffic (via Magic Transit) through a truly private interface.
  • Increased reliability. Traffic traveling over the public Internet is subject to factors outside of your control, including latency and packet loss on intermediate networks. Removing steps between Cloudflare’s network and yours means that after Magic Transit processes traffic, it’s forwarded directly and reliably to your network.
  • Simplified configuration. Soon, Magic Transit + CNI customers will have the option to skip making MSS (maximum segment size) changes when onboarding, a step that’s required for GRE-over-Internet and can be challenging for customers who need to consider their downstream customers’ MSS as well (eg. service providers).

Example deployment: Penguin Corp uses Cloudflare for Teams, Magic Transit, and CNI to protect branch and core networks, and employees.

Imagine Penguin Corp, a hypothetical company, has a fully connected private MPLS network.  Maintaining their network is difficult and they have a dedicated team of network engineers to do this.  They are currently paying a lot of money to run their own private cloud. To minimize costs, they limit their network egress points to two worldwide.  This creates a major performance problem for their users, whose bits have to travel a long way to accomplish basic tasks while still traversing Penguin’s network boundary.

Introducing Cloudflare Network Interconnect

SASE (Secure Access Service Edge) models look attractive to them, because they can, in theory, move away from their traditional MPLS network and move towards the cloud.  SASE deployments provide firewall, DDoS mitigation, and encryption services at the network edge, and bring security as a service to any cloud deployment, as seen in the diagram below:

Introducing Cloudflare Network Interconnect

CNI allows Penguin to use Cloudflare as their true network edge, hermetically sealing their branch office locations and datacenters from the Internet. Penguin can adapt to a SASE-like model while keeping exposure to the public Internet at zero. Penguin establishes PNIs with Cloudflare from their branch office in San Jose to Cloudflare’s San Jose location to take advantage of Cloudflare for Teams, and from their core colocation facility in Austin to Cloudflare’s Dallas location to use Magic Transit to protect their core networks.

Like Magic Transit, Cloudflare for Teams replaces traditional security hardware on-premise with Cloudflare’s global network. Customers who relied on VPN appliances to reach internal applications can instead connect securely through Cloudflare Access. Organizations maintaining physical web gateway boxes can send Internet-bound traffic to Cloudflare Gateway for filtering and logging.

Cloudflare for Teams services run in every Cloudflare data center, bringing filtering and authentication closer to your users and locations to avoid compromising performance. CNI improves that even further with a direct connection from your offices to Cloudflare. With a simple configuration change, all branch traffic reaches Cloudflare’s edge where Cloudflare for Teams policies can be applied. The link improves speed and reliability for users and replaces the need to backhaul traffic to centralized filtering appliances.

Once interconnected this way, Penguin’s network and employees realize two benefits:

  1. They get to use Cloudflare’s full set of security services without having to provision expensive and centralized physical or virtualized network appliances.
  2. Their security and performance services are running across Cloudflare’s global network in over 200 cities. This brings performance and usability improvements for users by putting security functions closer to them.
Introducing Cloudflare Network Interconnect

Scalable, global, and flexible interconnection options

CNI offers a big benefit to customers because it allows them to take advantage of our global footprint spanning 200+ cities: their branch office and datacenter infrastructure can connect to Cloudflare wherever they are.

This matters for two reasons: our globally distributed network makes it easier to interconnect locally, no matter where a customer’s branches and core infrastructure is, and allows for a globally distributed workforce to interact with our edge network with low latency and improved performance.

Customers don’t have to worry about securely expanding their network footprint: that’s our job.

To this point, global companies need to interconnect at many points around the world. Cloudflare Network Interconnect is priced for global network scale: Cloudflare doesn’t charge anything for enterprise customers to provision CNI. Customers may need to pay for access to an interconnection platform or a datacenter cross-connect. We’ll work with you and any other parties involved to make the ordering and provisioning process as smooth as possible.

In other words, CNI’s pricing is designed to accommodate complicated enterprise network topologies and modern IT budgets.

How to interconnect

Customers can interconnect with Cloudflare in one of three ways: over a private network interconnect (PNI), over an IX, or through one of our interconnection platform partners. We have worked closely with our global partners to meet our customers where they are and how they want.

Private Network Interconnects

Private Network Interconnects are available at any of our listed private peering facilities. Getting a physical connection to Cloudflare is easy: specify where you want to connect, port speeds, and target VLANs. From there, we’ll authorize it, you’ll place the order, and let us do the rest.  Customers should choose PNI as their connectivity option if they want higher throughput than a virtual connection or connection over an IX, or want to eliminate as many intermediaries from an interconnect as possible.

Internet Exchanges

Customers who want to use existing Internet Exchanges can interconnect with us at any of the 235+ Internet Exchanges we participate in. To connect with Cloudflare via an Internet Exchange, follow the IX’s instructions to connect, and Cloudflare will spin up our side of the connection.  Customers should choose Internet Exchanges as their connectivity option if they are either already peered at an IX, or they want to interconnect in a place where an interconnection platform isn’t present.

Interconnection Platform Partners

Cloudflare is proud to be partnering with Equinix, Megaport, PCCW ConsoleConnect, PacketFabric, and Zayo to provide you with easy ways to virtually connect with us in any of the partner-supported locations. Customers should choose to connect with an interconnection platform if they are already using these providers or want a quick and easy way to onboard onto a secure cloud experience.

Introducing Cloudflare Network Interconnect

If you’re interested in learning more, please see this blog post about all the different ways you can interconnect. For all of the interconnect methodologies described above, the BGP session establishment and IP routing are the same. The only thing that is different is the physical way in which we interconnect with other networks.

How do I find the best places to interconnect?

Our product page for CNI includes tools to better understand the right places for your network to interconnect with ours.  Customers can use this data to help figure out the optimal place to interconnect to have the most connectivity with other cloud providers and other ISPs in general.

What’s the difference between CNI and peering?

Technically, peering and CNI use similar mechanisms and technical implementations behind the scenes.

We have had an open peering policy for years with any network and will continue to abide by that policy: it allows us to help build a better Internet for everyone by interconnecting networks together, making the Internet more reliable. Traditional networks use interconnect/peering to drive better performance for their customers and connectivity while driving down costs. With CNI, we are opening up our infrastructure to extend the same benefits to our customers as well.

How do I learn more?

CNI provides customers with better performance, reliability, scalability, and security than using the public Internet. A customer can interconnect with Cloudflare in any of our physical locations today, getting dedicated links to Cloudflare that deliver security benefits and more stable latency, jitter, and available bandwidth through each interconnection point.

Contact our enterprise sales team about adding Cloudflare Network Interconnect to your existing offerings.

DDoS attacks have evolved, and so should your DDoS protection

Post Syndicated from Arun Singh original https://blog.cloudflare.com/ddos-attacks-have-evolved-and-so-should-your-ddos-protection/

DDoS attacks have evolved, and so should your DDoS protection

DDoS attacks have evolved, and so should your DDoS protection

The proliferation of DDoS attacks of varying size, duration, and persistence has made DDoS protection a foundational part of every business and organization’s online presence. However, there are key considerations including network capacity, management capabilities, global distribution, alerting, reporting and support that security and risk management technical professionals need to evaluate when selecting a DDoS protection solution.

Gartner’s view of the DDoS solutions; How did Cloudflare fare?

Gartner recently published the report Solution Comparison for DDoS Cloud Scrubbing Centers (ID G00467346), authored by Thomas Lintemuth, Patrick Hevesi and Sushil Aryal. This report enables customers to view a side-by-side solution comparison of different DDoS cloud scrubbing centers measured against common assessment criteria.  If you have a Gartner subscription, you can view the report here. Cloudflare has received the greatest number of ‘High’ ratings as compared to the 6 other DDoS vendors across 23 assessment criteria in the report.

The vast landscape of DDoS attacks

From our perspective, the nature of DDoS attacks has transformed, as the economics and ease of launching a DDoS attack has changed dramatically. With a rise in cost-effective capabilities of launching a DDoS attack, we have observed a rise in the number of under 10 Gbps DDoS network-level attacks, as shown in the figure below. Even though 10 Gbps from an attack size perspective does not seem that large, it is large enough to significantly affect a majority of the websites existing today.

DDoS attacks have evolved, and so should your DDoS protection

At the same time, larger-sized DDoS attacks are still prevalent and have the capability of crippling the availability of an organization’s infrastructure. In March 2020, Cloudflare observed numerous 300+ Gbps attacks with the largest attack being 550 Gbps in size.

DDoS attacks have evolved, and so should your DDoS protection

In the report Gartner also observes a similar trend, “In speaking with the vendors for this research, Gartner discovered a consistent theme: Clients are experiencing more frequent smaller attacks versus larger volumetric attacks.” In addition, they also observe that “For enterprises with Internet connections up to and exceeding 10 Gbps, frequent but short attacks up to 10 Gbps are still quite disruptive without DDoS protection. Not to say that large attacks have gone away. We haven’t seen a 1-plus Tbps attack since spring 2018, but attacks over 500 Gbps are still common.”

Gartner recommends in the report to “Choose a provider that offers scrubbing capacity of three times the largest documented volumetric attack on your continent.”

From an application-level DDoS attack perspective an interesting DDoS attack observed and mitigated by Cloudflare last year, is shown below. This HTTP DDoS attack had a peak of 1.4M requests per second, which isn’t highly rate-intensive. However, the fact that the 1.1M IPs from which the attack originated were unique and not spoofed made the attack quite interesting. The unique IP addresses were actual clients who were able to complete a TCP and HTTPS handshake.

DDoS attacks have evolved, and so should your DDoS protection

Harness the full power of Cloudflare’s DDoS protection

Cloudflare’s cloud-delivered DDoS solution provides key features that enable security professionals to protect their organizations and customers against even the most sophisticated DDoS attacks. Some of the key features and benefits include:

  • Massive network capacity: With over 35 Tbps of network capacity, Cloudflare ensures that you are protected against even the most sophisticated and largest DDoS attacks. Cloudflare’s network capacity is almost equal to the total scrubbing capacity of the other 6 leading DDoS vendors combined.
  • Globally distributed architecture: Having a few scrubbing centers globally to mitigate DDoS attacks is an outdated approach. As DDoS attacks scale and individual attacks originate from millions of unique IPs worldwide, it’s important to have a DDoS solution that mitigates the attack at the source rather than hauling traffic to a dedicated scrubbing center. With every one of our data centers across 200 cities enabled with full DDoS mitigation capabilities, Cloudflare has more points of presence than the 6 leading DDoS vendors combined.
  • Fast time to mitigation: Automated edge-analyzed and edge-enforced DDoS mitigation capabilities allows us to mitigate attacks at unprecedented speeds. Typical time to mitigate a DDoS attack is less than 10s.
  • Integrated security: A key design tenet while building products at Cloudflare is integration. Our DDoS solution integrates seamlessly with other product offerings including WAF, Bot Management, CDN and many more. A comprehensive and integrated security solution to bolster the security posture while aiding performance. No tradeoffs between security and performance!
  • Unmetered and unlimited mitigation: Cloudflare offers unlimited and unmetered DDoS mitigation. This eliminates the legacy concept of ‘Surge Pricing,’ which is especially painful when a business is under duress and experiencing a DDoS attack. This enables you to avoid unpredictable costs from traffic.

Whether you’re part of a large global enterprise, or use Cloudflare for your personal site, we want to make sure that you’re protected and also have the visibility that you need. DDoS Protection is included as part of every Cloudflare service. Enterprise-level plans include advanced mitigation, detailed reporting, enriched logs, productivity enhancements and fine-grained controls. Enterprise Plan customers also receive access to dedicated customer success and solution engineering.

To learn more about Cloudflare’s DDoS solution contact us or get started.

*Gartner “Solution Comparison for DDoS Cloud Scrubbing Centers,” Thomas Lintemuth,  Patrick Hevesi, Sushil Aryal, 16 April 2020

Remote work, regional lockdowns and migration of Internet usage

Post Syndicated from Vasco Asturiano original https://blog.cloudflare.com/remote-work-regional-lockdowns-and-migration-of-internet-usage/

Remote work, regional lockdowns and migration of Internet usage

The recommendation for social distancing to slow down the spread of COVID-19 has led many companies to adopt a work-from-home policy for their employees in offices around the world, and Cloudflare is no exception.

As a result, a large portion of Internet access shifted from office-focused areas, like city centers and business parks, towards more residential areas like suburbs and outlying towns. We wanted to find out just precisely how broad this geographical traffic migration was, and how different locations were affected by it.

It turns out it is substantial, and the results are quite stunning:

Remote work, regional lockdowns and migration of Internet usage

Gathering the Data

So how can we determine if Internet usage patterns have changed from a geographical perspective?

In each Cloudflare Point of Presence (in more than 200 cities worldwide) there’s an edge router whose responsibility it is to switch Internet traffic to serve the requests of end users in the region.

These edge routers are the network’s entry point and for monitoring and debugging purposes each router samples IP packet information regarding the traffic that traverses them. This data is collected as flow records and contains layer-3 related information, such as the source and destination IP address, port, packet size etc.

These statistical samples allow us to monitor aggregate traffic information, spot unusual activity, and verify that our connections to the greater Internet are working correctly.

By using a geolocation database like Maxmind, we can determine the rough location from which a flow emanates. If the geographic data is too imprecise it is excluded from our set to reduce noise.

By merging together information about flows that come from similar areas, we can infer a geospatial distribution of traffic volume across the globe.

Visual Representation

This distribution of traffic volume is an indication of Internet usage patterns. By plotting a geospatial heatmap of the Internet traffic in New York City for two separate dates four weeks apart (February 19, 2020 and March 18, 2020), we can already notice an erosion of traffic volume from the city center dissipating into the surrounding areas.

All the charts in this blog post show day time traffic in each geography.  This is done to show the difference working from home is having on Internet traffic.

Remote work, regional lockdowns and migration of Internet usage
Remote work, regional lockdowns and migration of Internet usage

But the migration pattern really jumps forward when producing a differential heatmap of these two dates:

Remote work, regional lockdowns and migration of Internet usage

This chart shows a diff of the “before” and “after” scenarios, highlighting only differences in volume between the two dates. The red color represents areas where Internet usage has decreased since February 19, and green where it has increased.

It strongly suggests a migration of Internet usage towards suburban and residential areas.

The data used is comparing working hours periods (1000 to 1600 local time) between two Wednesdays four weeks apart (February 19 and March 18).


We produced  similar visual representations for other impacted locations around the world. All exhibit similar migration patterns.

Seattle, US

Remote work, regional lockdowns and migration of Internet usage

Additional features can be picked out from the chart. The red dot in Silverdale, WA appears to be Internet access from the Kitsap Mall (which is currently closed). And Sea-Tac Airport is seen as a red area between Burien and Renton.

San Francisco Bay Area, US

Remote work, regional lockdowns and migration of Internet usage

Areas where large Silicon Valley businesses have sent employees home are clearly visible.

Paris, France

Remote work, regional lockdowns and migration of Internet usage

The red area in north east Paris appears to be the airport at Le Bourget and the surrounding industrial zone. To the west there’s another red area: Le Château de Versailles.

Berlin, Germany

Remote work, regional lockdowns and migration of Internet usage

In Berlin a red dot near the Tegeler See is Flughafen Berlin-Tegel likely resulting from fewer passengers passing through the airport.

Network Impact

As shown above, geographical Internet usage changes are visible from Internet traffic exchanged between end users and Cloudflare at various locations around the globe.

Besides the geographical migration in various metropolitan areas, the overall volume of traffic in these locations has also increased between 10% and 40% in just a period of four weeks.

It’s interesting to reflect that the Internet was originally conceived as a communications network for humanity during a crisis, and it’s come a long way since then. But in this moment of crisis, it’s being put to use for that original purpose. The Internet was built for this.

Cloudflare helps power a substantial portion of Internet traffic. During this time of increased strain on networks around the world, our team is working to ensure our service continues to run smoothly.

We are also providing our Cloudflare for Teams products at no cost to companies of any size struggling to support their employees working from home: learn more.