All posts by Bryton Herdes

Route leak incident on January 22, 2026

Post Syndicated from Bryton Herdes original https://blog.cloudflare.com/route-leak-incident-january-22-2026/

On January 22, 2026, an automated routing policy configuration error caused us to leak some Border Gateway Protocol (BGP) prefixes unintentionally from a router at our data center in Miami, Florida. While the route leak caused some impact to Cloudflare customers, multiple external parties were also affected because their traffic was accidentally funnelled through our Miami data center location.

The route leak lasted 25 minutes, causing congestion on some of our backbone infrastructure in Miami, elevated loss for some Cloudflare customer traffic, and higher latency for traffic across these links. Additionally, some traffic was discarded by firewall filters on our routers that are designed to only accept traffic for Cloudflare services and our customers.

While we’ve written about route leaks before, we rarely find ourselves causing them. This route leak was the result of an accidental misconfiguration on a router in Cloudflare’s network, and only affected IPv6 traffic. We sincerely apologize to the users, customers, and networks we impacted yesterday as a result of this BGP route leak.

BGP route leaks 

We have written multiple times about BGP route leaks, and we even record route leak events on Cloudflare Radar for anyone to view and learn from. To get a fuller understanding of what route leaks are, you can refer to this detailed background section, or refer to the formal definition within RFC7908

Essentially, a route leak occurs when a network tells the broader Internet to send it traffic that it’s not supposed to forward. Technically, a route leak occurs when a network, or Autonomous System (AS), appears unexpectedly in an AS path. An AS path is what BGP uses to determine the path across the Internet to a final destination. An example of an anomalous AS path indicative of a route leak would be finding a network sending routes received from a peer to a provider.


During this type of route leak, the rules of valley-free routing are violated, as BGP updates are sent from AS64501 to their peer (AS64502), and then unexpectedly up to a provider (AS64503). Oftentimes the leaker, in this case AS64502, is not prepared to handle the amount of traffic they are going to receive and may not even have firewall filters configured to accept all of the traffic coming in their direction. In simple terms, once a route update is sent to a peer or provider, it should only be sent further to customers and not to another peer or provider AS.

During the incident on January 22, we caused a similar kind of route leak, in which we took routes from some of our peers and redistributed them in Miami to some of our peers and providers. According to the route leak definitions in RFC7908, we caused a mixture of Type 3 and Type 4 route leaks on the Internet. 

Timeline

Time (UTC)

Event

2026-01-22 19:52 UTC

A change that ultimately triggers the routing policy bug is merged in our network automation code repository

2026-01-22 20:25 UTC

Automation is run on single Miami edge-router resulting in unexpected advertisements to BGP transit providers and peers

IMPACT START

2026-01-22 20:40 UTC

Network team begins investigating unintended route advertisements from Miami

2026-01-22 20:44 UTC

Incident is raised to coordinate response

2026-01-22 20:50 UTC

The bad configuration change is manually reverted by a network operator, and automation is paused for the router, so it cannot run again

IMPACT STOP

2026-01-22 21:47 UTC

The change that triggered the leak is reverted from our code repository

2026-01-22 22:07 UTC

Automation is confirmed by operators to be healthy to run again on the Miami router, without the routing policy bug

2026-01-22 22:40 UTC

Automation is unpaused on the single router in Miami

What happened: the configuration error

On January 22, 2026, at 20:25 UTC, we pushed a change via our policy automation platform to remove the BGP announcements from Miami for one of our data centers in Bogotá, Colombia. This was purposeful, as we previously forwarded some IPv6 traffic through Miami toward the Bogotá data center, but recent infrastructure upgrades removed the need for us to do so.

This change generated the following diff (a program that compares configuration files in order to determine how or whether they differ):

[edit policy-options policy-statement 6-COGENT-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-COMCAST-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-GTT-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-LEVEL3-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-PRIVATE-PEER-ANYCAST-OUT term ADV-SITELOCAL from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-PUBLIC-PEER-ANYCAST-OUT term ADV-SITELOCAL from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-PUBLIC-PEER-OUT term ADV-SITELOCAL from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-TELEFONICA-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-TELIA-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;

While this policy change looks innocent at a glance, only removing the prefix lists containing BOG04 unicast prefixes resulted in a policy that was too permissive:

policy-options policy-statement 6-TELIA-ACCEPT-EXPORT {
    term ADV-SITELOCAL-GRE-RECEIVER {
        from route-type internal;
        then {
            community add STATIC-ROUTE;
            community add SITE-LOCAL-ROUTE;
            community add MIA01;
            community add NORTH-AMERICA;
            accept;
        }
    }
}

The policy would now mark every prefix of type “internal” as acceptable, and proceed to add some informative communities to all matching prefixes. But more importantly, the policy also accepted the route through the policy filter, which resulted in the prefix — which was intended to be “internal” —  being advertised externally. This is an issue because the “route-type internal” match in JunOS or JunOS EVO (the operating systems used by HPE Juniper Networks devices) will match any non-external route type, such as Internal BGP (IBGP) routes, which is what happened here.

As a result, all IPv6 prefixes that Cloudflare redistributes internally across the backbone were accepted by this policy, and advertised to all our BGP neighbors in Miami. This is unfortunately very similar to the outage we experienced in 2020, on which you can read more on our blog.

When the policy misconfiguration was applied at 20:25 UTC, a series of unintended BGP updates were sent from AS13335 to peers and providers in Miami. These BGP updates are viewable historically by looking at MRT files with the monocle tool or using RIPE BGPlay

➜  ~ monocle search --start-ts 2026-01-22T20:24:00Z --end-ts 2026-01-22T20:30:00Z --as-path ".*13335[ \d$]32934$*"
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f077::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f091::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f16f::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f17c::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f26f::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f27c::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f33f::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113583.095278|2001:504:d::4:9544:1|49544|2a03:2880:f17c::/48|49544 1299 3356 13335 32934|IGP|2001:504:d::4:9544:1|0|0|1299:25000 1299:25800 49544:16000 49544:16106|false|||route-views.isc
A|1769113583.095278|2001:504:d::4:9544:1|49544|2a03:2880:f27c::/48|49544 1299 3356 13335 32934|IGP|2001:504:d::4:9544:1|0|0|1299:25000 1299:25800 49544:16000 49544:16106|false|||route-views.isc
A|1769113583.095278|2001:504:d::4:9544:1|49544|2a03:2880:f091::/48|49544 1299 3356 13335 32934|IGP|2001:504:d::4:9544:1|0|0|1299:25000 1299:25800 49544:16000 49544:16106|false|||route-views.isc
A|1769113584.324483|2001:504:d::19:9524:1|199524|2a03:2880:f091::/48|199524 1299 3356 13335 32934|IGP|2001:2035:0:2bfd::1|0|0||false|||route-views.isc
A|1769113584.324483|2001:504:d::19:9524:1|199524|2a03:2880:f17c::/48|199524 1299 3356 13335 32934|IGP|2001:2035:0:2bfd::1|0|0||false|||route-views.isc
A|1769113584.324483|2001:504:d::19:9524:1|199524|2a03:2880:f27c::/48|199524 1299 3356 13335 32934|IGP|2001:2035:0:2bfd::1|0|0||false|||route-views.isc
{trimmed}

In the monocle output seen above, we have the timestamp of our BGP update, followed by the next-hop in the announcement, the ASN of the network feeding a given route-collector, the prefix involved, and the AS path and BGP communities if any are found. At the end of the output per-line, we also find the route-collector instance.

Looking at the first update for prefix 2a03:2880:f077::/48, the AS path is 64112 22850 174 3356 13335 32934. This means we (AS13335) took the prefix received from Meta (AS32934), our peer, and then advertised it toward Lumen (AS3356), one of our upstream transit providers. We know this is a route leak as routes received from peers are only meant to be readvertised to downstream (customer) networks, not laterally to other peers or up to providers.

As a result of the leak and the forwarding of unintended traffic into our Miami router from providers and peers, we experienced congestion on our backbone between Miami and Atlanta, as you can see in the graph below. 


This would have resulted in elevated loss for some Cloudflare customer traffic, and higher latency than usual for traffic traversing these links. In addition to this congestion, the networks whose prefixes we leaked would have had their traffic discarded by firewall filters on our routers that are designed to only accept traffic for Cloudflare services and our customers. At peak, we discarded around 12Gbps of traffic ingressing our router in Miami for these non-downstream prefixes. 


Follow-ups and preventing route leaks 

We are big supporters and active contributors to efforts within the IETF and infrastructure community that strengthen routing security. We know firsthand how easy it is to cause a route leak accidentally, as evidenced by this incident. 

Preventing route leaks will require a multi-faceted approach, but we have identified multiple areas in which we can improve, both short- and long-term.

In terms of our routing policy configurations and automation, we are:

  • Patching the failure in our routing policy automation that caused the route leak, and will mitigate this potential failure and others like it immediately 

  • Implementing additional BGP community-based safeguards in our routing policies that explicitly reject routes that were received from providers and peers on external export policies 

  • Adding automatic routing policy evaluation into our CI/CD pipelines that looks specifically for empty or erroneous policy terms 

  • Improve early detection of issues with network configurations and the negative effects of an automated change

To help prevent route leaks in general, we are: 

  • Validating routing equipment vendors’ implementation of RFC9234 (BGP roles and the Only-to-Customer Attribute) in preparation for our rollout of the feature, which is the only way independent of routing policy to prevent route leaks caused at the local Autonomous System (AS)

  • Encouraging the long term adoption of RPKI Autonomous System Provider Authorization (ASPA), where networks could automatically reject routes that contain anomalous AS paths

Most importantly, we would again like to apologize for the impact we caused users and customers of Cloudflare, as well as any impact felt by external networks.

A closer look at a BGP anomaly in Venezuela

Post Syndicated from Bryton Herdes original https://blog.cloudflare.com/bgp-route-leak-venezuela/

As news unfolds surrounding the U.S. capture and arrest of Venezuelan leader Nicolás Maduro, a cybersecurity newsletter examined Cloudflare Radar data and took note of a routing leak in Venezuela on January 2.

We dug into the data. Since the beginning of December there have been eleven route leak events, impacting multiple prefixes, where AS8048 is the leaker. Although it is impossible to determine definitively what happened on the day of the event, this pattern of route leaks suggests that the CANTV (AS8048) network, a popular Internet Service Provider (ISP) in Venezuela, has insufficient routing export and import policies. In other words, the BGP anomalies observed by the researcher could be tied to poor technical practices by the ISP rather than malfeasance.

In this post, we’ll briefly discuss Border Gateway Protocol (BGP) and BGP route leaks, and then dig into the anomaly observed and what may have happened to cause it. 

Background: BGP route leaks

First, let’s revisit what a BGP route leak is. BGP route leaks cause behavior similar to taking the wrong exit off of a highway. While you may still make it to your destination, the path may be slower and come with delays you wouldn’t otherwise have traveling on a more direct route.

Route leaks were given a formal definition in RFC7908 as “the propagation of routing announcement(s) beyond their intended scope.” Intended scope is defined using pairwise business relationships between networks. The relationships between networks, which in BGP we represent using Autonomous Systems (ASes), can be one of the following: 

  • customer-provider: A customer pays a provider network to connect them and their own downstream customers to the rest of the Internet

  • peer-peer: Two networks decide to exchange traffic between one another, to each others’ customers, settlement-free (without payment)

In a customer-provider relationship, the provider will announce all routes to the customer. The customer, on the other hand, will advertise only the routes from their own customers and originating from their network directly.

In a peer-peer relationship, each peer will advertise to one another only their own routes and the routes of their downstream customers


These advertisements help direct traffic in expected ways: from customers upstream to provider networks, potentially across a single peering link, and then potentially back down to customers on the far end of the path from their providers. 

A valid path would look like the following that abides by the valley-free routing rule: 


A route leak is a violation of valley-free routing where an AS takes routes from a provider or peer and redistributes them to another provider or peer. For example, a BGP path should never go through a “valley” where traffic goes up to a provider, and back down to a customer, and then up to a provider again. There are different types of route leaks defined in RFC7908, but a simple one is the Type 1: Hairpin route leak between two provider networks by a customer. 


In the figure above, AS64505 takes routes from one of its providers and redistributes them to their other provider. This is unexpected, since we know providers should not use their customer as an intermediate IP transit network. AS64505 would become overwhelmed with traffic, as a smaller network with a smaller set of backbone and network links than its providers. This can become very impactful quickly. 

Route leak by AS8048 (CANTV)

Now that we have reminded ourselves what a route leak is in BGP, let’s examine what was hypothesized  in the newsletter post. The post called attention to a few route leak anomalies on Cloudflare Radar involving AS8048. On the Radar page for this leak, we see this information:


We see the leaker AS, which is AS8048 — CANTV, Venezuela’s state-run telephone and Internet Service Provider. We observe that routes were taken from one of their providers AS6762 (Sparkle, an Italian telecom company) and then redistributed to AS52320 (V.tal GlobeNet, a Colombian network service provider). This is definitely a route leak. 

The newsletter suggests “BGP shenanigans” and posits that such a leak could be exploited to collect intelligence useful to government entities. 

While we can’t say with certainty what caused this route leak, our data suggests that its likely cause was more mundane. That’s in part because BGP route leaks happen all of the time, and they have always been part of the Internet — most often for reasons that aren’t malicious.

To understand more, let’s look closer at the impacted prefixes and networks. The prefixes involved in the leak were all originated by AS21980 (Dayco Telecom, a Venezuelan company):


The prefixes are also all members of the same 200.74.224.0/20 subnet, as noted by the newsletter author. Much more intriguing than this, though, is the relationship between the originating network AS21980 and the leaking network AS8048: AS8048 is a provider of AS21980. 

The customer-provider relationship between AS8048 and AS21980 is visible in both Cloudflare Radar and bgp.tools AS relationship interference data. We can also get a confidence score of the AS relationship using the monocle tool from BGPKIT, as you see here: 

➜  ~ monocle as2rel 8048 21980
Explanation:
- connected: % of 1813 peers that see this AS relationship
- peer: % where the relationship is peer-to-peer
- as1_upstream: % where ASN1 is the upstream (provider)
- as2_upstream: % where ASN2 is the upstream (provider)

Data source: https://data.bgpkit.com/as2rel/as2rel-latest.json.bz2

╭──────┬───────┬───────────┬──────┬──────────────┬──────────────╮
│ asn1 │ asn2  │ connected │ peer │ as1_upstream │ as2_upstream │
├──────┼───────┼───────────┼──────┼──────────────┼──────────────┤
│ 8048 │ 21980 │ 9.9%   │ 0.6% │ 9.4%    │ 0.0%         │
╰──────┴───────┴───────────┴──────┴──────────────┴──────────────╯

While only 9.9% of route collectors see these two ASes as adjacent, almost all of the paths containing them reflect AS8048 as an upstream provider for AS21980, meaning confidence is high in the provider-customer relationship between the two.

Many of the leaked routes were also heavily prepended with AS8048, meaning it would have been potentially less attractive for routing when received by other networks. Prepending is the padding of an AS more than one time in an outbound advertisement by a customer or peer, to attempt to switch traffic away from a particular circuit to another. For example, many of the paths during the leak by AS8048 looked like this: “52320,8048,8048,8048,8048,8048,8048,8048,8048,8048,23520,1299,269832,21980”. 

You can see that AS8048 has sent their AS multiple times in an advertisement to AS52320, because by means of BGP loop prevention the path would never actually travel in and out of AS8048 multiple times in a row. A non-prepended path would look like this: “52320,8048,23520,1299,269832,21980”. 

If AS8048 was intentionally trying to become a man-in-the-middle (MITM) for traffic, why would they make the BGP advertisement less attractive instead of more attractive? Also, why leak prefixes to try and MITM traffic when you’re already a provider for the downstream AS anyway? That wouldn’t make much sense. 

The leaks from AS8048 also surfaced in multiple separate announcements, each around an hour apart on January 2, 2026 between 15:30 and 17:45 UTC, suggesting they may have been having network issues that surfaced in a routing policy issue or a convergence-based mishap. 


It is also noteworthy that these leak events begin over twelve hours prior to the U.S. military strikes in Venezuela. Leaks that impact South American networks are common, and we have no reason to believe, based on timing or the other factors I have discussed, that the leak is related to the capture of Maduro several hours later.

In fact, looking back the past two months, we can see plenty of leaks by AS8048 that are just like this one, meaning this is not a new BGP anomaly:


You can see above in the history of Cloudflare Radar’s route leak alerting pipeline that AS8048 is no stranger to Type 1 hairpin route leaks. Since the beginning of December alone there have been eleven route leak events where AS8048 is the leaker.

From this we can draw a more innocent possible explanation about the route leak: AS8048 may have configured too loose of export policies facing at least one of their providers, AS52320. And because of that, redistributed routes belong to their customer even when the direct customer BGP routes were missing. If their export policy toward AS52320 only matched on IRR-generated prefix list and not a customer BGP community tag, for example, it would make sense why an indirect path toward AS6762 was leaked back upstream by AS8048. 

These types of policy errors are something RFC9234 and the Only-to-Customer (OTC) attribute would help with considerably, by coupling BGP more tightly to customer-provider and peer-peer roles, when supported by all routing vendors. I will save the more technical details on RFC9234 for a follow-up blog post.

The difference between origin and path validation

The newsletter also calls out as “notable” that Sparkle (AS6762) does not implement RPKI (Resource Public Key Infrastructure) Route Origin Validation (ROV). While it is true that AS6762 appears to have an incomplete deployment of ROV and is flagged as “unsafe” on isbgpsafeyet.com because of it, origin validation would not have prevented this BGP anomaly in Venezuela. 

It is important to separate BGP anomalies into two categories: route misoriginations, and path-based anomalies. Knowing the difference between the two helps to understand the solution for each. Route misoriginations, often called BGP hijacks, are meant to be fixed by RPKI Route Origin Validation (ROV) by making sure the originator of a prefix is who rightfully owns it. In the case of the BGP anomaly described in this post, the origin AS was correct as AS21980 and only the path was anomalous. This means ROV wouldn’t help here.

Knowing that, we need path-based validation. This is what Autonomous System Provider Authorization (ASPA), an upcoming draft standard in the IETF, is going to provide. The idea is similar to RPKI Route Origin Authorizations (ROAs) and ROV: create an ASPA object that defines a list of authorized providers (upstreams) for our AS, and everyone will use this to invalidate route leaks on the Internet at various vantage points. Using a concrete example, AS6762 is a Tier-1 transit-free network, and they would use the special reserved “AS0” member in their ASPA signed object to communicate to the world that they have no upstream providers, only lateral peers and customers. Then, AS52320, the other provider of AS8048, would see routes from their customer with “6762” in the path and reject them by performing an ASPA verification process.

ASPA is based on RPKI and is exactly what would help prevent route leaks similar to the one we observed in Venezuela.

A safer BGP, built together 

We felt it was important to offer an alternative explanation for the BGP route leak by AS8048 in Venezuela that was observed on Cloudflare Radar. It is helpful to understand that route leaks are an expected side effect of BGP historically being based entirely on trust and carefully-executed business relationship-driven intent. 

While route leaks could be done with malicious intent, the data suggests this event may have been an accident caused by a lack of routing export and import policies that would prevent it. This is why to have a safer BGP and Internet we need to work together and drive adoption of RPKI-based ASPA, for which RIPE recently released object creation, on the wide Internet. It will be a collaborative effort, just like RPKI has been for origin validation, but it will be worth it and prevent BGP incidents such as the one in Venezuela. 

In addition to ASPA, we can all implement simpler mechanisms such as Peerlock and Peerlock-lite as operators, which sanity-checks received paths for obvious leaks. One especially promising initiative is the adoption of RFC9234, which should be used in addition to ASPA for preventing route leaks with the establishing of BGP roles and a new Only-To-Customer (OTC) attribute. If you haven’t already asked your routing vendors for an implementation of RFC9234 to be on their roadmap: please do. You can help make a big difference.

BGP zombies and excessive path hunting

Post Syndicated from Bryton Herdes original https://blog.cloudflare.com/going-bgp-zombie-hunting/

Here at Cloudflare, we’ve been celebrating Halloween with some zombie hunting of our own. The zombies we’d like to remove are those that disrupt the core framework responsible for how the Internet routes traffic: BGP (Border Gateway Protocol).

A BGP zombie is a silly name for a route that has become stuck in the Internet’s Default-Free Zone (aka the DFZ: the collection of all internet routers that do not require a default route, potentially due to a missed or lost prefix withdrawal).

The underlying root cause of a zombie could be multiple things, spanning from buggy software in routers or just general route processing slowness. It’s when a BGP prefix is meant to be gone from the Internet, but for one reason or another it becomes a member of the undead and hangs around for some period of time.

The longer these zombies linger, the more they create operational impact and become a real headache for network operators. Zombies can lead packets astray, either by trapping them inside of route loops or by causing them to take an excessively scenic route. Today, we’d like to celebrate Halloween by covering how BGP zombies form and how we can lessen the likelihood that they wreak havoc on Internet traffic.

Path hunting

To understand the slowness that can often lead to BGP zombies, we need to talk about path hunting. Path hunting occurs when routers running BGP exhaustively search for the best path to a prefix as determined by Longest Prefix Matching (LPM) and BGP routing attributes like path length and local preference. This becomes relevant in our observations of exactly how routes become stuck, for how long they become stuck, and how visible they are on the Internet.

For example, path hunting happens when a more-specific BGP prefix is withdrawn from the global routing table, and networks need to fallback to a less-specific BGP advertisement. In this example, we use 2001:db8::/48 for the more-specific BGP announcement and 2001:db8::/32 for the less-specific prefix. When the /48 is withdrawn by the originating Autonomous System (AS), BGP routers have to recognize that route as missing and begin routing traffic to IPs such as 2001:db8::1 via the 2001:db8::/32 route, which still remains while the prefix 2001:db8::/48 is gone. 

Let’s see what this could look like in action with a few diagrams. 


Diagram 1: Active 2001:db8::/48 route

In this initial state, 2001:db8::/48 is used actively for traffic forwarding, which all flows through AS13335 on the way to AS64511. In this case, AS64511 would be a BYOIP customer of Cloudflare. AS64511 also announces a backup route to another Internet Service Provider (ISP), AS64510, but this route is not active even in AS64510’s routing table for forwarding to 2001:db8::1 because 2001:db8::/48 is a longer prefix match when compared to 2001:db8::/32.

Things get more interesting when AS64510 signals for 2001:db8::/48 to be withdrawn by Cloudflare (AS13335), perhaps because a DDoS attack is over and the customer opts to use Cloudflare only when they are actively under attack.

When the customer signals to Cloudflare (via BGP Control or API call) to withdraw the 2001:db8::/48 announcement, all BGP routers have to converge upon this update, which involves path hunting. AS13335 sends a BGP withdrawal message for 2001:db8::/48 to its directly-connected BGP neighbors. While the news of withdrawal may travel quickly from AS13335 to the other networks, news may travel more quickly to some of the neighbors than others. This means that until everyone has received and processed the withdrawal, networks may try routing through one another to reach the 2001:db8::/48 prefix – even after AS13335 has withdrawn it. 


Diagram 2: 2001:db8::/48 route withdrawn via AS13335

Imagine AS64501 is a little slower than the rest – perhaps due to using older hardware, hardware being overloaded, a software bug, specific configuration settings, poor luck, or some other factor – and still has not processed the withdrawal of the /48. This in itself could be a BGP zombie, since the route is stuck for a small period. Our pings toward 2001:db8::1 are never able to actually reach AS64511, because AS13335 knows the /48 is meant to be withdrawn, but some routers carrying a full table have not yet converged upon that result.

The length of time spent path hunting is amplified by something called the Minimum Route Advertisement Interval (MRAI). The MRAI specifies the minimum amount of time between BGP advertisement messages from a BGP router, meaning it introduces a purposeful number of seconds of delay between each BGP advertisement update. RFC4271 recommends an MRAI value of 30-seconds for eBGP updates, and while this can cut down on the chattiness of BGP, or even potential oscillation of updates, it also makes path hunting take longer. 

At the next cycle of path hunting, even AS64501, which was previously still pointing toward a nonexistent /48 route from AS13335, should find the /32 advertisement is all that is left toward 2001:db8::1. Once it has done so, the traffic flow will become the following: 


Diagram 3: Routing fallback to 2001:db8::/32 and 2001:db8::/48 is gone from DFZ

This would mean BGP path hunting is over, and the Internet has realized that the 2001:db8::/32 is the best route available toward 2001:db8::1, and that 2001:db8::/48 is really gone. While in this example we’ve purposely made path hunting only last two cycles, in reality it can be far more, especially with how highly connected AS13335 is to thousands of peer networks and Tier-1’s globally. 

Now that we’ve discussed BGP path hunting and how it works, you can probably already see how a BGP zombie outbreak can begin and how routing tables can become stuck for a lengthy period of time. Excessive BGP path hunting for a previously-known more-specific prefix can be an early indicator that a zombie could follow.

Spawning a zombie

Zombies have captured our attention more recently as they were noticed by some of our customers leveraging Bring-Your-Own-IP (BYOIP) on-demand advertisement for Magic Transit. BYOIP may be configured in two modes: “always-on”, in which a prefix is continuously announced, or “on-demand”, where a prefix is announced only when a customer chooses to. For some on-demand customers, announcement and withdrawal cycles may be a more frequent occurrence, which can lead to an increase in BGP zombies.

With that in mind and also knowing how path hunting works, let’s spawn our own zombie onto the Internet. To do so, we’ll take a spare block of IPv4 and IPv6 and announce them like so:


Once the routes are announced and stable, we’ll then proceed to withdraw the more specific routes advertised via Cloudflare globally. With a few quick clicks, we’ve successfully re-animated the dead.

Variant A: Ghoulish Gateways

One place zombies commonly occur is between upstream ISPs. When one router in a given ISP’s network is a little slower to update, routes can become stuck. 

Take, for example, the following loop we observed between two of our upstream partners:

7. be2431.ccr31.sjc04.atlas.cogentco.com
8. tisparkle.sjc04.atlas.cogentco.com
9. 213.144.177.184
10. 213.144.177.184
11. 89.221.32.227
12. (waiting for reply)
13. be2749.rcr71.goa01.atlas.cogentco.com
14. be3219.ccr31.mrs02.atlas.cogentco.com
15. be2066.agr21.mrs02.atlas.cogentco.com
16. telecomitalia.mrs02.atlas.cogentco.com
17. 213.144.177.186
18. 89.221.32.227

Or this loop – observed on the same withdrawal test – between two different providers:

15. if-bundle-12-2.qcore2.pvu-paris.as6453.net
16. if-bundle-56-2.qcore1.fr0-frankfurt.as6453.net
17. if-bundle-15-2.qhar1.fr0-frankfurt.as6453.net
18. 195.219.223.11
19. 213.144.177.186
20. 195.22.196.137
21. 213.144.177.186
22. 195.22.196.137

Variant B: Undead LAN (Local Area Network)

Simultaneously, zombies can occur entirely within a given network. When a route is withdrawn from Cloudflare’s network, each device in our network must individually begin the process of withdrawing the route. While this is generally a smooth process, things can still become stuck.

Take, for instance, a situation where one router inside of our network has not yet fully processed the withdrawal. Connectivity partners will continue routing traffic towards that router (as they have not yet received the withdrawal) while no host remains behind the router which is capable of actually processing the traffic. The result is an internal-only looping path:

10. 192.0.2.112
11. 192.0.2.113
12. 192.0.2.112
13. 192.0.2.113
14. 192.0.2.112
15. 192.0.2.113
16. 192.0.2.112
17. 192.0.2.113
18. 192.0.2.112
19. 192.0.2.113

Unlike most fictionally-depicted hoards of the walking dead, our highly-visible zombie has a limited lifetime in most major networks – in this instance, only around around 6 minutes, after which most had re-converged around the less-specific as the best path. Sadly, this is on the shorter side – in some cases, we have seen long-lived zombies cause reachability issues for more than 10 minutes. It’s safe to say this is longer than most network operators would expect BGP convergence to take in a normal situation. 

But, you may ask – is this the excessive path hunting we talked about earlier, or a BGP zombie? Really, it depends on the expectation and tolerance around how long BGP convergence should take to process the prefix withdrawal. In any case, even over 30 minutes after our withdrawal of our more-specific prefix, we are able to see zombie routes in the route-views public collectors easily:

~ % monocle search --start-ts 2025-10-28T12:40:13Z --end-ts 2025-10-28T13:00:13Z --prefix 198.18.0.0/24
A|1761656125.550447|206.82.105.116|54309|198.18.0.0/24|54309 13335 395747|IGP|206.82.104.31|0|0|54309:111|false|||route-views.ny

You might argue that six to eleven minutes (or more) is a reasonable time for worst-case BGP convergence in the Tier-1 network layer, though that itself seems like a stretch. Even setting that aside, our data shows that very real BGP zombies exist in the global routing table, and they will negatively impact traffic. Curiously, we observed the path hunting delay is worse on IPv4, with the longest observed IPv6 impact in major (Tier-1) networks being just over 4 minutes. One could speculate this is in part due to the much higher number of IPv4 prefixes in the Internet global routing table than the IPv6 global table, and how BGP speakers handle them separately.


Source: RIPEstat’s BGPlay

Part of the delay appears to originate from how interconnected AS13335 is; being heavily peered with a large portion of the Internet increases the likelihood of a route becoming stuck in a given location. Given that, perhaps a zombie would be shorter-lived if we operated in the opposite direction: announcing a less-specific persistently to 13335 and announcing more specifics via our local ISP during normal operation. Since the withdrawal will come from what is likely a less well-peered network, the time-to-convergence may be shorter:


Indeed, as predicted, we still get a stuck route, and it only lives for around 20 seconds in the Tier-1 network layer:

19. be12488.ccr42.ams03.atlas.cogentco.com
20. 38.88.214.142
21. be2020.ccr41.ams03.atlas.cogentco.com
22. 38.88.214.142
23. (waiting for reply)
24. 38.88.214.142
25. (waiting for reply)
26. 38.88.214.142

Unfortunately, that 20 seconds is still an impactful 20 seconds – while better, it’s not where we want to be. The exact length of time will depend on the native ISP networks one is connected with, and it could certainly ease into the minutes worth of stuck routing. 

In both cases, the initial time-to-announce yielded no loss, nor was a zombie created, as both paths remained valid for the entirety of their initial lifetime. Zombies were only created when a more specific prefix was fully withdrawn. A newly-announced route is not subject to path hunting in the same way a withdrawn more-specific route is. As they say, good (new) news travels fast.

Lessening the zombie outbreak

Our findings lead us to believe that the withdrawal of a more-specific prefix may lead to zombies running rampant for longer periods of time. Because of this, we are exploring some improvements that make the consequences of BGP zombie routing less impactful for our customers relying on our on-demand BGP functionality.

For the traffic that does reach Cloudflare with stuck routes, we will introduce some BGP traffic forwarding improvements internally that allow for a more graceful withdrawal of traffic, even if routes are erroneously pointing toward us. In many ways, this will closely resemble the BGP well-known no-export community’s functionality from our servers running BGP. This means even if we receive traffic from external parties due to stuck routing, we will still have the opportunity to deliver traffic to our far-end customers over a tunneled connection or via a Cloudflare Network Interconnect (CNI). We look forward to reporting back the positive impact after making this improvement for a more graceful draining of traffic by default. 

For the traffic that does not reach Cloudflare’s edge, and instead loops between network providers, we need to use a different approach. Since we know more-specific to less-specific prefix routing fallback is more prone to BGP zombie outbreak, we are encouraging customers to instead use a multi-step draining process when they want traffic drained from the Cloudflare edge for an on-demand prefix without introducing route loops or blackhole events. The draining process when removing traffic for a BYOIP prefix from Cloudflare should look like this: 

  1. The customer is already announcing an example prefix from Cloudflare, ex. 198.18.0.0/24

  2. The customer begins natively announcing the prefix 198.18.0.0/24 (i.e. the same-length as the prefix they are advertising via Cloudflare) from their network to the Internet Service Providers that they wish to fail over traffic to.

  3. After a few minutes, the customer signals BGP withdrawal from Cloudflare for the 198.18.0.0/24 prefix.

The result is a clean cut over: impactful zombies are avoided because the same-length prefix (198.18.0.0/24) remains in the global routing table. Excessive path hunting is avoided because instead of routers needing to aggressively seek out a missing more-specific prefix match, they can fallback to the same-length announcement that persists in the routing table from the natively-originated path to the customer’s network.


Source: RIPEstat’s BGPlay

What next?

We are going to continue to refine our methods of measuring BGP zombies, so you can look forward to more insights in the future. There is also work from others in the community around zombie measurement that is interesting and producing useful data. In terms of combatting the software bugs around BGP zombie creation, routing vendors should implement RFC9687, the BGP SendHoldTimer. The general idea is that a local router can detect via the SendHoldTimer if the far-end router stops processing BGP messages unexpectedly, which lowers the possibility of zombies becoming stuck for long periods of time. 

In addition, it’s worth keeping in mind our observations made in this post about more-specific prefix announcements and excessive path hunting. If as a network operator you rely on more-specific BGP prefix announcements for failover, or for traffic engineering, you need to be aware that routes could become stuck for a longer period of time before full BGP convergence occurs.

If you’re interested in problems like BGP zombies, consider coming to work at Cloudflare or applying for an internship. Together we can help build a better Internet!

Cloudflare’s perspective of the October 30 OVHcloud outage

Post Syndicated from Bryton Herdes original https://blog.cloudflare.com/cloudflare-perspective-of-the-october-30-2024-ovhcloud-outage

On October 30, 2024, cloud hosting provider OVHcloud (AS16276) suffered a brief but significant outage. According to their incident report, the problem started at 13:23 UTC, and was described simply as “An incident is in progress on our backbone infrastructure.” OVHcloud noted that the incident ended 17 minutes later, at 13:40 UTC. As a major global cloud hosting provider, some customers use OVHcloud as an origin for sites delivered by Cloudflare — if a given content asset is not in our cache for a customer’s site, we retrieve the asset from OVHcloud.

We observed traffic starting to drop at 13:21 UTC, just ahead of the reported start time. By 13:28 UTC, it was approximately 95% lower than pre-incident levels. Recovery appeared to start at 13:31 UTC, and by 13:40 UTC, the reported end time of the incident, it had reached approximately 50% of pre-incident levels.


Traffic from OVHcloud (AS16276) to Cloudflare

Cloudflare generally exchanges most of our traffic with OVHcloud over peering links. However, as shown below, peered traffic volume during the incident fell significantly. It appears that some small amount of traffic briefly began to flow over transit links from Cloudflare to OVHcloud due to sudden changes in which Cloudflare data centers we were receiving OVHcloud requests. (Peering is a direct connection between two network providers for the purpose of exchanging traffic. Transit is when one network pays an intermediary network to carry traffic to the destination network.) 


Because we peer directly, we exchange most traffic over our private peering sessions with OVHcloud. Instead, we found OVHcloud routing to Cloudflare dropped entirely for a few minutes, then switched to just a single Internet Exchange port in Amsterdam, and finally normalized globally minutes later.

As the graphs below illustrate, we normally see the largest amount of traffic from OVHcloud in our Frankfurt and Paris data centers, as OVHcloud has large data center presences in these regions. However, in that shift to transit, and the shift to an Amsterdam Internet Exchange peering point, we saw a spike in traffic routed to our Amsterdam data center. We suspect the routing shifts are the earliest signs of either internal BGP reconvergence, or general network recovery within AS12676, starting with their presence nearest our Amsterdam peering point.


The postmortem published by OVHcloud noted that the incident was caused by “an issue in a network configuration mistakenly pushed by one of our peering partner[s]” and that “We immediately reconfigured our network routes to restore traffic.” One possible explanation for the backbone incident may be a BGP route leak by the mentioned peering partner, where OVHcloud could have accepted a full Internet table from the peer and therefore overwhelmed their network or the peering partner’s network with traffic, or caused unexpected internal BGP route updates within AS12676.

Upon investigating what route leak may have caused this incident impacting OVHcloud, we found evidence of a maximum prefix-limit threshold being breached on our peering with Worldstream (AS49981) in Amsterdam. 

Oct 30 13:16:53  edge02.ams01 rpd[9669]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 141.101.65.53 (External AS 49981) changed state from Established to Idle (event PrefixLimitExceeded) (instance master)

As the number of received prefixes exceeded the limits configured for our peering session with Worldstream, the BGP session automatically entered an idle state. This prevented the route leak from impacting Cloudflare’s network. In analyzing BGP Monitoring Protocol (BMP) data from AS49981 prior to the automatic session shutdown, we were able to confirm Worldstream was sending advertisements with AS paths that contained their upstream Tier 1 transit provider.

During this time, we also detected over 500,000 BGP announcements from AS49981, as Worldstream was announcing routes to many of their peers, visible on Cloudflare Radar.


Worldsteam later posted a notice on their status page, indicating that their network experienced a route leak, causing routes to be unintentionally advertised to all peers:

“Due to a configuration error on one of the core routers, all routes were briefly announced to all our peers. As a result, we pulled in more traffic than expected, leading to congestion on some paths. To address this, we temporarily shut down these BGP sessions to locate the issue and stabilize the network. We are sorry for the inconvenience.”

We believe Worldstream also leaked routes on an OVHcloud peering session in Amsterdam, which caused today’s impact.

Conclusion

Cloudflare has written about impactful route leaks before, and there are multiple methods available to prevent BGP route leaks from impacting your network. One is setting max prefix-limits for a peer, so the BGP session is automatically torn down when a peer sends more prefixes than they are expected to. Other forward-looking measures are Autonomous System Provider Authorization (ASPA) for BGP, where Resource Public Key Infrastructure (RPKI) helps protect a network from accepting BGP routes with an invalid AS path, or RFC9234, which prevents leaks by tying strict customer and provider relationships to BGP updates. For improved Internet resilience, we recommend that network operators follow recommendations defined within MANRS for Network Operators.

Cloudflare 1.1.1.1 incident on June 27, 2024

Post Syndicated from Bryton Herdes original https://blog.cloudflare.com/cloudflare-1111-incident-on-june-27-2024


Introduction

On June 27, 2024, 1.1.1.1 became unreachable or heavily degraded from over 300 networks in 70 countries. The root cause was a mix of BGP (Border Gateway Protocol) hijacking and a route leak.

Cloudflare was an early adopter of Resource Public Key Infrastructure (RPKI) for route origin validation (ROV). With RPKI, IP prefix owners can store and share ownership information securely, and other operators can validate BGP announcements by comparing received BGP routes with what is stored in the form of Route Origin Authorizations (ROAs). When Route Origin Validation is enforced by networks properly and prefixes are signed via ROA, the impact of a BGP hijack is greatly limited. Despite increased adoption of RPKI over the past several years and 1.1.1.0/24 being a signed resource, during the incident 1.1.1.1/32 was originated by ELETRONET S.A. (AS267613) and accepted by multiple networks, including at least one Tier 1 provider who accepted 1.1.1.1/32 as a blackhole route. This ultimately caused immediate unreachability for the DNS resolver address from much of the world.

Route leaks are something Cloudflare has written and talked about before, and unfortunately there are only best-effort safeguards in wide deployment today, such as IRR (Internet Routing Registry) prefix-list filtering by providers. During the same period of time as the 1.1.1.1/32 hijack, 1.1.1.0/24 was erroneously leaked upstream by Nova Rede de Telecomunicações Ltda (AS262504). The leak was further and widely propagated by Peer-1 Global Internet Exchange (AS1031), which also contributed to the impact felt by customers during the incident.

We apologize for the impact felt by users of 1.1.1.1, and take the operation of the service very seriously. Although the root cause of the impact was external to Cloudflare, we will continue to improve the detection methods in place to yield quicker response times, and will use our stance within the Internet community to further encourage adoption of RPKI-based hijack and leak prevention mechanisms such as Route Origin Validation (ROV) and Autonomous Systems Provider Authorization (ASPA) objects for BGP.

Background

Cloudflare introduced the 1.1.1.1 public DNS resolver service in 2018. Since the announcement, 1.1.1.1 has become one of the most popular resolver IP addresses that is free-to-use by anyone. Along with the popularity and easily recognized IP address comes some operational difficulties. The difficulties stem from historical use of 1.1.1.1 by networks in labs or as a testing IP address, resulting in some residual unexpected traffic or blackholed routing behavior. Because of this, Cloudflare is no stranger to dealing with the effects of BGP misrouting traffic, two of which are covered below.

BGP hijacks

Some of the difficulty comes from potential routing hijacks of 1.1.1.1. For example, if some fictitious FooBar Networks (AS65001) assigns 1.1.1.1/32 to one of their routers and shares this prefix within their internal network, their customers will have difficulty routing to the 1.1.1.1 DNS service. If they advertise the 1.1.1.1/32 prefix outside their immediate network, the impact can be even greater. The reason 1.1.1.1/32 would be selected instead of the 1.1.1.0/24 BGP-announced by Cloudflare is due to Longest Prefix Matching (LPM). While many prefixes in a route table could match the 1.1.1.1 address, such as 1.1.1.0/24, 1.1.1.0/29, and 1.1.1.1/32, 1.1.1.1.1/32 is considered the “longest match” by the LPM algorithm because it has the highest number of identical bits and longest subnet mask while matching the 1.1.1.1 address. In simple terms, we would call 1.1.1.1/32 the “most specific” route available to 1.1.1.1.

Instead of traffic toward 1.1.1.1 routing to Cloudflare via anycast and landing on one of our servers globally, it will instead land somewhere on a device within FooBar Networks where 1.1.1.1 is terminated, and a legitimate response will fail to be served back to clients. This would be considered a hijack of requests to 1.1.1.1, either done purposefully or accidentally by network operators within FooBar Networks.

BGP route leaks

Another source of impact we sometimes face for 1.1.1.1 is BGP route leaks. A route leak occurs when a network becomes an upstream, in terms of BGP announcement, for a network it shouldn’t be an upstream provider for.

Here is an example of a route leak where a customer forward routes learned from one provider to another, causing a type 1 leak (defined in RFC7908).

If enough networks within the Default-Free Zone (DFZ) accept a route leak, it may be used widely for forwarding traffic along the bad path. Often this will cause the network leaking the prefixes to overload, as they aren’t prepared for the amount of global traffic they are now attracting. We wrote about a wide-scale route leak that knocked off a large portion of the Internet, when a provider in Pennsylvania attracted traffic toward global destinations it would have typically never transited traffic for. Even though Cloudflare interconnects with over 13,000 networks globally, the BGP local-preference assigned to a leaked route could be higher than the route received by a network directly from Cloudflare. This sounds counterproductive, but unfortunately it can happen.

To explain why this happens, it helps to think of BGP as a business policy engine along with the routing protocol for the Internet. A transit provider has customers who pay them to transport their data, so logically they assign a higher BGP local-preference than connections with either private or Internet Exchange (IX) peers, so the connection being paid for is most utilized. Think of local-preference as a way of influencing priority of which outgoing connection to send traffic to. Different networks also may choose to prefer Private Network Interconnects (PNIs) over Internet Exchange (IX) received routes. Part of the reason for this is reliability, as a private connection can be viewed as a point-to-point connection between two networks with no third-party managed fabric in between to worry about. Another reason could be cost efficiency, as if you’ve gone to the trouble to allocate a router port and purchase a cross connect between yourself and another peer, you’d like to make use of it to get the best return on your investment.

It is worth noting that both BGP hijacks and route leaks can happen to any IP and prefix on the Internet, not just 1.1.1.1. But as mentioned earlier, 1.1.1.1 is such a recognizable and historically misappropriated address that it tends to be more prone to accidental hijacks or leaks than other IP resources.

During the Cloudflare 1.1.1.1 incident that happened on June 27, 2024, we ended up fighting the impact caused by a combination of both BGP hijacking and a route leak. One alone is enough to cripple a service, but the two paired together can be significantly more impactful.

Incident timeline and impact

All timestamps are in UTC.

2024-06-27 18:51:00 AS267613 (Eletronet) begins announcing 1.1.1.1/32 to peers, providers, and customers. 1.1.1.1/32 is announced with the AS267613 origin AS

2024-06-27 18:52:00 AS262504 (Nova) leaks 1.1.1.0/24, also received from AS267613, upstream to AS1031 (PEER 1 Global Internet Exchange) with AS path “1031 262504 267613 13335”

2024-06-27 18:52:00 AS1031 (upstream of Nova) propagates 1.1.1.0/24 to various Internet Exchange peers and route-servers, widening impact of the leak

2024-06-27 18:52:00 One tier 1 provider receives the 1.1.1.1/32 announcement from AS267613 as a RTBH (Remote Triggered Blackhole) route, causing blackholed traffic for all the tier 1’s customers

2024-06-27 20:03:00 Cloudflare raises internal incident for 1.1.1.1 reachability issues from various countries

2024-06-27 20:08:00 Cloudflare disables a partner peering location with AS267613 that is receiving traffic toward 1.1.1.0/24

2024-06-27 20:08:00 Cloudflare team engages peering partner AS267613 about the incident

2024-06-27 20:10:00 AS262504 leaks 1.1.1.0/24 with a new AS path, “262504 53072 7738 13335” which is also redistributed by AS1031. Traffic is being delivered successfully to Cloudflare when along this path, but with high latency for affected clients

2024-06-27 20:17:00 Cloudflare engages AS262504 regarding the route leak of 1.1.1.0/24 to their upstream providers

2024-06-27 21:56:00 Cloudflare engineers disable a second peering point with AS267613 that is receiving traffic meant for 1.1.1.0/24 from multiple sources not in Brazil

2024-06-27 22:16:00 AS262504 leaks 1.1.1.0/24 again, attracting some traffic to a Cloudflare peering with AS267613 in São Paulo. Some 1.1.1.1 requests as a result are returned with higher latency, but the hijack of 1.1.1.1/32 and traffic blackholing appears resolved

2024-06-28 02:28:00 AS262504 fully resolves the route leak of 1.1.1.0/24

The impact to customers surfaced in one of two ways: unable to reach 1.1.1.1 at all; Able to reach 1.1.1.1, but with high latency per request.

Since AS267613 was hijacking the 1.1.1.1/32 address somewhere within their network, many requests failed at some device in their autonomous system. There were intermittent periods, or flaps, during the incident where they successfully routed requests toward 1.1.1.1 to Cloudflare data centers, albeit with high latency.

Looking at two source countries during the incident, Germany and the United States, impacted traffic to 1.1.1.1 looked like this:

Source Country of Users:

Cloudflare Data Center city:

Looking at the graphs, requests to 1.1.1.1 were landing in Brazilian data centers. The gaps between the spikes are when 1.1.1.1 requests were blackholed prior to or within AS267613, and the spikes themselves are when traffic was delivered to Cloudflare with high latency invoked on the request and response. The brief spikes of traffic successfully carried to the Cloudflare peering location with AS267613 could be explained by the 1.1.1.1/32 route flapping within their network, occasionally letting traffic through to Cloudflare instead of it dropping somewhere in the intermediate path.

Technical description of the error and how it happened

Normally, a request to 1.1.1.1 from users routes to the nearest data center via BGP anycast. During the incident, AS267613 (Eletronet) advertised 1.1.1.1/32 to their peers and upstream providers, and AS262504 leaked 1.1.1.0/24 upstream, changing the normal path of BGP anycast for multiple eyeball networks drastically.

With public route collectors and the monocle tool, we can search for the rogue BGP updates.

monocle search --start-ts 2024-06-27T18:51:00Z --end-ts 2024-06-27T18:55:00Z --prefix '1.1.1.1/32'

A|1719514377.130203|206.126.236.209|398465|1.1.1.1/32|398465 267613|IGP|206.126.236.209|0|0||false|||route-views.eqix
–
A|1719514377.681932|206.82.104.185|398465|1.1.1.1/32|398465 267613|IGP|206.82.104.185|0|0|13538:1|false|||route-views.ny
–
A|1719514388.996829|198.32.132.129|13760|1.1.1.1/32|13760 267613|IGP|198.32.132.129|0|0||false|||route-views.telxatl

We see above that AS398465 and AS13760 reported to the route-views collectors that they received 1.1.1.1/32 from AS267613 around the time impact begins. Normally, the longest IPv4 prefix accepted in the Default-Free-Zone (DFZ) is a /24, but in this case we observed multiple networks using the 1.1.1.1/32 route from AS267613 for forwarding, made apparent by the blackholing of traffic that never arrived at a Cloudflare POP (Point of Presence). The origination of 1.1.1.1/32 by AS267613 is a BGP route hijack. They were announcing the prefix with origin AS267613 even though the Route Origin Authorization (ROA) is only signed for origin AS13335 (Cloudflare) with a maximum prefix length of /24.

We even saw BGP updates for 1.1.1.1/32 when looking at our own BMP (BGP Monitoring Protocol) data at Cloudflare. From at least a couple different route servers, we received our own 1.1.1.1/32 announcement via BGP. Thankfully, Cloudflare rejects these routes on import as both RPKI Invalid and DFZ Invalid due to invalid AS origin and prefix length. The BMP data display is pre-policy, meaning even though we rejected the route we can see where we receive the BGP update over a peering session.

So not only are multiple networks accepting prefixes that should not exist in the global routing table, but they are also accepting an RPKI (Resource Public Key Infrastructure) Invalid route. To make matters worse, one Tier-1 transit provider accepted the 1.1.1.1/32 announcement as a RTBH (Remote-Triggered Blackhole) route from AS267613, discarding all traffic at their edge that would normally route to Cloudflare. This alone caused wide impact, as any networks leveraging this particular Tier-1 provider in routing to 1.1.1.1 would have been unable to reach the IP address during the incident.

For those unfamiliar with Remote-Triggered Blackholing, it is a method of signaling to a provider a set of destinations you would like traffic to be dropped for within their network. It exists as a blunt method of fighting off DDoS attacks. When you are being attacked on a specific IP or prefix, you can tell your upstream provider to absorb all traffic toward that destination IP address or prefix by discarding it before it comes to your network port. The problem during this incident was AS267613 was unauthorized to blackhole 1.1.1.1/32. Cloudflare only should have the sole right to leverage RTBH for discarding of traffic destined for AS13335, which is something we would in reality never do.

Looking now at BGP updates for 1.1.1.0/24 multiple networks received the prefix from AS262504 and accepted it.

~> monocle search --start-ts 2024-06-27T20:10:00Z --end-ts 2024-06-27T20:13:00Z --prefix '1.1.1.0/24' --as-path ".* 267613 13335" --include-sub

.. some advertisements removed for brevity ..

A|1719519011.378028|187.16.217.158|1031|1.1.1.0/24|1031 262504 267613 13335|IGP|187.16.217.158|0|0|1031:1031 1031:4209 1031:6045 1031:7019 1031:8010|false|13335|162.158.177.1|route-views2.saopaulo
–
A|1719519011.629398|45.184.147.17|1031|1.1.1.0/24|1031 262504 267613 13335|IGP|45.184.147.17|0|0|1031:1031 1031:4209 1031:4259 1031:6045 1031:7019 1031:8010|false|13335|162.158.177.1|route-views.fortaleza
–
A|1719519036.943174|80.249.210.99|50763|1.1.1.0/24|50763 1031 262504 267613 13335|IGP|80.249.210.99|0|0|1031:1031 50763:400|false|13335|162.158.177.1|route-views.amsix
–
A|1719519037|80.249.210.99|50763|1.1.1.0/24|50763 1031 262504 267613 13335|IGP|80.249.210.99|0|0|1031:1031 50763:400|false|13335|162.158.177.1|rrc03
–
A|1719519087.4546|45.184.146.59|199524|1.1.1.0/24|199524 1031 262504 267613 13335|IGP|45.184.147.17|0|0||false|13335|162.158.177.1|route-views.fortaleza
A|1719519087.464375|45.184.147.74|264409|1.1.1.0/24|264409 1031 262504 267613 13335|IGP|45.184.147.74|0|0|65100:7010|false|13335|162.158.177.1|route-views.fortaleza
–
A|1719519096.059558|190.15.124.18|61568|1.1.1.0/24|61568 262504 267613 13335|IGP|190.15.124.18|0|0|1031:1031 1031:4209 1031:6045 1031:7019 1031:8010|false|13335|162.158.177.1|route-views3
–
A|1719519128.843415|190.15.124.18|61568|1.1.1.0/24|61568 262504 267613 13335|IGP|190.15.124.18|0|0|1031:1031 1031:4209 1031:6045 1031:7019 1031:8010|false|13335|162.158.177.1|route-views3

Here we pay attention to the AS path again. This time, AS13335 is the origin AS at the very end of the announcements. This BGP announcement is RPKI Valid, because the origin is correctly AS13335, but this is a route leak of 1.1.1.0/24 because the path itself is invalid.

How do we know it’s a route leak?

Looking at an example path, “199524 1031 262504 267613 13335”, AS267613 is functionally a peer of AS13335 and should not share the 1.1.1.0/24 announcement with their peers or upstreams, only their customers (AS Cone). AS262504 is a customer of AS267613 and the next adjacent ASN in the path, so that particular announcement is fine up until this point. Where the 1.1.1.0/24 goes wrong is AS262504, when they announce the prefix to their upstream AS1031. Furthermore, AS1031 redistributed the advertisement to their peers.

This means AS262504 is the leaking network. AS1031 accepted the leak from their customer, AS262504, and caused wide impact by distributing the route in multiple peering locations globally. AS1031 (Peer-1 Global Internet Exchange) advertises themselves as a global peering exchange. Cloudflare is not a customer of AS1031, so 1.1.1.0/24 should have never been redistributed to peers, route-servers, or upstreams of AS1031. It appears that AS1031 does not perform any extensive filtering for customer BGP sessions, and instead just matches on adjacency (in this case, AS262504) and redistributes everything that meets this criteria. Unfortunately, this is irresponsible of AS1031 and causes direct impact to 1.1.1.1 and potentially other services that fall victim to the unguarded route propagation. While the original leaking network was AS262504, impact was greatly amplified by AS1031 and others when they accepted the hijack or leak and further distributed the announcements.

During the majority of the incident, the leak by AS262504 eventually landed requests within AS267613, which was discarding 1.1.1.1/32 traffic somewhere in their network. To that end, AS262504 really just amplified the impact in terms of 1.1.1.1 unreachability by leaking routes upstream.

To limit impact of the route leak, Cloudflare disabled peering in multiple locations with AS267613. The problem did not completely go away, as AS262504 was still leaking a stale path pointing to São Paulo. Requests landing in São Paulo were able to be served, albeit with a high round-trip time back to users. Cloudflare has been engaging with all networks mentioned throughout this post in regard to the leak and future prevention mechanisms, as well as at least one Tier 1 transit provider who accepted 1.1.1.1/32 from AS267613 as a blackhole route that was unauthorized by Cloudflare and caused widespread impact.

Remediation and follow-up steps

BGP hijacks

RPKI origin validation
RPKI has recently reached a major milestone at 50% deployment in terms of prefixes signed by Route Origin Authorization (ROA). While RPKI certainly helps limit the spread of a hijacked BGP prefix throughout the Internet, we need all networks to do their part, especially major networks with a large sum of downstream Autonomous Systems (AS’s). During the hijack of 1.1.1.1/32, multiple networks accepted and used the route announced by AS267613 for traffic forwarding.

RPKI and Remote-Triggered Blackholing (RTBH)
A significant amount of the impact caused during this incident was due to a Tier 1 provider accepting 1.1.1.1/32 as a blackhole route from a third party that is not Cloudflare. This in itself is a hijack of 1.1.1.1, and a very dangerous one. RTBH is a useful tool used by many networks when desperate for a mitigation against large DDoS attacks. The problem is the BGP filtering used for RTBH is loose in nature, relying often only on AS-SET objects found in Internet Routing Registries. Relying on Route Origin Authorization (ROA) would be infeasible for RTBH filtering, as that would require thousands of potential ROAs be created for the network the size of Cloudflare. Not only this, but creating specific /32 entries opens up the potential for an individual address such as 1.1.1.1/32 being announced by someone pretending to be AS13335, becoming the best route to 1.1.1.1 on the Internet and causing severe impact.

AS-SET filtering is not representative of authority to blackhole a route, such as 1.1.1.1/32. Only Cloudflare should be able to blackhole a destination it has the rights to operate. A potential way to fix the lenient filtering of providers on RTBH sessions would again be leveraging an RPKI. Using an example from the IETF, the expired draft-spaghetti-sidrops-rpki-doa-00 proposal specified a Discard Origin Authorization (DOA) object that would be used to authorize only specific origins to authorize a blackhole action for a prefix. If such an object was signed, and RTBH requests validated against the object, the unauthorized blackhole attempt of 1.1.1.1/32 by AS267613 would have been invalid instead of accepted by the Tier 1 provider.

BGP best practices
Simply following BGP best practices laid out by MANRS, and rejecting IPv4 prefixes that are longer than a /24 in the Default-Free Zone (DFZ) would have reduced impact to 1.1.1.1. Rejecting invalid prefix lengths within the wider Internet should be part of a standard BGP policy for all networks.

BGP route leaks

Route leak detection

While route leaks are not unavoidable for Cloudflare today, because the Internet inherently relies on trust for interconnection, there are some steps we will take to limit impact.

We have expanded data sources to use for our route leak detection system to cover more networks and are in the process of incorporating real-time data into the detection system to allow more timely response toward similar events in the future.

ASPA for BGP

We will continue advocating for the adoption of RPKI into AS Path based leak route leak prevention. Autonomous System Provider Authorization (ASPA) objects are similar to ROAs, except instead of signing prefixes with an authorized origin AS, the AS itself is signed with a list of provider networks that are allowed to propagate their routes. So, in the case of Cloudflare, only valid upstream transit providers would be signed as authorized to advertise AS13335 prefixes such as 1.1.1.0/24 upstream.

In the route leak example where AS262504 (customer of AS267613) shared 1.1.1.0/24 upstream, BGP ASPA would see this as an invalid valley leak if AS267613 had signed their authorized providers and AS1031 had validated paths against that list. Similar to RPKI origin validation, however, this will be a long-term effort and dependent on networks, especially large providers, rejecting invalid AS paths as based on ASPA objects.

Other potential approaches

There are alternative approaches to ASPA that do exist, in various stages of adoption that may be worth noting. There is no guarantee that the following make it to a stage of wide Internet deployment, however.

RFC9234, for example, uses a concept of peer roles within BGP capabilities and attributes, and depending on the configuration of routers along a path for updates, an “Only-To-Customer” (OTC) attribute can be added to prefixes that will prevent the upstream spread of a prefix such as 1.1.1.0/24 along a leaked path. The downside is BGP configuration needs to be completed to assign the various roles to each peering session, and vendor adoption still has to be fully ironed out to make this feasible for actual use in production with positive results.

Like all approaches to solving route leaks, cooperation amongst network operators on the Internet is required for success.

Conclusion

Cloudflare’s 1.1.1.1 DNS resolver service fell victim to a simultaneous BGP hijack and BGP route leak event. While the actions of external networks are outside of Cloudflare’s direct control, we intend to take every step within both the Internet community and internally at Cloudflare to detect impact more quickly and lessen impact to our users.

Long term, Cloudflare continues to support adoption of RPKI-based origin validation, as well as AS path validation. The former exists with deployment across a wide array of the world’s largest networks, and the latter is still in draft phase at the IETF (Internet Engineering Task Force). In the meantime, to check if your ISP is enforcing RPKI origin validation, you can always visit isbgpsafeyet.com and Test Your ISP.