All posts by Mingwei Zhang

Monitoring AS-SETs and why they matter

2025-09-26 Mingwei Zhang

Post Syndicated from Mingwei Zhang original https://blog.cloudflare.com/monitoring-as-sets-and-why-they-matter/

Introduction to AS-SETs

An AS-SET, not to be confused with the recently deprecated BGP AS_SET, is an Internet Routing Registry (IRR) object that allows network operators to group related networks together. AS-SETs have been used historically for multiple purposes such as grouping together a list of downstream customers of a particular network provider. For example, Cloudflare uses the AS13335:AS-CLOUDFLARE AS-SET to group together our list of our own Autonomous System Numbers (ASNs) and our downstream Bring-Your-Own-IP (BYOIP) customer networks, so we can ultimately communicate to other networks whose prefixes they should accept from us.

In other words, an AS-SET is currently the way on the Internet that allows someone to attest the networks for which they are the provider. This system of provider authorization is completely trust-based, meaning it’s not reliable at all, and is best-effort. The future of an RPKI-based provider authorization system is coming in the form of ASPA (Autonomous System Provider Authorization), but it will take time for standardization and adoption. Until then, we are left with AS-SETs.

Because AS-SETs are so critical for BGP routing on the Internet, network operators need to be able to monitor valid and invalid AS-SET memberships for their networks. Cloudflare Radar now introduces a transparent, public listing to help network operators in our routing page per ASN.

AS-SETs and building BGP route filters

AS-SETs are a critical component of BGP policies, and often paired with the expressive Routing Policy Specification Language (RPSL) that describes how a particular BGP ASN accepts and propagates routes to other networks. Most often, networks use AS-SET to express what other networks should accept from them, in terms of downstream customers.

Back to the AS13335:AS-CLOUDFLARE example AS-SET, this is published clearly on PeeringDB for other peering networks to reference and build filters against.

When turning up a new transit provider service, we also ask the provider networks to build their route filters using the same AS-SET. Because BGP prefixes are also created in IRR registries using the route or route6 objects, peers and providers now know what BGP prefixes they should accept from us and deny the rest. A popular tool for building prefix-lists based on AS-SETs and IRR databases is bgpq4, and it’s one you can easily try out yourself.

For example, to generate a Juniper router’s IPv4 prefix-list containing prefixes that AS13335 could propagate for Cloudflare and its customers, you may use:

% bgpq4 -4Jl CLOUDFLARE-PREFIXES -m24 AS13335:AS-CLOUDFLARE | head -n 10
policy-options {
replace:
 prefix-list CLOUDFLARE-PREFIXES {
    1.0.0.0/24;
    1.0.4.0/22;
    1.1.1.0/24;
    1.1.2.0/24;
    1.178.32.0/19;
    1.178.32.0/20;
    1.178.48.0/20;

^{Restricted to 10 lines, actual output of prefix-list would be much greater}

This prefix list would be applied within an eBGP import policy by our providers and peers to make sure AS13335 is only able to propagate announcements for ourselves and our customers.

How accurate AS-SETs prevent route leaks

Let’s see how accurate AS-SETs can help prevent route leaks with a simple example. In this example, AS64502 has two providers – AS64501 and AS64503. AS64502 has accidentally messed up their BGP export policy configuration toward the AS64503 neighbor, and is exporting all routes, including those it receives from their AS64501 provider. This is a typical Type 1 Hairpin route leak.

Fortunately, AS64503 has implemented an import policy that they generated using IRR data including AS-SETs and route objects. By doing so, they will only accept the prefixes that originate from the AS Cone of AS64502, since they are their customer. Instead of having a major reachability or latency impact for many prefixes on the Internet because of this route leak propagating, it is stopped in its tracks thanks to the responsible filtering by the AS64503 provider network. Again it is worth keeping in mind the success of this strategy is dependent upon data accuracy for the fictional AS64502:AS-CUSTOMERS AS-SET.

Monitoring AS-SET misuse

Besides using AS-SETs to group together one’s downstream customers, AS-SETs can also represent other types of relationships, such as peers, transits, or IXP participations.

For example, there are 76 AS-SETs that directly include one of the Tier-1 networks, Telecom Italia / Sparkle (AS6762). Judging from the names of the AS-SETs, most of them are representing peers and transits of certain ASNs, which includes AS6762. You can view this output yourself at https://radar.cloudflare.com/routing/as6762#irr-as-sets

There is nothing wrong with defining AS-SETs that contain one’s peers or upstreams as long as those AS-SETs are not submitted upstream for customer->provider BGP session filtering. In fact, an AS-SET for upstreams or peer-to-peer relationships can be useful for defining a network’s policies in RPSL.

However, some AS-SETs in the AS6762 membership list such as AS-10099 look to attest customer relationships.

% whois -h rr.ntt.net AS-10099 | grep "descr"
descr:          CUHK Customer

We know AS6762 is transit free and this customer membership must be invalid, so it is a prime example of AS-SET misuse that would ideally be cleaned up. Many Internet Service Providers and network operators are more than happy to correct an invalid AS-SET entry when asked to. It is reasonable to look at each AS-SET membership like this as a potential risk of having higher route leak propagation to major networks and the Internet when they happen.

AS-SET information on Cloudflare Radar

Cloudflare Radar is a hub that showcases global Internet traffic, attack, and technology trends and insights. Today, we are adding IRR AS-SET information to Radar’s routing section, freely available to the public via both website and API access. To view all AS-SETs an AS is a member of, directly or indirectly via other AS-SETs, a user can visit the corresponding AS’s routing page. For example, the AS-SETs list for Cloudflare (AS13335) is available at https://radar.cloudflare.com/routing/as13335#irr-as-sets

The AS-SET data on IRR contains only limited information like the AS members and AS-SET members. Here at Radar, we also enhance the AS-SET table with additional useful information as follows.

Inferred ASN shows the AS number that is inferred to be the creator of the AS-SET. We use PeeringDB AS-SET information match if available. Otherwise, we parse the AS-SET name to infer the creator.
IRR Sources shows which IRR databases we see the corresponding AS-SET. We are currently using the following databases: AFRINIC, APNIC, ARIN, LACNIC, RIPE, RADB, ALTDB, NTTCOM, and TC.
AS Members and AS-SET members show the count of the corresponding types of members.
AS Cone is the count of the unique ASNs that are included by the AS-SET directly or indirectly.
Upstreams is the count of unique AS-SETs that includes the corresponding AS-SET.

Users can further filter the table by searching for a specific AS-SET name or ASN. A toggle to show only direct or indirect AS-SETs is also available.

In addition to listing AS-SETs, we also provide a tree-view to display how an AS-SET includes a given ASN. For example, the following screenshot shows how as-delta indirectly includes AS6762 through 7 additional other AS-SETs. Users can copy or download this tree-view content in the text format, making it easy to share with others.

We built this Radar feature using our publicly available API, the same way other Radar websites are built. We have also experimented using this API to build additional features like a full AS-SET tree visualization. We encourage developers to give this API (and other Radar APIs) a try, and tell us what you think!

Looking ahead

We know AS-SETs are hard to keep clean of error or misuse, and even though Radar is making them easier to monitor, the mistakes and misuse will continue. Because of this, we as a community need to push forth adoption of RFC9234 and implementations of it from the major vendors. RFC9234 embeds roles and an Only-To-Customer (OTC) attribute directly into the BGP protocol itself, helping to detect and prevent route leaks in-line. In addition to BGP misconfiguration protection with RFC9234, Autonomous System Provider Authorization (ASPA) is still making its way through the IETF and will eventually help offer an authoritative means of attesting who the actual providers are per BGP Autonomous System (AS).

If you are a network operator and manage an AS-SET, you should seriously consider moving to hierarchical AS-SETs if you have not already. A hierarchical AS-SET looks like AS13335:AS-CLOUDFLARE instead of AS-CLOUDFLARE, but the difference is very important. Only a proper maintainer of the AS13335 ASN can create AS13335:AS-CLOUDFLARE, whereas anyone could create AS-CLOUDFLARE in an IRR database if they wanted to. In other words, using hierarchical AS-SETs helps guarantee ownership and prevent the malicious poisoning of routing information.

While keeping track of AS-SET memberships seems like a chore, it can have significant payoffs in preventing BGP-related incidents such as route leaks. We encourage all network operators to do their part in making sure the AS-SETs you submit to your providers and peers to communicate your downstream customer cone are accurate. Every small adjustment or clean-up effort in AS-SETs could help lessen the impact of a BGP incident later.

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (X), https://noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

Bringing connections into view: real-time BGP route visibility on Cloudflare Radar

2025-05-21 Mingwei Zhang

Post Syndicated from Mingwei Zhang original https://blog.cloudflare.com/bringing-connections-into-view-real-time-bgp-route-visibility-on-cloudflare/

The Internet relies on the Border Gateway Protocol (BGP) to exchange IP address reachability information. This information outlines the path a sender or router can use to reach a specific destination. These paths, conveyed in BGP messages, are sequences of Autonomous System Numbers (ASNs), with each ASN representing an organization that operates its own segment of Internet infrastructure.

Throughout this blog post, we’ll use the terms “BGP routes” or simply “routes” to refer to these paths. In essence, BGP functions by enabling autonomous systems to exchange routes to IP address blocks (“IP prefixes”), allowing different entities across the Internet to construct their routing tables.

When network operators debug reachability issues or assess a resource’s global reach, BGP routes are often the first thing they examine. Therefore, it’s critical to have an up-to-date view of the routes toward the IP prefixes of interest. Some networks provide tools called “looking glasses” — public routing information services offering data directly from their own BGP routers. These allow external operators to examine routes from that specific network’s perspective. Furthermore, services like bgp.tools, bgp.he.net, RouteViews, or the NLNOG RING looking glass offer aggregated, looking glass-like lookup capabilities, drawing on data sources from multiple organizations rather than just a single one.

However, individual looking glass instances offer a limited scope, typically restricted to the infrastructure of the service provider’s network. While aggregated routing information services provide broader vantage points, they often lack the API access necessary for building automated tools on top of them. For example, systems designed for automated tasks, such as BGP leak or hijack detection, depend on programmatic API access.

We’re excited to introduce Cloudflare Radar’s new real-time BGP route lookup service, described below. Built using public data sources, this service provides visualizations of real-time routes directly on the corresponding IP prefix pages within Radar (see the page for 1.1.1.0/24 as an example). We are also offering API access through our free-to-use Cloudflare Radar API, empowering developers to leverage this data to build their own innovative systems and tools.

Cloudflare Radar provides real-time routes

We are excited to announce the launch of our new real-time BGP route lookup service, now accessible through both Cloudflare Radar web interface and the Cloudflare Radar API. This enhancement provides users with a near instantaneous view into global BGP routing data.

Cloudflare Radar prefix pages

Cloudflare Radar’s real-time routes feature now offers a Sankey diagram illustrating the BGP routes for a given prefix. To minimize visual complexity, the visualization displays routes directed towards the Tier 1 networks. For example, the diagram below shows that 1.1.1.0/24 is announced by AS13335 (Cloudflare) and that Cloudflare has direct connections to almost all U.S.-based and international Tier 1 network providers.

Expanding on this more concise view, users also have the option to ‘Show full paths’ and visualize every BGP route from the prefix of interest to the collectors. (The role of the collectors in gathering this data is discussed below.) The interactive view allows panning and zooming, and hovering over the links provides tooltip information on which collector saw the route and when it was last updated.

For both views, the prefix origin table is displayed above the route’s visualization. The table shows the originating Autonomous System (AS), the visibility percentage (representing the proportion of route collectors observing the origin ASN announcement), and RPKI validation outcomes.

During a recently detected BGP misconfiguration, we saw two origin ASNs for a prefix, with AS3 incorrectly used instead of the intended origin being prepended three times. The visualization reveals AS3 as RPKI invalid with low visibility, indicating limited network acceptance. Operators can analyze these issues visually or in the table and monitor real-time corrections by refreshing the page.

Whether facing network outages, implementing new deployments, or investigating route leaks, users can leverage this feature for any scenario where a clear, global understanding of a prefix’s routing paths is essential.

To allow easier access to this information, users can now search for any prefix using the Radar search bar and navigate to the corresponding prefix routing pages. Prefixes involved in BGP route leak and origin hijack events are also linked to this enhanced routing information page, helping operators debug BGP anomalies in real-time.

Cloudflare Routes API

Cloudflare Radar real-time route data is also accessible via the Radar API, and users can follow this guide to get started.

The following example shows an HTTP GET request to query all the current routes for a prefix of interest:

curl -X GET 
"https://api.cloudflare.com/client/v4/radar/bgp/routes/realtime?prefix=1.1.1.0/24" -H 
"Authorization: Bearer <API_TOKEN>"

With the help of JSON data processing tools like jq, users can further filter data results by routes containing a certain ASN. In the following example, we make a request to ask for all current routes toward the prefix 1.1.1.0/24 and filter all routes with AS paths containing AS174:

curl -X GET
 "https://api.cloudflare.com/client/v4/radar/bgp/routes/realtime?prefix=1.1.1.0/24" \
    -H "Authorization: Bearer <API_TOKEN>" | \
jq '.result.routes[]|select(.as_path | contains([174]))'

The command output is a JSON array of route objects. Each object details a route that includes AS174 in its AS path. Additional information provided for each route includes the BGP route collector, BGP community values, and the timestamp of the last update.

{
  "as_path": [
    3130,
    174,
    13335
  ],
  "collector": "route-views2",
  "communities": [
    "174:21001",
    "174:22013",
    "3130:394"
  ],
  "peer_asn": 3130,
  "prefix": "1.1.1.0/24",
  "timestamp": "2025-05-14T00:00:00Z"
}
{
  "as_path": [
    263237,
    174,
    13335
  ],
  "collector": "rrc15",
  "communities": [
    "174:21001",
    "174:22013",
    "65237:174"
  ],
  "peer_asn": 263237,
  "prefix": "1.1.1.0/24",
  "timestamp": "2025-05-14T01:39:52Z"
}

The API also offers supplementary metadata alongside BGP route information, including insights into BGP route collector status and aggregated prefix-to-origin data. Recalling the earlier example of an AS path prepending misconfiguration, the RPKI invalid AS3 origin is now directly visible to users and API clients in the JSON response, showing that just 9% of all collectors observed its announcements.

"meta": {
  "collectors": [
    {
      "latest_realtime_ts": "2025-05-19T21:35:40Z",
      "latest_rib_ts": "2025-05-19T20:00:00Z",
      "latest_updates_ts": "2025-05-19T21:15:00Z",
      "peers_count": 24,
      "peers_v4_count": 0,
      "peers_v6_count": 24,
      "collector": "route-views6"
    },
  ],
  "prefix_origins": [
    {
      "origin": 3,
      "prefix": "2804:4e28::/32",
      "rpki_validation": "invalid",
      "total_peers": 121,
      "total_visible": 11,
      "visibility": 0.09090909090909091
    },
    {
      "origin": 268243,
      "prefix": "2804:4e28::/32",
      "rpki_validation": "valid",
      "total_peers": 121,
      "total_visible": 94,
      "visibility": 0.7768595041322314
    }
  ],
}

From archives to real-time

Architecture overview

Cloudflare Radar uses RIPE RIS and the University of Oregon’s RouteViews as our primary BGP data sources for services like the routing statistics widget, anomaly detection, and announced address space graphs. We have previously discussed in detail on how we use the data archives from these two providers to build Cloudflare Radar’s routing pages, and our route leak and hijack detection systems.

In brief, RIPE RIS and RouteViews maintain several BGP route collectors, each connected to BGP routers across a diverse set of networks. These routers forward BGP messages to the collectors, which generate periodic data dumps for public access. These data dumps include both collections of BGP message updates and full routing table snapshots (RIB dump files).

For services monitoring stable routing information, like global routing statistics, we process RIB dump files from the archives as they become available. Conversely, for detecting dynamic events such as route leaks and hijacks, we process periodic BGP update files in batches. Services depending on this historical BGP data may experience processing delays of 10 to 30 minutes at the route collectors.

For the new real-time BGP routes feature, we aim to reduce the data delay from minutes or tens of minutes down to seconds. With the real-time stream capability provided by the BGP archiver services — RIS Live WebSocket from RIPE RIS and OpenBMP Kafka stream from RouteViews — we designed an additional real-time data stream component that enhances the routes snapshots built with MRT archive files by constantly updating the snapshots.

System design

At its core, the system enables a user to look up a prefix’s routes stored in BGP routes snapshots. The BGP routes snapshots serve as a queryable data repository, organized hierarchically. The snapshots use a trie structure to allow for the retrieval of route information (such as AS paths and community values) associated with specific address prefixes. Each node in the hierarchy stores routing information from different peering routers, providing a consolidated global view. To handle the large data volumes from multiple BGP route collectors, the system partitions routing data into separate BGP routes snapshots, where each snapshot receives data streamed from its corresponding collector. This architecture enables horizontal scalability, allowing for dynamic adjustment of data sources by selecting which independent collectors’ data to include or exclude.

Because the collectors’ BGP route information is maintained independently, to query for a global status, we apply the actor model software architecture for the implementation. Each collector is considered an actor that runs completely independently and communicates with a central controller via a dedicated communication channel. The central controller starts all actors by sending a signal to each of them, triggering actors to start collecting archival and real-time BGP data, on their separate threads.

Upon queries from users, the central controller will relay the query to all running actors via a query message. The actors will retrieve the corresponding route information on its prefix-trie and return the results to the controller with another message. The controller aggregates all messages from the actors and compiles them into a reply response to the user. During the whole process, the real-time BGP streaming and snapshots’ updating processes continue to run in the background without interruptions.

Our actor-model implementation enables a single node to efficiently store hundreds of full routing tables in its memory. Our current deployment uses eight route collectors, housing a total of 261 full routing tables. This in-memory system operating on a single node consumes approximately 45 GB of memory, which translates to about 170 MB per full routing table.

Summary

Cloudflare Radar now offers a real-time BGP route lookup service, providing near-instantaneous insights into global Internet routing. This feature leverages real-time data streams from RouteViews and RIPE RIS, moving beyond historical archives to deliver up-to-the-minute information. Users can now visualize routes in real time on Cloudflare Radar’s prefix pages with intuitive Sankey diagrams that detail complete route information. Furthermore, the Cloudflare Radar API provides programmatic access to this data, allowing for seamless integration into custom tools and workflows.

Visit Cloudflare Radar for additional insights around Internet disruptions, routing issues, Internet traffic trends, attacks, and Internet quality. Follow us on social media at @CloudflareRadar (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via email.

Cloudflare Radar’s new BGP origin hijack detection system

2023-07-28 Mingwei Zhang

Post Syndicated from Mingwei Zhang original http://blog.cloudflare.com/bgp-highjack-detection/

Cloudflare Radar's new BGP origin hijack detection system

Border Gateway Protocol (BGP) is the de facto inter-domain routing protocol used on the Internet. It enables networks and organizations to exchange reachability information for blocks of IP addresses (IP prefixes) among each other, thus allowing routers across the Internet to forward traffic to its destination. BGP was designed with the assumption that networks do not intentionally propagate falsified information, but unfortunately that’s not a valid assumption on today’s Internet.

Malicious actors on the Internet who control BGP routers can perform BGP hijacks by falsely announcing ownership of groups of IP addresses that they do not own, control, or route to. By doing so, an attacker is able to redirect traffic destined for the victim network to itself, and monitor and intercept its traffic. A BGP hijack is much like if someone were to change out all the signs on a stretch of freeway and reroute automobile traffic onto incorrect exits.

You can learn more about BGP and BGP hijacking and its consequences in our learning center.

At Cloudflare, we have long been monitoring suspicious BGP anomalies internally. With our recent efforts, we are bringing BGP origin hijack detection to the Cloudflare Radar platform, sharing our detection results with the public. In this blog post, we will explain how we built our detection system and how people can use Radar and its APIs to integrate our data into their own workflows.

What is BGP origin hijacking?

Services and devices on the Internet locate each other using IP addresses. Blocks of IP addresses are called an IP prefix (or just prefix for short), and multiple prefixes from the same organization are aggregated into an autonomous system (AS).

Using the BGP protocol, ASes announce which routes can be imported or exported to other ASes and routers from their routing tables. This is called the AS routing policy. Without this routing information, operating the Internet on a large scale would quickly become impractical: data packets would get lost or take too long to reach their destinations.

During a BGP origin hijack, an attacker creates fake announcements for a targeted prefix, falsely identifying an autonomous systems (AS) under their control as the origin of the prefix.

In the following graph, we show an example where AS 4 announces the prefix P that was previously originated by AS 1. The receiving parties, i.e. AS 2 and AS 3, accept the hijacked routes and forward traffic toward prefix P to AS 4 instead.

As you can see, the normal and hijacked traffic flows back in the opposite direction of the BGP announcements we receive.

If successful, this type of attack will result in the dissemination of the falsified prefix origin announcement throughout the Internet, causing network traffic previously intended for the victim network to be redirected to the AS controlled by the attacker. As an example of a famous BGP hijack attack, in 2018 someone was able to convince parts of the Internet to reroute traffic for AWS to malicious servers where they used DNS to redirect MyEtherWallet.com, a popular crypto wallet, to a hacked page.

Prevention mechanisms and why they’re not perfect (yet)

The key difficulty in preventing BGP origin hijacks is that the BGP protocol itself does not provide a mechanism to validate the announcement content. In other words, the original BGP protocol does not provide any authentication or ownership safeguards; any route can be originated and announced by any random network, independent of its rights to announce that route.

To address this problem, operators and researchers have proposed the Resource Public Key Infrastructure (RPKI) to store and validate prefix-to-origin mapping information. With RPKI, operators can prove the ownership of their network resources and create ROAs, short for Route Origin Authorisations, cryptographically signed objects that define which Autonomous System (AS) is authorized to originate a specific prefix.

Cloudflare committed to support RPKI since the early days of the RFC. With RPKI, IP prefix owners can store and share the ownership information securely, and other operators can validate BGP announcements by checking the prefix origin to the information stored on RPKI. Any hijacking attempt to announce an IP prefix with an incorrect origin AS will result in invalid validation results, and such invalid BGP messages will be discarded. This validation process is referred to as route origin validation (ROV).

In order to further advocate for RPKI deployment and filtering of RPKI invalid announcements, Cloudflare has been providing a RPKI test service, Is BGP Safe Yet?, allowing users to test whether their ISP filters RPKI invalid announcements. We also provide rich information with regard to the RPKI status of individual prefixes and ASes at https://rpki.cloudflare.com/.

However, the effectiveness of RPKI on preventing BGP origin hijacks depends on two factors:

The ratio of prefix owners register their prefixes on RPKI;
The ratio of networks performing route origin validation.

Unfortunately, neither ratio is at a satisfactory level yet. As of today, July 27, 2023, only about 45% of the IP prefixes routable on the Internet are covered by some ROA on RPKI. The remaining prefixes are highly vulnerable to BGP origin hijacks. Even for the 45% prefix that are covered by some ROA, origin hijack attempts can still affect them due to the low ratio of networks that perform route origin validation (ROV). Based on our recent study, only 6.5% of the Internet users are protected by ROV from BGP origin hijacks.

Despite the benefits of RPKI and RPKI ROAs, their effectiveness in preventing BGP origin hijacks is limited by the slow adoption and deployment of these technologies. Until we achieve a high rate of RPKI ROA registration and RPKI invalid filtering, BGP origin hijacks will continue to pose a significant threat to the daily operations of the Internet and the security of everyone connected to it. Therefore, it’s also essential to prioritize developing and deploying BGP monitoring and detection tools to enhance the security and stability of the Internet's routing infrastructure.

Design of Cloudflare’s BGP hijack detection system

Our system comprises multiple data sources and three distinct modules that work together to detect and analyze potential BGP hijack events: prefix origin change detection, hijack detection and the storage and notification module.

The Prefix Origin Change Detection module provides the data, the Hijack Detection module analyzes the data, and the Alerts Storage and Delivery module stores and provides access to the results. Together, these modules work in tandem to provide a comprehensive system for detecting and analyzing potential BGP hijack events.

Prefix origin change detection module

At its core, the BGP protocol involves:

Exchanging prefix reachability (routing) information;
Deciding where to forward traffic based on the reachability information received.

The reachability change information is encoded in BGP update messages while the routing decision results are encoded as a route information base (RIB) on the routers, also known as the routing table.

In our origin hijack detection system, we focus on investigating BGP update messages that contain changes to the origin ASes of any IP prefixes. There are two types of BGP update messages that could indicate prefix origin changes: announcements and withdrawals.

Announcements include an AS-level path toward one or more prefixes. The path tells the receiving parties through which sequence of networks (ASes) one can reach the corresponding prefixes. The last hop of an AS path is the origin AS. In the following diagram, AS 1 is the origin AS of the announced path.

Withdrawals, on the other hand, simply inform the receiving parties that the prefixes are no longer reachable.

Both types of messages are stateless. They inform us of the current route changes, but provide no information about the previous states. As a result, detecting origin changes is not as straightforward as one may think. Our system needs to keep track of historical BGP updates and build some sort of state over time so that we can verify if a BGP update contains origin changes.

We didn't want to deal with a complex system like a database to manage the state of all the prefixes we see resulting from all the BGP updates we get from them. Fortunately, there's this thing called prefix trie in computer science that you can use to store and look up string-indexed data structures, which is ideal for our use case. We ended up developing a fast Rust-based custom IP prefix trie that we use to hold the relevant information such as the origin ASN and the AS path for each IP prefix and allows information to be updated based on BGP announcements and withdrawals.

The example figure below shows an example of the AS path information for prefix 192.0.2.0/24 stored on a prefix trie. When updating the information on the prefix trie, if we see a change of origin ASN for any given prefix, we record the BGP message as well as the change and create an Origin Change Signal.

The prefix origin changes detection module collects and processes live-stream and historical BGP data from various sources. For live streams, our system applies a thin layer of data processing to translate BGP messages into our internal data structure. At the same time, for historical archives, we use a dedicated deployment of the BGPKIT broker and parser to convert MRT files from RouteViews and RIPE RIS into BGP message streams as they become available.

After the data is collected, consolidated and normalized it then creates, maintains and destroys the prefix tries so that we can know what changed from previous BGP announcements from the same peers. Based on these calculations we then send enriched messages downstream to be analyzed.

Hijack detection module

Determining whether BGP messages suggest a hijack is a complex task, and no common scoring mechanism can be used to provide a definitive answer. Fortunately, there are several types of data sources that can collectively provide a relatively good idea of whether a BGP announcement is legitimate or not. These data sources can be categorized into two types: inter-AS relationships and prefix-origin binding.

The inter-AS relationship datasets include AS2org and AS2rel datasets from CAIDA/UCSD, AS2rel datasets from BGPKIT, AS organization datasets from PeeringDB, and per-prefix AS relationship data built at Cloudflare. These datasets provide information about the relationship between autonomous systems, such as whether they are upstream or downstream from one another, or if the origins of any change signal belong to the same organization.

Prefix-to-origin binding datasets include live RPKI validated ROA payload (VRP) from the Cloudflare RPKI portal, daily Internet Routing Registry (IRR) dumps curated and cleaned up by MANRS, and prefix and AS bogon lists (private and reserved addresses defined by RFC 1918, RFC 5735, and RFC 6598). These datasets provide information about the ownership of prefixes and the ASes that are authorized to originate them.

By combining all these data sources, it is possible to collect information about each BGP announcement and answer questions programmatically. For this, we have a scoring function that takes all the evidence gathered for a specific BGP event as the input and runs that data through a sequence of checks. Each condition returns a neutral, positive, or negative weight that keeps adding to the final score. The higher the score, the more likely it is that the event is a hijack attempt.

The following diagram illustrates this sequence of checks:

As you can see, for each event, several checks are involved that help calculate the final score: RPKI, Internet Routing Registry (IRR), bogon prefixes and ASNs lists, AS relationships, and AS path.

Our guiding principles are: if the newly announced origins are RPKI or IRR invalid, it’s more likely that it’s a hijack, but if the old origins are also invalid, then it’s less likely. We discard events about private and reserved ASes and prefixes. If the new and old origins have a direct business relationship, then it’s less likely that it’s a hijack. If the new AS path indicates that the traffic still goes through the old origin, then it’s probably not a hijack.

Signals that are deemed legitimate are discarded, while signals with a high enough confidence score are flagged as potential hijacks and sent downstream for further analysis.

It's important to reiterate that the decision is not binary but a score. There will be situations where we find false negatives or false positives. The advantage of this framework is that we can easily monitor the results, learn from additional datasets and conduct the occasional manual inspection, which allows us to adjust the weights, add new conditions and continue improving the score precision over time.

Aggregating BGP hijack events

Our BGP hijack detection system provides fast response time and requires minimal resources by operating on a per-message basis.

However, when a hijack is happening, the number of hijack signals can be overwhelming for operators to manage. To address this issue, we designed a method to aggregate individual hijack messages into BGP hijack events, thereby reducing the number of alerts triggered.

An event aggregates BGP messages that are coming from the same hijacker related to prefixes from the same victim. The start date is the same as the date of the first suspicious signal. To calculate the end of an event we look for one of the following conditions:

A BGP withdrawn message for the hijacked prefix: regardless of who sends the withdrawal, the route towards the prefix is no longer via the hijacker, and thus this hijack message is considered finished.
A new BGP announcement message with the previous (legitimate) network as the origin: this indicates that the route towards the prefix is reverted to the state before the hijack, and the hijack is therefore considered finished.

If all BGP messages for an event have been withdrawn or reverted, and there are no more new suspicious origin changes from the hijacker ASN for six hours, we mark the event as finished and set the end date.

Hijack events can capture both small-scale and large-scale attacks. Alerts are then based on these aggregated events, not individual messages, making it easier for operators to manage and respond appropriately.

Alerts, Storage and Notifications module

This module provides access to detected BGP hijack events and sends out notifications to relevant parties. It handles storage of all detected events and provides a user interface for easy access and search of historical events. It also generates notifications and delivers them to the relevant parties, such as network administrators or security analysts, when a potential BGP hijack event is detected. Additionally, this module can build dashboards to display high-level information and visualizations of detected events to facilitate further analysis.

Lightweight and portable implementation

Our BGP hijack detection system is implemented as a Rust-based command line application that is lightweight and portable. The whole detection pipeline runs off a single binary application that connects to a PostgreSQL database and essentially runs a complete self-contained BGP data pipeline. And if you are wondering, yes, the full system, including the database, can run well on a laptop.

The runtime cost mainly comes from maintaining the in-memory prefix tries for each full-feed router, each costing roughly 200 MB RAM. For the beta deployment, we use about 170 full-feed peers and the whole system runs well on a single 32 GB node with 12 threads.

Using the BGP Hijack Detection

The BGP Hijack Detection results are now available on both the Cloudflare Radar website and the Cloudflare Radar API.

Cloudflare Radar

Under the “Security & Attacks” section of the Cloudflare Radar for both global and ASN view, we now display the BGP origin hijacks table. In this table, we show a list of detected potential BGP hijack events with the following information:

The detected and expected origin ASes;
The start time and event duration;
The number of BGP messages and route collectors peers that saw the event;
The announced prefixes;
Evidence tags and confidence level (on the likelihood of the event being a hijack).

For each BGP event, our system generates relevant evidence tags to indicate why the event is considered suspicious or not. These tags are used to inform the confidence score assigned to each event. Red tags indicate evidence that increases the likelihood of a hijack event, while green tags indicate the opposite.

For example, the red tag "RPKI INVALID" indicates an event is likely a hijack, as it suggests that the RPKI validation failed for the announcement. Conversely, the tag "SIBLING ORIGINS" is a green tag that indicates the detected and expected origins belong to the same organization, making it less likely for the event to be a hijack.

Users can now access the BGP hijacks table in the following ways:

Global view under Security & Attacks page without location filters. This view lists the most recent 150 detected BGP hijack events globally.
When filtered by a specific ASN, the table will appear on Overview, Traffic, and Traffic & Attacks tabs.

Cloudflare Radar API

We also provide programmable access to the BGP hijack detection results via the Cloudflare Radar API, which is freely available under CC BY-NC 4.0 license. The API documentation is available at the Cloudflare API portal.

The following curl command fetches the most recent 10 BGP hijack events relevant to AS64512.

curl -X GET "https://api.cloudflare.com/client/v4/radar/bgp/hijacks/events?invlovedAsn=64512&format=json&per_page=10" \
    -H "Authorization: Bearer <API_TOKEN>"

Users can further filter events with high confidence by specifying the minConfidence parameter with a 0-10 value, where a higher value indicates higher confidence of the events being a hijack. The following example expands on the previous example by adding the minimum confidence score of 8 to the query:

curl -X GET "https://api.cloudflare.com/client/v4/radar/bgp/hijacks/events?invlovedAsn=64512&format=json&per_page=10&minConfidence=8" \
    -H "Authorization: Bearer <API_TOKEN>"

Additionally, users can also quickly build custom hijack alerters using a Cloudflare Workers + KV combination. We have a full tutorial on building alerters that send out webhook-based messages or emails (with Email Routing) available on the Cloudflare Radar documentation site.

More routing security on Cloudflare Radar

As we continue improving Cloudflare Radar, we are planning to introduce additional Internet routing and security data. For example, Radar will soon get a dedicated routing section to provide digestible BGP information for given networks or regions, such as distinct routable prefixes, RPKI valid/invalid/unknown routes, distribution of IPv4/IPv6 prefixes, etc. Our goal is to provide the best data and tools for routing security to the community, so that we can build a better and more secure Internet together.

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (Twitter), cloudflare.social/@radar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

Routing information now on Cloudflare Radar

2023-07-27 Mingwei Zhang

Post Syndicated from Mingwei Zhang original http://blog.cloudflare.com/radar-routing/

Routing information now on Cloudflare Radar

Routing is one of the most critical operations of the Internet. Routing decides how and where the Internet traffic should flow from the source to the destination, and can be categorized into two major types: intra-domain routing and inter-domain routing. Intra-domain routing handles making decisions on how individual packets should be routed among the servers and routers within an organization/network. When traffic reaches the edge of a network, the inter-domain routing kicks in to decide what the next hop is and forward the traffic along to the corresponding networks. Border Gateway Protocol (BGP) is the de facto inter-domain routing protocol used on the Internet.

Today, we are introducing another section on Cloudflare Radar: the Routing page, which focuses on monitoring the BGP messages exchanged to extract and present insights on the IP prefixes, individual networks, countries, and the Internet overall. The new routing data allows users to quickly examine routing status of the Internet, examine secure routing protocol deployment for a country, identify routing anomalies, validate IP block reachability and much more from globally distributed vantage points.

It’s a detailed view of how the Internet itself holds together.

Collecting routing statistics

The Internet consists of tens of thousands of interconnected organizations. Each organization manages its own internal networking infrastructure autonomously, and is referred to as an autonomous system (AS). ASes establish connectivity among each other and exchange routing information via BGP messages to form the current Internet.

When we open the Radar Routing page the “Routing Statistics” block provides a quick glance on the sizes and status of an autonomous system (AS), a country, or the Internet overall. The routing statistics component contains the following count information:

The number of ASes on the Internet or registered from a given country;
The number of distinct prefixes and the routes toward them observed on the global routing table, worldwide, by country, or by AS;
The number of routes categorized by Resource Public Key Infrastructure (RPKI) validation results (valid, invalid, or unknown).

We also show the breakdown of these numbers for IPv4 and IPv6 separately, so users may have a better understanding of such information with respect to different IP protocols.

For a given network, we also show the BGP announcements volume chart for the past week as well as other basic information like network name, registration country, estimated user count, and sibling networks.

Identifying routing anomalies

BGP as a routing protocol suffers from a number of security weaknesses. In the new Routing page we consolidate the BGP route leaks and BGP hijacks detection results in one single place, showing the relevant detected events for any given network or globally.

The BGP Route Leaks table shows the detected BGP route leak events. Each entry in the table contains the information about the related ASes of the leak event, start and end time, as well as other numeric statistics that reflect the scale and impact of the event. The BGP Origin Hijacks table shows the detected potential BGP origin hijacks. Apart from the relevant ASes, time, and impact information, we also show the key evidence that we collected for each event to provide additional context on why and how likely one event being a BGP hijack.

With this release, we introduce another anomaly detection: RPKI Invalid Multiple Origin AS (MOAS) is one type of routing conflict where multiple networks (ASes) originate the same IP prefixes at the same time, which goes against the best practice recommendation. Our system examines the most recent global routing tables and identifies MOASes on the routing tables. With the help of Resource Public Key Infrastructure (RPKI), we can further identify MOAS events that have origins that were proven RPKI invalid, which are less likely to be legitimate cases. Users and operators can quickly identify such anomalies relevant to the networks of interest and take actions accordingly.

The Routing page will be the permanent home for all things BGP and routing data in the future; we will gradually introduce more anomaly detections and improve our pipeline to provide more security insights.

Examining routing assets and connectivity

Apart from examining the overall routing statistics and anomalies, we also gather information on the routing assets (IP prefixes for a network and networks for a country) and networks’ connectivity.

Tens of thousands autonomous systems (ASes) connect to each other to form the current Internet. The ASes differ in size and operate in different geolocations. Generally, larger networks are more well-connected and considered “upstream” and smaller networks are less connected and considered “downstream” on the Internet. Below is an example connectivity diagram showing how two smaller networks may connect to each other. AS1 announces its IP prefixes to its upstream providers and propagates upwards until it reaches the large networks AS3 and AS4, and then the route propagates downstream to smaller networks until it reaches AS6.

In the routing page, we examine what IP prefixes any given AS originates, as well as the interconnections among ASes. We show the full list of IP prefixes originated for any given AS, including the breakdown lists by RPKI validation status. We also show the detected connectivity among other ASes categorized into upstream, downstream and peering connections. Users can easily search for any ASes upstream, downstream, or peers.

For a given country, we show the full list of networks registered in the country, sorted by the number of IP prefixes originated from the corresponding networks. This allows users to quickly glance and find networks from any given country. The table is also searchable by network name or AS number.

Routing data API access

Like all the other data, the Cloudflare Radar Routing data is powered by our developer API. The data API is freely available under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0) license. In the following table, we list all of our data APIs available at launch. As we improve the routing section, we will introduce more APIs in the future.

API	Type
Get BGP origin hijack events	Anomaly detection
Get BGP route leak events	Anomaly detection
Get MOASes	Anomaly detection
Get BGP routing table stats	Routing information
Get prefix-to-origin mapping	Routing information
Get all ASes registered in a country	Routing information
Get AS-level relationship	Routing information

Example 1: lookup origin AS for a given prefix with cURL

The Cloudflare Radar prefix-to-origin mapping API returns the matching prefix-origin pairs observed on the global routing tables, allowing users to quickly examine the networks that originate a given prefix or listing all the prefixes a network originates.

In this example, we ask of “which network(s) originated the prefix 1.1.1.0/24?” using the following cURL command:

curl --request GET \
  --url "https://api.cloudflare.com/client/v4/radar/bgp/routes/pfx2as?prefix=1.1.1.0/24" \
  --header 'Content-Type: application/json' \
--header "Authorization: Bearer YOUR_TOKEN"

The returned JSON result shows that Cloudflare (AS13335) originates the prefix 1.1.1.0/24 and it is a RPKI valid origin. It also returns the meta information such as the UTC timestamp of the query as well as when the dataset is last updated (data_time field).

{
  "success": true,
  "errors": [],
  "result": {
    "prefix_origins": [
      {
        "origin": 13335,
        "peer_count": 82,
        "prefix": "1.1.1.0/24",
        "rpki_validation": "Valid"
      }
    ],
    "meta": {
      "data_time": "2023-07-24T16:00:00",
      "query_time": "2023-07-24T18:04:55",
      "total_peers": 82
    }
  }
}

Example 2: integrate Radar API into command-line tool

BGPKIT monocle is an open-source command-line application that provides multiple utility functions like searching BGP messages on public archives, network lookup by name, RPKI validation status for a given IP prefix, etc.

By integrating Cloudflare Radar APIs into monocle, users can now quickly lookup routing statistics or prefix-to-origin mapping by running monocle radar stats [QUERY] and monocle radar pfx2as commands.

➜  monocle radar stats   
┌─────────────┬─────────┬──────────┬─────────────────┬───────────────┬─────────────────┐
│ scope       │ origins │ prefixes │ rpki_valid      │ rpki_invalid  │ rpki_unknown    │
├─────────────┼─────────┼──────────┼─────────────────┼───────────────┼─────────────────┤
│ global      │ 81769   │ 1204488  │ 551831 (45.38%) │ 15652 (1.29%) │ 648462 (53.33%) │
├─────────────┼─────────┼──────────┼─────────────────┼───────────────┼─────────────────┤
│ global ipv4 │ 74990   │ 1001973  │ 448170 (44.35%) │ 11879 (1.18%) │ 550540 (54.48%) │
├─────────────┼─────────┼──────────┼─────────────────┼───────────────┼─────────────────┤
│ global ipv6 │ 31971   │ 202515   │ 103661 (50.48%) │ 3773 (1.84%)  │ 97922 (47.68%)  │
└─────────────┴─────────┴──────────┴─────────────────┴───────────────┴─────────────────┘

➜  monocle radar pfx2as 1.1.1.0/24
┌────────────┬─────────┬───────┬───────────────┐
│ prefix     │ origin  │ rpki  │ visibility    │
├────────────┼─────────┼───────┼───────────────┤
│ 1.1.1.0/24 │ as13335 │ valid │ high (98.78%) │
└────────────┴─────────┴───────┴───────────────┘

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (Twitter), cloudflare.social/@radar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

How we detect route leaks and our new Cloudflare Radar route leak service

2022-11-23 Mingwei Zhang

Post Syndicated from Mingwei Zhang original https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/

How we detect route leaks and our new Cloudflare Radar route leak service

Today we’re introducing Cloudflare Radar’s route leak data and API so that anyone can get information about route leaks across the Internet. We’ve built a comprehensive system that takes in data from public sources and Cloudflare’s view of the Internet drawn from our massive global network. The system is now feeding route leak data on Cloudflare Radar’s ASN pages and via the API.

This blog post is in two parts. There’s a discussion of BGP and route leaks followed by details of our route leak detection system and how it feeds Cloudflare Radar.

About BGP and route leaks

Inter-domain routing, i.e., exchanging reachability information among networks, is critical to the wellness and performance of the Internet. The Border Gateway Protocol (BGP) is the de facto routing protocol that exchanges routing information among organizations and networks. At its core, BGP assumes the information being exchanged is genuine and trust-worthy, which unfortunately is no longer a valid assumption on the current Internet. In many cases, networks can make mistakes or intentionally lie about the reachability information and propagate that to the rest of the Internet. Such incidents can cause significant disruptions of the normal operations of the Internet. One type of such disruptive incident is route leaks.

We consider route leaks as the propagation of routing announcements beyond their intended scope (RFC7908). Route leaks can cause significant disruption affecting millions of Internet users, as we have seen in many past notable incidents. For example, in June 2016 a misconfiguration in a small network in Pennsylvania, US (AS396531 – Allegheny Technologies Inc) accidentally leaked a Cloudflare prefix to Verizon, which proceeded to propagate the misconfigured route to the rest of its peers and customers. As a result, the traffic of a large portion of the Internet was squeezed through the limited-capacity links of a small network. The resulting congestion caused most of Cloudflare traffic to and from the affected IP range to be dropped.

A similar incident in November 2018 caused widespread unavailability of Google services when a Nigerian ISP (AS37282 – Mainone) accidentally leaked a large number of Google IP prefixes to its peers and providers violating the valley-free principle.

These incidents illustrate not only that route leaks can be very impactful, but also the snowball effects that misconfigurations in small regional networks can have on the global Internet.

Despite the criticality of detecting and rectifying route leaks promptly, they are often detected only when users start reporting the noticeable effects of the leaks. The challenge with detecting and preventing route leaks stems from the fact that AS business relationships and BGP routing policies are generally undisclosed, and the affected network is often remote to the root of the route leak.

In the past few years, solutions have been proposed to prevent the propagation of leaked routes. Such proposals include RFC9234 and ASPA, which extends the BGP to annotate sessions with the relationship type between the two connected AS networks to enable the detention and prevention of route leaks.

An alternative proposal to implement similar signaling of BGP roles is through the use of BGP Communities; a transitive attribute used to encode metadata in BGP announcements. While these directions are promising in the long term, they are still in very preliminary stages and are not expected to be adopted at scale soon.

At Cloudflare, we have developed a system to detect route leak events automatically and send notifications to multiple channels for visibility. As we continue our efforts to bring more relevant data to the public, we are happy to announce that we are starting an open data API for our route leak detection results today and integrate results to Cloudflare Radar pages.

Route leak definition and types

Before we jump into how we design our systems, we will first do a quick primer on what a route leak is, and why it is important to detect it.

We refer to the published IETF RFC7908 document “Problem Definition and Classification of BGP Route Leaks” to define route leaks.

> A route leak is the propagation of routing announcement(s) beyond their intended scope.

The intended scope is often concretely defined as inter-domain routing policies based on business relationships between Autonomous Systems (ASes). These business relationships are broadly classified into four categories: customers, transit providers, peers and siblings, although more complex arrangements are possible.

In a customer-provider relationship the customer AS has an agreement with another network to transit its traffic to the global routing table. In a peer-to-peer relationship two ASes agree to free bilateral traffic exchange, but only between their own IPs and the IPs of their customers. Finally, ASes that belong under the same administrative entity are considered siblings, and their traffic exchange is often unrestricted. The image below illustrates how the three main relationship types translate to export policies.

By categorizing the types of AS-level relationships and their implications on the propagation of BGP routes, we can define multiple phases of a prefix origination announcements during propagation:

upward: all path segments during this phase are customer to provider
peering: one peer-peer path segment
downward: all path segments during this phase are provider to customer

An AS path that follows valley-free routing principle will have upward, peering, downward phases, all optional but have to be in that order. Here is an example of an AS path that conforms with valley-free routing.

In RFC7908, “Problem Definition and Classification of BGP Route Leaks”, the authors define six types of route leaks, and we refer to these definitions in our system design. Here are illustrations of each of the route leak types.

Type 1: Hairpin Turn with Full Prefix

> A multihomed AS learns a route from one upstream ISP and simply propagates it to another upstream ISP (the turn essentially resembling a hairpin). Neither the prefix nor the AS path in the update is altered.

An AS path that contains a provider-customer and customer-provider segment is considered a type 1 leak. The following example: AS4 → AS5 → AS6 forms a type 1 leak.

Type 1 is the most recognized type of route leaks and is very impactful. In many cases, a customer route is preferable to a peer or a provider route. In this example, AS6 will likely prefer sending traffic via AS5 instead of its other peer or provider routes, causing AS5 to unintentionally become a transit provider. This can significantly affect the performance of the traffic related to the leaked prefix or cause outages if the leaking AS is not provisioned to handle a large influx of traffic.

In June 2015, Telekom Malaysia (AS4788), a regional ISP, leaked over 170,000 routes learned from its providers and peers to its other provider Level3 (AS3549, now Lumin). Level3 accepted the routes and further propagated them to its downstream networks, which in turn caused significant network issues globally.

Type 2: Lateral ISP-ISP-ISP Leak

Type 2 leak is defined as propagating routes obtained from one peer to another peer, creating two or more consecutive peer-to-peer path segments.

Here is an example: AS3 → AS4 → AS5 forms a type 2 leak.

One example of such leaks is more than three very large networks appearing in sequence. Very large networks (such as Verizon and Lumin) do not purchase transit from each other, and having more than three such networks on the path in sequence is often an indication of a route leak.

However, in the real world, it is not unusual to see multiple small peering networks exchanging routes and passing on to each other. Legit business reasons exist for having this type of network path. We are less concerned about this type of route leak as compared to type 1.

Type 3 and 4: Provider routes to peer; peer routes to provider

These two types involve propagating routes from a provider or a peer not to a customer, but to another peer or provider. Here are the illustrations of the two types of leaks:

As in the previously mentioned example, a Nigerian ISP who peers with Google accidentally leaked its route to its provider AS4809, and thus generated a type 4 route leak. Because routes via customers are usually preferred to others, the large provider (AS4809) rerouted its traffic to Google via its customer, i.e. the leaking ASN, overwhelmed the small ISP and took down Google for over one hour.

Route leak summary

So far, we have looked at the four types of route leaks defined in RFC7908. The common thread of the four types of route leaks is that they’re all defined using AS-relationships, i.e., peers, customers, and providers. We summarize the types of leaks by categorizing the AS path propagation based on where the routes are learned from and propagate to. The results are shown in the following table.

Routes from / propagates to	To provider	To peer	To customer
From provider	Type 1	Type 3	Normal
From peer	Type 4	Type 2	Normal
From customer	Normal	Normal	Normal

We can summarize the whole table into one single rule: routes obtained from a non-customer AS can only be propagated to customers.

Note: Type 5 and type 6 route leaks are defined as prefix re-origination and announcing of private prefixes. Type 5 is more closely related to prefix hijackings, which we plan to expand our system to as the next steps, while type 6 leaks are outside the scope of this work. Interested readers can refer to sections 3.5 and 3.6 of RFC7908 for more information.

The Cloudflare Radar route leak system

Now that we know what a route leak is, let’s talk about how we designed our route leak detection system.

From a very high level, we compartmentalize our system into three different components:

Raw data collection module: responsible for gathering BGP data from multiple sources and providing BGP message stream to downstream consumers.
Leak detection module: responsible for determining whether a given AS-level path is a route leak, estimate the confidence level of the assessment, aggregating and providing all external evidence needed for further analysis of the event.
Storage and notification module: responsible for providing access to detected route leak events and sending out notifications to relevant parties. This could also include building a dashboard for easy access and search of the historical events and providing the user interface for high-level analysis of the event.

Data collection module

There are three types of data input we take into consideration:

Historical: BGP archive files for some time range in the past
a. RouteViews and RIPE RIS BGP archives
Semi-real-time: BGP archive files as soon as they become available, with a 10-30 minute delay.
a. RouteViews and RIPE RIS archives with data broker that checks new files periodically (e.g. BGPKIT Broker)
Real-time: true real-time data sources
a. RIPE RIS Live
b. Cloudflare internal BGP sources

For the current version, we use the semi-real-time data source for the detection system, i.e., the BGP updates files from RouteViews and RIPE RIS. For data completeness, we process data from all public collectors from these two projects (a total of 63 collectors and over 2,400 collector peers) and implement a pipeline that’s capable of handling the BGP data processing as the data files become available.

For data files indexing and processing, we deployed an on-premises BGPKIT Broker instance with Kafka feature enabled for message passing, and a custom concurrent MRT data processing pipeline based on BGPKIT Parser Rust SDK. The data collection module processes MRT files and converts results into a BGP messages stream at over two billion BGP messages per day (roughly 30,000 messages per second).

Route leak detection

The route leak detection module works at the level of individual BGP announcements. The detection component investigates one BGP message at a time, and estimates how likely a given BGP message is a result of a route leak event.

We base our detection algorithm mainly on the valley-free model, which we believe can capture most of the notable route leak incidents. As mentioned previously, the key to having low false positives for detecting route leaks with the valley-free model is to have accurate AS-level relationships. While those relationship types are not publicized by every AS, there have been over two decades of research on the inference of the relationship types using publicly observed BGP data.

While state-of-the-art relationship inference algorithms have been shown to be highly accurate, even a small margin of errors can still incur inaccuracies in the detection of route leaks. To alleviate such artifacts, we synthesize multiple data sources for inferring AS-level relationships, including CAIDA/UCSD’s AS relationship data and our in-house built AS relationship dataset. Building on top of the two AS-level relationships, we create a much more granular dataset at the per-prefix and per-peer levels. The improved dataset allows us to answer the question like what is the relationship between AS1 and AS2 with respect to prefix P observed by collector peer X. This eliminates much of the ambiguity for cases where networks have multiple different relationships based on prefixes and geo-locations, and thus helps us reduce the number of false positives in the system. Besides the AS-relationships datasets, we also apply the AS Hegemony dataset from IHR IIJ to further reduce false positives.

Route leak storage and presentation

After processing each BGP message, we store the generated route leak entries in a database for long-term storage and exploration. We also aggregate individual route leak BGP announcements and group relevant leaks from the same leak ASN within a short period together into route-leak events. The route leak events will then be available for consumption by different downstream applications like web UIs, an API, or alerts.

Route leaks on Cloudflare Radar

At Cloudflare, we aim to help build a better Internet, and that includes sharing our efforts on monitoring and securing Internet routing. Today, we are releasing our route leak detection system as public beta.

Starting today, users going to the Cloudflare Radar ASN pages will now find the list of route leaks that affect that AS. We consider that an AS is being affected when the leaker AS is one hop away from it in any direction, before or after.

The Cloudflare Radar ASN page is directly accessible via https://radar.cloudflare.com/as{ASN}. For example, one can navigate to https://radar.cloudflare.com/as174 to view the overview page for Cogent AS174. ASN pages now show a dedicated card for route leaks detected relevant to the current ASN within the selected time range.

Users can also start using our public data API to lookup route leak events with regards to any given ASN. Our API supports filtering route leak results by time ranges, and ASes involved. Here is a screenshot of the route leak events API documentation page on the newly updated API docs site.

More to come on routing security

There is a lot more we are planning to do with route-leak detection. More features like a global view page, route leak notifications, more advanced APIs, custom automations scripts, and historical archive datasets will begin to ship on Cloudflare Radar over time. Your feedback and suggestions are also very important for us to continue improving on our detection results and serve better data to the public.

Furthermore, we will continue to expand our work on other important topics of Internet routing security, including global BGP hijack detection (not limited to our customer networks), RPKI validation monitoring, open-sourcing tools and architecture designs, and centralized routing security web gateway. Our goal is to provide the best data and tools for routing security to the communities so that we can build a better and more secure Internet together.

In the meantime, we opened a Radar room on our Developers Discord Server. Feel free to join and talk to us; the team is eager to receive feedback and answer questions.

Visit Cloudflare Radar for more Internet insights. You can also follow us on Twitter for more Radar updates.