Post Syndicated from Mingwei Zhang original http://blog.cloudflare.com/bgp-highjack-detection/
Border Gateway Protocol (BGP) is the de facto inter-domain routing protocol used on the Internet. It enables networks and organizations to exchange reachability information for blocks of IP addresses (IP prefixes) among each other, thus allowing routers across the Internet to forward traffic to its destination. BGP was designed with the assumption that networks do not intentionally propagate falsified information, but unfortunately that’s not a valid assumption on today’s Internet.
Malicious actors on the Internet who control BGP routers can perform BGP hijacks by falsely announcing ownership of groups of IP addresses that they do not own, control, or route to. By doing so, an attacker is able to redirect traffic destined for the victim network to itself, and monitor and intercept its traffic. A BGP hijack is much like if someone were to change out all the signs on a stretch of freeway and reroute automobile traffic onto incorrect exits.
At Cloudflare, we have long been monitoring suspicious BGP anomalies internally. With our recent efforts, we are bringing BGP origin hijack detection to the Cloudflare Radar platform, sharing our detection results with the public. In this blog post, we will explain how we built our detection system and how people can use Radar and its APIs to integrate our data into their own workflows.
What is BGP origin hijacking?
Services and devices on the Internet locate each other using IP addresses. Blocks of IP addresses are called an IP prefix (or just prefix for short), and multiple prefixes from the same organization are aggregated into an autonomous system (AS).
Using the BGP protocol, ASes announce which routes can be imported or exported to other ASes and routers from their routing tables. This is called the AS routing policy. Without this routing information, operating the Internet on a large scale would quickly become impractical: data packets would get lost or take too long to reach their destinations.
During a BGP origin hijack, an attacker creates fake announcements for a targeted prefix, falsely identifying an autonomous systems (AS) under their control as the origin of the prefix.
In the following graph, we show an example where
AS 4 announces the prefix
P that was previously originated by
AS 1. The receiving parties, i.e.
AS 2 and
AS 3, accept the hijacked routes and forward traffic toward prefix
AS 4 instead.
As you can see, the normal and hijacked traffic flows back in the opposite direction of the BGP announcements we receive.
If successful, this type of attack will result in the dissemination of the falsified prefix origin announcement throughout the Internet, causing network traffic previously intended for the victim network to be redirected to the AS controlled by the attacker. As an example of a famous BGP hijack attack, in 2018 someone was able to convince parts of the Internet to reroute traffic for AWS to malicious servers where they used DNS to redirect MyEtherWallet.com, a popular crypto wallet, to a hacked page.
Prevention mechanisms and why they’re not perfect (yet)
The key difficulty in preventing BGP origin hijacks is that the BGP protocol itself does not provide a mechanism to validate the announcement content. In other words, the original BGP protocol does not provide any authentication or ownership safeguards; any route can be originated and announced by any random network, independent of its rights to announce that route.
To address this problem, operators and researchers have proposed the Resource Public Key Infrastructure (RPKI) to store and validate prefix-to-origin mapping information. With RPKI, operators can prove the ownership of their network resources and create ROAs, short for Route Origin Authorisations, cryptographically signed objects that define which Autonomous System (AS) is authorized to originate a specific prefix.
Cloudflare committed to support RPKI since the early days of the RFC. With RPKI, IP prefix owners can store and share the ownership information securely, and other operators can validate BGP announcements by checking the prefix origin to the information stored on RPKI. Any hijacking attempt to announce an IP prefix with an incorrect origin AS will result in invalid validation results, and such invalid BGP messages will be discarded. This validation process is referred to as route origin validation (ROV).
In order to further advocate for RPKI deployment and filtering of RPKI invalid announcements, Cloudflare has been providing a RPKI test service, Is BGP Safe Yet?, allowing users to test whether their ISP filters RPKI invalid announcements. We also provide rich information with regard to the RPKI status of individual prefixes and ASes at https://rpki.cloudflare.com/.
However, the effectiveness of RPKI on preventing BGP origin hijacks depends on two factors:
- The ratio of prefix owners register their prefixes on RPKI;
- The ratio of networks performing route origin validation.
Unfortunately, neither ratio is at a satisfactory level yet. As of today, July 27, 2023, only about 45% of the IP prefixes routable on the Internet are covered by some ROA on RPKI. The remaining prefixes are highly vulnerable to BGP origin hijacks. Even for the 45% prefix that are covered by some ROA, origin hijack attempts can still affect them due to the low ratio of networks that perform route origin validation (ROV). Based on our recent study, only 6.5% of the Internet users are protected by ROV from BGP origin hijacks.
Despite the benefits of RPKI and RPKI ROAs, their effectiveness in preventing BGP origin hijacks is limited by the slow adoption and deployment of these technologies. Until we achieve a high rate of RPKI ROA registration and RPKI invalid filtering, BGP origin hijacks will continue to pose a significant threat to the daily operations of the Internet and the security of everyone connected to it. Therefore, it’s also essential to prioritize developing and deploying BGP monitoring and detection tools to enhance the security and stability of the Internet's routing infrastructure.
Design of Cloudflare’s BGP hijack detection system
Our system comprises multiple data sources and three distinct modules that work together to detect and analyze potential BGP hijack events: prefix origin change detection, hijack detection and the storage and notification module.
The Prefix Origin Change Detection module provides the data, the Hijack Detection module analyzes the data, and the Alerts Storage and Delivery module stores and provides access to the results. Together, these modules work in tandem to provide a comprehensive system for detecting and analyzing potential BGP hijack events.
Prefix origin change detection module
At its core, the BGP protocol involves:
- Exchanging prefix reachability (routing) information;
- Deciding where to forward traffic based on the reachability information received.
The reachability change information is encoded in BGP update messages while the routing decision results are encoded as a route information base (RIB) on the routers, also known as the routing table.
In our origin hijack detection system, we focus on investigating BGP update messages that contain changes to the origin ASes of any IP prefixes. There are two types of BGP update messages that could indicate prefix origin changes: announcements and withdrawals.
Announcements include an AS-level path toward one or more prefixes. The path tells the receiving parties through which sequence of networks (ASes) one can reach the corresponding prefixes. The last hop of an AS path is the origin AS. In the following diagram, AS 1 is the origin AS of the announced path.
Withdrawals, on the other hand, simply inform the receiving parties that the prefixes are no longer reachable.
Both types of messages are stateless. They inform us of the current route changes, but provide no information about the previous states. As a result, detecting origin changes is not as straightforward as one may think. Our system needs to keep track of historical BGP updates and build some sort of state over time so that we can verify if a BGP update contains origin changes.
We didn't want to deal with a complex system like a database to manage the state of all the prefixes we see resulting from all the BGP updates we get from them. Fortunately, there's this thing called prefix trie in computer science that you can use to store and look up string-indexed data structures, which is ideal for our use case. We ended up developing a fast Rust-based custom IP prefix trie that we use to hold the relevant information such as the origin ASN and the AS path for each IP prefix and allows information to be updated based on BGP announcements and withdrawals.
The example figure below shows an example of the AS path information for prefix
192.0.2.0/24 stored on a prefix trie. When updating the information on the prefix trie, if we see a change of origin ASN for any given prefix, we record the BGP message as well as the change and create an
Origin Change Signal.
The prefix origin changes detection module collects and processes live-stream and historical BGP data from various sources. For live streams, our system applies a thin layer of data processing to translate BGP messages into our internal data structure. At the same time, for historical archives, we use a dedicated deployment of the BGPKIT broker and parser to convert MRT files from RouteViews and RIPE RIS into BGP message streams as they become available.
After the data is collected, consolidated and normalized it then creates, maintains and destroys the prefix tries so that we can know what changed from previous BGP announcements from the same peers. Based on these calculations we then send enriched messages downstream to be analyzed.
Hijack detection module
Determining whether BGP messages suggest a hijack is a complex task, and no common scoring mechanism can be used to provide a definitive answer. Fortunately, there are several types of data sources that can collectively provide a relatively good idea of whether a BGP announcement is legitimate or not. These data sources can be categorized into two types: inter-AS relationships and prefix-origin binding.
The inter-AS relationship datasets include AS2org and AS2rel datasets from CAIDA/UCSD, AS2rel datasets from BGPKIT, AS organization datasets from PeeringDB, and per-prefix AS relationship data built at Cloudflare. These datasets provide information about the relationship between autonomous systems, such as whether they are upstream or downstream from one another, or if the origins of any change signal belong to the same organization.
Prefix-to-origin binding datasets include live RPKI validated ROA payload (VRP) from the Cloudflare RPKI portal, daily Internet Routing Registry (IRR) dumps curated and cleaned up by MANRS, and prefix and AS bogon lists (private and reserved addresses defined by RFC 1918, RFC 5735, and RFC 6598). These datasets provide information about the ownership of prefixes and the ASes that are authorized to originate them.
By combining all these data sources, it is possible to collect information about each BGP announcement and answer questions programmatically. For this, we have a scoring function that takes all the evidence gathered for a specific BGP event as the input and runs that data through a sequence of checks. Each condition returns a neutral, positive, or negative weight that keeps adding to the final score. The higher the score, the more likely it is that the event is a hijack attempt.
The following diagram illustrates this sequence of checks:
As you can see, for each event, several checks are involved that help calculate the final score: RPKI, Internet Routing Registry (IRR), bogon prefixes and ASNs lists, AS relationships, and AS path.
Our guiding principles are: if the newly announced origins are RPKI or IRR invalid, it’s more likely that it’s a hijack, but if the old origins are also invalid, then it’s less likely. We discard events about private and reserved ASes and prefixes. If the new and old origins have a direct business relationship, then it’s less likely that it’s a hijack. If the new AS path indicates that the traffic still goes through the old origin, then it’s probably not a hijack.
Signals that are deemed legitimate are discarded, while signals with a high enough confidence score are flagged as potential hijacks and sent downstream for further analysis.
It's important to reiterate that the decision is not binary but a score. There will be situations where we find false negatives or false positives. The advantage of this framework is that we can easily monitor the results, learn from additional datasets and conduct the occasional manual inspection, which allows us to adjust the weights, add new conditions and continue improving the score precision over time.
Aggregating BGP hijack events
Our BGP hijack detection system provides fast response time and requires minimal resources by operating on a per-message basis.
However, when a hijack is happening, the number of hijack signals can be overwhelming for operators to manage. To address this issue, we designed a method to aggregate individual hijack messages into BGP hijack events, thereby reducing the number of alerts triggered.
An event aggregates BGP messages that are coming from the same hijacker related to prefixes from the same victim. The start date is the same as the date of the first suspicious signal. To calculate the end of an event we look for one of the following conditions:
- A BGP withdrawn message for the hijacked prefix: regardless of who sends the withdrawal, the route towards the prefix is no longer via the hijacker, and thus this hijack message is considered finished.
- A new BGP announcement message with the previous (legitimate) network as the origin: this indicates that the route towards the prefix is reverted to the state before the hijack, and the hijack is therefore considered finished.
If all BGP messages for an event have been withdrawn or reverted, and there are no more new suspicious origin changes from the hijacker ASN for six hours, we mark the event as finished and set the end date.
Hijack events can capture both small-scale and large-scale attacks. Alerts are then based on these aggregated events, not individual messages, making it easier for operators to manage and respond appropriately.
Alerts, Storage and Notifications module
This module provides access to detected BGP hijack events and sends out notifications to relevant parties. It handles storage of all detected events and provides a user interface for easy access and search of historical events. It also generates notifications and delivers them to the relevant parties, such as network administrators or security analysts, when a potential BGP hijack event is detected. Additionally, this module can build dashboards to display high-level information and visualizations of detected events to facilitate further analysis.
Lightweight and portable implementation
Our BGP hijack detection system is implemented as a Rust-based command line application that is lightweight and portable. The whole detection pipeline runs off a single binary application that connects to a PostgreSQL database and essentially runs a complete self-contained BGP data pipeline. And if you are wondering, yes, the full system, including the database, can run well on a laptop.
The runtime cost mainly comes from maintaining the in-memory prefix tries for each full-feed router, each costing roughly 200 MB RAM. For the beta deployment, we use about 170 full-feed peers and the whole system runs well on a single 32 GB node with 12 threads.
Using the BGP Hijack Detection
Under the “Security & Attacks” section of the Cloudflare Radar for both global and ASN view, we now display the BGP origin hijacks table. In this table, we show a list of detected potential BGP hijack events with the following information:
- The detected and expected origin ASes;
- The start time and event duration;
- The number of BGP messages and route collectors peers that saw the event;
- The announced prefixes;
- Evidence tags and confidence level (on the likelihood of the event being a hijack).
For each BGP event, our system generates relevant evidence tags to indicate why the event is considered suspicious or not. These tags are used to inform the confidence score assigned to each event. Red tags indicate evidence that increases the likelihood of a hijack event, while green tags indicate the opposite.
For example, the red tag "RPKI INVALID" indicates an event is likely a hijack, as it suggests that the RPKI validation failed for the announcement. Conversely, the tag "SIBLING ORIGINS" is a green tag that indicates the detected and expected origins belong to the same organization, making it less likely for the event to be a hijack.
Users can now access the BGP hijacks table in the following ways:
- Global view under Security & Attacks page without location filters. This view lists the most recent 150 detected BGP hijack events globally.
- When filtered by a specific ASN, the table will appear on Overview, Traffic, and Traffic & Attacks tabs.
Cloudflare Radar API
We also provide programmable access to the BGP hijack detection results via the Cloudflare Radar API, which is freely available under CC BY-NC 4.0 license. The API documentation is available at the Cloudflare API portal.
curl command fetches the most recent 10 BGP hijack events relevant to AS64512.
curl -X GET "https://api.cloudflare.com/client/v4/radar/bgp/hijacks/events?invlovedAsn=64512&format=json&per_page=10" \ -H "Authorization: Bearer <API_TOKEN>"
Users can further filter events with high confidence by specifying the
minConfidence parameter with a 0-10 value, where a higher value indicates higher confidence of the events being a hijack. The following example expands on the previous example by adding the minimum confidence score of 8 to the query:
curl -X GET "https://api.cloudflare.com/client/v4/radar/bgp/hijacks/events?invlovedAsn=64512&format=json&per_page=10&minConfidence=8" \ -H "Authorization: Bearer <API_TOKEN>"
Additionally, users can also quickly build custom hijack alerters using a Cloudflare Workers + KV combination. We have a full tutorial on building alerters that send out webhook-based messages or emails (with Email Routing) available on the Cloudflare Radar documentation site.
More routing security on Cloudflare Radar
As we continue improving Cloudflare Radar, we are planning to introduce additional Internet routing and security data. For example, Radar will soon get a dedicated routing section to provide digestible BGP information for given networks or regions, such as distinct routable prefixes, RPKI valid/invalid/unknown routes, distribution of IPv4/IPv6 prefixes, etc. Our goal is to provide the best data and tools for routing security to the community, so that we can build a better and more secure Internet together.
Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (Twitter), cloudflare.social/@radar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.