Massive route leak impacts major parts of the Internet, including Cloudflare
Today at 10:30UTC, the Internet had a small heart attack. A small company in Northern Pennsylvania became a preferred path of many Internet routes through Verizon (AS701), a major Internet transit provider. This was the equivalent of Waze routing an entire freeway down a neighborhood street — resulting in many websites on Cloudflare, and many other providers, to be unavailable from large parts of the Internet. This should never have happened because Verizon should never have forwarded those routes to the rest of the Internet. To understand why, read on.
We have blogged about these unfortunate events in the past, as they are not uncommon. This time, the damage was seen worldwide. What exacerbated the problem today was the involvement of a “BGP Optimizer” product from Noction. This product has a feature that splits up received IP prefixes into smaller, contributing parts (called more-specifics). For example, our own IPv4 route 126.96.36.199/20 was turned into 188.8.131.52/21 and 184.108.40.206/21. It’s as if the road sign directing traffic to “Pennsylvania” was replaced by two road signs, one for “Pittsburgh, PA” and one for “Philadelphia, PA”. By splitting these major IP blocks into smaller parts, a network has a mechanism to steer traffic within their network but that split should never have been announced to the world at large. When it was it caused today’s outage.
To explain what happened next, here’s a quick summary of how the underlying “map” of the Internet works. “Internet” literally means a network of networks and it is made up of networks called Autonomous Systems (AS), and each of these networks has a unique identifier, its AS number. All of these networks are interconnected using a protocol called Border Gateway Protocol (BGP). BGP joins these networks together and build the Internet “map” that enables traffic to travel from, say, your ISP to a popular website on the other side of the globe.
Using BGP, networks exchange route information: how to get to them from wherever you are. These routes can either be specific, similar to finding a specific city on your GPS, or very general, like pointing your GPS to a state. This is where things went wrong today.
An Internet Service Provider in Pennsylvania (AS33154 – DQE Communications) was using a BGP optimizer in their network, which meant there were a lot of more specific routes in their network. Specific routes override more general routes (in the Waze analogy a route to, say, Buckingham Palace is more specific than a route to London).
DQE announced these specific routes to their customer (AS396531 – Allegheny Technologies Inc). All of this routing information was then sent to their other transit provider (AS701 – Verizon), who proceeded to tell the entire Internet about these “better” routes. These routes were supposedly “better” because they were more granular, more specific.
The leak should have stopped at Verizon. However, against numerous best practices outlined below, Verizon’s lack of filtering turned this into a major incident that affected many Internet services such as Amazon, Fastly, Linode and Cloudflare.
What this means is that suddenly Verizon, Allegheny, and DQE had to deal with a stampede of Internet users trying to access those services through their network. None of these networks were suitably equipped to deal with this drastic increase in traffic, causing disruption in service. Even if they had sufficient capacity DQE, Allegheny and Verizon were not allowed to say they had the best route to Cloudflare, Amazon, Fastly, Linode, etc…
During the incident, we observed a loss, at the worst of the incident, of about 15% of our global traffic.
How could this leak have been prevented?
There are multiple ways this leak could have been avoided:
A BGP session can be configured with a hard limit of prefixes to be received. This means a router can decide to shut down a session if the number of prefixes goes above the threshold. Had Verizon had such a prefix limit in place, this would not have occurred. It is a best practice to have such limits in place. It doesn’t cost a provider like Verizon anything to have such limits in place. And there’s no good reason, other than sloppiness or laziness, that they wouldn’t have such limits in place
A different way network operators can prevent leaks like this one is by implementing IRR-based filtering. IRR is the Internet Routing Registry, and networks can add entries to these distributed databases. Other network operators can then use these IRR records to generate specific prefix lists for the BGP sessions with their peers. If IRR filtering had been used, none of the networks involved would have accepted the faulty more-specifics. What’s quite shocking is that it appears that Verizon didn’t implement any of this filtering in their BGP session with Allegheny Technologies, even though IRR filtering has been around (and well documented) for over 24 years. IRR filtering would not have increased Verizon’s costs or limited their service in any way. Again, the only explanation we can conceive of why it wasn’t in place is sloppiness or laziness.
The RPKI framework that we implemented and deployed globally last year is designed to prevent this type of leak. It enables filtering on origin network and prefix size. The prefixes Cloudflare announces are signed for a maximum size of 20. RPKI then indicates any more-specific prefix should not be accepted, no matter what the path is. In order for this mechanism to take action, a network needs to enable BGP Origin Validation. Many providers like AT&T have already enabled it successfully in their network.
If Verizon had used RPKI, they would have seen that the advertised routes were not valid, and the routes could have been automatically dropped by the router.
Cloudflare encourages all network operators to deploy RPKI now!
The network team at Cloudflare reached out to the networks involved, AS33154 (DQE Communications) and AS701 (Verizon). We had difficulties reaching either network, this may have been due to the time of the incident as it was still early on the East Coast of the US when the route leak started.
One of our network engineers made contact with DQE Communications quickly and after a little delay they were able to put us in contact with someone who could fix the problem. DQE worked with us on the phone to stop advertising these “optimized” routes to Allegheny Technologies Inc. We’re grateful for their help. Once this was done, the Internet stabilized, and things went back to normal.
It is unfortunate that while we tried both e-mail and phone calls to reach out to Verizon, at the time of writing this article (over 8 hours after the incident), we have not heard back from them, nor are we aware of them taking action to resolve the issue.
At Cloudflare, we wish that events like this never take place, but unfortunately the current state of the Internet does very little to prevent incidents such as this one from occurring. It’s time for the industry to adopt better routing security through systems like RPKI. We hope that major providers will follow the lead of Cloudflare, Amazon, and AT&T and start validating routes. And, in particular, we’re looking at you Verizon — and still waiting on your reply.
Despite this being caused by events outside our control, we’re sorry for the disruption. Our team cares deeply about our service and we had engineers in the US, UK, Australia, and Singapore online minutes after this problem was identified.
Trust on the Internet is underpinned by the Public Key Infrastructure (PKI). PKI grants servers the ability to securely serve websites by issuing digital certificates, providing the foundation for encrypted and authentic communication.
Certificates make HTTPS encryption possible by using the public key in the certificate to verify server identity. HTTPS is especially important for websites that transmit sensitive data, such as banking credentials or private messages. Thankfully, modern browsers, such as Google Chrome, flag websites not secured using HTTPS by marking them “Not secure,” allowing users to be more security conscious of the websites they visit.
Now that we know what certificates are used for, let’s talk about where they come from.
Certificate Authorities (CAs) are the institutions responsible for issuing certificates.
When issuing a certificate for any given domain, they use Domain Control Validation (DCV) to verify that the entity requesting a certificate for the domain is the legitimate owner of the domain. With DCV the domain owner:
creates a DNS resource record for a domain;
uploads a document to the web server located at that domain; OR
proves ownership of the domain’s administrative email account.
The DCV process prevents adversaries from obtaining private-key and certificate pairs for domains not owned by the requestor.
Preventing adversaries from acquiring this pair is critical: if an incorrectly issued certificate and private-key pair wind up in an adversary’s hands, they could pose as the victim’s domain and serve sensitive HTTPS traffic. This violates our existing trust of the Internet, and compromises private data on a potentially massive scale.
For example, an adversary that tricks a CA into mis-issuing a certificate for gmail.com could then perform TLS handshakes while pretending to be Google, and exfiltrate cookies and login information to gain access to the victim’s Gmail account. The risks of certificate mis-issuance are clearly severe.
Domain Control Validation
To prevent attacks like this, CAs only issue a certificate after performing DCV. One way of validating domain ownership is through HTTP validation, done by uploading a text file to a specific HTTP endpoint on the webserver they want to secure. Another DCV method is done using email verification, where an email with a validation code link is sent to the administrative contact for the domain.
Suppose Alice buys the domain name aliceswonderland.com and wants to get a dedicated certificate for this domain. Alice chooses to use Let’s Encrypt as their certificate authority. First, Alice must generate their own private key and create a certificate signing request (CSR). She sends the CSR to Let’s Encrypt, but the CA won’t issue a certificate for that CSR and private key until they know Alice owns aliceswonderland.com. Alice can then choose to prove that she owns this domain through HTTP validation.
When Let’s Encrypt performs DCV over HTTP, they require Alice to place a randomly named file in the /.well-known/acme-challenge path for her website. The CA must retrieve the text file by sending an HTTP GET request to http://aliceswonderland.com/.well-known/acme-challenge/<random_filename>. An expected value must be present on this endpoint for DCV to succeed.
For HTTP validation, Alice would upload a file to http://aliceswonderland.com/.well-known/acme-challenge/YnV0dHNz
where the body contains:
HTTP/1.1 200 OK
The CA instructs them to use the Base64 token YnV0dHNz. TEST_CLIENT_KEY in an account-linked key that only the certificate requestor and the CA know. The CA uses this field combination to verify that the certificate requestor actually owns the domain. Afterwards, Alice can get her certificate for her website!
Another way users can validate domain ownership is to add a DNS TXT record containing a verification string or token from the CA to their domain’s resource records. For example, here’s a domain for an enterprise validating itself towards Google:
$ dig TXT aliceswonderland.com
aliceswonderland.com. 28 IN TXT "google-site-verification=COanvvo4CIfihirYW6C0jGMUt2zogbE_lC6YBsfvV-U"
Here, Alice chooses to create a TXT DNS resource record with a specific token value. A Google CA can verify the presence of this token to validate that Alice actually owns her website.
Types of BGP Hijacking Attacks
Certificate issuance is required for servers to securely communicate with clients. This is why it’s so important that the process responsible for issuing certificates is also secure. Unfortunately, this is not always the case.
Researchers at Princeton University recently discovered that common DCV methods are vulnerable to attacks executed by network-level adversaries. If Border Gateway Protocol (BGP) is the “postal service” of the Internet responsible for delivering data through the most efficient routes, then Autonomous Systems (AS) are individual post office branches that represent an Internet network run by a single organization. Sometimes network-level adversaries advertise false routes over BGP to steal traffic, especially if that traffic contains something important, like a domain’s certificate.
Bamboozling Certificate Authorities with BGP highlights five types of attacks that can be orchestrated during the DCV process to obtain a certificate for a domain the adversary does not own. After implementing these attacks, the authors were able to (ethically) obtain certificates for domains they did not own from the top five CAs: Let’s Encrypt, GoDaddy, Comodo, Symantec, and GlobalSign. But how did they do it?
Attacking the Domain Control Validation Process
There are two main approaches to attacking the DCV process with BGP hijacking:
These attacks create a vulnerability when an adversary sends a certificate signing request for a victim’s domain to a CA. When the CA verifies the network resources using an HTTP GET request (as discussed earlier), the adversary then uses BGP attacks to hijack traffic to the victim’s domain in a way that the CA’s request is rerouted to the adversary and not the domain owner. To understand how these attacks are conducted, we first need to do a little bit of math.
Every device on the Internet uses an IP (Internet Protocol) address as a numerical identifier. IPv4 addresses contain 32 bits and follow a slash notation to indicate the size of the prefix. So, in the network address 220.127.116.11/24, “/24” refers to how many bits the network contains. This means that there are 8 bits left that contain the host addresses, for a total of 256 host addresses. The smaller the prefix number, the more host addresses remain in the network. With this knowledge, let’s jump into the attacks!
Attack one: Sub-Prefix Attack
When BGP announces a route, the router always prefers to follow the more specific route. So if 18.104.22.168/8 and 22.214.171.124/24 are advertised, the router will use the latter as it is the more specific prefix. This becomes a problem when an adversary makes a BGP announcement to a specific IP address while using the victim’s domain IP address. Let’s say the IP address for our victim, leagueofentropy.com, is 126.96.36.199/8. If an adversary announces the prefix 188.8.131.52/24, then they will capture the victim’s traffic, launching a sub-prefix hijack attack.
For example, in an attack during April 2018, routes were announced with the more specific /24 vs. the existing /23. In the diagram below, /23 is Texas and /24 is the more specific Austin, Texas. The new (but nefarious) routes overrode the existing routes for portions of the Internet. The attacker then ran a nefarious DNS server on the normal IP addresses with DNS records pointing at some new nefarious web server instead of the existing server. This attracted the traffic destined for the victim’s domain within the area the nefarious routes were being propagated. The reason this attack was successful was because a more specific prefix is always preferred by the receiving routers.
Attack two: Equally-Specific-Prefix Attack
In the last attack, the adversary was able to hijack traffic by offering a more specific announcement, but what if the victim’s prefix is /24 and a sub-prefix attack is not viable? In this case, an attacker would launch an equally-specific-prefix hijack, where the attacker announces the same prefix as the victim. This means that the AS chooses the preferred route between the victim and the adversary’s announcements based on properties like path length. This attack only ever intercepts a portion of the traffic.
There are more advanced attacks that are covered in more depth in the paper. They are fundamentally similar attacks but are more stealthy.
Once an attacker has successfully obtained a bogus certificate for a domain that they do not own, they can perform a convincing attack where they pose as the victim’s domain and are able to decrypt and intercept the victim’s TLS traffic. The ability to decrypt the TLS traffic allows the adversary to completely Monster-in-the-Middle (MITM) encrypted TLS traffic and reroute Internet traffic destined for the victim’s domain to the adversary. To increase the stealthiness of the attack, the adversary will continue to forward traffic through the victim’s domain to perform the attack in an undetected manner.
Another way an adversary can gain control of a domain is by spoofing DNS traffic by using a source IP address that belongs to a DNS nameserver. Because anyone can modify their packets’ outbound IP addresses, an adversary can fake the IP address of any DNS nameserver involved in resolving the victim’s domain, and impersonate a nameserver when responding to a CA.
This attack is more sophisticated than simply spamming a CA with falsified DNS responses. Because each DNS query has its own randomized query identifiers and source port, a fake DNS response must match the DNS query’s identifiers to be convincing. Because these query identifiers are random, making a spoofed response with the correct identifiers is extremely difficult.
Adversaries can fragment User Datagram Protocol (UDP) DNS packets so that identifying DNS response information (like the random DNS query identifier) is delivered in one packet, while the actual answer section follows in another packet. This way, the adversary spoofs the DNS response to a legitimate DNS query.
Say an adversary wants to get a mis-issued certificate for victim.com by forcing packet fragmentation and spoofing DNS validation. The adversary sends a DNS nameserver for victim.com a DNS packet with a small Maximum Transmission Unit, or maximum byte size. This gets the nameserver to start fragmenting DNS responses. When the CA sends a DNS query to a nameserver for victim.com asking for victim.com’s TXT records, the nameserver will fragment the response into the two packets described above: the first contains the query ID and source port, which the adversary cannot spoof, and the second one contains the answer section, which the adversary can spoof. The adversary can continually send a spoofed answer to the CA throughout the DNS validation process, in the hopes of sliding their spoofed answer in before the CA receives the real answer from the nameserver.
In doing so, the answer section of a DNS response (the important part!) can be falsified, and an adversary can trick a CA into mis-issuing a certificate.
At first glance, one could think a Certificate Transparency log could expose a mis-issued certificate and allow a CA to quickly revoke it. CT logs, however, can take up to 24 hours to include newly issued certificates, and certificate revocation can be inconsistently followed among different browsers. We need a solution that allows CAs to proactively prevent this attacks, not retroactively address them.
We’re excited to announce that Cloudflare provides CAs a free API to leverage our global network to perform DCV from multiple vantage points around the world. This API bolsters the DCV process against BGP hijacking and off-path DNS attacks.
Given that Cloudflare runs 175+ datacenters around the world, we are in a unique position to perform DCV from multiple vantage points. Each datacenter has a unique path to DNS nameservers or HTTP endpoints, which means that successful hijacking of a BGP route can only affect a subset of DCV requests, further hampering BGP hijacks. And since we use RPKI, we actually sign and verify BGP routes.
This DCV checker additionally protects CAs against off-path, DNS spoofing attacks. An additional feature that we built into the service that helps protect against off-path attackers is DNS query source IP randomization. By making the source IP unpredictable to the attacker, it becomes more challenging to spoof the second fragment of the forged DNS response to the DCV validation agent.
By comparing multiple DCV results collected over multiple paths, our DCV API makes it virtually impossible for an adversary to mislead a CA into thinking they own a domain when they actually don’t. CAs can use our tool to ensure that they only issue certificates to rightful domain owners.
Our multipath DCV checker consists of two services:
DCV agents responsible for performing DCV out of a specific datacenter, and
a DCV orchestrator that handles multipath DCV requests from CAs and dispatches them to a subset of DCV agents.
When a CA wants to ensure that DCV occurred without being intercepted, it can send a request to our API specifying the type of DCV to perform and its parameters.
The DCV orchestrator then forwards each request to a random subset of over 20 DCV agents in different datacenters. Each DCV agent performs the DCV request and forwards the result to the DCV orchestrator, which aggregates what each agent observed and returns it to the CA.
This approach can also be generalized to performing multipath queries over DNS records, like Certificate Authority Authorization (CAA) records. CAA records authorize CAs to issue certificates for a domain, so spoofing them to trick unauthorized CAs into issuing certificates is another attack vector that multipath observation prevents.
As we were developing our multipath checker, we were in contact with the Princeton research group that introduced the proof-of-concept (PoC) of certificate mis-issuance through BGP hijacking attacks. Prateek Mittal, coauthor of the Bamboozling Certificate Authorities with BGP paper, wrote:
“Our analysis shows that domain validation from multiple vantage points significantly mitigates the impact of localized BGP attacks. We recommend that all certificate authorities adopt this approach to enhance web security. A particularly attractive feature of Cloudflare’s implementation of this defense is that Cloudflare has access to a vast number of vantage points on the Internet, which significantly enhances the robustness of domain control validation.”
Our DCV checker follows our belief that trust on the Internet must be distributed, and vetted through third-party analysis (like that provided by Cloudflare) to ensure consistency and security. This tool joins our pre-existing Certificate Transparency monitor as a set of services CAs are welcome to use in improving the accountability of certificate issuance.
An Opportunity to Dogfood
Building our multipath DCV checker also allowed us to dogfood multiple Cloudflare products.
The DCV orchestrator as a simple fetcher and aggregator was a fantastic candidate for Cloudflare Workers. We implemented the orchestrator in TypeScript using this post as a guide, and created a typed, reliable orchestrator service that was easy to deploy and iterate on. Hooray that we don’t have to maintain our own dcv-orchestrator server!
We use Argo Tunnel to allow Cloudflare Workers to contact DCV agents. Argo Tunnel allows us to easily and securely expose our DCV agents to the Workers environment. Since Cloudflare has approximately 175 datacenters running DCV agents, we expose many services through Argo Tunnel, and have had the opportunity to load test Argo Tunnel as a power user with a wide variety of origins. Argo Tunnel readily handled this influx of new origins!
Getting Access to the Multipath DCV Checker
If you and/or your organization are interested in trying our DCV checker, email [email protected] and let us know! We’d love to hear more about how multipath querying and validation bolsters the security of your certificate issuance.
As a new class of BGP and IP spoofing attacks threaten to undermine PKI fundamentals, it’s important that website owners advocate for multipath validation when they are issued certificates. We encourage all CAs to use multipath validation, whether it is Cloudflare’s or their own. Jacob Hoffman-Andrews, Tech Lead, Let’s Encrypt, wrote:
“BGP hijacking is one of the big challenges the web PKI still needs to solve, and we think multipath validation can be part of the solution. We’re testing out our own implementation and we encourage other CAs to pursue multipath as well”
Hopefully in the future, website owners will look at multipath validation support when selecting a CA.
The Internet is an extraordinarily complex and evolving ecosystem. Its constituent protocols range from the ancient and archaic (hello FTP) to the modern and sleek (meet WireGuard), with a fair bit of everything in between. This evolution is ongoing, and as one of the most connected networks on the Internet, Cloudflare has a duty to be a good steward of this ecosystem. We take this responsibility to heart: Cloudflare’s mission is to help build a better Internet. In this spirit, we are very proud to announce Crypto Week 2019.
Every day this week we’ll announce a new project or service that uses modern cryptography to build a more secure, trustworthy Internet. Everything we release this week will be free and immediately useful. This blog is a fun exploration of the themes of the week.
Monday: Coming Soon
Tuesday: Coming Soon
Wednesday: Coming Soon
Thursday: Coming Soon
Friday: Coming Soon
The Internet of the Future
Many pieces of the Internet in use today were designed in a different era with different assumptions. The Internet’s success is based on strong foundations that support constant reassessment and improvement. Sometimes these improvements require deploying new protocols.
Performing an upgrade on a system as large and decentralized as the Internet can’t be done by decree;
There are too many economic, cultural, political, and technological factors at play.
Changes must be compatible with existing systems and protocols to even be considered for adoption.
To gain traction, new protocols must provide tangible improvements for users. Nobody wants to install an update that doesn’t improve their experience!
The last time the Internet had a complete reboot and upgrade was during TCP/IP flag dayin 1983. Back then, the Internet (called ARPANET) had fewer than ten thousand hosts! To have an Internet-wide flag day today to switch over to a core new protocol is inconceivable; the scale and diversity of the components involved is way too massive. Too much would break. It’s challenging enough to deprecate outmoded functionality. In some ways, the open Internet is a victim of its own success. The bigger a system grows and the longer it stays the same, the harder it is to change. The Internet is like a massive barge: it takes forever to steer in a different direction and it’s carrying a lot of garbage.
As you would expect, many of the warts of the early Internet still remain. Both academic security researchers and real-life adversaries are still finding and exploiting vulnerabilities in the system. Many vulnerabilities are due to the fact that most of the protocols in use on the Internet have a weak notion of trust inherited from the early days. With 50 hosts online, it’s relatively easy to trust everyone, but in a world-scale system, that trust breaks down in fascinating ways. The primary tool to scale trust is cryptography, which helps provide some measure of accountability, though it has its own complexities.
In an ideal world, the Internet would provide a trustworthy substrate for human communication and commerce. Some people naïvely assume that this is the natural direction the evolution of the Internet will follow. However, constant improvement is not a given. It’s possible that the Internet of the future will actually be worse than the Internet today: less open, less secure, less private, less trustworthy. There are strong incentives to weaken the Internet on a fundamental level by Governments, by businesses such as ISPs, and even by the financial institutions entrusted with our personal data.
In a system with as many stakeholders as the Internet, real change requires principled commitment from all invested parties. At Cloudflare, we believe everyone is entitled to an Internet built on a solid foundation of trust. Crypto Week is our way of helping nudge the Internet’s evolution in a more trust-oriented direction. Each announcement this week helps bring the Internet of the future to the present in a tangible way.
Ongoing Internet Upgrades
Before we explore the Internet of the future, let’s explore some of the previous and ongoing attempts to upgrade the Internet’s fundamental protocols.
As we highlighted in last year’s Crypto Weekone of the weak links on the Internet is routing. Not all networks are directly connected.
To send data from one place to another, you might have to rely on intermediary networks to pass your data along. A packet sent from one host to another may have to be passed through up to a dozen of these intermediary networks.No single network knows the full path the data will have to take to get to its destination, it only knows which network to pass it to next.The protocol that determines how packets are routed is called the Border Gateway Protocol (BGP.) Generally speaking, networks use BGP to announce to each other which addresses they know how to route packets for and (dependent on a set of complex rules) these networks share what they learn with their neighbors.
Unfortunately, BGP is completely insecure:
Any network can announce any set of addresses to any other network, even addresses they don’t control. This leads to a phenomenon called BGP hijacking, where networks are tricked into sending data to the wrong network.
A BGP hijack ismost often caused by accidental misconfiguration, but can also be the result of malice on the network operator’s part.
During a BGP hijack, a network inappropriately announces a set of addresses to other networks, which results in packets destined for the announced addresses to be routed through the illegitimate network.
Understanding the risk
If the packets represent unencrypted data, this can be a big problemas it allows the hijacker to read or even change the data:
The Resource Public Key Infrastructure (RPKI) system helps bring some trust to BGP by enabling networks to utilize cryptography to digitally sign network routes with certificates, making BGP hijacking much more difficult.
This enables participants of the network to gain assurances about the authenticity of route advertisements. Certificate Transparency (CT) is a tool that enables additional trust for certificate-based systems. Cloudflare operates the Cirrus CT log to support RPKI.
Since we announced our support of RPKI last year, routing security has made big strides. More routes are signed, more networks validate RPKI, and the software ecosystem has matured, but this work is not complete. Most networks are still vulnerable to BGP hijacking. For example, Pakistan knocked YouTube offline with a BGP hijack back in 2008, and could likely do the same today. Adoption here is driven less by providing a benefit to users, but rather by reducing systemic risk, which is not the strongest motivating factor for adopting a complex new technology. Full routing security on the Internet could take decades.
The Domain Name System (DNS) is the phone book of the Internet. Or, for anyone under 25 who doesn’t remember phone books, it’s the system that takes hostnames (like cloudflare.com or facebook.com) and returns the Internet address where that host can be found. For example, as of this publication, www.cloudflare.com is 184.108.40.206 and 220.127.116.11 (IPv4) and 2606:4700::c629:d7a2, 2606:4700::c629:d6a2 (IPv6). Like BGP, DNS is completely insecure. Queries and responses sent unencrypted over the Internet are modifiable by anyone on the path.
There are many ongoing attempts to add security to DNS, such as:
DNSSEC that adds a chain of digital signatures to DNS responses
DoT/DoH that wraps DNS queries in the TLS encryption protocol (more on that later)
Both technologies are slowly gaining adoption, but have a long way to go.
Just like RPKI, securing DNS comes with a performance cost, making it less attractive to users. However,
This performance improvement makes it appealing for customers of privacy-conscious applications, like Firefox and Cloudflare’s 18.104.22.168 app, to adopt secure DNS.
Transport Layer Security (TLS) is a cryptographic protocol that gives two parties the ability to communicate over an encrypted and authenticated channel.TLS protects communications from eavesdroppers even in the event of a BGP hijack. TLS is what puts the “S” in HTTPS. TLS protects web browsing against multiple types of network adversaries.
The adoption of TLS on the web is partially driven by the fact that:
Browsers make TLS adoption appealing to website operators by only supporting new web features such as HTTP/2 over HTTPS.
This has led to the rapid adoption of HTTPS over the last five years.
To further that adoption, TLS recently got an upgrade in TLS 1.3, making it faster and more secure (a combination we love). It’s taking over the Internet!
Despite this fantastic progress in the adoption of security for routing, DNS, and the web, there are still gaps in the trust model of the Internet. There are other things needed to help build the Internet of the future. To find and identify these gaps, we lean on research experts.
Research Farm to Table
Cryptographic security on the Internet is a hot topic and there have been many flaws and issues recently pointed out in academic journals. Researchers often study the vulnerabilities of the past and ask:
What other critical components of the Internet have the same flaws?
What underlying assumptions can subvert trust in these existing systems?
The answers to these questions help us decide what to tackle next. Some recent research topics we’ve learned about include:
Attacks on Time Synchronization
DNS attacks affecting Certificate issuance
Scaling distributed trust
Cloudflare keeps abreast of these developments and we do what we can to bring these new ideas to the Internet at large. In this respect, we’re truly standing on the shoulders of giants.
Future-proofing Internet Cryptography
The new protocols we are currently deploying (RPKI, DNSSEC, DoT/DoH, TLS 1.3) use relatively modern cryptographic algorithms published in the 1970s and 1980s.
If you can solve the hard problem, you can crack the code. Using a bigger key makes the problem harder, making it more difficult to break, but also slows performance.
Modern Internet protocols typically pick keys large enough to make it infeasible to break with classical computers, but no larger. The sweet spot is around 128-bits of security;meaning a computer has to do approximately 2¹²⁸ operations to break it.
Arjen Lenstra and others created a useful measure of security levels by comparing the amount of energy it takes to break a key to the amount of water you can boil using that much energy. You can think of this as the electric bill you’d get if you run a computer long enough to crack the key.
35-bit securityis “Teaspoon security” — It takes about the same amount of energy to break a 35-bit key as it does to boil a teaspoon of water (pretty easy).
65 bits gets you up to “Pool security” – The energy needed to boil the average amount of water in a swimming pool.
105 bits is “Sea Security” – The energy needed to boil the Mediterranean Sea.
114-bits is “Global Security” – The energy needed to boil all water on Earth.
128-bit security is safely beyond thatof Global Security – Anything larger is overkill.
256-bit security corresponds to “Universal Security” – The estimated mass-energy of the observable universe. So, if you ever hear someone suggest 256-bit AES, you know they mean business.
Post-Quantum of Solace
As far as we know, the algorithms we use for cryptography are functionally uncrackable with all known algorithms that classical computers can run. Quantum computers change this calculus. Instead of transistors and bits, a quantum computer uses the effects of quantum mechanics to perform calculations that just aren’t possible with classical computers. As you can imagine, quantum computers are very difficult to build. However, despite large-scale quantum computers not existing quite yet, computer scientists have already developed algorithms that can only run efficiently on quantum computers. Surprisingly, it turns out that with a sufficiently powerful quantum computer, most of the hard mathematical problems we rely on for Internet security become easy!
Although there are still quantum-skeptics out there, some expertsestimate that within 15-30 years these large quantum computers will exist, which poses a risk to every security protocol online. Progress is moving quickly; every few months a more powerful quantum computer is announced.
Luckily, there are cryptography algorithms that rely on different hard math problems that seem to be resistant to attack from quantum computers. These math problems form the basis of so-called quantum-resistant (or post-quantum) cryptography algorithms that can run on classical computers. These algorithms can be used as substitutes for most of our current quantum-vulnerable algorithms.
Some quantum-resistant algorithms (such as McEliece and Lamport Signatures) were invented decades ago, but there’s a reason they aren’t in common use: they lack some of the nice properties of the algorithms we’re currently using, such as key size and efficiency.
Some quantum-resistant algorithms require much larger keys to provide 128-bit security
Some are very CPU intensive,
And some just haven’t been studied enough to know if they’re secure.
It is possible to swap our current set of quantum-vulnerable algorithms with new quantum-resistant algorithms, but it’s a daunting engineering task. With widely deployed protocols, it is hard to make the transition from something fast and small to something slower, bigger or more complicated without providing concrete user benefits. When exploring new quantum-resistant algorithms, minimizing user impact is of utmost importance to encourage adoption. This is a big deal, because almost all the protocols we use to protect the Internet are vulnerable to quantum computers.
Cryptography-breaking quantum computing is still in the distant future, but we must start the transition to ensure that today’s secure communications are safe from tomorrow’s quantum-powered onlookers; however, that’s not the most timely problem with the Internet. We haven’t addressed that…yet.
Just like DNS, BGP, and HTTP, the Network Time Protocol (NTP) is fundamental to how the Internet works. And like these other protocols, it is completely insecure.
Last year, Cloudflare introduced Roughtime as a mechanism for computers to access the current time from a trusted server in an authenticated way.
Roughtime is powerful because it provides a way to distribute trust among multiple time servers so that if one server attempts to lie about the time, it will be caught.
However, Roughtime is not exactly a secure drop-in replacement for NTP.
Roughtime lacks the complex mechanisms of NTP that allow it to compensate for network latency and yet maintain precise time, especially if the time servers are remote. This leads to imprecise time.
Roughtime also involves expensive cryptography that can further reduce precision. This lack of precision makes Roughtime useful for browsers and other systems that need coarse time to validate certificates (most certificates are valid for 3 months or more), but some systems (such as those used for financial trading) require precision to the millisecond or below.
With Roughtime we supported the time protocol of the future, but there are things we can do to help improve the health of security online today.
Some academic researchers, including Aanchal Malhotra of Boston University, have demonstrated a variety of attacks against NTP, including BGP hijacking and off-path User Datagram Protocol (UDP) attacks.
Some of these attacks can be avoided by connecting to an NTP server that is close to you on the Internet.
However, to bring cryptographic trust to time while maintaining precision, we need something in between NTP and Roughtime.
To solve this, it’s natural to turn to the same system of trust that enabled us to patch HTTP and DNS: Web PKI.
Attacking the Web PKI
The Web PKI is similar to the RPKI, but is more widely visible since it relates to websites rather than routing tables.
If you’ve ever clicked the lock icon on your browser’s address bar, you’ve interacted with it.
The PKI relies on a set of trusted organizations called Certificate Authorities (CAs) to issue certificates to websites and web services.
While we were all patting ourselves on the back for moving the web to HTTPS, someresearchers managed to find and exploit a weakness in the system: the process for getting HTTPS certificates.
Certificate Authorities (CAs) use a process called domain control validation (DCV) to ensure that they only issue certificates to websites owners who legitimately request them.
Some CAs do this validation manually, which is secure, but can’t scale to the total number of websites deployed today.
More progressive CAs have automated this validation process, but rely on insecure methods (HTTP and DNS) to validate domain ownership.
Without ubiquitous cryptography in place (DNSSEC may never reach 100% deployment), there is no completely secure way to bootstrap this system. So, let’s look at how to distribute trust using other methods.
One tool at our disposal is the distributed nature of the Cloudflare network.
Cloudflare is global. We have locations all over the world connected to dozens of networks. That means we have different vantage points, resulting in different ways to traverse networks. This diversity can prove an advantage when dealing with BGP hijacking, since an attacker would have to hijack multiple routes from multiple locations to affect all the traffic between Cloudflare and other distributed parts of the Internet. The natural diversity of the network raises the cost of the attacks.
A distributed set of connections to the Internet and using them as a quorum is a mighty paradigm to distribute trust, with or without cryptography.
This idea of distributing the source of trust is powerful. Last year we announced the Distributed Web Gateway that
Enables users to access content on the InterPlanetary File System (IPFS), a network structured to reduce the trust placed in any single party.
Even if a participant of the network is compromised, it can’t be used to distribute compromised content because the network is content-addressed.
However, using content-based addressing is not the only way to distribute trust between multiple independent parties.
Another way to distribute trust is to literally split authority between multiple independent parties. We’ve explored this topic before. In the context of Internet services, this means ensuring that no single server can authenticate itself to a client on its own. For example,
In HTTPS the server’s private key is the lynchpin of its security. Compromising the owner of the private key (by hook or by crook) gives an attacker the ability to impersonate (spoof) that service. This single point of failure puts services at risk. You can mitigate this risk by distributing the authority to authenticate the service between multiple independently-operated services.
The Internet barge is old and slow, and we’ve only been able to improve it through the meticulous process of patching it piece by piece. Another option is to build new secure systems on top of this insecure foundation. IPFS is doing this, and IPFS is not alone in its design. There has been more research into secure systems with decentralized trust in the last ten years than ever before.
The result is radical new protocols and designs that use exotic new algorithms. These protocols do not supplant those at the core of the Internet (like TCP/IP), but instead, they sit on top of the existing Internet infrastructure, enabling new applications, much like HTTP did for the web.
Some of the most innovative technical projects were considered failures because they couldn’t attract users. New technology has to bring tangible benefits to users to sustain it: useful functionality, content, and a decent user experience. Distributed projects, such as IPFS and others, are gaining popularity, but have not found mass adoption. This is a chicken-and-egg problem. New protocols have a high barrier to entry—users have to install new software—and because of the small audience, there is less incentive to create compelling content. Decentralization and distributed trust are nice security features to have, but they are not products. Users still need to get some benefit out of using the platform.
An example of a system to break this cycle is the web. In 1992 the web was hardly a cornucopia of awesomeness. What helped drive the dominance of the web was its users.
The growth of the user base meant more incentive for people to build services, and the availability of more services attracted more users. It was a virtuous cycle.
It’s hard for a platform to gain momentum, but once the cycle starts, a flywheel effect kicks in to help the platform grow.
The Distributed Web Gateway project Cloudflare launched last year in Crypto Week is our way of exploring what happens if we try to kickstart that flywheel. By providing a secure, reliable, and fast interface from the classic web with its two billion users to the content on the distributed web, we give the fledgling ecosystem an audience.
If the advantages provided by building on the distributed web are appealing to users, then the larger audience will help these services grow in popularity.
This is somewhat reminiscent of how IPv6 gained adoption. It started as a niche technology only accessible using IPv4-to-IPv6 translation services.
IPv6 adoption has now grown so much that it is becoming a requirement for new services. For example,Apple is requiring that all apps work in IPv6-only contexts.
Eventually, as user-side implementations of distributed web technologies improve, people may move to using the distributed web natively rather than through an HTTP gateway. Or they may not! By leveraging Cloudflare’s global network to give users access to new technologies based on distributed trust, we give these technologies a better chance at gaining adoption.
Happy Crypto Week
At Cloudflare, we always support new technologies that help make the Internet better. Part of helping make a better Internet is scaling the systems of trust that underpin web browsing and protect them from attack. We provide the tools to create better systems of assurance with fewer points of vulnerability. We work with academic researchers of security to get a vision of the future and engineer away vulnerabilities before they can become widespread. It’s a constant journey.
Cloudflare knows that none of this is possible without the work of researchers. From award-winning researcher publishing papers in top journals to the blog posts of clever hobbyists, dedicated and curious people are moving the state of knowledge of the world forward. However, the push to publish new and novel research sometimes holds researchers back from committing enough time and resources to fully realize their ideas. Great research can be powerful on its own, but it can have an even broader impact when combined with practical applications. We relish the opportunity to stand on the shoulders of these giants and use our engineering know-how and global reach to expand on their work to help build a better Internet.
So, to all of you dedicated researchers, thank you for your work! Crypto Week is yours as much as ours. If you’re working on something interesting and you want help to bring the results of your research to the broader Internet, please contact us at [email protected]. We want to help you realize your dream of making the Internet safe and trustworthy.
Members of 3ve (pronounced “eve”) used their large reservoir of trusted IP addresses to conceal a fraud that otherwise would have been easy for advertisers to detect. The scheme employed a thousand servers hosted inside data centers to impersonate real human beings who purportedly “viewed” ads that were hosted on bogus pages run by the scammers themselves — who then received a check from ad networks for these billions of fake ad impressions. Normally, a scam of this magnitude coming from such a small pool of server-hosted bots would have stuck out to defrauded advertisers. To camouflage the scam, 3ve operators funneled the servers’ fraudulent page requests through millions of compromised IP addresses.
About one million of those IP addresses belonged to computers, primarily based in the US and the UK, that attackers had infected with botnet software strains known as Boaxxe and Kovter. But at the scale employed by 3ve, not even that number of IP addresses was enough. And that’s where the BGP hijacking came in. The hijacking gave 3ve a nearly limitless supply of high-value IP addresses. Combined with the botnets, the ruse made it seem like millions of real people from some of the most affluent parts of the world were viewing the ads.
Lots of details in the article.
An aphorism I often use in my talks is “expertise flows downhill: today’s top-secret NSA programs become tomorrow’s PhD theses and the next day’s hacking tools.” This is an example of that. BGP hacking — known as “traffic shaping” inside the NSA — has long been a tool of national intelligence agencies. Now it is being used by cybercriminals.
EDITED TO ADD (1/2): Classified NSA presentation on “network shaping.” I don’t know if there is a difference inside the NSA between the two terms.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.