Quantum computing is inevitable; cryptography prepares for the future
Quantum computing began in the early 1980s. It operates on principles of quantum physics rather than the limitations of circuits and electricity, which is why it is capable of processing highly complex mathematical problems so efficiently. Quantum computing could one day achieve things that classical computing simply cannot.
The evolution of quantum computers has been slow. Still, work is accelerating, thanks to the efforts of academic institutions such as Oxford, MIT, and the University of Waterloo, as well as companies like IBM, Microsoft, Google, and Honeywell. IBM has held a leadership role in this innovation push and has named optimization the most likely application for consumers and organizations alike. Honeywell expects to release what it calls the “world’s most powerful quantum computer” for applications like fraud detection, optimization for trading strategies, security, machine learning, and chemistry and materials science.
In 2019, the Google Quantum Artificial Intelligence (AI) team announced that their 53-qubit (analogous to bits in classical computing) machine had achieved “quantum supremacy.” This was the first time a quantum computer was able to solve a problem faster than any classical computer in existence. This was considered a significant milestone.
Quantum computing will change the face of Internet security forever — particularly in the realm of cryptography, which is the way communications and information are secured across channels like the Internet. Cryptography is critical to almost every aspect of modern life, from banking to cellular communications to connected refrigerators and systems that keep subways running on time. This ultra-powerful, highly sophisticated new generation of computing has the potential to unravel decades of work that have been put into developing the cryptographic algorithms and standards we use today.
Quantum computers will crack modern cryptographic algorithms
Quantum computers can take a very large integer and find out its prime factor extremely rapidly by using Shor’s algorithm. Why is this so important in the context of cryptographic security?
Most cryptography today is based on algorithms that incorporate difficult problems from number theory, like factoring. The forerunner of nearly all modern cryptographic schemes is RSA (Rivest-Shamir-Adleman), which was devised back in 1976. Basically, every participant of a public key cryptography system like RSA has both a public key and a private key. To send a secure message, data is encoded as a large number and scrambled using the public key of the person you want to send it to. The person on the receiving end can decrypt it with their private key. In RSA, the public key is a large number, and the private key is its prime factors. With Shor’s algorithm, a quantum computer with enough qubits could factor large numbers. For RSA, someone with a quantum computer can take a public key and factor it to get the private key, which allows them to read any message encrypted with that public key. This ability to factor numbers breaks nearly all modern cryptography. Since cryptography is what provides pervasive security for how we communicate and share information online, this has significant implications.
Theoretically, if an adversary were to gain control of a quantum computer, they could create total chaos. They could create cryptographic certificates and impersonate banks to steal funds, disrupt Bitcoin, break into digital wallets, and access and decrypt confidential communications. Some liken this to Y2K. But, unlike Y2K, there’s no fixed date as to when existing cryptography will be rendered insecure. Researchers have been preparing and working hard to get ahead of the curve by building quantum-resistant cryptography solutions.
When will a quantum computer be built that is powerful enough to break all modern cryptography? By some estimates, it may take 10 to 15 years. Companies and universities have made a commitment to innovation in the field of quantum computing, and progress is certainly being made. Unlike classical computers, quantum computers rely on quantum effects, which only happen at the atomic scale. To instantiate a qubit, you need a particle that exhibits quantum effects like an electron or a photon. These particles are extremely small and hard to manage, so one of the biggest hurdles to the realization of quantum computers is how to keep the qubits stable long enough to do the expensive calculations involved in cryptographic algorithms.
Both quantum computing and quantum-resistant cryptography are works in progress
It takes a long time for hardware technology to develop and mature. Similarly, new cryptographic techniques take a long time to discover and refine. To protect today’s data from tomorrow’s quantum adversaries, we need new cryptographic techniques that are not vulnerable to Shor’s algorithm.
The National Institute of Standards and Technology (NIST) is leading the charge in defining post-quantum cryptography algorithms to replace RSA and ECC. There is a project currently underway to test and select a set of post-quantum computing-resistant algorithms that go beyond existing public-key cryptography. NIST plans to make a recommendation sometime between 2022 and 2024 for two to three algorithms for both encryption and digital signatures. As Dustin Moody, NIST mathematician points out, the organization wants to cover as many bases as possible: “If some new attack is found that breaks all lattices, we’ll still have something to fall back on.”
We’re following closely. The participants of NIST have developed high-speed implementations of post-quantum algorithms on different computer architectures. We’ve taken some of these algorithms and tested them in Cloudflare’s systems in various capacities. Last year, Cloudflare and Google performed the TLS Post-Quantum Experiment, which involved implementing and supporting new key exchange mechanisms based on post-quantum cryptography for all Cloudflare customers for a period of a few months. As an edge provider, Cloudflare was well positioned to turn on post-quantum algorithms for millions of websites to measure performance and use these algorithms to provide confidentiality in TLS connections. This experiment led us to some useful insights around which algorithms we should focus on for TLS and which we should not (sorry, SIDH!).
More recently, we have been working with researchers from the University of Waterloo and Radboud University on a new protocol called KEMTLS, which will be presented at Real World Crypto 2021. In our last TLS experiment, we replaced the key negotiation part of TLS with quantum-safe alternatives but continued to rely on digital signatures. KEMTLS is designed to be fully post-quantum and relies only on public-key encryption.
On the implementation side, Cloudflare team members including Armando Faz Hernandez and visiting researcher Bas Westerbaan have developed high-speed assembly versions of several of the NIST finalists (Kyber, Dilithium), as well as other relevant post-quantum algorithms (CSIDH, SIDH) in our CIRCL cryptography library written in Go.
Post-quantum security, coming soon?
Everything that is encrypted with today’s public key cryptography can be decrypted with tomorrow’s quantum computers. Imagine waking up one day, and everyone’s diary from 2020 is suddenly public. Although it’s impossible to find enough storage to record keep all the ciphertext sent over the Internet, there are current and active efforts to collect a lot of it. This makes deploying post-quantum cryptography as soon as possible a pressing privacy concern.
Cloudflare is taking steps to accelerate this transition. First, we endeavor to use post-quantum cryptography for most internal services by the end of 2021. Second, we plan to be among the first services to offer post-quantum cipher suites to customers as standards emerge. We’re optimistic that collaborative efforts among NIST, Microsoft, Cloudflare, and other computing companies will yield a robust, standards-based solution. Although powerful quantum computers are likely in our future, Cloudflare is helping to make sure the Internet is ready for when they arrive.
Over the last ten years, Cloudflare has become an important part of Internet infrastructure, powering websites, APIs, and web services to help make them more secure and efficient. The Internet is growing in terms of its capacity and the number of people using it and evolving in terms of its design and functionality. As a player in the Internet ecosystem, Cloudflare has a responsibility to help the Internet grow in a way that respects and provides value for its users. Today, we’re making several announcements around improving Internet protocols with respect to something important to our customers and Internet users worldwide: privacy.
Developing a superior protocol forpasswordauthentication, OPAQUE, that makes password breaches less likely to occur
Each of these projects impacts an aspect of the Internet that influences our online lives and digital footprints. Whether we know it or not, there is a lot of private information about us and our lives floating around online. This is something we can help fix.
For over a year, we have been working through standards bodies like the IETF and partnering with the biggest names in Internet technology (including Mozilla, Google, Equinix, and more) to design, deploy, and test these new privacy-preserving protocols at Internet scale. Each of these three protocols touches on a critical aspect of our online lives, and we expect them to help make real improvements to privacy online as they gain adoption.
A continuing tradition at Cloudflare
One of Cloudflare’s core missions is to support and develop technology that helps build a better Internet. As an industry, we’ve made exceptional progress in making the Internet more secure and robust. Cloudflare is proud to have played a part in this progress through multiple initiatives over the years.
Here are a few highlights:
Universal SSL™. We’ve been one of the driving forces for encrypting the web. We launched Universal SSL in 2014 to give website encryption to our customers for free and have actively been working along with certificate authorities like Let’s Encrypt, web browsers, and website operators to help remove mixed content. Before Universal SSL launched to give all Cloudflare customers HTTPS for free, only 30% of connections to websites were encrypted. Through the industry’s efforts, that number is now 80% — and a much more significant proportion of overall Internet traffic. Along with doing our part to encrypt the web, we have supported the Certificate Transparency project via Nimbus and Merkle Town, which has improved accountability for the certificate ecosystem HTTPS relies on for trust.
TLS 1.3 and QUIC. We’ve also been a proponent of upgrading existing security protocols. Take Transport Layer Security (TLS), the underlying protocol that secures HTTPS. Cloudflare engineers helped contribute to the design of TLS 1.3, the latest version of the standard, and in 2016 we launched support for an early version of the protocol. This early deployment helped lead to improvements to the final version of the protocol. TLS 1.3 is now the most widely used encryption protocol on the web and a vital component of the emerging QUIC standard, of which we were also early adopters.
Continuing to improve the security of the systems of trust online is critical to the Internet’s growth. However, there is a more fundamental principle at play: respect. The infrastructure underlying the Internet should be designed to respect its users.
Building an Internet that respects users
Without encryption, Internet browsing information is implicitly shared with countless third parties online as information passes between networks. Without secure routing, users’ traffic can be hijacked and disrupted. Without privacy-preserving protocols, users’ online life is not as private as they would think or expect. The infrastructure of the Internet wasn’t built in a way that reflects their expectations.
The good news is that the Internet is continuously evolving. One of the groups that help guide that evolution is the Internet Architecture Board (IAB). The IAB provides architectural oversight to the Internet Engineering Task Force (IETF), the Internet’s main standard-setting body. The IAB recently published RFC 8890, which states that individual end-users should be prioritized when designing Internet protocols. It says that if there’s a conflict between the interests of end-users and the interest of service providers, corporations, or governments, IETF decisions should favor end users. One of the prime interests of end-users is the right to privacy, and the IAB published RFC 6973 to indicate how Internet protocols should take privacy into account.
Today’s technical blog posts are about improvements to the Internet designed to respect user privacy. Privacy is a complex topic that spans multiple disciplines, so it’s essential to clarify what we mean by “improving privacy.” We are specifically talking about changing the protocols that handle privacy-sensitive information exposed “on-the-wire” and modifying them so that this data is exposed to fewer parties. This data continues to exist. It’s just no longer available or visible to third parties without building a mechanism to collect it at a higher layer of the Internet stack, the application layer. These changes go beyond website encryption; they go deep into the design of the systems that are foundational to making the Internet what it is.
The toolbox: cryptography and secure proxies
Two tools for making sure data can be used without being seen are cryptography and secureproxies.
Cryptography allows information to be transformed into a format that a very limited number of people (those with the key) can understand. Some describe cryptography as a tool that transforms data security problems into key management problems. This is a humorous but fair description. Cryptography makes it easier to reason about privacy because only key holders can view data.
Both these tools are useful individually, but they can be even more effective if combined. Onion routing (the cryptographic technique underlying Tor) is one example of how proxies and encryption can be used in tandem to enforce strong privacy. Broadly, if party A wants to send data to party B, they can encrypt the data with party B’s key and encrypt the metadata with a proxy’s key and send it to the proxy.
Platforms and services built on top of the Internet can build in consent systems, like privacy policies presented through user interfaces. The infrastructure of the Internet relies on layers of underlying protocols. Because these layers of the Internet are so far below where the user interacts with them, it’s almost impossible to build a concept of user consent. In order to respect users and protect them from privacy issues, the protocols that glue the Internet together should be designed with privacy enabled by default.
Data vs. metadata
The transition from a mostly unencrypted web to an encrypted web has done a lot for end-user privacy. For example, the “coffeeshop stalker” is no longer an issue for most sites. When accessing the majority of sites online, users are no longer broadcasting every aspect of their web browsing experience (search queries, browser versions, authentication cookies, etc.) over the Internet for any participant on the path to see. Suppose a site is configured correctly to use HTTPS. In that case, users can be confident their data is secure from onlookers and reaches only the intended party because their connections are both encrypted and authenticated.
However, HTTPS only protects the content of web requests. Even if you only browse sites over HTTPS, that doesn’t mean that your browsingpatterns are private. This is because HTTPS fails to encrypt a critical aspect of the exchange: the metadata. When you make a phone call, the metadata is the phone number, not the call’s contents. Metadata is the data about the data.
To illustrate the difference and why it matters, here’s a diagram of what happens when you visit a website like an imageboard. Say you’re going to a specific page on that board (https://<imageboard>.com/room101/) that has specific embedded images hosted on <embarassing>.com.
The space inside the dotted line here represents the part of the Internet that your data needs to transit. They include your local area network or coffee shop, your ISP, an Internet transit provider, and it could be the network portion of the cloud provider that hosts the server. Users often don’t have a relationship with these entities or a contract to prevent these parties from doing anything with the user’s data. And even if those entities don’t look at the data, a well-placed observer intercepting Internet traffic could see anything sent unencrypted. It would be best if they just didn’t see it at all. In this example, the fact that the user visited <imageboard>.com can be seen by an observer, which is expected. However, though page content is encrypted, it’s possible to learn which specific page you’ve visited can be seen since <embarassing>.com is also visible.
It’s a general rule that if data is available to on-path parties on the Internet, some of these on-path parties will use this data. It’s also true that these on-path parties need some metadata in order to facilitate the transport of this data. This balance is explored in RFC 8558, which explains how protocols should be designed thoughtfully with respect to the balance between too much metadata (bad for privacy) and too little metadata (bad for operations).
In an ideal world, Internet protocols would be designed with the principle of least privilege. They would provide the minimum amount of information needed for the on-path parties (the pipes) to do the job of transporting the data to the right place and keep everything else confidential by default. Current protocols, including TLS 1.3 and QUIC, are important steps towards this ideal but fall short with respect to metadata privacy.
Knowing both who you are and what you do online can lead to profiling
Today’s announcements reflect two metadata protection levels: the first involves limiting the amount of metadata available to third-party observers (like ISPs). The second involves restricting the amount of metadata that users share with service providers themselves.
Hostnames are an example of metadata that needs to be protected from third-party observers, which DoH and ECH intend to do. However, it doesn’t make sense to hide the hostname from the site you’re visiting. It also doesn’t make sense to hide it from a directory service like DNS. A DNS server needs to know which hostname you’re resolving to resolve it for you!
A privacy issue arises when a service provider knows about both what sites you’re visiting and who you are. Individual websites do not have this dangerous combination of information (except in the case of third party cookies, which are going away soon in browsers), but DNS providers do. Thankfully, it’s not actually necessary for a DNS resolver to know *both* the hostname of the service you’re going to and which IP you’re coming from. Disentangling the two, which is the goal of ODoH, is good for privacy.
The Internet is part of ‘our’ Infrastructure
Roads should be well-paved, well lit, have accurate signage, and be optimally connected. They aren’t designed to stop a car based on who’s inside it. Nor should they be! Like transportation infrastructure, Internet infrastructure is responsible for getting data where it needs to go, not looking inside packets, and making judgments. But the Internet is made of computers and software, and software tends to be written to make decisions based on the data it has available to it.
Privacy-preserving protocols attempt to eliminate the temptation for infrastructure providers and others to peek inside and make decisions based on personal data. A non-privacy preserving protocol like HTTP keeps data and metadata, like passwords, IP addresses, and hostnames, as explicit parts of the data sent over the wire. The fact that they are explicit means that they are available to any observer to collect and act on. A protocol like HTTPS improves upon this by making some of the data (such as passwords and site content) invisible on the wire using encryption.
The three protocols we are exploring today extend this concept.
ECH takes most of the unencrypted metadata in TLS (including the hostname) and encrypts it with a key that was fetched ahead of time.
ODoH (a new variant of DoH co-designed by Apple, Cloudflare, and Fastly engineers) uses proxies and onion-like encryption to make the source of a DNS query invisible to the DNS resolver. This protects the user’s IP address when resolving hostnames.
OPAQUE uses a new cryptographic technique to keep passwords hidden even from the server. Utilizing a construction called an Oblivious Pseudo-Random Function (as seen in Privacy Pass), the server does not learn the password; it only learns whether or not the user knows the password.
By making sure Internet infrastructure acts more like physical infrastructure, user privacy is more easily protected. The Internet is more private if private data can only be collected where the user has a chance to consent to its collection.
Doing it together
As much as we’re excited about working on new ways to make the Internet more private, innovation at a global scale doesn’t happen in a vacuum. Each of these projects is the output of a collaborative group of individuals working out in the open in organizations like the IETF and the IRTF. Protocols must come about through a consensus process that involves all the parties that make up the interconnected set of systems that power the Internet. From browser builders to cryptographers, from DNS operators to website administrators, this is truly a global team effort.
We also recognize that sweeping technical changes to the Internet will inevitably also impact the technical community. Adopting these new protocols may have legal and policy implications. We are actively working with governments and civil society groups to help educate them about the impact of these potential changes.
We’re looking forward to sharing our work today and hope that more interested parties join in developing these protocols. The projects we are announcing today were designed by experts from academia, industry, and hobbyists together and were built by engineers from Cloudflare Research (including the work of interns, which we will highlight) with everyone’s support Cloudflare.
Time flies. The Heartbleed vulnerability was discovered just over five and a half years ago. Heartbleed became a household name not only because it was one of the first bugs with its own web page and logo, but because of what it revealed about the fragility of the Internet as a whole. With Heartbleed, one tiny bug in a cryptography library exposed the personal data of the users of almost every website online.
Heartbleed is an example of an underappreciated class of bugs: remote memory disclosure vulnerabilities. High profile examples other than Heartbleed include Cloudbleed and most recently NetSpectre. These vulnerabilities allow attackers to extract secrets from servers by simply sending them specially-crafted packets. Cloudflare recently completed a multi-year project to make our platform more resilient against this category of bug.
For the last five years, the industry has been dealing with the consequences of the design that led to Heartbleed being so impactful. In this blog post we’ll dig into memory safety, and how we re-designed Cloudflare’s main product to protect private keys from the next Heartbleed.
Perfect security is not possible for businesses with an online component. History has shown us that no matter how robust their security program, an unexpected exploit can leave a company exposed. One of the more famous recent incidents of this sort is Heartbleed, a vulnerability in a commonly used cryptography library called OpenSSL that exposed the inner details of millions of web servers to anyone with a connection to the Internet. Heartbleed made international news, caused millions of dollars of damage, and still hasn’t been fully resolved.
Typical web services only return data via well-defined public-facing interfaces called APIs. Clients don’t typically get to see what’s going on under the hood inside the server, that would be a huge privacy and security risk. Heartbleed broke that paradigm: it enabled anyone on the Internet to get access to take a peek at the operating memory used by web servers, revealing privileged data usually not exposed via the API. Heartbleed could be used to extract the result of previous data sent to the server, including passwords and credit cards. It could also reveal the inner workings and cryptographic secrets used inside the server, including TLS certificate private keys.
Heartbleed let attackers peek behind the curtain, but not too far. Sensitive data could be extracted, but not everything on the server was at risk. For example, Heartbleed did not enable attackers to steal the content of databases held on the server. You may ask: why was some data at risk but not others? The reason has to do with how modern operating systems are built.
A simplified view of process isolation
Most modern operating systems are split into multiple layers. These layers are analogous to security clearance levels. So-called user-space applications (like your browser) typically live in a low-security layer called user space. They only have access to computing resources (memory, CPU, networking) if the lower, more credentialed layers let them.
User-space applications need resources to function. For example, they need memory to store their code and working memory to do computations. However, it would be risky to give an application direct access to the physical RAM of the computer they’re running on. Instead, the raw computing elements are restricted to a lower layer called the operating system kernel. The kernel only runs specially-designed applications designed to safely manage these resources and mediate access to them for user-space applications.
When a new user space application process is launched, the kernel gives it a virtual memory space. This virtual memory space acts like real memory to the application but is actually a safely guarded translation layer the kernel uses to protect the real memory. Each application’s virtual memory space is like a parallel universe dedicated to that application. This makes it impossible for one process to view or modify another’s, the other applications are simply not addressable.
Heartbleed, Cloudbleed and the process boundary
Heartbleed was a vulnerability in the OpenSSL library, which was part of many web server applications. These web servers run in user space, like any common applications. This vulnerability caused the web server to return up to 2 kilobytes of its memory in response to a specially-crafted inbound request.
Cloudbleed was also a memory disclosure bug, albeit one specific to Cloudflare, that got its name because it was so similar to Heartbleed. With Cloudbleed, the vulnerability was not in OpenSSL, but instead in a secondary web server application used for HTML parsing. When this code parsed a certain sequence of HTML, it ended up inserting some process memory into the web page it was serving.
It’s important to note that both of these bugs occurred in applications running in user space, not kernel space. This means that the memory exposed by the bug was necessarily part of the virtual memory of the application. Even if the bug were to expose megabytes of data, it would only expose data specific to that application, not other applications on the system.
In order for a web server to serve traffic over the encrypted HTTPS protocol, it needs access to the certificate’s private key, which is typically kept in the application’s memory. These keys were exposed to the Internet by Heartbleed. The Cloudbleed vulnerability affected a different process, the HTML parser, which doesn’t do HTTPS and therefore doesn’t keep the private key in memory. This meant that HTTPS keys were safe, even if other data in the HTML parser’s memory space wasn’t.
The fact that the HTML parser and the web server were different applications saved us from having to revoke and re-issue our customers’ TLS certificates. However, if another memory disclosure vulnerability is discovered in the web server, these keys are again at risk.
Moving keys out of Internet-facing processes
Not all web servers keep private keys in memory. In some deployments, private keys are held in a separate machine called a Hardware Security Module (HSM). HSMs are built to withstand physical intrusion and tampering and are often built to comply with stringent compliance requirements. They can often be bulky and expensive. Web servers designed to take advantage of keys in an HSM connect to them over a physical cable and communicate with a specialized protocol called PKCS#11. This allows the web server to serve encrypted content while being physically separated from the private key.
At Cloudflare, we built our own way to separate a web server from a private key: Keyless SSL. Rather than keeping the keys in a separate physical machine connected to the server with a cable, the keys are kept in a key server operated by the customer in their own infrastructure (this can also be backed by an HSM).
More recently, we launched Geo Key Manager, a service that allows users to store private keys in only select Cloudflare locations. Connections to locations that do not have access to the private key use Keyless SSL with a key server hosted in a datacenter that does have access.
In both Keyless SSL and Geo Key Manager, private keys are not only not part of the web server’s memory space, they’re often not even in the same country! This extreme degree of separation is not necessary to protect against the next Heartbleed. All that is needed is for the web server and the key server to not be part of the same application. So that’s what we did. We call this Keyless Everywhere.
Keyless SSL is coming from inside the house
Repurposing Keyless SSL for Cloudflare-held private keys was easy to conceptualize, but the path from ideation to live in production wasn’t so straightforward. The core functionality of Keyless SSL comes from the open source gokeyless which customers run on their infrastructure, but internally we use it as a library and have replaced the main package with an implementation suited to our requirements (we’ve creatively dubbed it gokeyless-internal).
As with all major architecture changes, it’s prudent to start with testing out the model with something new and low risk. In our case, the test bed was our experimental TLS 1.3 implementation. In order to quickly iterate through draft versions of the TLS specification and push releases without affecting the majority of Cloudflare customers, we re-wrote our custom nginx web server in Go and deployed it in parallel to our existing infrastructure. This server was designed to never hold private keys from the start and only leverage gokeyless-internal. At this time there was only a small amount of TLS 1.3 traffic and it was all coming from the beta versions of browsers, which allowed us to work through the initial kinks of gokeyless-internal without exposing the majority of visitors to security risks or outages due to gokeyless-internal.
The first step towards making TLS 1.3 fully keyless was identifying and implementing the new functionality we needed to add to gokeyless-internal. Keyless SSL was designed to run on customer infrastructure, with the expectation of supporting only a handful of private keys. But our edge must simultaneously support millions of private keys, so we implemented the same lazy loading logic we use in our web server, nginx. Furthermore, a typical customer deployment would put key servers behind a network load balancer, so they could be taken out of service for upgrades or other maintenance. Contrast this with our edge, where it’s important to maximize our resources by serving traffic during software upgrades. This problem is solved by the excellent tableflip package we use elsewhere at Cloudflare.
The next project to go Keyless was Spectrum, which launched with default support for gokeyless-internal. With these small victories in hand, we had the confidence necessary to attempt the big challenge, which was porting our existing nginx infrastructure to a fully keyless model. After implementing the new functionality, and being satisfied with our integration tests, all that’s left is to turn this on in production and call it a day, right? Anyone with experience with large distributed systems knows how far “working in dev” is from “done,” and this story is no different. Thankfully we were anticipating problems, and built a fallback into nginx to complete the handshake itself if any problems were encountered with the gokeyless-internal path. This allowed us to expose gokeyless-internal to production traffic without risking downtime in the event that our reimplementation of the nginx logic was not 100% bug-free.
When rolling back the code doesn’t roll back the problem
Our deployment plan was to enable Keyless Everywhere, find the most common causes of fallbacks, and then fix them. We could then repeat this process until all sources of fallbacks had been eliminated, after which we could remove access to private keys (and therefore the fallback) from nginx. One of the early causes of fallbacks was gokeyless-internal returning ErrKeyNotFound, indicating that it couldn’t find the requested private key in storage. This should not have been possible, since nginx only makes a request to gokeyless-internal after first finding the certificate and key pair in storage, and we always write the private key and certificate together. It turned out that in addition to returning the error for the intended case of the key truly not found, we were also returning it when transient errors like timeouts were encountered. To resolve this, we updated those transient error conditions to return ErrInternal, and deployed to our canary datacenters. Strangely, we found that a handful of instances in a single datacenter started encountering high rates of fallbacks, and the logs from nginx indicated it was due to a timeout between nginx and gokeyless-internal. The timeouts didn’t occur right away, but once a system started logging some timeouts it never stopped. Even after we rolled back the release, the fallbacks continued with the old version of the software! Furthermore, while nginx was complaining about timeouts, gokeyless-internal seemed perfectly healthy and was reporting reasonable performance metrics (sub-millisecond median request latency).
To debug the issue, we added detailed logging to both nginx and gokeyless, and followed the chain of events backwards once timeouts were encountered.
➜ ~ grep 'timed out' nginx.log | grep Keyless | head -5
2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015157 Keyless SSL request/response timed out while reading Keyless SSL response, keyserver: 127.0.0.1
2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015231 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1
2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015271 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1
2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015280 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1
2018-07-25T05:30:50.000 29m41 2018/07/25 05:30:50 [error] 4525#0: *1015289 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1
You can see the first request to log a timeout had id 1015157. Also interesting that the first log line was “timed out while reading,” but all the others are “timed out while waiting,” and this latter message is the one that continues forever. Here is the matching request in the gokeyless log:
Aha! That SNI value is clearly invalid (SNIs are like Host headers, i.e. they are domains, not URL paths), and it’s also quite long. Our storage system indexes certificates based on two indices: which SNI they correspond to, and which IP addresses they correspond to (for older clients that don’t support SNI). Our storage interface uses the memcached protocol, and the client library that gokeyless-internal uses rejects requests for keys longer than 250 characters (memcached’s maximum key length), whereas the nginx logic is to simply ignore the invalid SNI and treat the request as if only had an IP. The change in our new release had shifted this condition from ErrKeyNotFound to ErrInternal, which triggered cascading problems in nginx. The “timeouts” it encountered were actually a result of throwing away all in-flight requests multiplexed on a connection which happened to return ErrInternalfor a single request. These requests were retried, but once this condition triggered, nginx became overloaded by the number of retried requests plus the continuous stream of new requests coming in with bad SNI, and was unable to recover. This explains why rolling back gokeyless-internal didn’t fix the problem.
This discovery finally brought our attention to nginx, which thus far had escaped blame since it had been working reliably with customer key servers for years. However, communicating over localhost to a multitenant key server is fundamentally different than reaching out over the public Internet to communicate with a customer’s key server, and we had to make the following changes:
Instead of a long connection timeout and a relatively short response timeout for customer key servers, extremely short connection timeouts and longer request timeouts are appropriate for a localhost key server.
Similarly, it’s reasonable to retry (with backoff) if we timeout waiting on a customer key server response, since we can’t trust the network. But over localhost, a timeout would only occur if gokeyless-internal were overloaded and the request were still queued for processing. In this case a retry would only lead to more total work being requested of gokeyless-internal, making the situation worse.
Most significantly, nginx must not throw away all requests multiplexed on a connection if any single one of them encounters an error, since a single connection no longer represents a single customer.
CPU at the edge is one of our most precious assets, and it’s closely guarded by our performance team (aka CPU police). Soon after turning on Keyless Everywhere in one of our canary datacenters, they noticed gokeyless using ~50% of a core per instance. We were shifting the sign operations from nginx to gokeyless, so of course it would be using more CPU now. But nginx should have seen a commensurate reduction in CPU usage, right?
Although Go 1.11 includes optimizations for RSA math operations, we needed more speed. Well-tuned assembly code is required to match the performance of BoringSSL, so Armando Faz from our Crypto team helped claw back some of the lost CPU by reimplementing parts of the math/big package with platform-dependent assembly in an internal fork of Go. The recent assembly policy of Go prefers the use of Go portable code instead of assembly, so these optimizations were not upstreamed. There is still room for more optimizations, and for that reason we’re still evaluating moving to cgo + BoringSSL for sign operations, despite cgo’s many downsides.
Changing our tooling
Process isolation is a powerful tool for protecting secrets in memory. Our move to Keyless Everywhere demonstrates that this is not a simple tool to leverage. Re-architecting an existing system such as nginx to use process isolation to protect secrets was time-consuming and difficult. Another approach to memory safety is to use a memory-safe language such as Rust.
Rust was originally developed by Mozilla but is starting to be used much more widely. The main advantage that Rust has over C/C++ is that it has memory safety features without a garbage collector.
Re-writing an existing application in a new language such as Rust is a daunting task. That said, many new Cloudflare features, from the powerful Firewall Rules feature to our 22.214.171.124 with WARP app, have been written in Rust to take advantage of its powerful memory-safety properties. We’re really happy with Rust so far and plan on using it even more in the future.
The harrowing aftermath of Heartbleed taught the industry a lesson that should have been obvious in retrospect: keeping important secrets in applications that can be accessed remotely via the Internet is a risky security practice. In the following years, with a lot of work, we leveraged process separation and Keyless SSL to ensure that the next Heartbleed wouldn’t put customer keys at risk.
However, this is not the end of the road. Recently memory disclosure vulnerabilities such as NetSpectre have been discovered which are able to bypass application process boundaries, so we continue to actively explore new ways to keep keys secure.
Today we’re happy to announce support for a new cryptographic protocol that helps make it possible to deploy encrypted services in a global network while still maintaining fast performance and tight control of private keys: Delegated Credentials for TLS. We have been working with partners from Facebook, Mozilla, and the broader IETF community to define this emerging standard. We’re excited to share the gory details today in this blog post.
Deploying TLS globally
Many of the technical problems we face at Cloudflare are widely shared problems across the Internet industry. As gratifying as it can be to solve a problem for ourselves and our customers, it can be even more gratifying to solve a problem for the entire Internet. For the past three years, we have been working with peers in the industry to solve a specific shared problem in the TLS infrastructure space: How do you terminate TLS connections while storing keys remotely and maintaining performance and availability? Today we’re announcing that Cloudflare now supports Delegated Credentials, the result of this work.
Cloudflare’s TLS/SSL features are among the top reasons customers use our service. Configuring TLS is hard to do without internal expertise. By automating TLS, web site and web service operators gain the latest TLS features and the most secure configurations by default. It also reduces the risk of outages or bad press due to misconfigured or insecure encryption settings. Customers also gain early access to unique features like TLS 1.3, post-quantum cryptography, and OCSP stapling as they become available.
Unfortunately, for web services to authorize a service to terminate TLS for them, they have to trust the service with their private keys, which demands a high level of trust. For services with a global footprint, there is an additional level of nuance. They may operate multiple data centers located in places with varying levels of physical security, and each of these needs to be trusted to terminate TLS.
To tackle these problems of trust, Cloudflare has invested in two technologies: Keyless SSL, which allows customers to use Cloudflare without sharing their private key with Cloudflare; and Geo Key Manager, which allows customers to choose the datacenters in which Cloudflare should keep their keys. Both of these technologies are able to be deployed without any changes to browsers or other clients. They also come with some downsides in the form of availability and performance degradation.
Keyless SSL introduces extra latency at the start of a connection. In order for a server without access to a private key to establish a connection with a client, that servers needs to reach out to a key server, or a remote point of presence, and ask them to do a private key operation. This not only adds additional latency to the connection, causing the content to load slower, but it also introduces some troublesome operational constraints on the customer. Specifically, the server with access to the key needs to be highly available or the connection can fail. Sites often use Cloudflare to improve their site’s availability, so having to run a high-availability key server is an unwelcome requirement.
Turning a pull into a push
The reason services like Keyless SSL that rely on remote keys are so brittle is their architecture: they are pull-based rather than push-based. Every time a client attempts a handshake with a server that doesn’t have the key, it needs to pull the authorization from the key server. An alternative way to build this sort of system is to periodically push a short-lived authorization key to the server and use that for handshakes. Switching from a pull-based model to a push-based model eliminates the additional latency, but it comes with additional requirements, including the need to change the client.
Enter the new TLS feature of Delegated Credentials (DCs). A delegated credential is a short-lasting key that the certificate’s owner has delegated for use in TLS. They work like a power of attorney: your server authorizes our server to terminate TLS for a limited time. When a browser that supports this protocol connects to our edge servers we can show it this “power of attorney”, instead of needing to reach back to a customer’s server to get it to authorize the TLS connection. This reduces latency and improves performance and reliability.
A fresh delegated credential can be created and pushed out to TLS servers long before the previous credential expires. Momentary blips in availability will not lead to broken handshakes for clients that support delegated credentials. Furthermore, a Delegated Credentials-enabled TLS connection is just as fast as a standard TLS connection: there’s no need to connect to the key server for every handshake. This removes the main drawback of Keyless SSL for DC-enabled clients.
Delegated credentials are intended to be an Internet Standard RFC that anyone can implement and use, not a replacement for Keyless SSL. Since browsers will need to be updated to support the standard, proprietary mechanisms like Keyless SSL and Geo Key Manager will continue to be useful. Delegated credentials aren’t just useful in our context, which is why we’ve developed it openly and with contributions from across industry and academia. Facebook has integrated them into their own TLS implementation, and you can read more about how they view the security benefits here. When it comes to improving the security of the Internet, we’re all on the same team.
"We believe delegated credentials provide an effective way to boost security by reducing certificate lifetimes without sacrificing reliability. This will soon become an Internet standard and we hope others in the industry adopt delegated credentials to help make the Internet ecosystem more secure."
— Subodh Iyengar, software engineer at Facebook
Extensibility beyond the PKI
At Cloudflare, we’re interested in pushing the state of the art forward by experimenting with new algorithms. In TLS, there are three main areas of experimentation: ciphers, key exchange algorithms, and authentication algorithms. Ciphers and key exchange algorithms are only dependent on two parties: the client and the server. This freedom allows us to deploy exciting new choices like ChaCha20-Poly1305 or post-quantum key agreement in lockstep with browsers. On the other hand, the authentication algorithms used in TLS are dependent on certificates, which introduces certificate authorities and the entire public key infrastructure into the mix.
Unfortunately, the public key infrastructure is very conservative in its choice of algorithms, making it harder to adopt newer cryptography for authentication algorithms in TLS. For instance, EdDSA, a highly-regarded signature scheme, is not supported by certificate authorities, and root programs limit the certificates that will be signed. With the emergence of quantum computing, experimenting with new algorithms is essential to determine which solutions are deployable and functional on the Internet.
Since delegated credentials introduce the ability to use new authentication key types without requiring changes to certificates themselves, this opens up a new area of experimentation. Delegated credentials can be used to provide a level of flexibility in the transition to post-quantum cryptography, by enabling new algorithms and modes of operation to coexist with the existing PKI infrastructure. It also enables tiny victories, like the ability to use smaller, faster Ed25519 signatures in TLS.
A delegated credential contains a public key and an expiry time. This bundle is then signed by a certificate along with the certificate itself, binding the delegated credential to the certificate for which it is acting as “power of attorney”. A supporting client indicates its support for delegated credentials by including an extension in its Client Hello.
A server that supports delegated credentials composes the TLS Certificate Verify and Certificate messages as usual, but instead of signing with the certificate’s private key, it includes the certificate along with the DC, and signs with the DC’s private key. Therefore, the private key of the certificate only needs to be used for the signing of the DC.
Certificates used for signing delegated credentials require a special X.509 certificate extension. This requirement exists to avoid breaking assumptions people may have about the impact of temporary access to their keys on security, particularly in cases involving HSMs and the still unfixed Bleichbacher oracles in older TLS versions. Temporary access to a key can enable signing lots of delegated credentials which start far in the future, and as a result support was made opt-in. Early versions of QUIC had similar issues, and ended up adopting TLS to fix them. Protocol evolution on the Internet requires working well with already existing protocols and their flaws.
Delegated Credentials at Cloudflare and Beyond
Currently we use delegated credentials as a performance optimization for Geo Key Manager and Keyless SSL. Customers can update their certificates to include the special extension for delegated credentials, and we will automatically create delegated credentials and distribute them to the edge through the Keyless SSL or Geo Key Manager. For more information, see the documentation. It also enables us to be more conservative about where we keep keys for customers, improving our security posture.
Delegated Credentials would be useless if it wasn’t also supported by browsers and other HTTP clients. Christopher Patton, a former intern at Cloudflare, implemented support in Firefox and its underlying NSS security library. This feature is now in the Nightly versions of Firefox. You can turn it on by activating the configuration option security.tls.enable_delegated_credentials at about:config. Studies are ongoing on how effective this will be in a wider deployment. There also is support for Delegated Credentials in BoringSSL.
"At Mozilla we welcome ideas that help to make the Web PKI more robust. The Delegated Credentials feature can help to provide secure and performant TLS connections for our users, and we’re happy to work with Cloudflare to help validate this feature."
— Thyla van der Merwe, Cryptography Engineering Manager at Mozilla
One open issue is the question of client clock accuracy. Until we have a wide-scale study we won’t know how many connections using delegated credentials will break because of the 24 hour time limit that is imposed. Some clients, in particular mobile clients, may have inaccurately set clocks, the root cause of one third of all certificate errors in Chrome. Part of the way that we’re aiming to solve this problem is through standardizing and improving Roughtime, so web browsers and other services that need to validate certificates can do so independent of the client clock.
Cloudflare’s global scale means that we see connections from every corner of the world, and from many different kinds of connection and device. That reach enables us to find rare problems with the deployability of protocols. For example, our early deployment helped inform the development of the TLS 1.3 standard. As we enable developing protocols like delegated credentials, we learn about obstacles that inform and affect their future development.
As new protocols emerge, we’ll continue to play a role in their development and bring their benefits to our customers. Today’s announcement of a technology that overcomes some limitations of Keyless SSL is just one example of how Cloudflare takes part in improving the Internet not just for our customers, but for everyone. During the standardization process of turning the draft into an RFC, we’ll continue to maintain our implementation and come up with new ways to apply delegated credentials.
Halloween season is upon us. This week we’re sharing a series of blog posts about work being done at Cloudflare involving cryptography, one of the spookiest technologies around. So bookmark this page and come back every day for tricks, treats, and deep technical content.
A long-term mission
Cryptography is one of the most powerful technological tools we have, and Cloudflare has been at the forefront of using cryptography to help build a better Internet. Of course, we haven’t been alone on this journey. Making meaningful changes to the way the Internet works requires time, effort, experimentation, momentum, and willing partners. Cloudflare has been involved with several multi-year efforts to leverage cryptography to help make the Internet better.
Here are some highlights to expect this week:
We’re renewing Cloudflare’s commitment to privacy-enhancing technologies by sharing some of the recent work being done on Privacy Pass
We’re helping forge a path to a quantum-safe Internet by sharing some of the results of the Post-quantum Cryptography experiment
We’re doing a deep dive into the technical details of Encrypted DNS
We’re announcing support for a new technique we developed with industry partners to help keep TLS private keys more secure
The milestones we’re sharing this week would not be possible without partnerships with companies, universities, and individuals working in good faith to help build a better Internet together. Hopefully, this week provides a fun peek into the future of the Internet.
Cloudflare’s mission is to help build a better Internet. One of the tools used in pursuit of this goal is computer science research. We’ve learned that some of the difficult problems to solve are best approached through research and experimentation to understand the solution before engineering it at scale. This research-focused approach to solving the big problems of the Internet is exemplified by the work of the Cryptography Research team, which leverages research to help build a safer, more secure and more performant Internet. Over the years, the team has worked on more than just cryptography, so we’re taking the model we’ve developed and expanding the scope of the team to include more areas of computer science research. Cryptography Research at Cloudflare is now Cloudflare Research. I am excited to share some of the insights we’ve learned over the years in this blog post.
Cloudflare’s research model
Hybrid approach. We have a program that allows research engineers to be embedded into product and operations teams for temporary assignments. This gives people direct exposure to practical problems.
Impact-focused. We use our expertise and the expertise of partners in industry and academia to select projects that have the potential to make a big impact, and for which existing solutions are insufficient or not yet popularized.
Open collaboration. Popularizing winning ideas through public outreach, working with industry partners to promote standardization, and implementing ideas at scale to show they’re effective.
The hybrid approach to research
“Super-ambitious goals tend to be unifying and energizing to people; but only if they believe there’s a chance of success.” – Peter Diamandis
Given the scale and reach of Cloudflare, research problems (and opportunities) present themselves all the time. Our approach to research is a practical one. We choose to tackle projects that have the potential to make a big impact, and for which existing solutions are insufficient. This stems from a belief that the interconnected systems that make up the Internet can be changed and improved in a fundamental way. While some research problems are solvable in a few months, some may take years. We don’t shy away from long-term projects, but the Internet moves fast, so it’s important to break down long-term projects into smaller, independently-valuable pieces in order to continually provide value while pursuing a bigger vision.
Successful technological innovation is not purely about technical accomplishments. New creations need the social and political scaffolding to support it while being built, and the momentum and support to gain popularity. We are better able to innovate if grounded in a deep understanding of the current day-to-day. To stay grounded, our research team members spend part of their time solving practical problems that affect Cloudflare and our customers right now.
Cloudflare employs a hybrid research model similar to the model pioneered by Google. Innovation can come from everywhere in a company, so teams are encouraged to find the right balance between research and engineering activities. The research team works with the same tools, systems, and constraints as the rest of the engineering organization.
Research engineers are expected to write production-quality code and contribute to engineering activities. This enables researchers to leverage the rich data provided by Cloudflare’s production environment for experiments. To further break down silos, we have a program that allows research engineers to be embedded into product and operations teams for temporary assignments. This gives people direct exposure to practical problems.
Continuing a successful tradition (our tradition)
“Skate to where the puck is going, not where it has been.” – Wayne Gretzky
The output of the research team is both new knowledge and technology that can lead to innovative products. Research works hand-in-hand with both product and engineering to help drive long-term positive outcomes for both Cloudflare and the Internet at large.
An example of a long-term project that requires both research and engineering is helping the Internet migrate from insecure to secure network protocols. To tackle the problem, we pursued several smaller projects with discrete and measurable outcomes. This included:
and many other smaller projects. Each step along the way contributed something concrete to help make the Internet more secure.
This year’s Crypto Week is a great example of the type of impact an effective hybrid research organization can make. Every day that week, a new announcement was made that helped take research results and realize their practical impact. From the League of Entropy, which is based on fundamental work by researchers at EPFL, to Cloudflare Time Services, which helps address time security issues raised in papers by former Cloudflare intern Aanchal Malhotra, to our own (currently running) post-quantum experiment with Google Chrome, engineers at Cloudflare combined research with building large-scale production systems to help solve some unsolved problems on the Internet.
Open collaboration, open standards, and open source
“We reject kings, presidents and voting. We believe in rough consensus and running code.” – Dave Clark
Effective research requires:
Choosing interesting problems to solve
Popularizing the ideas discovered while studying the solution space
Implementing the ideas at scale to show they’re effective
Cloudflare’s massive popularity puts us in a very privileged position. We can research, implement and deploy experiments at a scale that simply can’t be done by most organizations. This makes Cloudflare an attractive research partner for universities and other research institutions who have domain knowledge but not data. We rely on our own expertise along with that of peers in both academia and industry to decide which problems to tackle in order to achieve common goals and make new scientific progress. Our middlebox detection project, proposed by researchers at the University of Michigan, is an example of such a problem.
We’re not purists who are only interested in pursuing our own ideas. Some interesting problems have already been solved, but the solution isn’t widely known or implemented. In this situation, we contribute our efforts to help elevate the best ideas and make them available to the public in an accessible way. Our early work popularizing elliptic curves on the Internet is such an example.
Popularizing an idea and implementing the idea at scale are two different things. Along with popularizing winning ideas, we want to ensure these ideas stick and provide benefits to Internet users. To promote the widespread deployment of useful ideas, we work on standards and deploy newly emerging standards early on. Doing so helps the industry easily adopt innovations and supports interoperability. For example, the work done for Crypto Week 2019 has helped the development of international technical standards. Aspects of the League of Entropy are now being standardized at the CFRG, Roughtime is now being considered for adoption as an IETF standard, and we are presenting our post-quantum results as part of NIST’s post-quantum cryptography standardization effort.
Open source software is another key aspect of scaling the implementation of an idea. We open source associated code whenever possible. The research team collaborates with the wider research world as well as internally with other teams at Cloudflare.
Focus areas going forward
Doing research, sharing it in an accessible way, working with top experts to validate it, and working on standardization has several benefits. It provides an opportunity to educate the public, further scientific understanding, and improve the state of the art; but it’s also a great way to attract candidates. Great engineers want to work on interesting projects and great researchers want to see their work have an impact. This hybrid research approach is attractive to both types of candidates.
Computer science is a vast arena, so the areas we’re currently focusing on are:
Security and privacy
Low-level networking and operating systems
Emerging networking paradigms
Here are some highlights of publications we’ve co-authored over the last few years in these areas. We’ll be building on this tradition going forward.
Today has been a big day for Cloudflare, as we became a public company on the New York Stock Exchange (NYSE: NET). To mark the occasion, we decided to bring our favorite entropy machines to the floor of the NYSE. Footage of these lava lamps is being used as an additional seed to our entropy-generation system LavaRand — bolstering Internet encryption for over 20 million Internet properties worldwide.
(This is mostly for fun. But when’s the last time you saw a lava lamp on the trading floor of the New York Stock Exchange?)
A little context: generating truly random numbers using computers is impossible, because code is inherently deterministic (i.e. predictable). To compensate for this, engineers draw from pools of randomness created by entropy generators, which is a fancy term for “things that are truly unpredictable”.
It turns out that lava lamps are fantastic sources of entropy, as was first shown by Silicon Graphics in the 1990s. It’s a torch we’ve been proud to carry forward: today, Cloudflare uses lava lamps to generate entropy that helps make millions of Internet properties more secure.
Housed in our San Francisco headquarters is a wall filled with dozens of lava lamps, undulating with mesmerizing randomness. We capture these lava lamps on video via a camera mounted across the room, and feed the resulting footage into an algorithm — called LavaRand — that amplifies the pure randomness of these lava lamps to dizzying extremes (computers can’t create seeds of pure randomness, but they can massively amplify them).
Shortly before we rang the opening bell this morning, we recorded footage of our lava lamps in operation on the trading room floor of the New York Stock Exchange, and we’re ingesting the footage into our LavaRand system. The resulting entropy is mixed with the myriad additional sources of entropy that we leverage every day, creating a cryptographically-secure source of randomness — fortified by Wall Street.
We recently took our enthusiasm for randomness a step further by facilitating the League of Entropy, a consortium of global organizations and individual contributors, generating verifiable randomness via a globally distributed network. As one of the founding members of the League, LavaRand (pictured above) plays a key role in empowering developers worldwide with a pool of randomness with extreme entropy and high reliability.
And today, she’s enjoying the view from the podium!
One caveat: the lava lamps we run in our San Francisco headquarters are recorded in real-time, 24/7, giving us an ongoing stream of entropy. For reasons that are understandable, the NYSE doesn’t allow for live video feeds from the exchange floor while it is in operation. But this morning they did let us record footage of the lava lamps operating shortly before the opening bell. The video was recorded and we’re ingesting it into our LavaRand system (alongside many other entropy generators, including the lava lamps back in San Francisco).
The Internet is an extraordinarily complex and evolving ecosystem. Its constituent protocols range from the ancient and archaic (hello FTP) to the modern and sleek (meet WireGuard), with a fair bit of everything in between. This evolution is ongoing, and as one of the most connected networks on the Internet, Cloudflare has a duty to be a good steward of this ecosystem. We take this responsibility to heart: Cloudflare’s mission is to help build a better Internet. In this spirit, we are very proud to announce Crypto Week 2019.
Every day this week we’ll announce a new project or service that uses modern cryptography to build a more secure, trustworthy Internet. Everything we release this week will be free and immediately useful. This blog is a fun exploration of the themes of the week.
Monday: Coming Soon
Tuesday: Coming Soon
Wednesday: Coming Soon
Thursday: Coming Soon
Friday: Coming Soon
The Internet of the Future
Many pieces of the Internet in use today were designed in a different era with different assumptions. The Internet’s success is based on strong foundations that support constant reassessment and improvement. Sometimes these improvements require deploying new protocols.
Performing an upgrade on a system as large and decentralized as the Internet can’t be done by decree;
There are too many economic, cultural, political, and technological factors at play.
Changes must be compatible with existing systems and protocols to even be considered for adoption.
To gain traction, new protocols must provide tangible improvements for users. Nobody wants to install an update that doesn’t improve their experience!
The last time the Internet had a complete reboot and upgrade was during TCP/IP flag dayin 1983. Back then, the Internet (called ARPANET) had fewer than ten thousand hosts! To have an Internet-wide flag day today to switch over to a core new protocol is inconceivable; the scale and diversity of the components involved is way too massive. Too much would break. It’s challenging enough to deprecate outmoded functionality. In some ways, the open Internet is a victim of its own success. The bigger a system grows and the longer it stays the same, the harder it is to change. The Internet is like a massive barge: it takes forever to steer in a different direction and it’s carrying a lot of garbage.
As you would expect, many of the warts of the early Internet still remain. Both academic security researchers and real-life adversaries are still finding and exploiting vulnerabilities in the system. Many vulnerabilities are due to the fact that most of the protocols in use on the Internet have a weak notion of trust inherited from the early days. With 50 hosts online, it’s relatively easy to trust everyone, but in a world-scale system, that trust breaks down in fascinating ways. The primary tool to scale trust is cryptography, which helps provide some measure of accountability, though it has its own complexities.
In an ideal world, the Internet would provide a trustworthy substrate for human communication and commerce. Some people naïvely assume that this is the natural direction the evolution of the Internet will follow. However, constant improvement is not a given. It’s possible that the Internet of the future will actually be worse than the Internet today: less open, less secure, less private, less trustworthy. There are strong incentives to weaken the Internet on a fundamental level by Governments, by businesses such as ISPs, and even by the financial institutions entrusted with our personal data.
In a system with as many stakeholders as the Internet, real change requires principled commitment from all invested parties. At Cloudflare, we believe everyone is entitled to an Internet built on a solid foundation of trust. Crypto Week is our way of helping nudge the Internet’s evolution in a more trust-oriented direction. Each announcement this week helps bring the Internet of the future to the present in a tangible way.
Ongoing Internet Upgrades
Before we explore the Internet of the future, let’s explore some of the previous and ongoing attempts to upgrade the Internet’s fundamental protocols.
As we highlighted in last year’s Crypto Weekone of the weak links on the Internet is routing. Not all networks are directly connected.
To send data from one place to another, you might have to rely on intermediary networks to pass your data along. A packet sent from one host to another may have to be passed through up to a dozen of these intermediary networks.No single network knows the full path the data will have to take to get to its destination, it only knows which network to pass it to next.The protocol that determines how packets are routed is called the Border Gateway Protocol (BGP.) Generally speaking, networks use BGP to announce to each other which addresses they know how to route packets for and (dependent on a set of complex rules) these networks share what they learn with their neighbors.
Unfortunately, BGP is completely insecure:
Any network can announce any set of addresses to any other network, even addresses they don’t control. This leads to a phenomenon called BGP hijacking, where networks are tricked into sending data to the wrong network.
A BGP hijack ismost often caused by accidental misconfiguration, but can also be the result of malice on the network operator’s part.
During a BGP hijack, a network inappropriately announces a set of addresses to other networks, which results in packets destined for the announced addresses to be routed through the illegitimate network.
Understanding the risk
If the packets represent unencrypted data, this can be a big problemas it allows the hijacker to read or even change the data:
The Resource Public Key Infrastructure (RPKI) system helps bring some trust to BGP by enabling networks to utilize cryptography to digitally sign network routes with certificates, making BGP hijacking much more difficult.
This enables participants of the network to gain assurances about the authenticity of route advertisements. Certificate Transparency (CT) is a tool that enables additional trust for certificate-based systems. Cloudflare operates the Cirrus CT log to support RPKI.
Since we announced our support of RPKI last year, routing security has made big strides. More routes are signed, more networks validate RPKI, and the software ecosystem has matured, but this work is not complete. Most networks are still vulnerable to BGP hijacking. For example, Pakistan knocked YouTube offline with a BGP hijack back in 2008, and could likely do the same today. Adoption here is driven less by providing a benefit to users, but rather by reducing systemic risk, which is not the strongest motivating factor for adopting a complex new technology. Full routing security on the Internet could take decades.
The Domain Name System (DNS) is the phone book of the Internet. Or, for anyone under 25 who doesn’t remember phone books, it’s the system that takes hostnames (like cloudflare.com or facebook.com) and returns the Internet address where that host can be found. For example, as of this publication, www.cloudflare.com is 126.96.36.199 and 188.8.131.52 (IPv4) and 2606:4700::c629:d7a2, 2606:4700::c629:d6a2 (IPv6). Like BGP, DNS is completely insecure. Queries and responses sent unencrypted over the Internet are modifiable by anyone on the path.
There are many ongoing attempts to add security to DNS, such as:
DNSSEC that adds a chain of digital signatures to DNS responses
DoT/DoH that wraps DNS queries in the TLS encryption protocol (more on that later)
Both technologies are slowly gaining adoption, but have a long way to go.
Just like RPKI, securing DNS comes with a performance cost, making it less attractive to users. However,
This performance improvement makes it appealing for customers of privacy-conscious applications, like Firefox and Cloudflare’s 184.108.40.206 app, to adopt secure DNS.
Transport Layer Security (TLS) is a cryptographic protocol that gives two parties the ability to communicate over an encrypted and authenticated channel.TLS protects communications from eavesdroppers even in the event of a BGP hijack. TLS is what puts the “S” in HTTPS. TLS protects web browsing against multiple types of network adversaries.
The adoption of TLS on the web is partially driven by the fact that:
Browsers make TLS adoption appealing to website operators by only supporting new web features such as HTTP/2 over HTTPS.
This has led to the rapid adoption of HTTPS over the last five years.
To further that adoption, TLS recently got an upgrade in TLS 1.3, making it faster and more secure (a combination we love). It’s taking over the Internet!
Despite this fantastic progress in the adoption of security for routing, DNS, and the web, there are still gaps in the trust model of the Internet. There are other things needed to help build the Internet of the future. To find and identify these gaps, we lean on research experts.
Research Farm to Table
Cryptographic security on the Internet is a hot topic and there have been many flaws and issues recently pointed out in academic journals. Researchers often study the vulnerabilities of the past and ask:
What other critical components of the Internet have the same flaws?
What underlying assumptions can subvert trust in these existing systems?
The answers to these questions help us decide what to tackle next. Some recent research topics we’ve learned about include:
Attacks on Time Synchronization
DNS attacks affecting Certificate issuance
Scaling distributed trust
Cloudflare keeps abreast of these developments and we do what we can to bring these new ideas to the Internet at large. In this respect, we’re truly standing on the shoulders of giants.
Future-proofing Internet Cryptography
The new protocols we are currently deploying (RPKI, DNSSEC, DoT/DoH, TLS 1.3) use relatively modern cryptographic algorithms published in the 1970s and 1980s.
If you can solve the hard problem, you can crack the code. Using a bigger key makes the problem harder, making it more difficult to break, but also slows performance.
Modern Internet protocols typically pick keys large enough to make it infeasible to break with classical computers, but no larger. The sweet spot is around 128-bits of security;meaning a computer has to do approximately 2¹²⁸ operations to break it.
Arjen Lenstra and others created a useful measure of security levels by comparing the amount of energy it takes to break a key to the amount of water you can boil using that much energy. You can think of this as the electric bill you’d get if you run a computer long enough to crack the key.
35-bit securityis “Teaspoon security” — It takes about the same amount of energy to break a 35-bit key as it does to boil a teaspoon of water (pretty easy).
65 bits gets you up to “Pool security” – The energy needed to boil the average amount of water in a swimming pool.
105 bits is “Sea Security” – The energy needed to boil the Mediterranean Sea.
114-bits is “Global Security” – The energy needed to boil all water on Earth.
128-bit security is safely beyond thatof Global Security – Anything larger is overkill.
256-bit security corresponds to “Universal Security” – The estimated mass-energy of the observable universe. So, if you ever hear someone suggest 256-bit AES, you know they mean business.
Post-Quantum of Solace
As far as we know, the algorithms we use for cryptography are functionally uncrackable with all known algorithms that classical computers can run. Quantum computers change this calculus. Instead of transistors and bits, a quantum computer uses the effects of quantum mechanics to perform calculations that just aren’t possible with classical computers. As you can imagine, quantum computers are very difficult to build. However, despite large-scale quantum computers not existing quite yet, computer scientists have already developed algorithms that can only run efficiently on quantum computers. Surprisingly, it turns out that with a sufficiently powerful quantum computer, most of the hard mathematical problems we rely on for Internet security become easy!
Although there are still quantum-skeptics out there, some expertsestimate that within 15-30 years these large quantum computers will exist, which poses a risk to every security protocol online. Progress is moving quickly; every few months a more powerful quantum computer is announced.
Luckily, there are cryptography algorithms that rely on different hard math problems that seem to be resistant to attack from quantum computers. These math problems form the basis of so-called quantum-resistant (or post-quantum) cryptography algorithms that can run on classical computers. These algorithms can be used as substitutes for most of our current quantum-vulnerable algorithms.
Some quantum-resistant algorithms (such as McEliece and Lamport Signatures) were invented decades ago, but there’s a reason they aren’t in common use: they lack some of the nice properties of the algorithms we’re currently using, such as key size and efficiency.
Some quantum-resistant algorithms require much larger keys to provide 128-bit security
Some are very CPU intensive,
And some just haven’t been studied enough to know if they’re secure.
It is possible to swap our current set of quantum-vulnerable algorithms with new quantum-resistant algorithms, but it’s a daunting engineering task. With widely deployed protocols, it is hard to make the transition from something fast and small to something slower, bigger or more complicated without providing concrete user benefits. When exploring new quantum-resistant algorithms, minimizing user impact is of utmost importance to encourage adoption. This is a big deal, because almost all the protocols we use to protect the Internet are vulnerable to quantum computers.
Cryptography-breaking quantum computing is still in the distant future, but we must start the transition to ensure that today’s secure communications are safe from tomorrow’s quantum-powered onlookers; however, that’s not the most timely problem with the Internet. We haven’t addressed that…yet.
Just like DNS, BGP, and HTTP, the Network Time Protocol (NTP) is fundamental to how the Internet works. And like these other protocols, it is completely insecure.
Last year, Cloudflare introduced Roughtime as a mechanism for computers to access the current time from a trusted server in an authenticated way.
Roughtime is powerful because it provides a way to distribute trust among multiple time servers so that if one server attempts to lie about the time, it will be caught.
However, Roughtime is not exactly a secure drop-in replacement for NTP.
Roughtime lacks the complex mechanisms of NTP that allow it to compensate for network latency and yet maintain precise time, especially if the time servers are remote. This leads to imprecise time.
Roughtime also involves expensive cryptography that can further reduce precision. This lack of precision makes Roughtime useful for browsers and other systems that need coarse time to validate certificates (most certificates are valid for 3 months or more), but some systems (such as those used for financial trading) require precision to the millisecond or below.
With Roughtime we supported the time protocol of the future, but there are things we can do to help improve the health of security online today.
Some academic researchers, including Aanchal Malhotra of Boston University, have demonstrated a variety of attacks against NTP, including BGP hijacking and off-path User Datagram Protocol (UDP) attacks.
Some of these attacks can be avoided by connecting to an NTP server that is close to you on the Internet.
However, to bring cryptographic trust to time while maintaining precision, we need something in between NTP and Roughtime.
To solve this, it’s natural to turn to the same system of trust that enabled us to patch HTTP and DNS: Web PKI.
Attacking the Web PKI
The Web PKI is similar to the RPKI, but is more widely visible since it relates to websites rather than routing tables.
If you’ve ever clicked the lock icon on your browser’s address bar, you’ve interacted with it.
The PKI relies on a set of trusted organizations called Certificate Authorities (CAs) to issue certificates to websites and web services.
While we were all patting ourselves on the back for moving the web to HTTPS, someresearchers managed to find and exploit a weakness in the system: the process for getting HTTPS certificates.
Certificate Authorities (CAs) use a process called domain control validation (DCV) to ensure that they only issue certificates to websites owners who legitimately request them.
Some CAs do this validation manually, which is secure, but can’t scale to the total number of websites deployed today.
More progressive CAs have automated this validation process, but rely on insecure methods (HTTP and DNS) to validate domain ownership.
Without ubiquitous cryptography in place (DNSSEC may never reach 100% deployment), there is no completely secure way to bootstrap this system. So, let’s look at how to distribute trust using other methods.
One tool at our disposal is the distributed nature of the Cloudflare network.
Cloudflare is global. We have locations all over the world connected to dozens of networks. That means we have different vantage points, resulting in different ways to traverse networks. This diversity can prove an advantage when dealing with BGP hijacking, since an attacker would have to hijack multiple routes from multiple locations to affect all the traffic between Cloudflare and other distributed parts of the Internet. The natural diversity of the network raises the cost of the attacks.
A distributed set of connections to the Internet and using them as a quorum is a mighty paradigm to distribute trust, with or without cryptography.
This idea of distributing the source of trust is powerful. Last year we announced the Distributed Web Gateway that
Enables users to access content on the InterPlanetary File System (IPFS), a network structured to reduce the trust placed in any single party.
Even if a participant of the network is compromised, it can’t be used to distribute compromised content because the network is content-addressed.
However, using content-based addressing is not the only way to distribute trust between multiple independent parties.
Another way to distribute trust is to literally split authority between multiple independent parties. We’ve explored this topic before. In the context of Internet services, this means ensuring that no single server can authenticate itself to a client on its own. For example,
In HTTPS the server’s private key is the lynchpin of its security. Compromising the owner of the private key (by hook or by crook) gives an attacker the ability to impersonate (spoof) that service. This single point of failure puts services at risk. You can mitigate this risk by distributing the authority to authenticate the service between multiple independently-operated services.
The Internet barge is old and slow, and we’ve only been able to improve it through the meticulous process of patching it piece by piece. Another option is to build new secure systems on top of this insecure foundation. IPFS is doing this, and IPFS is not alone in its design. There has been more research into secure systems with decentralized trust in the last ten years than ever before.
The result is radical new protocols and designs that use exotic new algorithms. These protocols do not supplant those at the core of the Internet (like TCP/IP), but instead, they sit on top of the existing Internet infrastructure, enabling new applications, much like HTTP did for the web.
Some of the most innovative technical projects were considered failures because they couldn’t attract users. New technology has to bring tangible benefits to users to sustain it: useful functionality, content, and a decent user experience. Distributed projects, such as IPFS and others, are gaining popularity, but have not found mass adoption. This is a chicken-and-egg problem. New protocols have a high barrier to entry—users have to install new software—and because of the small audience, there is less incentive to create compelling content. Decentralization and distributed trust are nice security features to have, but they are not products. Users still need to get some benefit out of using the platform.
An example of a system to break this cycle is the web. In 1992 the web was hardly a cornucopia of awesomeness. What helped drive the dominance of the web was its users.
The growth of the user base meant more incentive for people to build services, and the availability of more services attracted more users. It was a virtuous cycle.
It’s hard for a platform to gain momentum, but once the cycle starts, a flywheel effect kicks in to help the platform grow.
The Distributed Web Gateway project Cloudflare launched last year in Crypto Week is our way of exploring what happens if we try to kickstart that flywheel. By providing a secure, reliable, and fast interface from the classic web with its two billion users to the content on the distributed web, we give the fledgling ecosystem an audience.
If the advantages provided by building on the distributed web are appealing to users, then the larger audience will help these services grow in popularity.
This is somewhat reminiscent of how IPv6 gained adoption. It started as a niche technology only accessible using IPv4-to-IPv6 translation services.
IPv6 adoption has now grown so much that it is becoming a requirement for new services. For example,Apple is requiring that all apps work in IPv6-only contexts.
Eventually, as user-side implementations of distributed web technologies improve, people may move to using the distributed web natively rather than through an HTTP gateway. Or they may not! By leveraging Cloudflare’s global network to give users access to new technologies based on distributed trust, we give these technologies a better chance at gaining adoption.
Happy Crypto Week
At Cloudflare, we always support new technologies that help make the Internet better. Part of helping make a better Internet is scaling the systems of trust that underpin web browsing and protect them from attack. We provide the tools to create better systems of assurance with fewer points of vulnerability. We work with academic researchers of security to get a vision of the future and engineer away vulnerabilities before they can become widespread. It’s a constant journey.
Cloudflare knows that none of this is possible without the work of researchers. From award-winning researcher publishing papers in top journals to the blog posts of clever hobbyists, dedicated and curious people are moving the state of knowledge of the world forward. However, the push to publish new and novel research sometimes holds researchers back from committing enough time and resources to fully realize their ideas. Great research can be powerful on its own, but it can have an even broader impact when combined with practical applications. We relish the opportunity to stand on the shoulders of these giants and use our engineering know-how and global reach to expand on their work to help build a better Internet.
So, to all of you dedicated researchers, thank you for your work! Crypto Week is yours as much as ours. If you’re working on something interesting and you want help to bring the results of your research to the broader Internet, please contact us at [email protected]. We want to help you realize your dream of making the Internet safe and trustworthy.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.