At least one presidential candidate has a policy about quantum computing and encryption.

It has two basic planks. One: fund quantum-resistant encryption standards. (Note: NIST is alreadydoingthis.) Two, fund quantum computing. (Unlike many far more pressing computer security problems, the market seems to be doing this on its own quite nicely.)

Okay, so not the greatest policy — but at least one candidate has a policy. Do any of the other candidates have anything else in this area?

“I believe that blockchain needs to be a big part of our future,” Yang told a crowded room at the Consensus conference in New York, where he gave a keynote address Wednesday. “If I’m in the White House, oh boy are we going to have some fun in terms of the crypto currency community.”

Okay, so that’s not so great, either. But again, I don’t think anyone else talks about this.

Note: this is not an invitation to talk more general politics. Not even an invitation to explain how good or bad Andrew Yang’s chances are. Or anyone else’s. Please.

Private Join and Compute uses a 1970s methodology known as “commutative encryption” to allow data in the data sets to be encrypted with multiple keys, without it mattering which order the keys are used in. This is helpful for multiparty computation, where you need to apply and later peel away multiple layers of encryption without affecting the computations performed on the encrypted data. Crucially, Private Join and Compute also uses methods first developed in the ’90s that enable a system to combine two encrypted data sets, determine what they have in common, and then perform mathematical computations directly on this encrypted, unreadable data through a technique called homomorphic encryption.

True homomorphic encryption isn’t possible, and my guess is that it will never be feasible for most applications. But limited application tricks like this have been around for decades, and sometimes they’re useful.

Yubico is recalling a line of security keys used by the U.S. government due to a firmware flaw. The company issued a security advisory today that warned of an issue in YubiKey FIPS Series devices with firmware versions 4.4.2 and 4.4.4 that reduced the randomness of the cryptographic keys it generates. The security keys are used by thousands of federal employees on a daily basis, letting them securely log-on to their devices by issuing one-time passwords.

The problem in question occurs after the security key powers up. According to Yubico, a bug keeps “some predictable content” inside the device’s data buffer that could impact the randomness of the keys generated. Security keys with ECDSA signatures are in particular danger. A total of 80 of the 256 bits generated by the key remain static, meaning an attacker who gains access to several signatures could recreate the private key.

MongoDB calls the new feature Field Level Encryption. It works kind of like end-to-end encrypted messaging, which scrambles data as it moves across the internet, revealing it only to the sender and the recipient. In such a “client-side” encryption scheme, databases utilizing Field Level Encryption will not only require a system login, but will additionally require specific keys to process and decrypt specific chunks of data locally on a user’s device as needed. That means MongoDB itself and cloud providers won’t be able to access customer data, and a database’s administrators or remote managers don’t need to have access to everything either.

For regular users, not much will be visibly different. If their credentials are stolen and they aren’t using multifactor authentication, an attacker will still be able to access everything the victim could. But the new feature is meant to eliminate single points of failure. With Field Level Encryption in place, a hacker who steals an administrative username and password, or finds a software vulnerability that gives them system access, still won’t be able to use these holes to access readable data.

This is a guest post by Aanchal Malhotra, a Graduate Research Assistant at Boston University and former Cloudflare intern on the Cryptography team.

Cloudflare has always been a leader in deploying secure versions of insecure Internet protocols and making them available for free for anyone to use. In 2014, we launched one of the world’s first free, secure HTTPS service (Universal SSL) to go along with our existing free HTTP plan. When we launched the 1.1.1.1 DNS resolver, we also supported the new secure versions of DNS (DNS over HTTPS and DNS over TLS). Today, we are doing the same thing for the Network Time Protocol (NTP), the dominant protocol for obtaining time over the Internet.

This announcement is personal for me. I’ve spent the last four years identifying and fixing vulnerabilities in time protocols. Today I’m proud to help introduce a service that would have made my life from 2015 through 2019 a whole lot harder: time.cloudflare.com, a free time service that supports both NTP and the emerging Network Time Security (NTS) protocol for securing NTP. Now, anyone can get time securely from all our datacenters in 180 cities around the world.

You can use time.cloudflare.com as the source of time for all your devices today with NTP, while NTS clients are still under development. NTPsec includes experimental support for NTS. If you’d like to get updates about NTS client development, email us asking to join at [email protected]. To use NTS to secure time synchronization, reach out to your vendors and inquire about NTS support.

A small tale of “time” first

Back in 2015, as a fresh graduate student interested in Internet security, I came across this mostly esoteric Internet protocol called the Network Time Protocol (NTP). NTP was designed to synchronize time between computer systems communicating over unreliable and variable-latency network paths. I was actually studying Internet routing security, in particular attacks against the Resource Public Key Infrastructure (RPKI), and kept hitting a dead end because of a cache-flushing issue. As a last-ditch effort I decided to roll back the time on my computer manually, and the attack worked.

I had discovered the importance of time to computer security. Most cryptography uses timestamps to limit certificate and signature validity periods. When connecting to a website, knowledge of the correct time ensures that the certificate you see is current and is not compromised by an attacker. When looking at logs, time synchronization makes sure that events on different machines can be correlated accurately. Certificates and logging infrastructure can break with minutes, hours or months of time difference. Other applications like caching and Bitcoin are sensitive to even very small differences in time on the order of seconds.

Two factor authentication using rolling numbers also rely on accurate clocks. This then creates the need for computer clocks to have access to reasonably accurate time that is securely delivered. NTP is the most commonly used protocol for time synchronization on the Internet. If an attacker can leverage vulnerabilities in NTP to manipulate time on computer clocks, they can undermine the security guarantees provided by these systems.

Motivated by the severity of the issue, I decided to look deeper into NTP and its security. Since the need for synchronizing time across networks was visible early on, NTP is a very old protocol. The first standardized version of NTP dates back to 1985, while the latest NTP version 4 was completed in 2010 (see RFC5905).

In its most common mode, NTP works by having a client send a query packet out to an NTP server that then responds with its clock time. The client then computes an estimate of the difference between its clock and the remote clock and attempts to compensate for network delay in this. NTP client queries multiple servers and implements algorithms to select the best estimate, and rejects clearly wrong answers.

Surprisingly enough, research on NTP and its security was not very active at the time. Before this, in late 2013 and early 2014, high-profile Distributed Denial of Service (DDoS) attacks were carried out by amplifying traffic from NTP servers; attackers able to spoof a victim’s IP address were able to funnel copious amounts of traffic overwhelming the targeted domains. This caught the attention of some researchers. However, these attacks did not exploit flaws in the fundamental protocol design. The attackers simply used NTP as a boring bandwidth multiplier. Cloudflare wrote extensively about these attacks and you can read about it here, here, and here.

I found several flaws in the core NTP protocol design and its implementation that can be exploited by network attackers to launch much more devastating attacks by shifting time or denying service to NTP clients. What is even more concerning was that these attackers do not need to be a Monster-In-The-Middle (MITM), where an attacker can modify traffic between the client and the server, to mount these attacks. A set of recent papers authored by one of us showed that an off-path attacker present anywhere on the network can shift time or deny service to NTP clients. One of the ways this is done is by abusing IP fragmentation.

Fragmentation is a feature of the IP layer where a large packet is chopped into several smaller fragments so that they can pass through the networks that do not support large packets. Basically, any random network element on the path between the client and the server can send a special “ICMP fragmentation needed” packet to the server telling them to fragment the packet to say X bytes. Since the server is not expected to know the IP addresses of all the network elements on its path, this packet can be sent from any source IP.

In our attack, the attacker exploits this feature to make the NTP server fragment its NTP response packet for the victim NTP client. The attacker then spoofs carefully crafted overlapping response fragments from off-path that contain the attacker’s timestamp values. By further exploiting the reassembly policies for overlapping fragments the attacker fools the client into assembling a packet with legitimate fragments and the attacker’s insertions. This evades the authenticity checks that rely on values in the original parts of the packet.

NTP’s past and future

At the time of NTP’s creation back in 1985, there were two main design goals for the service provided by NTP. First, they wanted it to be robust enough to handle networking errors and equipment failures. So it was designed as a service where client can gather timing samples from multiple peers over multiple communication paths and then average them to get more accurate measurement.

The second goal was load distribution. While every client would like to talk to time servers which are directly attached to high precision time-keeping devices like atomic clocks, GPS, etc, and thus have more accurate time, the capacity of those devices is only so much. So, to reduce protocol load on the network, the service was designed in a hierarchical manner. At the top of the hierarchy are servers connected to non-NTP time sources, that distribute time to other servers, that further distribute time to even more servers. Most computers connect to either these second or third level servers.

The original specification (RFC 958) also states the “non-goals” of the protocol, namely peer authentication and data integrity. Security wasn’t considered critical in the relatively small and trusting early Internet, and the protocols and applications that rely on time for security didn’t exist then. Securing NTP came second to improving the protocol and implementation.

As the Internet has grown, more and more core Internet protocols have been secured through cryptography to protect against abuse: TLS, DNSSEC, RPKI are all steps toward ensuring the security of all communications on the Internet. These protocols use “time” to provide security guarantees. Since security of Internet hinges on the security of NTP, it becomes even more important to secure NTP.

This research perspicuously showed the need for securing NTP. As a result, there was more work in the standards body for Internet Protocols, the Internet Engineering Task Force (IETF) towards cryptographically authenticating NTP. At the time, even though NTPv4 supported both symmetric and asymmetric cryptographic authentication, it was rarely used in practice due to limitations of both approaches.

NTPv4’s symmetric approach to securing synchronization doesn’t scale as the symmetric key must be pre-shared and configured manually: imagine if every client on earth needed a special secret key with the servers they wanted to get time from, the organizations that run those servers would have to do a great deal of work managing keys. This makes this solution quite cumbersome for public servers that must accept queries from arbitrary clients. For context, NIST operates important public time servers and distributes symmetric keys only to users that register, once per year, via US mail or facsimile; the US Naval Office does something similar.

The first attempt to solve the problem of key distribution was the Autokey protocol, described in RFC 5906. Many public NTP servers do not support Autokey (e.g., the NIST and USNO time servers, and many servers in pool.ntp.org). The protocol is badly broken as any network attacker can trivially retrieve the secret key shared between the client and server. The authentication mechanisms are non-standard and quite idiosyncratic.

The future of the Internet is a secure Internet, which means an authenticated and encrypted Internet. But until now NTP remains mostly insecure, despite continuing protocol development. In the meantime more and more services depended on it.

Fixing the problem

Following the release of our paper, there was a lot more enthusiasm in the NTP community at standards body for Internet Protocols, the Internet Engineering Task Force (IETF) and outside for improving the state of NTP security. As a short-term fix, the ntpd reference implementation software was patched for several vulnerabilities that we found. And for a long-term solution, the community realized the dire need for a secure, authenticated time synchronization protocol based on public-key cryptography, which enables encryption and authentication without requiring the sharing of key material beforehand. Today we have a Network Time Security (NTS) draft at the IETF, thanks to the work of dozens of dedicated individuals at the NTP working group.

In a nutshell, the NTS protocol is divided into two-phases. The first phase is the NTS key exchange that establishes the necessary key material between the NTP client and the server. This phase uses the Transport Layer Security (TLS) handshake and relies on the same public key infrastructure as the web. Once the keys are exchanged, the TLS channel is closed and the protocol enters the second phase. In this phase the results of that TLS handshake are used to authenticate NTP time synchronization packets via extension fields. The interested reader can find more information in the Internet draft.

Cloudflare’s new service

Today, Cloudflare announces its free time service to anyone on the Internet. We intend to solve the limitations with the existing public time services, in particular by increasing availability, robustness and security.

We use our global network to provide an advantage in latency and accuracy. Our 180 locations around the world all use anycast to automatically route your packets to our closest server. All of our servers are synchronized with stratum 1 time service providers, and then offer NTP to the general public, similar to how other public NTP providers function. The biggest source of inaccuracy for time synchronization protocols is the network asymmetry, leading to a difference in travel times between the client and server and back from the server to the client. However, our servers’ proximity to users means there will be less jitter — a measurement of variance in latency on the network — and possible asymmetry in packet paths. We also hope that in regions with a dearth of NTP servers our service significantly improves the capacity and quality of the NTP ecosystem.

Cloudflare servers obtain authenticated time by using a shared symmetric key with our stratum 1 upstream servers. These upstream servers are geographically spread and ensure that our servers have accurate time in our datacenters. But this approach to securing time doesn’t scale. We had to exchange emails individually with the organizations that run stratum 1 servers, as well as negotiate permission to use them. While this is a solution for us, it isn’t a solution for everyone on the Internet.

As a secure time service provider Cloudflare is proud to announce that we are among the first to offer a free and secure public time service based on Network Time Security. We have implemented the latest NTS IETF draft. As this draft progresses through the Internet standards process we are committed to keeping our service current.

Most NTP implementations are currently working on NTS support, and we expect that the next few months will see broader introduction as well as advancement of the current draft protocol to an RFC. Currently we have interoperability with NTPsec who have implemented draft 18 of NTS. We hope that our service will spur faster adoption of this important improvement to Internet security. Because this is a new service with no backwards compatibility requirements, we are requiring the use of TLS v1.3 with it to promote adoption of the most secure version of TLS.

Use it

If you have an NTS client, point it at time.cloudflare.com:1234. Otherwise point your NTP client at time.cloudflare.com. More details on configuration are available in the developer docs.

Conclusion

From our Roughtime service to Universal SSL Cloudflare has played a role in expanding the availability and use of secure protocols. Now with our free public time service we provide a trustworthy, widely available alternative to another insecure legacy protocol. It’s all a part of our mission to help make a faster, reliable, and more secure Internet for everyone.

This is a guest post by Aanchal Malhotra, a Graduate Research Assistant at Boston University and former Cloudflare intern on the Cryptography team.

Cloudflare has always been a leader in deploying secure versions of insecure Internet protocols and making them available for free for anyone to use. In 2014, we launched one of the world’s first free, secure HTTPS service (Universal SSL) to go along with our existing free HTTP plan. When we launched the 1.1.1.1 DNS resolver, we also supported the new secure versions of DNS (DNS over HTTPS and DNS over TLS). Today, as part of Crypto Week 2019, we are doing the same thing for the Network Time Protocol (NTP), the dominant protocol for obtaining time over the Internet.

This announcement is personal for me. I’ve spent the last four years identifying and fixing vulnerabilities in time protocols. Today I’m proud to help introduce a service that would have made my life from 2015 through 2019 a whole lot harder: time.cloudflare.com, a free time service that supports both NTP and the emerging Network Time Security (NTS) protocol for securing NTP. Now, anyone can get time securely from all our datacenters in 180 cities around the world.

You can use time.cloudflare.com as the source of time for all your devices today with NTP, while NTS clients are still under development. NTPsec includes experimental support for NTS. If you’d like to get updates about NTS client development, email us asking to join at [email protected]. To use NTS to secure time synchronization, reach out to your vendors and inquire about NTS support.

A small tale of “time” first

Back in 2015, as a fresh graduate student interested in Internet security, I came across this mostly esoteric Internet protocol called the Network Time Protocol (NTP). NTP was designed to synchronize time between computer systems communicating over unreliable and variable-latency network paths. I was actually studying Internet routing security, in particular attacks against the Resource Public Key Infrastructure (RPKI), and kept hitting a dead end because of a cache-flushing issue. As a last-ditch effort I decided to roll back the time on my computer manually, and the attack worked.

I had discovered the importance of time to computer security. Most cryptography uses timestamps to limit certificate and signature validity periods. When connecting to a website, knowledge of the correct time ensures that the certificate you see is current and is not compromised by an attacker. When looking at logs, time synchronization makes sure that events on different machines can be correlated accurately. Certificates and logging infrastructure can break with minutes, hours or months of time difference. Other applications like caching and Bitcoin are sensitive to even very small differences in time on the order of seconds.

Two factor authentication using rolling numbers also rely on accurate clocks. This then creates the need for computer clocks to have access to reasonably accurate time that is securely delivered. NTP is the most commonly used protocol for time synchronization on the Internet. If an attacker can leverage vulnerabilities in NTP to manipulate time on computer clocks, they can undermine the security guarantees provided by these systems.

Motivated by the severity of the issue, I decided to look deeper into NTP and its security. Since the need for synchronizing time across networks was visible early on, NTP is a very old protocol. The first standardized version of NTP dates back to 1985, while the latest NTP version 4 was completed in 2010 (see RFC5905).

In its most common mode, NTP works by having a client send a query packet out to an NTP server that then responds with its clock time. The client then computes an estimate of the difference between its clock and the remote clock and attempts to compensate for network delay in this. NTP client queries multiple servers and implements algorithms to select the best estimate, and rejects clearly wrong answers.

Surprisingly enough, research on NTP and its security was not very active at the time. Before this, in late 2013 and early 2014, high-profile Distributed Denial of Service (DDoS) attacks were carried out by amplifying traffic from NTP servers; attackers able to spoof a victim’s IP address were able to funnel copious amounts of traffic overwhelming the targeted domains. This caught the attention of some researchers. However, these attacks did not exploit flaws in the fundamental protocol design. The attackers simply used NTP as a boring bandwidth multiplier. Cloudflare wrote extensively about these attacks and you can read about it here, here, and here.

I found several flaws in the core NTP protocol design and its implementation that can be exploited by network attackers to launch much more devastating attacks by shifting time or denying service to NTP clients. What is even more concerning was that these attackers do not need to be a Monster-In-The-Middle (MITM), where an attacker can modify traffic between the client and the server, to mount these attacks. A set of recent papers authored by one of us showed that an off-path attacker present anywhere on the network can shift time or deny service to NTP clients. One of the ways this is done is by abusing IP fragmentation.

Fragmentation is a feature of the IP layer where a large packet is chopped into several smaller fragments so that they can pass through the networks that do not support large packets. Basically, any random network element on the path between the client and the server can send a special “ICMP fragmentation needed” packet to the server telling them to fragment the packet to say X bytes. Since the server is not expected to know the IP addresses of all the network elements on its path, this packet can be sent from any source IP.

In our attack, the attacker exploits this feature to make the NTP server fragment its NTP response packet for the victim NTP client. The attacker then spoofs carefully crafted overlapping response fragments from off-path that contain the attacker’s timestamp values. By further exploiting the reassembly policies for overlapping fragments the attacker fools the client into assembling a packet with legitimate fragments and the attacker’s insertions. This evades the authenticity checks that rely on values in the original parts of the packet.

NTP’s past and future

At the time of NTP’s creation back in 1985, there were two main design goals for the service provided by NTP. First, they wanted it to be robust enough to handle networking errors and equipment failures. So it was designed as a service where client can gather timing samples from multiple peers over multiple communication paths and then average them to get more accurate measurement.

The second goal was load distribution. While every client would like to talk to time servers which are directly attached to high precision time-keeping devices like atomic clocks, GPS, etc, and thus have more accurate time, the capacity of those devices is only so much. So, to reduce protocol load on the network, the service was designed in a hierarchical manner. At the top of the hierarchy are servers connected to non-NTP time sources, that distribute time to other servers, that further distribute time to even more servers. Most computers connect to either these second or third level servers.

The original specification (RFC 958) also states the “non-goals” of the protocol, namely peer authentication and data integrity. Security wasn’t considered critical in the relatively small and trusting early Internet, and the protocols and applications that rely on time for security didn’t exist then. Securing NTP came second to improving the protocol and implementation.

As the Internet has grown, more and more core Internet protocols have been secured through cryptography to protect against abuse: TLS, DNSSEC, RPKI are all steps toward ensuring the security of all communications on the Internet. These protocols use “time” to provide security guarantees. Since security of Internet hinges on the security of NTP, it becomes even more important to secure NTP.

This research perspicuously showed the need for securing NTP. As a result, there was more work in the standards body for Internet Protocols, the Internet Engineering Task Force (IETF) towards cryptographically authenticating NTP. At the time, even though NTPv4 supported both symmetric and asymmetric cryptographic authentication, it was rarely used in practice due to limitations of both approaches.

NTPv4’s symmetric approach to securing synchronization doesn’t scale as the symmetric key must be pre-shared and configured manually: imagine if every client on earth needed a special secret key with the servers they wanted to get time from, the organizations that run those servers would have to do a great deal of work managing keys. This makes this solution quite cumbersome for public servers that must accept queries from arbitrary clients. For context, NIST operates important public time servers and distributes symmetric keys only to users that register, once per year, via US mail or facsimile; the US Naval Office does something similar.

The first attempt to solve the problem of key distribution was the Autokey protocol, described in RFC 5906. Many public NTP servers do not support Autokey (e.g., the NIST and USNO time servers, and many servers in pool.ntp.org). The protocol is badly broken as any network attacker can trivially retrieve the secret key shared between the client and server. The authentication mechanisms are non-standard and quite idiosyncratic.

The future of the Internet is a secure Internet, which means an authenticated and encrypted Internet. But until now NTP remains mostly insecure, despite continuing protocol development. In the meantime more and more services depended on it.

Fixing the problem

Following the release of our paper, there was a lot more enthusiasm in the NTP community at standards body for Internet Protocols, the Internet Engineering Task Force (IETF) and outside for improving the state of NTP security. As a short-term fix, the ntpd reference implementation software was patched for several vulnerabilities that we found. And for a long-term solution, the community realized the dire need for a secure, authenticated time synchronization protocol based on public-key cryptography, which enables encryption and authentication without requiring the sharing of key material beforehand. Today we have a Network Time Security (NTS) draft at the IETF, thanks to the work of dozens of dedicated individuals at the NTP working group.

In a nutshell, the NTS protocol is divided into two-phases. The first phase is the NTS key exchange that establishes the necessary key material between the NTP client and the server. This phase uses the Transport Layer Security (TLS) handshake and relies on the same public key infrastructure as the web. Once the keys are exchanged, the TLS channel is closed and the protocol enters the second phase. In this phase the results of that TLS handshake are used to authenticate NTP time synchronization packets via extension fields. The interested reader can find more information in the Internet draft.

Cloudflare’s new service

Today, Cloudflare announces its free time service to anyone on the Internet. We intend to solve the limitations with the existing public time services, in particular by increasing availability, robustness and security.

We use our global network to provide an advantage in latency and accuracy. Our 180 locations around the world all use anycast to automatically route your packets to our closest server. All of our servers are synchronized with stratum 1 time service providers, and then offer NTP to the general public, similar to how other public NTP providers function. The biggest source of inaccuracy for time synchronization protocols is the network asymmetry, leading to a difference in travel times between the client and server and back from the server to the client. However, our servers’ proximity to users means there will be less jitter — a measurement of variance in latency on the network — and possible asymmetry in packet paths. We also hope that in regions with a dearth of NTP servers our service significantly improves the capacity and quality of the NTP ecosystem.

Cloudflare servers obtain authenticated time by using a shared symmetric key with our stratum 1 upstream servers. These upstream servers are geographically spread and ensure that our servers have accurate time in our datacenters. But this approach to securing time doesn’t scale. We had to exchange emails individually with the organizations that run stratum 1 servers, as well as negotiate permission to use them. While this is a solution for us, it isn’t a solution for everyone on the Internet.

As a secure time service provider Cloudflare is proud to announce that we are among the first to offer a free and secure public time service based on Network Time Security. We have implemented the latest NTS IETF draft. As this draft progresses through the Internet standards process we are committed to keeping our service current.

Most NTP implementations are currently working on NTS support, and we expect that the next few months will see broader introduction as well as advancement of the current draft protocol to an RFC. Currently we have interoperability with NTPsec who have implemented draft 18 of NTS. We hope that our service will spur faster adoption of this important improvement to Internet security. Because this is a new service with no backwards compatibility requirements, we are requiring the use of TLS v1.3 with it to promote adoption of the most secure version of TLS.

Use it

If you have an NTS client, point it at time.cloudflare.com:1234. Otherwise point your NTP client at time.cloudflare.com. More details on configuration are available in the developer docs.

Conclusion

From our Roughtime service to Universal SSL Cloudflare has played a role in expanding the availability and use of secure protocols. Now with our free public time service we provide a trustworthy, widely available alternative to another insecure legacy protocol. It’s all a part of our mission to help make a faster, reliable, and more secure Internet for everyone.

If you haven’t already been inspired by the description above, let me phrase the question you ought to be asking: how is this system going to avoid being a massive privacy nightmare?

Let me count the concerns:

If your device is constantly emitting a BLE signal that uniquely identifies it, the whole world is going to have (yet another) way to track you. Marketers already use WiFi and Bluetooth MAC addresses to do this: Find My could create yet another tracking channel.

It also exposes the phones who are doing the tracking. These people are now going to be sending their current location to Apple (which they may or may not already be doing). Now they’ll also be potentially sharing this information with strangers who “lose” their devices. That could go badly.

Scammers might also run active attacks in which they fake the location of your device. While this seems unlikely, people will always surprise you.

The good news is that Apple claims that their system actually does provide strong privacy, and that it accomplishes this using clever cryptography. But as is typical, they’ve declined to give out the details how they’re going to do it. Andy Greenberg talked me through an incomplete technical description that Apple provided to Wired, so that provides many hints. Unfortunately, what Apple provided still leaves huge gaps. It’s into those gaps that I’m going to fill in my best guess for what Apple is actually doing.

Over the last few decades, the word ‘quantum’ has become increasingly popular. It is common to find articles, reports, and many people interested in quantum mechanics and the new capabilities and improvements it brings to the scientific community. This topic not only concerns physics, since the development of quantum mechanics impacts on several other fields such as chemistry, economics, artificial intelligence, operations research, and undoubtedly, cryptography.

This post begins a trio of blogs describing the impact of quantum computing on cryptography, and how to use stronger algorithms resistant to the power of quantum computing.

This post introduces quantum computing and describes the main aspects of this new computing model and its devastating impact on security standards; it summarizes some approaches to securing information using quantum-resistant algorithms.

Due to the relevance of this matter, we present our experiments on a large-scale deployment of quantum-resistant algorithms.

Our third post introduces CIRCL, open-source Go library featuring optimized implementations of quantum-resistant algorithms and elliptic curve-based primitives.

All of this is part of Cloudflare’s Crypto Week 2019, now fasten your seatbelt and get ready to make a quantum leap.

What is Quantum Computing?

Back in 1981, Richard Feynman raised the question about what kind of computers can be used to simulate physics. Although some physical systems can be simulated in a classical computer, the amount of resources used by such a computer can grow exponentially. Then, he conjectured the existence of a computer model that behaves under quantum mechanics rules, which opened a field of research now called quantum computing. To understand the basics of quantum computing, it is necessary to recall how classical computers work, and from that shine a spotlight on the differences between these computational models.

In 1936, Alan Turing and Emil Post independently described models that gave rise to the foundation of the computing model known as the Post-Turing machine, which describes how computers work and allowed further determination of limits for solving problems.

In this model, the units of information are bits, which store one of two possible values, usually denoted by 0 and 1. A computing machine contains a set of bits and performs operations that modify the values of the bits, also known as the machine’s state. Thus, a machine with N bits can be in one of 2ᴺ possible states. With this in mind, the Post-Turing computing model can be abstractly described as a machine of states, in which running a program is translated as machine transitions along the set of states.

A paper David Deutsch published in 1985 describes a computing model that extends the capabilities of a Turing machine based on the theory of quantum mechanics. This computing model introduces several advantages over the Turing model for processing large volumes of information. It also presents unique properties that deviate from the way we understand classical computing. Most of these properties come from the nature of quantum mechanics. We’re going to dive into these details before approaching the concept of quantum computing.

Superposition

One of the most exciting properties of quantum computing that provides an advantage over the classical computing model is superposition. In physics, superposition is the ability to produce valid states from the addition or superposition of several other states that are part of a system.

Applying these concepts to computing information, it means that there is a system in which it is possible to generate a machine state that represents a (weighted) sum of the states 0 and 1; in this case, the term weighted means that the state can keep track of “the quantity of” 0 and 1 present in the state. In the classical computation model, one bit can only store either the state of 0 or 1, not both; even using two bits, they cannot represent the weighted sum of these states. Hence, to make a distinction from the basic states, quantum computing uses the concept of a quantum bit (qubit) — a unit of information to denote the superposition of two states. This is a cornerstone concept of quantum computing as it provides a way of tracking more than a single state per unit of information, making it a powerful tool for processing information.

So, a qubit represents the sum of two parts: the 0 or 1 state plus the amount each 0/1 state contributes to produce the state of the qubit.

In mathematical notation, qubit \( | \Psi \rangle \) is an explicit sum indicating that a qubit represents the superposition of the states 0 and 1. This is the Dirac notation used to describe the value of a qubit \( | \Psi \rangle = A | 0 \rangle +B | 1 \rangle \), where, A and B are complex numbers known as the amplitude of the states 0 and 1, respectively. The value of the basic states is represented by qubits as \( | 0 \rangle = 1 | 0 \rangle + 0 | 1 \rangle \) and \( | 1 \rangle = 0 | 0 \rangle + 1 | 1 \rangle \), respectively. The right side of the term contains the abbreviated notation for these special states.

Measurement

In a classical computer, the values 0 and 1 are implemented as digital signals. Measuring the current of the signal automatically reveals the status of a bit. This means that at any moment the value of the bit can be observed or measured.

The state of a qubit is maintained in a physically closed system, meaning that the properties of the system, such as superposition, require no interaction with the environment; otherwise any interaction, like performing a measurement, can cause interference on the state of a qubit.

Measuring a qubit is a probabilistic experiment. The result is a bit of information that depends on the state of the qubit. The bit, obtained by measuring \( | \Psi \rangle = A | 0 \rangle +B | 1 \rangle \), will be equal to 0 with probability \( |A|^2 \), and equal to 1 with probability \( |B|^2 \), where \( |x| \) represents the absolute value of \(x\).

From Statistics, we know that the sum of probabilities of all possible events is always equal to 1, so it must hold that \( |A|^2 +|B|^2 =1 \). This last equation motivates to represent qubits as the points of a circle of radius one, and more generally, as the points on the surface of a sphere of radius one, which is known as the Bloch Sphere.

Let’s break it down: If you measure a qubit you also destroy the superposition of the qubit, resulting in a superposition state collapse, where it assumes one of the basics states, providing your final result.

Another way to think about superposition and measurement is through the coin tossing experiment.

Toss a coin in the air and you give people a random choice between two options: heads or tails. Now, don’t focus on the randomness of the experiment, instead note that while the coin is rotating in the air, participants are uncertain which side will face up when the coin lands. Conversely, once the coin stops with a random side facing up, participants are 100% certain of the status.

How does it relate? Qubits are similar to the participants. When a qubit is in a superposition of states, it is tracking the probability of heads or tails, which is the participants’ uncertainty quotient while the coin is in the air. However, once you start to measure the qubit to retrieve its value, the superposition vanishes, and a classical bit value sticks: heads or tails. Measurement is that moment when the coin is static with only one side facing up.

A fair coin is a coin that is not biased. Each side (assume 0=heads and 1=tails) of a fair coin has the same probability of sticking after a measurement is performed. The qubit \( \tfrac{1}{\sqrt{2}}|0\rangle + \tfrac{1}{\sqrt{2}}|1\rangle \) describes the probabilities of tossing a fair coin. Note that squaring either of the amplitudes results in ½, indicating that there is a 50% chance either heads or tails sticks.

It would be interesting to be able to charge a fair coin at will while it is in the air. Although this is the magic of a professional illusionist, this task, in fact, can be achieved by performing operations over qubits. So, get ready to become the next quantum magician!

Quantum Gates

A logic gate represents a Boolean function operating over a set of inputs (on the left) and producing an output (on the right). A logic circuit is a set of connected logic gates, a convenient way to represent bit operations.

Other gates are AND, OR, XOR, and NAND, and more. A set of gates is universal if it can generate other gates. For example, NOR and NAND gates are universal since any circuit can be constructed using only these gates.

Quantum computing also admits a description using circuits. Quantum gates operate over qubits, modifying the superposition of the states. For example, there is a quantum gate analogous to the NOT gate, the X gate.

The X quantum gate interchanges the amplitudes of the states of the input qubit.

The Z quantum gate flips the sign’s amplitude of state 1:

Another quantum gate is the Hadamard gate, which generates an equiprobable superposition of the basic states.

Using our coin tossing analogy, the Hadamard gate has the action of tossing a fair coin to the air. In quantum circuits, a triangle represents measuring a qubit, and the resulting bit is indicated by a double-wire.

Other gates, such as the CNOT gate, Pauli’s gates, Toffoli gate, Deutsch gate, are slightly more advanced. Quirk, the open-source playground, is a fun sandbox where you can construct quantum circuits using all of these gates.

Reversibility

An operation is reversible if there exists another operation that rolls back the output state to the initial state. For instance, a NOT gate is reversible since applying a second NOT gate recovers the initial input.

In contrast, AND, OR, NAND gates are not reversible. This means that some classical computations cannot be reversed by a classic circuit that uses only the output bits. However, if you insert additional bits of information, the operation can be reversed.

Quantum computing mainly focuses on reversible computations, because there’s always a way to construct a reversible circuit to perform an irreversible computation. The reversible version of a circuit could require the use of ancillary qubits as auxiliary (but not temporary) variables.

Due to the nature of composed systems, it could be possible that these ancillas (extra qubits) correlate to qubits of the main computation. This correlation makes it infeasible to reuse ancillas since any modification could have the side-effect on the operation of a reversible circuit. This is like memory assigned to a process by the operating system: the process cannot use memory from other processes or it could cause memory corruption, and processes cannot release their assigned memory to other processes. You could use garbage collection mechanisms for ancillas, but performing reversible computations increases your qubit budget.

Composed Systems

In quantum mechanics, a single qubit can be described as a single closed system: a system that has no interaction with the environment nor other qubits. Letting qubits interact with others leads to a composed system where more states are represented. The state of a 2-qubit composite system is denoted as \(A_0|00\rangle+A_1|01\rangle+A_2|10\rangle+A_3|11\rangle \), where, \( A_i \) values correspond to the amplitudes of the four basic states 00, 01, 10, and 11. This qubit \( \tfrac{1}{2}|00\rangle+\tfrac{1}{2}|01\rangle+\tfrac{1}{2}|10\rangle+\tfrac{1}{2}|11\rangle \) represents the superposition of these basic states, both having the same probability obtained after measuring the two qubits.

In the classical case, the state of N bits represents only one of 2ᴺ possible states, whereas a composed state of N qubits represents all the 2ᴺ states but in superposition. This is one big difference between these computing models as it carries two important properties: entanglement and quantum parallelism.

Entanglement

According to the theory behind quantum mechanics, some composed states can be described through the description of its constituents. However, there are composed states where no description is possible, known as entangled states.

The entanglement phenomenon was pointed out by Einstein, Podolsky, and Rosen in the so-called EPR paradox. Suppose there is a composed system of two entangled qubits, in which by performing a measurement in one qubit causes interference in the measurement of the second. This interference occurs even when qubits are separated by a long distance, which means that some information transfer happens faster than the speed of light. This is how quantum entanglement conflicts with the theory of relativity, where information cannot travel faster than the speed of light. The EPR paradox motivated further investigation for deriving new interpretations about quantum mechanics and aiming to resolve the paradox.

Quantum entanglement can help to transfer information at a distance by following a communication protocol. The following protocol examples rely on the fact that Alice and Bob separately possess one of two entangled qubits:

The superdense coding protocol allows Alice to communicate a 2-bit message \(m_0,m_1\) to Bob using a quantum communication channel, for example, using fiber optics to transmit photons. All Alice has to do is operate on her qubit according to the value of the message and send the resulting qubit to Bob. Once Bob receives the qubit, he measures both qubits, noting that the collapsed 2-bit state corresponds to Alice’s message.

The quantum teleportation protocol allows Alice to transmit a qubit to Bob without using a quantum communication channel. Alice measures the qubit to send Bob and her entangled qubit resulting in two bits. Alice sends these bits to Bob, who operates on his entangled qubit according to the bits received and notes that the result state matches the original state of Alice’s qubit.

Quantum Parallelism

Composed systems of qubits allow representation of more information per composed state. Note that operating on a composed state of N qubits is equivalent to operating over a set of 2ᴺ states in superposition. This procedure is quantum parallelism. In this setting, operating over a large volume of information gives the intuition of performing operations in parallel, like in the parallel computing paradigm; one big caveat is that superposition is not equivalent to parallelism.

Remember that a composed state is a superposition of several states so, a computation that takes a composed state of inputs will result in a composed state of outputs. The main divergence between classical and quantum parallelism is that quantum parallelism can obtain only one of the processed outputs. Observe that a measurement in the output of a composed state causes that the qubits collapse to only one of the outputs, making it unattainable to calculate all computed values.

Although quantum parallelism does not match precisely with the traditional notion of parallel computing, you can still leverage this computational power to get related information.

Deutsch-Jozsa Problem: Assume \(F\) is a function that takes as input N bits, outputs one bit, and is either constant (always outputs the same value for all inputs) or balanced (outputs 0 for half of the inputs and 1 for the other half). The problem is to determine if \(F\) is constant or balanced.

The quantum algorithm that solves the Deutsch-Jozsa problem uses quantum parallelism. First, N qubits are initialized in a superposition of 2ᴺ states. Then, in a single shot, it evaluates \(F\) for all of these states.

The result of applying \(F\) appears (in the exponent) of the amplitude of the all-zero state. Note that only when \(F\) is constant is this amplitude, either +1 or -1. If the result of measuring the N qubit is an all-zeros bitstring, then there is a 100% certainty that \(F\) is constant. Any other result indicates that \(F\) is balanced.

A deterministic classical algorithm solves this problem using \( 2^{N-1}+1\) evaluations of \(F\) in the worst case. Meanwhile, the quantum algorithm requires only one evaluation. The Deutsch-Jozsa problem exemplifies the exponential advantage of a quantum algorithm over classical algorithms.

Quantum Computers

The theory of quantum computing is supported by investigations in the field of quantum mechanics. However, constructing a quantum machine requires a physical system that allows representing qubits and manipulation of states in a reliable and precise way.

The DiVincenzo Criteria require that a physical implementation of a quantum computer must:

Be scalable and have well-defined qubits.

Be able to initialize qubits to a state.

Have long decoherence times to apply quantum error-correcting codes. Decoherence of a qubit happens when the qubit interacts with the environment, for example, when a measurement is performed.

Use a universal set of quantum gates.

Be able to measure single qubits without modifying others.

Quantum computer physical implementations face huge engineering obstacles to satisfy these requirements. The most important challenge is to guarantee low error rates during computation and measurement. Lowering these rates require techniques for error correction, which add a significant number of qubits specialized on this task. For this reason, the number of qubits of a quantum computer should not be regarded as for classical systems. In a classical computer, the bits of a computer are all effective for performing a calculation, whereas the number of qubits is the sum of the effective qubits (those used to make calculations) plus the ancillas (used for reversible computations) plus the error correction qubits.

Current implementations of quantum computers partially satisfy the DiVincenzo criteria. Quantum adiabatic computers fit in this category since they do not operate using quantum gates. For this reason, they are not considered to be universal quantum computers.

Quantum Adiabatic Computers

A recurrent problem in optimization is to find the global minimum of an objective function. For example, a route-traffic control system can be modeled as a function that reduces the cost of routing to a minimum. Simulated annealing is a heuristic procedure that provides a good solution to these types of problems. Simulated annealing finds the solution state by slowly introducing changes (the adiabatic process) on the variables that govern the system.

Quantum annealing is the analogous quantum version of simulated annealing. A qubit is initialized into a superposition of states representing all possible solutions to the problem. Here is used the Hamiltonian operator, which is the sum of vectors of potential and kinetic energies of the system. Hence, the objective function is encoded using this operator describing the evolution of the system in correspondence with time. Then, if the system is allowed to evolve very slowly, it will eventually land on a final state representing the optimal value of the objective function.

Currently, there exist adiabatic computers in the market, such as the D-Wave and IBM Q systems, featuring hundreds of qubits; however, their capabilities are somewhat limited to some problems that can be modeled as optimization problems. The limits of adiabatic computers were studied by van Dam et al, showing that despite solving local searching problems and even some instances of the max-SAT problem, there exists harder searching problems this computing model cannot efficiently solve.

Nuclear Magnetic Resonance

Nuclear Magnetic Resonance (NMR) is a physical phenomena that can be used to represent qubits. The spin of atomic nucleus of molecules is perturbed by an oscillating magnetic field. A 2001 report describes successful implementation of Shor’s algorithm in a 7-qubit NMR quantum computer. An iconic result since this computer was able to factor the number 15.

Superconducting Quantum Computers

One way to physically construct qubits is based on superconductors, materials that conduct electric current with zero resistance when exposed to temperatures close to absolute zero.

The Josephson effect, in which current flows across the junction of two superconductors separated by a non-superconducting material, is used to physically implement a superposition of states.

When a magnetic flux is applied to this junction, the current flows continuously in one direction. But, depending on the quantity of magnetic flux applied, the current can also flow in the opposite direction. There exists a quantum superposition of currents going both clockwise and counterclockwise leading to a physical implementation of a qubit called flux qubit. The complete device is known as the Superconducting Quantum Interference Device (SQUID) and can be easily coupled scaling the number of qubits. Thus, SQUIDs are like the transistors of a quantum computer.

Examples of superconducting computers are:

D-wave’s adiabatic computers process quantum annealing for solving diverse optimization problems.

IBM’s IBM-Q Tokyo, a 20-qubit adiabatic computer, and IBM Q Experience, a cloud-based system for exploring quantum circuits.

IBM Q System

The Imminent Threat of Quantum Algorithms

The quantum zoo website tracks problems that can be solved using quantum algorithms. As of mid-2018, more than 60 problems appear on this list, targeting diverse applications in the area of number theory, approximation, simulation, and searching. As terrific as it sounds, some easily-solvable problems by quantum computing are surrounding the security of information.

Grover’s Algorithm

Tales of a quantum detective (fragment). A couple of detectives have the mission of finding one culprit in a group of suspects that always respond to this question honestly: “are you guilty?”. The detective C follows a classic interrogative method and interviews every person one at a time, until finding the first one that confesses. The detective Q proceeds in a different way, First gather all suspects in a completely dark room, and after that, the detective Q asks them — are you guilty? — A steady sound comes from the room saying “No!” while at the same time, a single voice mixed in the air responds “Yes!.” Since everybody is submerged in darkness, the detective cannot see the culprit. However, detective Q knows that, as long as the interrogation advances, the culprit will feel desperate and start to speak louder and louder, and so, he continues asking the same question. Suddenly, detective Q turns on the lights, enters into the room, and captures the culprit. How did he do it?

The task of the detective can be modeled as a searching problem. Given a Boolean function \( f\) that takes N bits and produces one bit, to find the unique input \(x\) such that \( f(x)=1\).

A classical algorithm (detective C) finds \(x\) using \(2^N-1\) function evaluations in the worst case. However, the quantum algorithm devised by Grover, corresponding to detective Q, searches quadratically faster using around \(2^{N/2}\) function evaluations.

The key intuition of Grover’s algorithm is increasing the amplitude of the state that represents the solution while maintaining the other states in a lower amplitude. In this way, a system of N qubits, which is a superposition of 2ᴺ possible inputs, can be continuously updated using this intuition until the solution state has an amplitude closer to 1. Hence, after updating the qubits many times, there will be a high probability to measure the solution state.

Initially, a superposition of 2ᴺ states (horizontal axis) is set, each state has an amplitude (vertical axis) close to 0. The qubits are updated so that the amplitude of the solution state increases more than the amplitude of other states. By repeating the update step, the amplitude of the solution state gets closer to 1, which boosts the probability of collapsing to the solution state after measuring.

Grover’s Algorithm (pseudo-code):

Prepare an N qubit \(|x\rangle \) as a uniform superposition of 2ᴺ states.

Update the qubits by performing this core operation. $$ |x\rangle \mapsto (-1)^{f(x)} |x\rangle $$ The result of \( f(x) \) only flips the amplitude of the searched state.

Negate the N qubit over the average of the amplitudes.

Repeat Step 2 and 3 for \( (\tfrac{\pi}{4}) 2^{ N/2} \) times.

Measure the qubit and return the bits obtained.

Alternatively, the second step can be better understood as a conditional statement:

IF f(x) = 1 THEN
Negate the amplitude of the solution state.
ELSE
/* nothing */
ENDIF

Grover’s algorithm considers function \(f\) a black box, so with slight modifications, the algorithm can also be used to find collisions on the function. This implies that Grover’s algorithm can find a collision using an asymptotically less number of operations than using a brute-force algorithm.

The power of Grover’s algorithm can be turned against cryptographic hash functions. For instance, a quantum computer running Grover’s algorithm could find a collision on SHA256 performing only 2¹²⁸ evaluations of a reversible circuit of SHA256. The natural protection for hash functions is to increase the output size to double. More generally, most of symmetric key encryption algorithms will survive to the power of Grover’s algorithm by doubling the size of keys.

The scenario for public-key algorithms is devastating in face of Peter Shor’s algorithm.

Shor’s Algorithm

Multiplying integers is an easy task to accomplish, however, finding the factors that compose an integer is difficult. The integer factorization problem is to decompose a given integer number into its prime factors. For example, 42 has three factors 2, 3, and 7 since \( 2\times 3\times 7 = 42\). As the numbers get bigger, integer factorization becomes more difficult to solve, and the hardest instances of integer factorization are when the factors are only two different large primes. Thus, given an integer number \(N\), to find primes \(p\) and \(q\) such that \( N = p \times q\), is known as integer splitting.

Factoring integers is like cutting wood, and the specific task of splitting integers is analogous to using an axe for splitting the log in two parts. There exist many different tools (algorithms) for accomplishing each task.

For integer factorization, trial division, the Rho method, the elliptic curve method are common algorithms. Fermat’s method, the quadratic- and rational-sieve, leads to the (general) number field sieve (NFS) algorithm for integer splitting. The latter relies on finding a congruence of squares, that is, splitting \(N\) as a product of squares such that $$ N = x^2 – y^2 = (x+y)\times(x-y) $$ The complexity of NFS is mainly attached to the number of pairs \((x, y)\) that must be examined before getting a pair that factors \(N\). The NFS algorithm has subexponential complexity on the size of \(N\), meaning that the time required for splitting an integer increases significantly as the size of \(N\) grows. For large integers, the problem becomes intractable for classical computers.

The Axe of Thor Shor

The many different guesses of the NFS algorithm are analogous to hitting the log using a dulled axe; after subexponential many tries, the log is cut by half. However, using a sharper axe allows you to split the log faster. This sharpened axe is the quantum algorithm proposed by Shor in 1994.

Let \(x\) be an integer less than \(N\) and of the order \(k\). Then, if \(k\) is even, there exists an integer \(q\) so \(qN\) can be factored as follows.

This approach has some issues. For example, the factorization could correspond to \(q\) not \(N\) and the order of \(x\) is unknown, and here is where Shor’s algorithm enters the picture, finding the order of \(x\).

The internals of Shor’s algorithm rely on encoding the order \(k\) into a periodic function, so that its period can be obtained using the quantum version of the Fourier transform (QFT). The order of \(x\) can be found using a polynomial number quantum evaluations of Shor’s algorithm. Therefore, splitting integers using this quantum approach has polynomial complexity on the size of \(N\).

Shor’s algorithm carries strong implications on the security of the RSA encryption scheme because its security relies on integer factorization. A large-enough quantum computer can efficiently break RSA for current instances.

Alternatively, one may recur to elliptic curves, used in cryptographic protocols like ECDSA or ECDH. Moreover, all TLS ciphersuites use a combination of elliptic curve groups, large prime groups, and RSA and DSA signatures. Unfortunately, these algorithms all succumb to Shor’s algorithm. It only takes a few modifications for Shor’s algorithm to solve the discrete logarithm problem on finite groups. This sounds like a catastrophic story where all of our encrypted data and privacy are no longer secure with the advent of a quantum computer, and in some sense this is true.

On one hand, it is a fact that the quantum computers constructed as of 2019 are not large enough to run, for instance, Shor’s algorithm for the RSA key sizes used in standard protocols. For example, a 2018 report shows experiments on the factorization of a 19-bit number using 94 qubits, they also estimate that 147456 qubits are needed for factoring a 768-bit number. Hence, there numbers indicates that we are still far from breaking RSA.

What if we increment RSA key sizes to be resistant to quantum algorithms, just like for symmetric algorithms?

Bernstein et al. estimated that RSA public keys should be as large as 1 terabyte to maintain secure RSA even in the presence of quantum factoring algorithms. So, for public-key algorithms, increasing the size of keys does not help.

A recent investigation by Gidney and Ekerá shows improvements that accelerate the evaluation of quantum factorization. In their report, the cost of factoring 2048-bit integers is estimated to take a few hours using a quantum machine of 20 million qubits, which is far from any current development. Something worth noting is that the number of qubits needed is two orders of magnitude smaller than the estimated numbers given in previous works developed in this decade. Under these estimates, current encryption algorithms will remain secure several more years; however, consider the following not-so-unrealistic situation.

Information currently encrypted with for example, RSA, can be easily decrypted with a quantum computer in the future. Now, suppose that someone records encrypted information and stores them until a quantum computer is able to decrypt ciphertexts. Although this could be as far as 20 years from now, the forward-secrecy principle is violated. A 20-year gap to the future is sometimes difficult to imagine. So, let’s think backwards, what would happen if all you did on the Internet at the end of the 1990s can be revealed 20 years later — today. How does this impact the security of your personal information? What if the ciphertexts were company secrets or business deals? In 1999, most of us were concerned about the effects of the Y2K problem, now we’re facing Y2Q (years to quantum): the advent of quantum computers.

Post-Quantum Cryptography

Although the current capacity of the physical implementation of quantum computers is far from a real threat to secure communications, a transition to use stronger problems to protect information has already started. This wave emerged as post-quantum cryptography (PQC). The core idea of PQC is finding algorithms difficult enough that no quantum (and classical) algorithm can solve them.

A recurrent question is: How does it look like a problem that even a quantum computer can not solve?

These so-called quantum-resistant algorithms rely on different hard mathematical assumptions; some of them as old as RSA, others more recently proposed. For example, McEliece cryptosystem, formulated in the late 70s, relies on the hardness of decoding a linear code (in the sense of coding theory). The practical use of this cryptosystem didn’t become widespread, since with the passing of time, other cryptosystems superseded in efficiency. Fortunately, McEliece cryptosystem remains immune to Shor’s algorithm, gaining it relevance in the post-quantum era.

As of 2017, the NIST started an evaluation process that tracks possible alternatives for next-generation secure algorithms. From a practical perspective, all candidates present different trade-offs in implementation and usage. The time and space requirements are diverse; at this moment, it’s too early to define which will succeed RSA and elliptic curves. An initial round collected 70 algorithms for deploying key encapsulation mechanisms and digital signatures. As of early 2019, 28 of these survive and are currently in the analysis, investigation, and experimentation phase.

Cloudflare’s mission is to help build a better Internet. As a proactive action, our cryptography team is preparing experiments on the deployment of post-quantum algorithms at Cloudflare scale. Watch our blog post for more details.

Over the last few decades, the word ‘quantum’ has become increasingly popular. It is common to find articles, reports, and many people interested in quantum mechanics and the new capabilities and improvements it brings to the scientific community. This topic not only concerns physics, since the development of quantum mechanics impacts on several other fields such as chemistry, economics, artificial intelligence, operations research, and undoubtedly, cryptography.

This post begins a trio of blogs describing the impact of quantum computing on cryptography, and how to use stronger algorithms resistant to the power of quantum computing.

This post introduces quantum computing and describes the main aspects of this new computing model and its devastating impact on security standards; it summarizes some approaches to securing information using quantum-resistant algorithms.

Due to the relevance of this matter, we present our experiments on a large-scale deployment of quantum-resistant algorithms.

Our third post introduces CIRCL, open-source Go library featuring optimized implementations of quantum-resistant algorithms and elliptic curve-based primitives.

All of this is part of Cloudflare’s Crypto Week 2019, now fasten your seatbelt and get ready to make a quantum leap.

What is Quantum Computing?

Back in 1981, Richard Feynman raised the question about what kind of computers can be used to simulate physics. However, some physical phenomena, such as quantum mechanics, cannot be simulated using a classical computer. Then, he conjectured the existence of a computer model that behaves under quantum mechanics rules, which opened a field of research now called quantum computing. To understand the basics of quantum computing, it is necessary to recall how classical computers work, and from that shine a spotlight on the differences between these computational models.

In 1936, Alan Turing and Emil Post independently described models that gave rise to the foundation of the computing model known as the Post-Turing machine, which describes how computers work and allowed further determination of limits for solving problems.

In this model, the units of information are bits, which store one of two possible values, usually denoted by 0 and 1. A computing machine contains a set of bits and performs operations that modify the values of the bits, also known as the machine’s state. Thus, a machine with N bits can be in one of 2ᴺ possible states. With this in mind, the Post-Turing computing model can be abstractly described as a machine of states, in which running a program is translated as machine transitions along the set of states.

A paper David Deutsch published in 1985 describes a computing model that extends the capabilities of a Turing machine based on the theory of quantum mechanics. This computing model introduces several advantages over the Turing model for processing large volumes of information. It also presents unique properties that deviate from the way we understand classical computing. Most of these properties come from the nature of quantum mechanics. We’re going to dive into these details before approaching the concept of quantum computing.

Superposition

One of the most exciting properties of quantum computing that provides an advantage over the classical computing model is superposition. In physics, superposition is the ability to produce valid states from the addition or superposition of several other states that are part of a system.

Applying these concepts to computing information, it means that there is a system in which it is possible to generate a machine state that represents a (weighted) sum of the states 0 and 1; in this case, the term weighted means that the state can keep track of “the quantity of” 0 and 1 present in the state. In the classical computation model, one bit can only store either the state of 0 or 1, not both; even using two bits, they cannot represent the weighted sum of these states. Hence, to make a distinction from the basic states, quantum computing uses the concept of a quantum bit (qubit) — a unit of information to denote the superposition of two states. This is a cornerstone concept of quantum computing as it provides a way of tracking more than a single state per unit of information, making it a powerful tool for processing information.

So, a qubit represents the sum of two parts: the 0 or 1 state plus the amount each 0/1 state contributes to produce the state of the qubit.

In mathematical notation, qubit \( | \Psi \rangle \) is an explicit sum indicating that a qubit represents the superposition of the states 0 and 1. This is the Dirac notation used to describe the value of a qubit \( | \Psi \rangle = A | 0 \rangle +B | 1 \rangle \), where, A and B are complex numbers known as the amplitude of the states 0 and 1, respectively. The value of the basic states is represented by qubits as \( | 0 \rangle = 1 | 0 \rangle + 0 | 1 \rangle \) and \( | 1 \rangle = 0 | 0 \rangle + 1 | 1 \rangle \), respectively. The right side of the term contains the abbreviated notation for these special states.

Measurement

In a classical computer, the values 0 and 1 are implemented as digital signals. Measuring the current of the signal automatically reveals the status of a bit. This means that at any moment the value of the bit can be observed or measured.

The state of a qubit is maintained in a physically closed system, meaning that the properties of the system, such as superposition, require no interaction with the environment; otherwise any interaction, like performing a measurement, can cause interference on the state of a qubit.

Measuring a qubit is a probabilistic experiment. The result is a bit of information that depends on the state of the qubit. The bit, obtained by measuring \( | \Psi \rangle = A | 0 \rangle +B | 1 \rangle \), will be equal to 0 with probability \( |A|^2 \), and equal to 1 with probability \( |B|^2 \), where \( |x| \) represents the absolute value of \(x\).

From Statistics, we know that the sum of probabilities of all possible events is always equal to 1, so it must hold that \( |A|^2 +|B|^2 =1 \). This last equation motivates to represent qubits as the points of a circle of radius one, and more generally, as the points on the surface of a sphere of radius one, which is known as the Bloch Sphere.

Let’s break it down: If you measure a qubit you also destroy the superposition of the qubit, resulting in a superposition state collapse, where it assumes one of the basics states, providing your final result.

Another way to think about superposition and measurement is through the coin tossing experiment.

Toss a coin in the air and you give people a random choice between two options: heads or tails. Now, don’t focus on the randomness of the experiment, instead note that while the coin is rotating in the air, participants are uncertain which side will face up when the coin lands. Conversely, once the coin stops with a random side facing up, participants are 100% certain of the status.

How does it relate? Qubits are similar to the participants. When a qubit is in a superposition of states, it is tracking the probability of heads or tails, which is the participants’ uncertainty quotient while the coin is in the air. However, once you start to measure the qubit to retrieve its value, the superposition vanishes, and a classical bit value sticks: heads or tails. Measurement is that moment when the coin is static with only one side facing up.

A fair coin is a coin that is not biased. Each side (assume 0=heads and 1=tails) of a fair coin has the same probability of sticking after a measurement is performed. The qubit \( \tfrac{1}{\sqrt{2}}|0\rangle + \tfrac{1}{\sqrt{2}}|1\rangle \) describes the probabilities of tossing a fair coin. Note that squaring either of the amplitudes results in ½, indicating that there is a 50% chance either heads or tails sticks.

It would be interesting to be able to charge a fair coin at will while it is in the air. Although this is the magic of a professional illusionist, this task, in fact, can be achieved by performing operations over qubits. So, get ready to become the next quantum magician!

Quantum Gates

A logic gate represents a Boolean function operating over a set of inputs (on the left) and producing an output (on the right). A logic circuit is a set of connected logic gates, a convenient way to represent bit operations.

Other gates are AND, OR, XOR, and NAND, and more. A set of gates is universal if it can generate other gates. For example, NOR and NAND gates are universal since any circuit can be constructed using only these gates.

Quantum computing also admits a description using circuits. Quantum gates operate over qubits, modifying the superposition of the states. For example, there is a quantum gate analogous to the NOT gate, the X gate.

The X quantum gate interchanges the amplitudes of the states of the input qubit.

The Z quantum gate flips the sign’s amplitude of state 1:

Another quantum gate is the Hadamard gate, which generates an equiprobable superposition of the basic states.

Using our coin tossing analogy, the Hadamard gate has the action of tossing a fair coin to the air. In quantum circuits, a triangle represents measuring a qubit, and the resulting bit is indicated by a double-wire.

Other gates, such as the CNOT gate, Pauli’s gates, Toffoli gate, Deutsch gate, are slightly more advanced. Quirk, the open-source playground, is a fun sandbox where you can construct quantum circuits using all of these gates.

Reversibility

An operation is reversible if there exists another operation that rolls back the output state to the initial state. For instance, a NOT gate is reversible since applying a second NOT gate recovers the initial input.

In contrast, AND, OR, NAND gates are not reversible. This means that some classical computations cannot be reversed by a classic circuit that uses only the output bits. However, if you insert additional bits of information, the operation can be reversed.

Quantum computing mainly focuses on reversible computations, because there’s always a way to construct a reversible circuit to perform an irreversible computation. The reversible version of a circuit could require the use of ancillary qubits as auxiliary (but not temporary) variables.

Due to the nature of composed systems, it could be possible that these ancillas (extra qubits) correlate to qubits of the main computation. This correlation makes it infeasible to reuse ancillas since any modification could have the side-effect on the operation of a reversible circuit. This is like memory assigned to a process by the operating system: the process cannot use memory from other processes or it could cause memory corruption, and processes cannot release their assigned memory to other processes. You could use garbage collection mechanisms for ancillas, but performing reversible computations increases your qubit budget.

Composed Systems

In quantum mechanics, a single qubit can be described as a single closed system: a system that has no interaction with the environment nor other qubits. Letting qubits interact with others leads to a composed system where more states are represented. The state of a 2-qubit composite system is denoted as \(A_0|00\rangle+A_1|01\rangle+A_2|10\rangle+A_3|11\rangle \), where, \( A_i \) values correspond to the amplitudes of the four basic states 00, 01, 10, and 11. This qubit \( \tfrac{1}{2}|00\rangle+\tfrac{1}{2}|01\rangle+\tfrac{1}{2}|10\rangle+\tfrac{1}{2}|11\rangle \) represents the superposition of these basic states, both having the same probability obtained after measuring the two qubits.

In the classical case, the state of N bits represents only one of 2ᴺ possible states, whereas a composed state of N qubits represents all the 2ᴺ states but in superposition. This is one big difference between these computing models as it carries two important properties: entanglement and quantum parallelism.

Entanglement

According to the theory behind quantum mechanics, some composed states can be described through the description of its constituents. However, there are composed states where no description is possible, known as entangled states.

The entanglement phenomenon was pointed out by Einstein, Podolsky, and Rosen in the so-called EPR paradox. Suppose there is a composed system of two entangled qubits, in which by performing a measurement in one qubit causes interference in the measurement of the second. This interference occurs even when qubits are separated by a long distance, which means that some information transfer happens faster than the speed of light. This is how quantum entanglement conflicts with the theory of relativity, where information cannot travel faster than the speed of light. The EPR paradox motivated further investigation for deriving new interpretations about quantum mechanics and aiming to resolve the paradox.

Quantum entanglement can help to transfer information at a distance by following a communication protocol. The following protocol examples rely on the fact that Alice and Bob separately possess one of two entangled qubits:

The superdense coding protocol allows Alice to communicate a 2-bit message \(m_0,m_1\) to Bob using a quantum communication channel, for example, using fiber optics to transmit photons. All Alice has to do is operate on her qubit according to the value of the message and send the resulting qubit to Bob. Once Bob receives the qubit, he measures both qubits, noting that the collapsed 2-bit state corresponds to Alice’s message.

The quantum teleportation protocol allows Alice to transmit a qubit to Bob without using a quantum communication channel. Alice measures the qubit to send Bob and her entangled qubit resulting in two bits. Alice sends these bits to Bob, who operates on his entangled qubit according to the bits received and notes that the result state matches the original state of Alice’s qubit.

Quantum Parallelism

Composed systems of qubits allow representation of more information per composed state. Note that operating on a composed state of N qubits is equivalent to operating over a set of 2ᴺ states in superposition. This procedure is quantum parallelism. In this setting, operating over a large volume of information gives the intuition of performing operations in parallel, like in the parallel computing paradigm; one big caveat is that superposition is not equivalent to parallelism.

Remember that a composed state is a superposition of several states so, a computation that takes a composed state of inputs will result in a composed state of outputs. The main divergence between classical and quantum parallelism is that quantum parallelism can obtain only one of the processed outputs. Observe that a measurement in the output of a composed state causes that the qubits collapse to only one of the outputs, making it unattainable to calculate all computed values.

Although quantum parallelism does not match precisely with the traditional notion of parallel computing, you can still leverage this computational power to get related information.

Deutsch-Jozsa Problem: Assume \(F\) is a function that takes as input N bits, outputs one bit, and is either constant (always outputs the same value for all inputs) or balanced (outputs 0 for half of the inputs and 1 for the other half). The problem is to determine if \(F\) is constant or balanced.

The quantum algorithm that solves the Deutsch-Jozsa problem uses quantum parallelism. First, N qubits are initialized in a superposition of 2ᴺ states. Then, in a single shot, it evaluates \(F\) for all of these states.

The result of applying \(F\) appears (in the exponent) of the amplitude of the all-zero state. Note that only when \(F\) is constant is this amplitude, either +1 or -1. If the result of measuring the N qubit is an all-zeros bitstring, then there is a 100% certainty that \(F\) is constant. Any other result indicates that \(F\) is balanced.

A deterministic classical algorithm solves this problem using \( 2^{N-1}+1\) evaluations of \(F\) in the worst case. Meanwhile, the quantum algorithm requires only one evaluation. The Deutsch-Jozsa problem exemplifies the exponential advantage of a quantum algorithm over classical algorithms.

Quantum Computers

The theory of quantum computing is supported by investigations in the field of quantum mechanics. However, constructing a quantum machine requires a physical system that allows representing qubits and manipulation of states in a reliable and precise way.

The DiVincenzo Criteria require that a physical implementation of a quantum computer must:

Be scalable and have well-defined qubits.

Be able to initialize qubits to a state.

Have long decoherence times to apply quantum error-correcting codes. Decoherence of a qubit happens when the qubit interacts with the environment, for example, when a measurement is performed.

Use a universal set of quantum gates.

Be able to measure single qubits without modifying others.

Quantum computer physical implementations face huge engineering obstacles to satisfy these requirements. The most important challenge is to guarantee low error rates during computation and measurement. Lowering these rates require techniques for error correction, which add a significant number of qubits specialized on this task. For this reason, the number of qubits of a quantum computer should not be regarded as for classical systems. In a classical computer, the bits of a computer are all effective for performing a calculation, whereas the number of qubits is the sum of the effective qubits (those used to make calculations) plus the ancillas (used for reversible computations) plus the error correction qubits.

Current implementations of quantum computers partially satisfy the DiVincenzo criteria. Quantum adiabatic computers fit in this category since they do not operate using quantum gates. For this reason, they are not considered to be universal quantum computers.

Quantum Adiabatic Computers

A recurrent problem in optimization is to find the global minimum of an objective function. For example, a route-traffic control system can be modeled as a function that reduces the cost of routing to a minimum. Simulated annealing is a heuristic procedure that provides a good solution to these types of problems. Simulated annealing finds the solution state by slowly introducing changes (the adiabatic process) on the variables that govern the system.

Quantum annealing is the analogous quantum version of simulated annealing. A qubit is initialized into a superposition of states representing all possible solutions to the problem. Here is used the Hamiltonian operator, which is the sum of vectors of potential and kinetic energies of the system. Hence, the objective function is encoded using this operator describing the evolution of the system in correspondence with time. Then, if the system is allowed to evolve very slowly, it will eventually land on a final state representing the optimal value of the objective function.

Currently, there exist adiabatic computers in the market, such as the D-Wave and IBM Q systems, featuring hundreds of qubits; however, their capabilities are somewhat limited to some problems that can be modeled as optimization problems. The limits of adiabatic computers were studied by van Dam et al, showing that despite solving local searching problems and even some instances of the max-SAT problem, there exists harder searching problems this computing model cannot efficiently solve.

Nuclear Magnetic Resonance

Nuclear Magnetic Resonance (NMR) is a physical phenomena that can be used to represent qubits. The spin of atomic nucleus of molecules is perturbed by an oscillating magnetic field. A 2001 report describes successful implementation of Shor’s algorithm in a 7-qubit NMR quantum computer. An iconic result since this computer was able to factor the number 15.

Superconducting Quantum Computers

One way to physically construct qubits is based on superconductors, materials that conduct electric current with zero resistance when exposed to temperatures close to absolute zero.

The Josephson effect, in which current flows across the junction of two superconductors separated by a non-superconducting material, is used to physically implement a superposition of states.

When a magnetic flux is applied to this junction, the current flows continuously in one direction. But, depending on the quantity of magnetic flux applied, the current can also flow in the opposite direction. There exists a quantum superposition of currents going both clockwise and counterclockwise leading to a physical implementation of a qubit called flux qubit. The complete device is known as the Superconducting Quantum Interference Device (SQUID) and can be easily coupled scaling the number of qubits. Thus, SQUIDs are like the transistors of a quantum computer.

Examples of superconducting computers are:

D-wave’s adiabatic computers process quantum annealing for solving diverse optimization problems.

IBM’s IBM-Q Tokyo, a 20-qubit adiabatic computer, and IBM Q Experience, a cloud-based system for exploring quantum circuits.

IBM Q System

The Imminent Threat of Quantum Algorithms

The quantum zoo website tracks problems that can be solved using quantum algorithms. As of mid-2018, more than 60 problems appear on this list, targeting diverse applications in the area of number theory, approximation, simulation, and searching. As terrific as it sounds, some easily-solvable problems by quantum computing are surrounding the security of information.

Grover’s Algorithm

Tales of a quantum detective (fragment). A couple of detectives have the mission of finding one culprit in a group of suspects that always respond to this question honestly: “are you guilty?”. The detective C follows a classic interrogative method and interviews every person one at a time, until finding the first one that confesses. The detective Q proceeds in a different way, First gather all suspects in a completely dark room, and after that, the detective Q asks them — are you guilty? — A steady sound comes from the room saying “No!” while at the same time, a single voice mixed in the air responds “Yes!.” Since everybody is submerged in darkness, the detective cannot see the culprit. However, detective Q knows that, as long as the interrogation advances, the culprit will feel desperate and start to speak louder and louder, and so, he continues asking the same question. Suddenly, detective Q turns on the lights, enters into the room, and captures the culprit. How did he do it?

The task of the detective can be modeled as a searching problem. Given a Boolean function \( f\) that takes N bits and produces one bit, to find the unique input \(x\) such that \( f(x)=1\).

A classical algorithm (detective C) finds \(x\) using \(2^N-1\) function evaluations in the worst case. However, the quantum algorithm devised by Grover, corresponding to detective Q, searches quadratically faster using around \(2^{N/2}\) function evaluations.

The key intuition of Grover’s algorithm is increasing the amplitude of the state that represents the solution while maintaining the other states in a lower amplitude. In this way, a system of N qubits, which is a superposition of 2ᴺ possible inputs, can be continuously updated using this intuition until the solution state has an amplitude closer to 1. Hence, after updating the qubits many times, there will be a high probability to measure the solution state.

Initially, a superposition of 2ᴺ states (horizontal axis) is set, each state has an amplitude (vertical axis) close to 0. The qubits are updated so that the amplitude of the solution state increases more than the amplitude of other states. By repeating the update step, the amplitude of the solution state gets closer to 1, which boosts the probability of collapsing to the solution state after measuring.

Grover’s Algorithm (pseudo-code):

Prepare an N qubit \(|x\rangle \) as a uniform superposition of 2ᴺ states.

Update the qubits by performing this core operation. $$ |x\rangle \mapsto (-1)^{f(x)} |x\rangle $$ The result of \( f(x) \) only flips the amplitude of the searched state.

Negate the N qubit over the average of the amplitudes.

Repeat Step 2 and 3 for \( (\tfrac{\pi}{4}) 2^{ N/2} \) times.

Measure the qubit and return the bits obtained.

Alternatively, the second step can be better understood as a conditional statement:

IF f(x) = 1 THEN
Negate the amplitude of the solution state.
ELSE
/* nothing */
ENDIF

Grover’s algorithm considers function \(f\) a black box, so with slight modifications, the algorithm can also be used to find collisions on the function. This implies that Grover’s algorithm can find a collision using an asymptotically less number of operations than using a brute-force algorithm.

The power of Grover’s algorithm can be turned against cryptographic hash functions. For instance, a quantum computer running Grover’s algorithm could find a collision on SHA256 performing only 2¹²⁸ evaluations of a reversible circuit of SHA256. The natural protection for hash functions is to increase the output size to double. More generally, most of symmetric key encryption algorithms will survive to the power of Grover’s algorithm by doubling the size of keys.

The scenario for public-key algorithms is devastating in face of Peter Shor’s algorithm.

Shor’s Algorithm

Multiplying integers is an easy task to accomplish, however, finding the factors that compose an integer is difficult. The integer factorization problem is to decompose a given integer number into its prime factors. For example, 42 has three factors 2, 3, and 7 since \( 2\times 3\times 7 = 42\). As the numbers get bigger, integer factorization becomes more difficult to solve, and the hardest instances of integer factorization are when the factors are only two different large primes. Thus, given an integer number \(N\), to find primes \(p\) and \(q\) such that \( N = p \times q\), is known as integer splitting.

Factoring integers is like cutting wood, and the specific task of splitting integers is analogous to using an axe for splitting the log in two parts. There exist many different tools (algorithms) for accomplishing each task.

For integer factorization, trial division, the Rho method, the elliptic curve method are common algorithms. Fermat’s method, the quadratic- and rational-sieve, leads to the (general) number field sieve (NFS) algorithm for integer splitting. The latter relies on finding a congruence of squares, that is, splitting \(N\) as a product of squares such that $$ N = x^2 – y^2 = (x+y)\times(x-y) $$ The complexity of NFS is mainly attached to the number of pairs \((x, y)\) that must be examined before getting a pair that factors \(N\). The NFS algorithm has subexponential complexity on the size of \(N\), meaning that the time required for splitting an integer increases significantly as the size of \(N\) grows. For large integers, the problem becomes intractable for classical computers.

The Axe of Thor Shor

The many different guesses of the NFS algorithm are analogous to hitting the log using a dulled axe; after subexponential many tries, the log is cut by half. However, using a sharper axe allows you to split the log faster. This sharpened axe is the quantum algorithm proposed by Shor in 1994.

Let \(x\) be an integer less than \(N\) and of the order \(k\). Then, if \(k\) is even, there exists an integer \(q\) so \(qN\) can be factored as follows.

This approach has some issues. For example, the factorization could correspond to \(q\) not \(N\) and the order of \(x\) is unknown, and here is where Shor’s algorithm enters the picture, finding the order of \(x\).

The internals of Shor’s algorithm rely on encoding the order \(k\) into a periodic function, so that its period can be obtained using the quantum version of the Fourier transform (QFT). The order of \(x\) can be found using a polynomial number quantum evaluations of Shor’s algorithm. Therefore, splitting integers using this quantum approach has polynomial complexity on the size of \(N\).

Shor’s algorithm carries strong implications on the security of the RSA encryption scheme because its security relies on integer factorization. A large-enough quantum computer can efficiently break RSA for current instances.

Alternatively, one may recur to elliptic curves, used in cryptographic protocols like ECDSA or ECDH. Moreover, all TLS ciphersuites use a combination of elliptic curve groups, large prime groups, and RSA and DSA signatures. Unfortunately, these algorithms all succumb to Shor’s algorithm. It only takes a few modifications for Shor’s algorithm to solve the discrete logarithm problem on finite groups. This sounds like a catastrophic story where all of our encrypted data and privacy are no longer secure with the advent of a quantum computer, and in some sense this is true.

On one hand, it is a fact that the quantum computers constructed as of 2019 are not large enough to run, for instance, Shor’s algorithm for the RSA key sizes used in standard protocols. For example, a 2018 report shows experiments on the factorization of a 19-bit number using 94 qubits, they also estimate that 147456 qubits are needed for factoring a 768-bit number. Hence, there numbers indicates that we are still far from breaking RSA.

What if we increment RSA key sizes to be resistant to quantum algorithms, just like for symmetric algorithms?

Bernstein et al. estimated that RSA public keys should be as large as 1 terabyte to maintain secure RSA even in the presence of quantum factoring algorithms. So, for public-key algorithms, increasing the size of keys does not help.

A recent investigation by Gidney and Ekerá shows improvements that accelerate the evaluation of quantum factorization. In their report, the cost of factoring 2048-bit integers is estimated to take a few hours using a quantum machine of 20 million qubits, which is far from any current development. Something worth noting is that the number of qubits needed is two orders of magnitude smaller than the estimated numbers given in previous works developed in this decade. Under these estimates, current encryption algorithms will remain secure several more years; however, consider the following not-so-unrealistic situation.

Information currently encrypted with for example, RSA, can be easily decrypted with a quantum computer in the future. Now, suppose that someone records encrypted information and stores them until a quantum computer is able to decrypt ciphertexts. Although this could be as far as 20 years from now, the forward-secrecy principle is violated. A 20-year gap to the future is sometimes difficult to imagine. So, let’s think backwards, what would happen if all you did on the Internet at the end of the 1990s can be revealed 20 years later — today. How does this impact the security of your personal information? What if the ciphertexts were company secrets or business deals? In 1999, most of us were concerned about the effects of the Y2K problem, now we’re facing Y2Q (years to quantum): the advent of quantum computers.

Post-Quantum Cryptography

Although the current capacity of the physical implementation of quantum computers is far from a real threat to secure communications, a transition to use stronger problems to protect information has already started. This wave emerged as post-quantum cryptography (PQC). The core idea of PQC is finding algorithms difficult enough that no quantum (and classical) algorithm can solve them.

A recurrent question is: How does it look like a problem that even a quantum computer can not solve?

These so-called quantum-resistant algorithms rely on different hard mathematical assumptions; some of them as old as RSA, others more recently proposed. For example, McEliece cryptosystem, formulated in the late 70s, relies on the hardness of decoding a linear code (in the sense of coding theory). The practical use of this cryptosystem didn’t become widespread, since with the passing of time, other cryptosystems superseded in efficiency. Fortunately, McEliece cryptosystem remains immune to Shor’s algorithm, gaining it relevance in the post-quantum era.

As of 2017, the NIST started an evaluation process that tracks possible alternatives for next-generation secure algorithms. From a practical perspective, all candidates present different trade-offs in implementation and usage. The time and space requirements are diverse; at this moment, it’s too early to define which will succeed RSA and elliptic curves. An initial round collected 70 algorithms for deploying key encapsulation mechanisms and digital signatures. As of early 2019, 28 of these survive and are currently in the analysis, investigation, and experimentation phase.

Cloudflare’s mission is to help build a better Internet. As a proactive action, our cryptography team is preparing experiments on the deployment of post-quantum algorithms at Cloudflare scale. Watch our blog post for more details.

We live in a completely connected society. A society connected by a variety of devices: laptops, mobile phones, wearables, self-driving or self-flying things. We have standards for a common language that allows these devices to communicate with each other. This is critical for wide-scale deployment – especially in cryptography where the smallest detail has great importance.

One of the most important standards-setting organizations is the National Institute of Standards and Technology (NIST), which is hugely influential in determining which standardized cryptographic systems see worldwide adoption. At the end of 2016, NIST announced it would hold a multi-year open project with the goal of standardizing new post-quantum (PQ) cryptographic algorithms secure against both quantum and classical computers.

Many of our devices have very different requirements and capabilities, so it may not be possible to select a “one-size-fits-all” algorithm during the process. NIST mathematician, Dustin Moody, indicated that institute will likely select more than one algorithm:

“There are several systems in use that could be broken by a quantum computer – public-key encryption and digital signatures, to take two examples – and we will need different solutions for each of those systems.”

Initially, NIST selected 82 candidates for further consideration from all submitted algorithms. At the beginning of 2019, this process entered its second stage. Today, there are 26 algorithms still in contention.

Post-quantum cryptography: what is it really and why do I need it?

In 1994, Peter Shor made a significant discovery in quantum computation. He found an algorithm for integer factorization and computing discrete logarithms, both believed to be hard to solve in classical settings. Since then it has become clear that the ‘hard problems’ on which cryptosystems like RSA and elliptic curve cryptography (ECC) rely – integer factoring and computing discrete logarithms, respectively – are efficiently solvable with quantum computing.

A quantum computer can help to solve some of the problems that are intractable on a classical computer. In theory, they could efficiently solve some fundamental problems in mathematics. This amazing computing power would be highly beneficial, which is why companies are actually trying to build quantum computers. At first, Shor’s algorithm was merely a theoretical result – quantum computers powerful enough to execute it did not exist – but this is quickly changing. In March 2018, Google announced a 72-qubit universal quantum computer. While this is not enough to break say RSA-2048 (still more is needed), many fundamental problems have already been solved.

In anticipation of wide-spread quantum computing, we must start the transition from classical public-key cryptography primitives to post-quantum (PQ) alternatives. It may be that consumers will never get to hold a quantum computer, but a few powerful attackers who will get one can still pose a serious threat. Moreover, under the assumption that current TLS handshakes and ciphertexts are being captured and stored, a future attacker could crack these stored individual session keys and use those results to decrypt the corresponding individual ciphertexts. Even strong security guarantees, like forward secrecy, do not help out much there.

In 2006, the academic research community launched a conference series dedicated to finding alternatives to RSA and ECC. This so-called post-quantum cryptography should run efficiently on a classical computer, but it should also be secure against attacks performed by a quantum computer. As a research field, it has grown substantially in popularity.

Several companies, including Google, Microsoft, Digicert and Thales, are already testing the impact of deploying PQ cryptography. Cloudflare is involved in some of this, but we want to be a company that leads in this direction. The first thing we need to do is understand the real costs of deploying PQ cryptography, and that’s not obvious at all.

What options do we have?

Many submissions to the NIST project are still under study. Some are very new and little understood; others are more mature and already standardized as RFCs. Some have been broken or withdrawn from the process; others are more conservative or illustrate how far classical cryptography would need to be pushed so that a quantum computer could not crack it within a reasonable cost. Some are very slow and big; others are not. But most cryptographic schemes can be categorized into these families: lattice-based, multivariate, hash-based (signatures only), code-based and isogeny-based.

For some algorithms, nevertheless, there is a fear they may be too inconvenient to use with today’s Internet. We must also be able to integrate new cryptographic schemes with existing protocols, such as SSH or TLS. To do that, designers of PQ cryptosystems must consider these characteristics:

Latency caused by encryption and decryption on both ends of the communication channel, assuming a variety of devices from big and fast servers to slow and memory constrained IoT (Internet of Things) devices

Small public keys and signatures to minimize bandwidth

Clear design that allows cryptanalysis and determining weaknesses that could be exploited

Use of existing hardware for fast implementation

The work on post-quantum public key cryptosystems must be done in a full view of organizations, governments, cryptographers, and the public. Emerging ideas must be properly vetted by this community to ensure widespread support.

Helping Build a Better Internet

To better understand the post-quantum world, Cloudflare began experimenting with these algorithms and used them to provide confidentiality in TLS connections.

With Google, we are proposing a wide-scale experiment that combines client- and server-side data collection to evaluate the performance of key-exchange algorithms on actual users’ devices. We hope that this experiment helps choose an algorithm with the best characteristics for the future of the Internet. With Cloudflare’s highly distributed network of access points and Google’s Chrome browser, both companies are in a very good position to perform this experiment.

Our goal is to understand how these algorithms act when used by real clients over real networks, particularly candidate algorithms with significant differences in public-key or ciphertext sizes. Our focus is on how different key sizes affect handshake time in the context of Transport Layer Security (TLS) as used on the web over HTTPS.

Our primary candidates are an NTRU-based construction called HRSS-SXY (by Hülsing – Rijneveld – Schanck – Schwabe, and Tsunekazu Saito – Keita Xagawa – Takashi Yamakawa) and an isogeny-based Supersingular Isogeny Key Encapsulation (SIKE). Details of both algorithms are described in more detail below in section “Dive into post-quantum cryptography”. This table shows a few characteristics for both algorithms. Performance timings were obtained by running the BoringSSL speed test on an Intel Skylake CPU.

KEM

Public Key size (bytes)

Ciphertext (bytes)

Secret size (bytes)

KeyGen (op/sec)

Encaps (op/sec)

Decaps (op/sec)

NIST level

HRSS-SXY

1138

1138

32

3952.3

76034.7

21905.8

1

SIKE/p434

330

346

16

367.1

228.0

209.3

1

Currently the most commonly used key exchange algorithm (according to Cloudflare’s data) is the non-quantum X25519. Its public keys are 32 bytes and BoringSSL can generate 49301.2 key pairs, and is able to perform 19628.6 key agreements every second on my Skylake CPU.

Note that HRSS-SXY shows a significant speed advantage, while SIKE has a size advantage. In our experiment, we will deploy these two algorithms on both the server side using Cloudflare’s infrastructure, and the client side using Chrome Canary; both sides will collect telemetry information about TLS handshakes using these two PQ algorithms to see how they perform in practice.

What do we expect to find?

In 2018, Adam Langley conducted an experiment with the goal of evaluating the likely latency impact of a post-quantum key exchange in TLS. Chrome was augmented with the ability to include a dummy, arbitrarily-sized extension in the TLS ClientHello (fixed number of bytes of random noise). After taking into account the performance and key size offered by different types key-exchange schemes, he concluded that constructs based on structured lattices may be most suitable for future use in TLS.

However, Langley also observed a peculiar phenomenon; client connections measured at 95th percentile had much higher latency than the median. It means that in those cases, isogeny-based systems may be a better choice. In the “Dive into post-quantum cryptography”, we describe the difference between isogeny-based SIKE and lattice-based NTRU cryptosystems.

In our experiment, we want to more thoroughly evaluate and ascribe root causes to these unexpected latency increases. We would particularly like to learn more about the characteristics of those networks: what causes increased latency? how does the performance cost of isogeny-based algorithms impact the TLS handshake? We want to answer key questions, like:

What is a good ratio for speed-to-key size (or how much faster could SIKE get to achieve the client-perceived performance of HRSS)?

How do network middleboxes behave when clients use new PQ algorithms, and which networks have problematic middleboxes?

How do the different properties of client networks affect TLS performance with different PQ key exchanges? Can we identify specific autonomous systems, device configurations, or network configurations that favor one algorithm over another? How is performance affected in the long tail?

Experiment Design

Our experiment will involve both server- and client-side performance statistics collection from real users around the world (all the data is anonymized). Cloudflare is operating the server-side TLS connections. We will enable the CECPQ2 (HRSS + X25519) and CECPQ2b (SIKE + X25519) key-agreement algorithms on all TLS-terminating edge servers.

In this experiment, the ClientHello will contain a CECPQ2 or CECPQ2b public key (but never both). Additionally, Chrome will always include X25519 for servers that do not support post-quantum key exchange. The post-quantum key exchange will only be negotiated in TLS version 1.3 when both sides support it.

Since Cloudflare only measures the server side of the connection, it is impossible to determine the time it takes for a ClientHello sent from Chrome to reach Cloudflare’s edge servers; however, we can measure the time it takes for the TLS ServerHello message containing post-quantum key exchange, to reach the client and for the client to respond.

On the client side, Chrome Canary will operate the TLS connection. Google will enable either CECPQ2 or CECPQ2b in Chrome for the following mix of architecture and OSes:

x86-64: Windows, Linux, macOS, ChromeOS

aarch64: Android

Our high-level expectation is to get similar results as Langley’s original experiment in 2018 — slightly increased latency for the 50th percentile and higher latency for the 95th. Unfortunately, data collected purely from real users’ connections may not suffice for diagnosing the root causes of why some clients experience excessive slowdown. To this end, we will perform follow-up experiments based on per-client information we collect server-side.

Our primary hypothesis is that excessive slowdowns, like those Langley observed, are largely due to in-network events, such as middleboxes or bloated/lossy links. As a first-pass analysis, we will investigate whether the slowed-down clients share common network features, like common ASes, common transit networks, common link types, and so on. To determine this, we will run a traceroute from vantage points close to our servers back toward the clients (not overloading any particular links or hosts) and study whether some client locations are subject to slowdowns for all destinations or just for some.

Dive into post-quantum cryptography

Be warned: the details of PQ cryptography may be quite complicated. In some cases it builds on classical cryptography, and in other cases it is completely different math. It would be rather hard to describe details in a single blog post. Instead, we are giving you an intuition of post-quantum cryptography, rather than provide deep academic-level descriptions. We’re skipping a lot of details for the sake of brevity. Nevertheless, settle in for a bit of an epic journey because we have a lot to cover.

Key encapsulation mechanism

NIST requires that all key-agreement algorithms have a form of key-encapsulation mechanism (KEM). The KEM is a simplified form of public key encryption (PKE). As PKE, it also allows agreement on a secret, but in a slightly different way. The idea is that the session key is an output of the encryption algorithm, conversely to public key encryption schemes where session key is an input to the algorithm. In a KEM, Alice generates a random key and uses the pre-generated public key from Bob to encrypt (encapsulate) it. This results in a ciphertext sent to Bob. Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the random key. The idea was initially introduced by Cramer and Shoup. Experience shows that such constructs are easier to design, analyze, and implement as the scheme is limited to communicating a fixed-size session key. Leonardo Da Vinci said, “Simplicity is the ultimate sophistication,” which is very true in cryptography.

The key exchange (KEX) protocol, like Diffie-Hellman, is yet a different construct: it allows two parties to agree on a shared secret that can be used as a symmetric encryption key. For example, Alice generates a key pair and sends a public key to Bob. Bob does the same and uses his own key pair with Alice’s public key to generate the shared secret. He then sends his public key to Alice who can now generate the same shared secret. What’s worth noticing is that both Alice and Bob perform exactly the same operations.

KEM construction can be converted to KEX. Alice performs key generation and sends the public key to Bob. Bob uses it to encapsulate a symmetric session key and sends it back to Alice. Alice decapsulates the ciphertext received from Bob and gets the symmetric key. This is actually what we do in our experiment to make integration with the TLS protocol less complicated.

NTRU Lattice-based Encryption

We will enable the CECPQ2 implemented by Adam Langley from Google on our servers. He described this implementation in detail here. This key exchange uses the HRSS algorithm, which is based on the NTRU (N-Th Degree TRUncated Polynomial Ring) algorithm. Foregoing too much detail, I am going to explain how NTRU works and give simplified examples, and finally, compare it to HRSS.

NTRU is a cryptosystem based on a polynomial ring. This means that we do not operate on numbers modulo a prime (like in RSA), but on polynomials of degree \( N \) , where the degree of a polynomial is the highest exponent of its variable. For example, \(x^7 + 6x^3 + 11x^2 \) has degree of 7.

One can add polynomials in the ring in the usual way, by simply adding theirs coefficients modulo some integer. In NTRU this integer is called \( q \). Polynomials can also be multiplied, but remember, you are operating in the ring, therefore the result of a multiplication is always a polynomial of degree less than \(N\). It basically means that exponents of the resulting polynomial are added to modulo\(N\).

In other words, polynomial ring arithmetic is very similar to modular arithmetic, but instead of working with a set of numbers less than N, you are working with a set of polynomials with a degree less than N.

To instantiate the NTRU cryptosystem, three domain parameters must be chosen:

\(N\) – degree of the polynomial ring, in NTRU the principal objects are polynomials of degree \(N-1\).

\(p\) – small modulus used during key generation and decryption for reducing message coefficients.

\(q\) – large modulus used during algorithm execution for reducing coefficients of the polynomials.

First, we generate a pair of public and private keys. To do that, two polynomials \(f\) and \(g\) are chosen from the ring in a way that their randomly generated coefficients are much smaller than \(q\). Then key generation computes two inverses of the polynomial: $$ f_p= f^{-1} \bmod{p} \\ f_q= f^{-1} \bmod{q} $$

The last step is to compute $$ pk = p\cdot f_q\cdot g \bmod q $$, which we will use as public key pk. The private key consists of \(f\) and \(f_p\). The \(f_q\) is not part of any key, however it must remain secret.

It might be the case that after choosing \(f\), the inverses modulo \(p\) and \( q \) do not exist. In this case, the algorithm has to start from the beginning and generate another \(f\). That’s unfortunate because calculating the inverse of a polynomial is a costly operation. HRSS brings an improvement to this issue since it ensures that those inverses always exist, making key generation faster than as proposed initially in NTRU.

The encryption of a message \(m\) proceeds as follows. First, the message \(m\) is converted to a ring element \(pt\) (there exists an algorithm for performing this conversion in both directions). During encryption, NTRU randomly chooses one polynomial \(b\) called blinder. The goal of the blinder is to generate different ciphertexts per encyption. Thus, the ciphetext \(ct\) is obtained as $$ ct = (b\cdot pk + pt ) \bmod q $$ Decryption looks a bit more complicated but it can also be easily understood. Decryption uses both the secret value \(f\) and to recover the plaintext as $$ v = f \cdot ct \bmod q \\ pt = v \cdot f_p \bmod p $$

This diagram demonstrates why and how decryption works.

After obtaining \(pt\), the message \(m\) is recovered by inverting the conversion function.

The underlying hard assumption is that given two polynomials: \(f\) and \(g\) whose coefficients are short compared to the modulus \(q\), it is difficult to distinguish \(pk = \frac{f}{g} \) from a random element in the ring. It means that it’s hard to find \(f\) and \(g\) given only public key pk.

Lattices

NTRU cryptosystem is a grandfather of lattice-based encryption schemes. The idea of using difficult problems for cryptographic purposes was due to Ajtai. His work evolved into a whole area of research with the goal of creating more practical, lattice-based cryptosystems.

What is a lattice and why it can be used for post-quantum crypto?

The picture below visualizes lattice as points in a two-dimensional space. A lattice is defined by the origin \(O\) and base vectors \( \{ b_1 , b_2\} \). Every point on the lattice is represented as a linear combination of the base vectors, for example \(V = -2b_1+b_2\).

There are two classical NP-hard problems in lattice-based cryptography:

Shortest Vector Problem (SVP): Given a lattice, to find the shortest non-zero vector in the lattice. In the graph, the vector \(s\) is the shortest one. The SVP problem is NP-hard only under some assumptions.

Closest Vector Problem (CVP). Given a lattice and a vector \(V\) (not necessarily in the lattice), to find the closest vector to \(V\). For example, the closest vector to \(t\) is \(z\).

In the graph above, it is easy for us to solve SVP and CVP by simple inspection. However, the lattices used in cryptography have higher dimensions, say above 1000, as well as highly non-orthogonal basis vectors. On these instances, the problems get extremely hard to solve. It’s even believed future quantum computers will have it tough.

NTRU vs HRSS

HRSS, which we use in our experiment, is based on NTRU, but a slightly better instantiation. The main improvements are:

Faster key generation algorithm.

NTRU encryption can produce ciphertexts that are impossible to decrypt (true for many lattice-based schemes). But HRSS fixes this problem.

HRSS is a key encapsulation mechanism.

CECPQ2b – Isogeny-based Post-Quantum TLS

Following CECPQ2, we have integrated into BoringSSL another hybrid key exchange mechanism relying on SIKE. It is called CECPQ2b and we will use it in our experimentation in TLS 1.3. SIKE is a key encapsulation method based on Supersingular Isogeny Diffie-Hellman (SIDH). Read more about SIDH in our previous post. The math behind SIDH is related to elliptic curves. A comparison between SIDH and the classical Elliptic Curve Diffie-Hellman (ECDH) is given.

An elliptic curve is a set of points that satisfy a specific mathematical equation. The equation of an elliptic curve may have multiple forms, the standard form is called the Weierstrass equation $$ y^2 = x^3 +ax +b $$ and its shape can look like the red curve.

An interesting fact about elliptic curves is have a group structure. That is, the set of points on the curve have associated a binary operation called point addition. The set of points on the elliptic curve is closed under addition. Thus,adding two points results in another point that is also on the elliptic curve.

If we can add two different points on a curve, then we can also add one point to itself. And if we do it multiple times, then the resulting operations is known as a scalar multiplication and denoted as \(Q = k\cdot P = P+P+\dots+P\) for an integer \(k\).

Multiplication of scalars is commutative. It means that two scalar multiplications can be evaluated in any order \( \color{darkred}{k_a}\cdot\color{darkgreen}{k_b} = \color{darkgreen}{k_b}\cdot\color{darkred}{k_a} \); this an important property that makes ECDH possible.

It turns out that carefully if choosing an elliptic curve “correctly”, scalar multiplication is easy to compute but extremely hard to reverse. Meaning, given two points \(Q\) and \(P\) such that \(Q=k\cdot P\), finding the integer k is a difficult task known as the Elliptic Curve Discrete Logarithm problem (ECDLP). This problem is suitable for cryptographic purposes.

Alice and Bob agree on a secret key as follows. Alice generates a private key \( k_a\). Then, she uses some publicly known point \(P\) and calculates her public key as \( Q_a = k_a\cdot P\). Bob proceeds in similar fashion and gets \(k_b\) and \(Q_b = k_b\cdot P\). To agree on a shared secret, each party multiplies their private key with the public key of the other party. The result of this is the shared secret. Key agreement as described above, works thanks to the fact that scalars can commute: $$ \color{darkgreen}{k_a} \cdot Q_b = \color{darkgreen}{k_a} \cdot \color{darkred}{k_b} \cdot P \iff \color{darkred}{k_b} \cdot \color{darkgreen}{k_a} \cdot P = \color{darkred}{k_b} \cdot Q_a $$

There is a vast theory behind elliptic curves. An introduction to elliptic curve cryptography was posted before and more details can be found in this book. Now, lets describe SIDH and compare with ECDH.

Isogenies on Elliptic Curves

Before explaining the details of SIDH key exchange, I’ll explain the 3 most important concepts, namely: j-invariant, isogeny and its kernel.

Each curve has a number that can be associated to it. Let’s call this number a j-invariant. This number is not unique per curve, meaning many curves have the same value of j-invariant, but it can be viewed as a way to group multiple elliptic curves into disjoint sets. We say that two curves are isomorphic if they are in the same set, called the isomorphism class. The j-invariant is a simple criterion to determine whether two curves are isomorphic. The j-invariant of a curve \(E\) in Weierstrass form \( y^2 = x^3 + ax + b\) is given as $$ j(E) = 1728\frac{4a^3}{4a^3 +27b^2} $$

When it comes to isogeny, think about it as a map between two curves. Each point on some curve \( E \) is mapped by isogeny to the point on isogenous curve \( E’ \). We denote mapping from curve \( E \) to \( E’ \) by isogeny \( \phi \) as:

$$\phi: E \rightarrow E’ $$

It depends on the map if those two curves are isomorphic or not. Isogeny can be visualised as:

There may exist many of those mappings, each curve used in SIDH has small number of isogenies to other curves. Natural question is how do we compute such isogeny. Here is where the kernel of an isogeny comes. The kernel uniquely determines isogeny (up to isomorphism class). Formulas for calculating isogeny from its kernel were initially given by J. Vélu and the idea of calculating them efficiently was extended.

To finish, I will summarize what was said above with a picture.

There are two isomorphism classes on the picture above. Both curves \(E_1\) and \(E_2\) are isomorphic and have j-invariant = 6. As curves \(E_3\) and \(E_4\) have j-invariant=13, they are in a different isomorphism class. There exists an isogeny \(\phi_2\) between curve \(E_3\) and \(E_2\), so they both are isogeneous. Curves \( \phi_1 \) and \( E_2 \) are isomorphic and there is isogeny \( \phi_1 \) between them. Curves \( E_1\) and \(E_4\) are not isomorphic.

For brevity I’m skipping many important details, like details of the finite field, the fact that isogenies must be separable and that the kernel is finite. But curious readers can find a number of academic research papers available on the Internet.

Big picture: similarities with ECDH

Let’s generalize the ECDH algorithm described above, so that we can swap some elements and try to use Supersingular Isogeny Diffie-Hellman.

Note that what actually happens during an ECDH key exchange is:

We have a set of points on elliptic curve, set S

We have another group of integers used for point multiplication, G

We use an element from Z to act on an element from S to get another element from S:

$$ G \cdot S \rightarrow S $$

Now the question is: what is our G and S in an SIDH setting? For SIDH to work, we need a big set of elements and something secret that will act on the elements from that set. This “group action” must also be resistant to attacks performed by quantum computers.

In the SIDH setting, those two sets are defined as:

Set S is a set (graph) of j-invariants, such that all the curves are supersingular: \( S = [j(E_1), j(E_2), j(E_3), …. , j(E_n)]\)

Set G is a set of isogenies acting on elliptic curves and transforming, for example, the elliptic curve \(E_1\) into \(E_n\):

Random walk on supersingular graph

When we talk about Isogeny Based Cryptography, as a topic distinct from Elliptic Curve Cryptography, we usually mean algorithms and protocols that rely fundamentally on the structure of isogeny graphs. An example of such a (small) graph is pictured below.

Each vertex of the graph represents a different j-invariant of a set of supersingular curves. The edges between vertices represent isogenies converting one elliptic curve to another. As you can notice, the graph is strongly connected, meaning every vertex can be reached from every other vertex. In the context of isogeny-based crypto, we call such a graph a supersingular isogeny graph. I’ll skip some technical details about the construction of this graph (look for those here or here), but instead describe ideas about how it can be used.

As the graph is strongly connected, it is possible to walk a whole graph by starting from any vertex, randomly choosing an edge, following it to the next vertex and then start the process again on a new vertex. Such a way of visiting edges of this graph is called a random walk.

The random walk is a key concept that makes isogeny based crypto feasible. When you look closely at the graph, you can notice that each vertex has a small number of edges incident to it, this is why we can compute the isogenies efficiently. But also for any vertex there is only a limited number of isogenies to choose from, which doesn’t look like good base for a cryptographic scheme. The key question is – where does the security of the scheme come from exactly? In order to get it, it is necessary to visit a couple hundred vertices. What it means in practice is that secret isogeny (of large degree) is constructed as a composition of multiple isogenies (of small, prime degree). Which means, the secret isogeny is:

This property and properties of the isogeny graph are what makes some of us believe that scheme has a good chance to be secure. More specifically, there is no efficient way of finding a path that connects \( E_0 \) with \( E_n \), even with quantum computer at hand. The security level of a system depends on value n – the number of steps taken during the walk.

The random walk is a core process used when both generating public keys and computing shared secrets. It starts with party generating random value m (see more below), starting curve \(E_0\)and points P and Q on this curve. Those values are used to compute the kernel of an isogeny \( R_1 \) in the following way:

$$ R_1 = P + m \cdot Q $$

Thanks to formulas given by Vélu we can now use the point \( R_1 \)to compute the isogeny, the party will choose to move from a vertex to another one. After the isogeny \( \phi_{R_1} \) is calculated it is applied to \( E_0 \) which results in a new curve \( E_1 \):

$$ \phi_{R_1}: E_0 \rightarrow E_1 $$

Isogeny is also applied to points P and Q. Once on \( E_1 \) the process is repeated. This process is applied n times, and at the end a party ends up on some curve \( E_n \) which defines isomorphism class, so also j-invariant.

Supersingular Isogeny Diffie-Hellman

The core idea in SIDH is to compose two random walks on an isogeny graph of elliptic curves in such a way that the end node of both ways of composing is the same.

In order to do it, scheme sets public parameters – starting curve \( E_0 \) and 2 pairs of base points on this curve \( (PA,QA) \) , \( (PB,QB) \). Alice generates her random secret keys m, and calculates a secret isogeny \( \phi_q \) by performing a random walk as described above. The walk finishes with 3 values: elliptic curve \( E_a \) she has ended up with and pair of points \( \phi_a(PB) \) and \( \phi_a(QB) \) after pushing through Alice’s secret isogeny. Bob proceeds analogouslywhich results in the triple \( {E_b, \phi_b(PA), \phi_b(QA)} \). The triple forms a public key which is exchanged between parties.

The picture below visualizes the operation. The black dots represent curves grouped in the same isomorphism classes represented by light blue circles. Alice takes the orange path ending up on a curve \( E_a \) in a separate isomorphism class than Bob after taking his dark blue path ending on \( E_b \). SIDH is parametrized in a way that Alice and Bob will always end up in different isomorphism classes.

Upon receipt of triple \( { E_a, \phi_a(PB), \phi_a(QB) } \) from Alice, Bob will use his secret value m to calculate a new kernel – but instead of using point \(PA\) and \(QA\) to calculate an isogeny kernel, he will now use images \( \phi_a(PB) \) and \( \phi_a(QB) \) received from Alice:

$$ R’_1 = \phi_a(PB) + m \cdot \phi_a(QB) $$

Afterwards, he uses \( R’_1 \) to start the walk again resulting in the isogeny \( \phi’_b: E_a \rightarrow E_{ab} \). Allice proceeds analogously resulting in the isogeny \(\phi’_a: E_b \rightarrow E_{ba} \). With isogenies calculated this way, both Alice and Bob will converge in the same isomorphism class. The math math may seem complicated, hopefully the picture below makes it easier to understand.

Bob computes a new isogeny and starts his random walk from \( E_a \) received from Alice. He ends up on some curve \(E_{ba}\). Similarly, Alice calculates a new isogeny, applies it on \( E_b \) received from Bob and her random walk ends on some curve \(E_{ab}\). Curves \(E_{ab}\) and \(E_{ba}\) are not likely to be the same, but construction guarantees that they are isomorphic. As mentioned earlier, isomorphic curves have the same value of j-invariant, hence the shared secret is a value of j-invariant \(j(E_{ab})\).

Coming back to differences between SIDH and ECDH – we can split them into four categories: the elements of the group we are operating on, the cornerstone computation required to agree on a shared secret, the elements representing secret values, and the difficult problem on which the security relies.

In ECDH there is a secret key which is an integer scalar, in case of SIDH it is a secret isogeny, which also is generated from an integer scalar. In the case of ECDH one multiplies a point on a curve by a scalar, in the case of SIDH it is a random walk in an isogeny graph. In the case of ECDH, the public key is a point on a curve, in the case of SIDH, the public part is a curve itself and the image of some points after applying isogeny. The shared secret in the case of ECDH is a point on a curve, in the case of SIDH it is a j-invariant.

SIKE: Supersingular Isogeny Key Encapsulation

SIDH could potentially be used as a drop-in replacement of the ECDH protocol. We have actually implemented a proof-of-concept and added it to our implementation of TLS 1.3 in the tls-tris library and described (together with Mozilla) implementation details in this draft. Nevertheless, there is a problem with SIDH – the keys can be used only once. In 2016, a few researchers came up with an active attack on SIDH which works only when public keys are reused. In the context of TLS, it is not a big problem, because for each session a fresh key pair is generated (ephemeral keys), but it may not be true for other applications.

SIKE is an isogeny key encapsulation which solves this problem. Bob can generate SIKE keys, upload the public part somewhere in the Internet and then anybody can use it whenever he wants to communicate with Bob securely. SIKE reuses SIDH – internally both sides of the connection always perform SIDH key generation, SIDH key agreement and apply some other cryptographic primitives in order to convert SIDH to KEM. SIKE is implemented in a few variants – each variant corresponds to the security levels using 128-, 192- and 256-bit secret keys. Higher security level means longer running time. More details about SIKE can be found here.

SIKE is also one of the candidates in NIST post-quantum “competition“.

I’ve skipped many important details to give a brief description of how isogeny based crypto works. If you’re curious and hungry for details, look at either of these Cloudflare meetups, where Deirdre Connolly talked about isogeny-based cryptography or this talk by Chloe Martindaleduring PQ Crypto School 2017. And if you would like to know more about quantum attacks on this scheme, I highly recommend this work.

Conclusion

Quantum computers that can break meaningful cryptographic parameter settings do not exist, yet. They won’t be built for at least the next few years. Nevertheless, they have already changed the way we look at current cryptographic deployments. There are at least two reasons it’s worth investing in PQ cryptography:

It takes a lot of time to build secure cryptography and we don’t actually know when today’s classical cryptography will be broken. There is a need for a good mathematical base: an initial idea of what may be secure against something that doesn’t exist yet. If you have an idea, you also need good implementation, constant time, resistance to things like time and cacheside-channels, DFA, DPA, EM, and a bunch of other abbreviations indicating side-channel resistance. There is also deployment of, for example, algorithms based on elliptic curves were introduced in ’85, but started to really be used in production only during the last decade, 20 or so years later. Obviously, the implementation must be blazingly fast! Last, but not least, integration: we need time to develop standards to allow integration of PQ cryptography with protocols like TLS.

Even though efficient quantum computers probably won’t exist for another few years, the threat is real. Data encrypted with current cryptographic algorithms can be recorded now with hopes of being broken in the future.

Cloudflare is motivated to help build the Internet of tomorrow with the tools at hand today. Our interest is in cryptographic techniques that can be integrated into existing protocols and widely deployed on the Internet as seamlessly as possible. PQ cryptography, like the rest of cryptography, includes many cryptosystems that can be used for communications in today’s Internet; Alice and Bob need to perform some computation, but they do not need to buy new hardware to do that.

Cloudflare sees great potential in those algorithms and believes that some of them can be used as a safe replacement for classical public-key cryptosystems. Time will tell if we’re justified in this belief!

We live in a completely connected society. A society connected by a variety of devices: laptops, mobile phones, wearables, self-driving or self-flying things. We have standards for a common language that allows these devices to communicate with each other. This is critical for wide-scale deployment – especially in cryptography where the smallest detail has great importance.

One of the most important standards-setting organizations is the National Institute of Standards and Technology (NIST), which is hugely influential in determining which standardized cryptographic systems see worldwide adoption. At the end of 2016, NIST announced it would hold a multi-year open project with the goal of standardizing new post-quantum (PQ) cryptographic algorithms secure against both quantum and classical computers.

Many of our devices have very different requirements and capabilities, so it may not be possible to select a “one-size-fits-all” algorithm during the process. NIST mathematician, Dustin Moody, indicated that institute will likely select more than one algorithm:

“There are several systems in use that could be broken by a quantum computer – public-key encryption and digital signatures, to take two examples – and we will need different solutions for each of those systems.”

Initially, NIST selected 82 candidates for further consideration from all submitted algorithms. At the beginning of 2019, this process entered its second stage. Today, there are 26 algorithms still in contention.

Post-quantum cryptography: what is it really and why do I need it?

In 1994, Peter Shor made a significant discovery in quantum computation. He found an algorithm for integer factorization and computing discrete logarithms, both believed to be hard to solve in classical settings. Since then it has become clear that the ‘hard problems’ on which cryptosystems like RSA and elliptic curve cryptography (ECC) rely – integer factoring and computing discrete logarithms, respectively – are efficiently solvable with quantum computing.

A quantum computer can help to solve some of the problems that are intractable on a classical computer. In theory, they could efficiently solve some fundamental problems in mathematics. This amazing computing power would be highly beneficial, which is why companies are actually trying to build quantum computers. At first, Shor’s algorithm was merely a theoretical result – quantum computers powerful enough to execute it did not exist – but this is quickly changing. In March 2018, Google announced a 72-qubit universal quantum computer. While this is not enough to break say RSA-2048 (still more is needed), many fundamental problems have already been solved.

In anticipation of wide-spread quantum computing, we must start the transition from classical public-key cryptography primitives to post-quantum (PQ) alternatives. It may be that consumers will never get to hold a quantum computer, but a few powerful attackers who will get one can still pose a serious threat. Moreover, under the assumption that current TLS handshakes and ciphertexts are being captured and stored, a future attacker could crack these stored individual session keys and use those results to decrypt the corresponding individual ciphertexts. Even strong security guarantees, like forward secrecy, do not help out much there.

In 2006, the academic research community launched a conference series dedicated to finding alternatives to RSA and ECC. This so-called post-quantum cryptography should run efficiently on a classical computer, but it should also be secure against attacks performed by a quantum computer. As a research field, it has grown substantially in popularity.

Several companies, including Google, Microsoft, Digicert and Thales, are already testing the impact of deploying PQ cryptography. Cloudflare is involved in some of this, but we want to be a company that leads in this direction. The first thing we need to do is understand the real costs of deploying PQ cryptography, and that’s not obvious at all.

What options do we have?

Many submissions to the NIST project are still under study. Some are very new and little understood; others are more mature and already standardized as RFCs. Some have been broken or withdrawn from the process; others are more conservative or illustrate how far classical cryptography would need to be pushed so that a quantum computer could not crack it within a reasonable cost. Some are very slow and big; others are not. But most cryptographic schemes can be categorized into these families: lattice-based, multivariate, hash-based (signatures only), code-based and isogeny-based.

For some algorithms, nevertheless, there is a fear they may be too inconvenient to use with today’s Internet. We must also be able to integrate new cryptographic schemes with existing protocols, such as SSH or TLS. To do that, designers of PQ cryptosystems must consider these characteristics:

Latency caused by encryption and decryption on both ends of the communication channel, assuming a variety of devices from big and fast servers to slow and memory constrained IoT (Internet of Things) devices

Small public keys and signatures to minimize bandwidth

Clear design that allows cryptanalysis and determining weaknesses that could be exploited

Use of existing hardware for fast implementation

The work on post-quantum public key cryptosystems must be done in a full view of organizations, governments, cryptographers, and the public. Emerging ideas must be properly vetted by this community to ensure widespread support.

Helping Build a Better Internet

To better understand the post-quantum world, Cloudflare began experimenting with these algorithms and used them to provide confidentiality in TLS connections.

With Google, we are proposing a wide-scale experiment that combines client- and server-side data collection to evaluate the performance of key-exchange algorithms on actual users’ devices. We hope that this experiment helps choose an algorithm with the best characteristics for the future of the Internet. With Cloudflare’s highly distributed network of access points and Google’s Chrome browser, both companies are in a very good position to perform this experiment.

Our goal is to understand how these algorithms act when used by real clients over real networks, particularly candidate algorithms with significant differences in public-key or ciphertext sizes. Our focus is on how different key sizes affect handshake time in the context of Transport Layer Security (TLS) as used on the web over HTTPS.

Our primary candidates are an NTRU-based construction called HRSS-SXY (by Hülsing – Rijneveld – Schanck – Schwabe, and Tsunekazu Saito – Keita Xagawa – Takashi Yamakawa) and an isogeny-based Supersingular Isogeny Key Encapsulation (SIKE). Details of both algorithms are described in more detail below in section “Dive into post-quantum cryptography”. This table shows a few characteristics for both algorithms. Performance timings were obtained by running the BoringSSL speed test on an Intel Skylake CPU.

KEM

Public Key size (bytes)

Ciphertext (bytes)

Secret size (bytes)

KeyGen (op/sec)

Encaps (op/sec)

Decaps (op/sec)

NIST level

HRSS-SXY

1138

1138

32

3952.3

76034.7

21905.8

1

SIKE/p434

330

346

16

367.1

228.0

209.3

1

Currently the most commonly used key exchange algorithm (according to Cloudflare’s data) is the non-quantum X25519. Its public keys are 32 bytes and BoringSSL can generate 49301.2 key pairs, and is able to perform 19628.6 key agreements every second on my Skylake CPU.

Note that HRSS-SXY shows a significant speed advantage, while SIKE has a size advantage. In our experiment, we will deploy these two algorithms on both the server side using Cloudflare’s infrastructure, and the client side using Chrome Canary; both sides will collect telemetry information about TLS handshakes using these two PQ algorithms to see how they perform in practice.

What do we expect to find?

In 2018, Adam Langley conducted an experiment with the goal of evaluating the likely latency impact of a post-quantum key exchange in TLS. Chrome was augmented with the ability to include a dummy, arbitrarily-sized extension in the TLS ClientHello (fixed number of bytes of random noise). After taking into account the performance and key size offered by different types key-exchange schemes, he concluded that constructs based on structured lattices may be most suitable for future use in TLS.

However, Langley also observed a peculiar phenomenon; client connections measured at 95th percentile had much higher latency than the median. It means that in those cases, isogeny-based systems may be a better choice. In the Dive into post-quantum cryptography, we describe the difference between isogeny-based SIKE and lattice-based NTRU cryptosystems.

In our experiment, we want to more thoroughly evaluate and ascribe root causes to these unexpected latency increases. We would particularly like to learn more about the characteristics of those networks: what causes increased latency? how does the performance cost of isogeny-based algorithms impact the TLS handshake? We want to answer key questions, like:

What is a good ratio for speed-to-key size (or how much faster could SIKE get to achieve the client-perceived performance of HRSS)?

How do network middleboxes behave when clients use new PQ algorithms, and which networks have problematic middleboxes?

How do the different properties of client networks affect TLS performance with different PQ key exchanges? Can we identify specific autonomous systems, device configurations, or network configurations that favor one algorithm over another? How is performance affected in the long tail?

Experiment Design

Our experiment will involve both server- and client-side performance statistics collection from real users around the world (all the data is anonymized). Cloudflare is operating the server-side TLS connections. We will enable the CECPQ2 (HRSS + X25519) and CECPQ2b (SIKE + X25519) key-agreement algorithms on all TLS-terminating edge servers.

In this experiment, the ClientHello will contain a CECPQ2 or CECPQ2b public key (but never both). Additionally, Chrome will always include X25519 for servers that do not support post-quantum key exchange. The post-quantum key exchange will only be negotiated in TLS version 1.3 when both sides support it.

Since Cloudflare only measures the server side of the connection, it is impossible to determine the time it takes for a ClientHello sent from Chrome to reach Cloudflare’s edge servers; however, we can measure the time it takes for the TLS ServerHello message containing post-quantum key exchange, to reach the client and for the client to respond.

On the client side, Chrome Canary will operate the TLS connection. Google will enable either CECPQ2 or CECPQ2b in Chrome for the following mix of architecture and OSes:

x86-64: Windows, Linux, macOS, ChromeOS

aarch64: Android

Our high-level expectation is to get similar results as Langley’s original experiment in 2018 — slightly increased latency for the 50th percentile and higher latency for the 95th. Unfortunately, data collected purely from real users’ connections may not suffice for diagnosing the root causes of why some clients experience excessive slowdown. To this end, we will perform follow-up experiments based on per-client information we collect server-side.

Our primary hypothesis is that excessive slowdowns, like those Langley observed, are largely due to in-network events, such as middleboxes or bloated/lossy links. As a first-pass analysis, we will investigate whether the slowed-down clients share common network features, like common ASes, common transit networks, common link types, and so on. To determine this, we will run a traceroute from vantage points close to our servers back toward the clients (not overloading any particular links or hosts) and study whether some client locations are subject to slowdowns for all destinations or just for some.

Dive into post-quantum cryptography

Be warned: the details of PQ cryptography may be quite complicated. In some cases it builds on classical cryptography, and in other cases it is completely different math. It would be rather hard to describe details in a single blog post. Instead, we are giving you an intuition of post-quantum cryptography, rather than provide deep academic-level descriptions. We’re skipping a lot of details for the sake of brevity. Nevertheless, settle in for a bit of an epic journey because we have a lot to cover.

Key encapsulation mechanism

NIST requires that all key-agreement algorithms have a form of key-encapsulation mechanism (KEM). The KEM is a simplified form of public key encryption (PKE). As PKE, it also allows agreement on a secret, but in a slightly different way. The idea is that the session key is an output of the encryption algorithm, conversely to public key encryption schemes where session key is an input to the algorithm. In a KEM, Alice generates a random key and uses the pre-generated public key from Bob to encrypt (encapsulate) it. This results in a ciphertext sent to Bob. Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the random key. The idea was initially introduced by Cramer and Shoup. Experience shows that such constructs are easier to design, analyze, and implement as the scheme is limited to communicating a fixed-size session key. Leonardo Da Vinci said, “Simplicity is the ultimate sophistication,” which is very true in cryptography.

The key exchange (KEX) protocol, like Diffie-Hellman, is yet a different construct: it allows two parties to agree on a shared secret that can be used as a symmetric encryption key. For example, Alice generates a key pair and sends a public key to Bob. Bob does the same and uses his own key pair with Alice’s public key to generate the shared secret. He then sends his public key to Alice who can now generate the same shared secret. What’s worth noticing is that both Alice and Bob perform exactly the same operations.

KEM construction can be converted to KEX. Alice performs key generation and sends the public key to Bob. Bob uses it to encapsulate a symmetric session key and sends it back to Alice. Alice decapsulates the ciphertext received from Bob and gets the symmetric key. This is actually what we do in our experiment to make integration with the TLS protocol less complicated.

NTRU Lattice-based Encryption

We will enable the CECPQ2 implemented by Adam Langley from Google on our servers. He described this implementation in detail here. This key exchange uses the HRSS algorithm, which is based on the NTRU (N-Th Degree TRUncated Polynomial Ring) algorithm. Foregoing too much detail, I am going to explain how NTRU works and give simplified examples, and finally, compare it to HRSS.

NTRU is a cryptosystem based on a polynomial ring. This means that we do not operate on numbers modulo a prime (like in RSA), but on polynomials of degree \( N \) , where the degree of a polynomial is the highest exponent of its variable. For example, \(x^7 + 6x^3 + 11x^2 \) has degree of 7.

One can add polynomials in the ring in the usual way, by simply adding theirs coefficients modulo some integer. In NTRU this integer is called \( q \). Polynomials can also be multiplied, but remember, you are operating in the ring, therefore the result of a multiplication is always a polynomial of degree less than \(N\). It basically means that exponents of the resulting polynomial are added to modulo\(N\).

In other words, polynomial ring arithmetic is very similar to modular arithmetic, but instead of working with a set of numbers less than N, you are working with a set of polynomials with a degree less than N.

To instantiate the NTRU cryptosystem, three domain parameters must be chosen:

\(N\) – degree of the polynomial ring, in NTRU the principal objects are polynomials of degree \(N-1\).

\(p\) – small modulus used during key generation and decryption for reducing message coefficients.

\(q\) – large modulus used during algorithm execution for reducing coefficients of the polynomials.

First, we generate a pair of public and private keys. To do that, two polynomials \(f\) and \(g\) are chosen from the ring in a way that their randomly generated coefficients are much smaller than \(q\). Then key generation computes two inverses of the polynomial: $$ f_p= f^{-1} \bmod{p} \\ f_q= f^{-1} \bmod{q} $$

The last step is to compute $$ pk = p\cdot f_q\cdot g \bmod q $$, which we will use as public key pk. The private key consists of \(f\) and \(f_p\). The \(f_q\) is not part of any key, however it must remain secret.

It might be the case that after choosing \(f\), the inverses modulo \(p\) and \( q \) do not exist. In this case, the algorithm has to start from the beginning and generate another \(f\). That’s unfortunate because calculating the inverse of a polynomial is a costly operation. HRSS brings an improvement to this issue since it ensures that those inverses always exist, making key generation faster than as proposed initially in NTRU.

The encryption of a message \(m\) proceeds as follows. First, the message \(m\) is converted to a ring element \(pt\) (there exists an algorithm for performing this conversion in both directions). During encryption, NTRU randomly chooses one polynomial \(b\) called blinder. The goal of the blinder is to generate different ciphertexts per encyption. Thus, the ciphetext \(ct\) is obtained as $$ ct = (b\cdot pk + pt ) \bmod q $$ Decryption looks a bit more complicated but it can also be easily understood. Decryption uses both the secret value \(f\) and to recover the plaintext as $$ v = f \cdot ct \bmod q \\ pt = v \cdot f_p \bmod p $$

This diagram demonstrates why and how decryption works.

After obtaining \(pt\), the message \(m\) is recovered by inverting the conversion function.

The underlying hard assumption is that given two polynomials: \(f\) and \(g\) whose coefficients are short compared to the modulus \(q\), it is difficult to distinguish \(pk = \frac{f}{g} \) from a random element in the ring. It means that it’s hard to find \(f\) and \(g\) given only public key pk.

Lattices

NTRU cryptosystem is a grandfather of lattice-based encryption schemes. The idea of using difficult problems for cryptographic purposes was due to Ajtai. His work evolved into a whole area of research with the goal of creating more practical, lattice-based cryptosystems.

What is a lattice and why it can be used for post-quantum crypto?

The picture below visualizes lattice as points in a two-dimensional space. A lattice is defined by the origin \(O\) and base vectors \( \{ b_1 , b_2\} \). Every point on the lattice is represented as a linear combination of the base vectors, for example \(V = -2b_1+b_2\).

There are two classical NP-hard problems in lattice-based cryptography:

Shortest Vector Problem (SVP): Given a lattice, to find the shortest non-zero vector in the lattice. In the graph, the vector \(s\) is the shortest one. The SVP problem is NP-hard only under some assumptions.

Closest Vector Problem (CVP). Given a lattice and a vector \(V\) (not necessarily in the lattice), to find the closest vector to \(V\). For example, the closest vector to \(t\) is \(z\).

In the graph above, it is easy for us to solve SVP and CVP by simple inspection. However, the lattices used in cryptography have higher dimensions, say above 1000, as well as highly non-orthogonal basis vectors. On these instances, the problems get extremely hard to solve. It’s even believed future quantum computers will have it tough.

NTRU vs HRSS

HRSS, which we use in our experiment, is based on NTRU, but a slightly better instantiation. The main improvements are:

Faster key generation algorithm.

NTRU encryption can produce ciphertexts that are impossible to decrypt (true for many lattice-based schemes). But HRSS fixes this problem.

HRSS is a key encapsulation mechanism.

CECPQ2b – Isogeny-based Post-Quantum TLS

Following CECPQ2, we have integrated into BoringSSL another hybrid key exchange mechanism relying on SIKE. It is called CECPQ2b and we will use it in our experimentation in TLS 1.3. SIKE is a key encapsulation method based on Supersingular Isogeny Diffie-Hellman (SIDH). Read more about SIDH in our previous post. The math behind SIDH is related to elliptic curves. A comparison between SIDH and the classical Elliptic Curve Diffie-Hellman (ECDH) is given.

An elliptic curve is a set of points that satisfy a specific mathematical equation. The equation of an elliptic curve may have multiple forms, the standard form is called the Weierstrass equation $$ y^2 = x^3 +ax +b $$ and its shape can look like the red curve.

An interesting fact about elliptic curves is have a group structure. That is, the set of points on the curve have associated a binary operation called point addition. The set of points on the elliptic curve is closed under addition. Thus,adding two points results in another point that is also on the elliptic curve.

If we can add two different points on a curve, then we can also add one point to itself. And if we do it multiple times, then the resulting operations is known as a scalar multiplication and denoted as \(Q = k\cdot P = P+P+\dots+P\) for an integer \(k\).

Multiplication of scalars is commutative. It means that two scalar multiplications can be evaluated in any order \( \color{darkred}{k_a}\cdot\color{darkgreen}{k_b} = \color{darkgreen}{k_b}\cdot\color{darkred}{k_a} \); this an important property that makes ECDH possible.

It turns out that carefully if choosing an elliptic curve “correctly”, scalar multiplication is easy to compute but extremely hard to reverse. Meaning, given two points \(Q\) and \(P\) such that \(Q=k\cdot P\), finding the integer k is a difficult task known as the Elliptic Curve Discrete Logarithm problem (ECDLP). This problem is suitable for cryptographic purposes.

Alice and Bob agree on a secret key as follows. Alice generates a private key \( k_a\). Then, she uses some publicly known point \(P\) and calculates her public key as \( Q_a = k_a\cdot P\). Bob proceeds in similar fashion and gets \(k_b\) and \(Q_b = k_b\cdot P\). To agree on a shared secret, each party multiplies their private key with the public key of the other party. The result of this is the shared secret. Key agreement as described above, works thanks to the fact that scalars can commute: $$ \color{darkgreen}{k_a} \cdot Q_b = \color{darkgreen}{k_a} \cdot \color{darkred}{k_b} \cdot P \iff \color{darkred}{k_b} \cdot \color{darkgreen}{k_a} \cdot P = \color{darkred}{k_b} \cdot Q_a $$

There is a vast theory behind elliptic curves. An introduction to elliptic curve cryptography was posted before and more details can be found in this book. Now, lets describe SIDH and compare with ECDH.

Isogenies on Elliptic Curves

Before explaining the details of SIDH key exchange, I’ll explain the 3 most important concepts, namely: j-invariant, isogeny and its kernel.

Each curve has a number that can be associated to it. Let’s call this number a j-invariant. This number is not unique per curve, meaning many curves have the same value of j-invariant, but it can be viewed as a way to group multiple elliptic curves into disjoint sets. We say that two curves are isomorphic if they are in the same set, called the isomorphism class. The j-invariant is a simple criterion to determine whether two curves are isomorphic. The j-invariant of a curve \(E\) in Weierstrass form \( y^2 = x^3 + ax + b\) is given as $$ j(E) = 1728\frac{4a^3}{4^3 +27b^2} $$

When it comes to isogeny, think about it as a map between two curves. Each point on some curve \( E \) is mapped by isogeny to the point on isogenous curve \( E’ \). We denote mapping from curve \( E \) to \( E’ \) by isogeny \( \phi \) as:

$$\phi: E \rightarrow E’ $$

It depends on the map if those two curves are isomorphic or not. Isogeny can be visualised as:

There may exist many of those mappings, each curve used in SIDH has small number of isogenies to other curves. Natural question is how do we compute such isogeny. Here is where the kernel of an isogeny comes. The kernel uniquely determines isogeny (up to isomorphism class). Formulas for calculating isogeny from its kernel were initially given by J. Vélu and the idea of calculating them efficiently was extended .

To finish, I will summarize what was said above with a picture.

There are two isomorphism classes on the picture above. Both curves \(E_1\) and \(E_2\) are isomorphic and have j-invariant = 6. As curves \(E_3\) and \(E_4\) have j-invariant=13, they are in a different isomorphism class. There exists an isogeny \(\phi_2\) between curve \(E_3\) and \(E_2\), so they both are isogeneous. Curves \( \phi_1 \) and \( E_2 \) are isomorphic and there is isogeny \( \phi_1 \) between them. Curves \( E_1\) and \(E_4\) are neither isomorphic nor isogeneus.

For brevity I’m skipping many important details, like details of the finite field, the fact that isogenies must be separable and that the kernel is finite. But curious readers can find a number of academic research papers available on the Internet.

Big picture: similarities with ECDH

Let’s generalize the ECDH algorithm described above, so that we can swap some elements and try to use Supersingular Isogeny Diffie-Hellman.

Note that what actually happens during an ECDH key exchange is:

We have a set of points on elliptic curve, set S

We have another group of integers used for point multiplication, G

We use an element from Z to act on an element from S to get another element from S:

$$ G \cdot S \rightarrow S $$

Now the question is: what is our G and S in an SIDH setting? For SIDH to work, we need a big set of elements and something secret that will act on the elements from that set. This “group action” must also be resistant to attacks performed by quantum computers.

In the SIDH setting, those two sets are defined as:

Set S is a set (graph) of j-invariants, such that all the curves are supersingular: \( S = [j(E_1), j(E_2), j(E_3), …. , j(E_n)]\)

Set G is a set of isogenies acting on elliptic curves and transforming, for example, the elliptic curve \(E_1\) into \(E_n\):

Random walk on supersingular graph

When we talk about Isogeny Based Cryptography, as a topic distinct from Elliptic Curve Cryptography, we usually mean algorithms and protocols that rely fundamentally on the structure of isogeny graphs. An example of such a (small) graph is pictured below.

Each vertex of the graph represents a different j-invariant of a set of supersingular curves. The edges between vertices represent isogenies converting one elliptic curve to another. As you can notice, the graph is strongly connected, meaning every vertex can be reached from every other vertex. In the context of isogeny-based crypto, we call such a graph a supersingular isogeny graph. I’ll skip some technical details about the construction of this graph (look for those here or here), but instead describe ideas about how it can be used.

As the graph is strongly connected, it is possible to walk a whole graph by starting from any vertex, randomly choosing an edge, following it to the next vertex and then start the process again on a new vertex. Such a way of visiting edges of this graph is called a random walk.

The random walk is a key concept that makes isogeny based crypto feasible. When you look closely at the graph, you can notice that each vertex has a small number of edges incident to it, this is why we can compute the isogenies efficiently. But also for any vertex there is only a limited number of isogenies to choose from, which doesn’t look like good base for a cryptographic scheme. The key question is – where does the security of the scheme come from exactly? In order to get it, it is necessary to visit a couple hundred vertices. What it means in practice is that secret isogeny (of large degree) is constructed as a composition of multiple isogenies (of small, prime degree). Which means, the secret isogeny is:

This property and properties of the isogeny graph are what makes some of us believe that scheme has a good chance to be secure. More specifically, there is no efficient way of finding a path that connects \( E_0 \) with \( E_n \), even with quantum computer at hand. The security level of a system depends on value n – the number of steps taken during the walk.

The random walk is a core process used when both generating public keys and computing shared secrets. It starts with party generating random value m (see more below), starting curve \(E_0\)and points P and Q on this curve. Those values are used to compute the kernel of an isogeny \( R_1 \) in the following way:

$$ R_1 = P + m \cdot Q $$

Thanks to formulas given by Vélu we can now use the point \( R_1 \)to compute the isogeny, the party will choose to move from a vertex to another one. After the isogeny \( \phi_{R_1} \) is calculated it is applied to \( E_0 \) which results in a new curve \( E_1 \):

$$ \phi_{R_1}: E_0 \rightarrow E_1 $$

Isogeny is also applied to points P and Q. Once on \( E_1 \) the process is repeated. This process is applied n times, and at the end a party ends up on some curve \( E_n \) which defines isomorphism class, so also j-invariant.

Supersingular Isogeny Diffie-Hellman

The core idea in SIDH is to compose two random walks on an isogeny graph of elliptic curves in such a way that the end node of both ways of composing is the same.

In order to do it, scheme sets public parameters – starting curve \( E_0 \) and 2 pairs of base points on this curve \( (PA,QA) \) , \( (PB,QB) \). Alice generates her random secret keys m, and calculates a secret isogeny \( \phi_q \) by performing a random walk as described above. The walk finishes with 3 values: elliptic curve \( E_a \) she has ended up with and pair of points \( \phi_a(PB) \) and \( \phi_a(QB) \) after pushing through Alice’s secret isogeny. Bob proceeds analogouslywhich results in the triple \( {E_b, \phi_b(PA), \phi_b(QA)} \). The triple forms a public key which is exchanged between parties.

The picture below visualizes the operation. The black dots represent curves grouped in the same isomorphism classes represented by light blue circles. Alice takes the orange path ending up on a curve \( E_a \) in a separate isomorphism class than Bob after taking his dark blue path ending on \( E_b \). SIDH is parametrized in a way that Alice and Bob will always end up in different isomorphism classes.

Upon receipt of triple \( { E_a, \phi_a(PB), \phi_a(QB) } \) from Alice, Bob will use his secret value m to calculate a new kernel – but instead of using point \(PA\) and \(QA\) to calculate an isogeny kernel, he will now use images \( \phi_a(PB) \) and \( \phi_a(QB) \) received from Alice:

$$ R’_1 = \phi_a(PB) + m \cdot \phi_a(QB) $$

Afterwards, he uses \( R’_1 \) to start the walk again resulting in the isogeny \( \phi’_b: E_a \rightarrow E_{ab} \). Allice proceeds analogously resulting in the isogeny \(\phi’_a: E_b \rightarrow E_{ba} \). With isogenies calculated this way, both Alice and Bob will converge in the same isomorphism class. The math math may seem complicated, hopefully the picture below makes it easier to understand.

Bob computes a new isogeny and starts his random walk from \( E_a \) received from Alice. He ends up on some curve \(E_{ba}\). Similarly, Alice calculates a new isogeny, applies it on \( E_b \) received from Bob and her random walk ends on some curve \(E_{ab}\). Curves \(E_{ab}\) and \(E_{ba}\) are not likely to be the same, but construction guarantees that they are isomorphic. As mentioned earlier, isomorphic curves have the same value of j-invariant, hence the shared secret is a value of j-invariant \(j(E_{ab})\).

Coming back to differences between SIDH and ECDH – we can split them into four categories: the elements of the group we are operating on, the cornerstone computation required to agree on a shared secret, the elements representing secret values, and the difficult problem on which the security relies.

In ECDH there is a secret key which is an integer scalar, in case of SIDH it is a secret isogeny, which also is generated from an integer scalar. In the case of ECDH one multiplies a point on a curve by a scalar, in the case of SIDH it is a random walk in an isogeny graph. In the case of ECDH, the public key is a point on a curve, in the case of SIDH, the public part is a curve itself and the image of some points after applying isogeny. The shared secret in the case of ECDH is a point on a curve, in the case of SIDH it is a j-invariant.

SIKE: Supersingular Isogeny Key Encapsulation

SIDH could potentially be used as a drop-in replacement of the ECDH protocol. We have actually implemented a proof-of-concept and added it to our implementation of TLS 1.3 in the tls-tris library and described (together with Mozilla) implementation details in this draft. Nevertheless, there is a problem with SIDH – the keys can be used only once. In 2016, a few researchers came up with an active attack on SIDH which works only when public keys are reused. In the context of TLS, it is not a big problem, because for each session a fresh key pair is generated (ephemeral keys), but it may not be true for other applications.

SIKE is an isogeny key encapsulation which solves this problem. Bob can generate SIKE keys, upload the public part somewhere in the Internet and then anybody can use it whenever he wants to communicate with Bob securely. SIKE reuses SIDH – internally both sides of the connection always perform SIDH key generation, SIDH key agreement and apply some other cryptographic primitives in order to convert SIDH to KEM. SIKE is implemented in a few variants – each variant corresponds to the security levels using 128-, 192- and 256-bit secret keys. Higher security level means longer running time. More details about SIKE can be found here.

SIKE is also one of the candidates in NIST post-quantum “competition“.

I’ve skipped many important details to give a brief description of how isogeny based crypto works. If you’re curious and hungry for details, look at either of these Cloudflare meetups, where Deirdre Connolly talked about isogeny-based cryptography or this talk by Chloe Martindaleduring PQ Crypto School 2017. And if you would like to know more about quantum attacks on this scheme, I highly recommend this work.

Conclusion

Quantum computers that can break meaningful cryptographic parameter settings do not exist, yet. They won’t be built for at least the next few years. Nevertheless, they have already changed the way we look at current cryptographic deployments. There are at least two reasons it’s worth investing in PQ cryptography:

It takes a lot of time to build secure cryptography and we don’t actually know when today’s classical cryptography will be broken. There is a need for a good mathematical base: an initial idea of what may be secure against something that doesn’t exist yet. If you have an idea, you also need good implementation, constant time, resistance to things like time and cacheside-channels, DFA, DPA, EM, and a bunch of other abbreviations indicating side-channel resistance. There is also deployment of, for example, algorithms based on elliptic curves were introduced in ’85, but started to really be used in production only during the last decade, 20 or so years later. Obviously, the implementation must be blazingly fast! Last, but not least, integration: we need time to develop standards to allow integration of PQ cryptography with protocols like TLS.

Even though efficient quantum computers probably won’t exist for another few years, the threat is real. Data encrypted with current cryptographic algorithms can be recorded now with hopes of being broken in the future.

Cloudflare is motivated to help build the Internet of tomorrow with the tools at hand today. Our interest is in cryptographic techniques that can be integrated into existing protocols and widely deployed on the Internet as seamlessly as possible. PQ cryptography, like the rest of cryptography, includes many cryptosystems that can be used for communications in today’s Internet; Alice and Bob need to perform some computation, but they do not need to buy new hardware to do that.

Cloudflare sees great potential in those algorithms and believes that some of them can be used as a safe replacement for classical public-key cryptosystems. Time will tell if we’re justified in this belief!

As part of Crypto Week 2019, today we are proud to release the source code of a cryptographic library we’ve been working on: a collection of cryptographic primitives written in Go, called CIRCL. This library includes a set of packages that target cryptographic algorithms for post-quantum (PQ), elliptic curve cryptography, and hash functions for prime groups. Our hope is that it’s useful for a broad audience. Get ready to discover how we made CIRCL unique.

Cryptography in Go

We use Go a lot at Cloudflare. It offers a good balance between ease of use and performance; the learning curve is very light, and after a short time, any programmer can get good at writing fast, lightweight backend services. And thanks to the possibility of implementing performance critical parts in Go assembly, we can try to ‘squeeze the machine’ and get every bit of performance.

Cloudflare’s cryptography team designs and maintains security-critical projects. It’s not a secret that security is hard. That’s why, we are introducing the Cloudflare Interoperable Reusable Cryptographic Library – CIRCL. There are multiple goals behind CIRCL. First, we want to concentrate our efforts to implement cryptographic primitives in a single place. This makes it easier to ensure that proper engineering processes are followed. Second, Cloudflare is an active member of the Internet community – we are trying to improve and propose standards to help make the Internet a better place.

Cloudflare’s mission is to help build a better Internet. For this reason, we want CIRCL helps the cryptographic community to create proof of concepts, like the post-quantum TLS experiments we are doing. Over the years, lots of ideas have been put on the table by cryptographers (for example, homomorphic encryption, multi-party computation, and privacy preserving constructions). Recently, we’ve seen those concepts picked up and exercised in a variety of contexts. CIRCL’s implementations of cryptographic primitives creates a powerful toolbox for developers wishing to use them.

The Go language provides native packages for several well-known cryptographic algorithms, such as key agreement algorithms, hash functions, and digital signatures. There are also packages maintained by the community under golang.org/x/crypto that provide a diverse set of algorithms for supporting authenticated encryption, stream ciphers, key derivation functions, and bilinear pairings. CIRCL doesn’t try to compete with golang.org/x/crypto in any sense. Our goal is to provide a complementary set of implementations that are more aggressively optimized, or may be less commonly used but have a good chance at being very useful in the future.

Unboxing CIRCL

Our cryptography team worked on a fresh proposal to augment the capabilities of Go users with a new set of packages. You can get them by typing:

$ go get github.com/cloudflare/circl

The contents of CIRCL is split across different categories, summarized in this table:

Category

Algorithms

Description

Applications

Post-Quantum Cryptography

SIDH

Isogeny-based cryptography.

SIDH provides key exchange mechanisms using ephemeral keys.

SIKE

SIKE is a key encapsulation mechanism (KEM).

Key agreement protocols.

Key Exchange

X25519, X448

RFC-7748 provides new key exchange mechanisms based on Montgomery elliptic curves.

Our optimizations reduce the burden when moving from P-256 to P-384.

ECDSA and ECDH using Suite B at top secret level.

SIKE, a Post-Quantum Key Encapsulation Mechanism

To better understand the post-quantum world, we started experimenting with post-quantum key exchange schemes and using them for key agreement in TLS 1.3. CIRCL contains the sidh package, an implementation of Supersingular Isogeny-based Diffie-Hellman (SIDH), as well as CCA2-secure Supersingular Isogeny-based Key Encapsulation (SIKE), which is based on SIDH.

CIRCL makes playing with PQ key agreement very easy. Below is an example of the SIKE interface that can be used to establish a shared secret between two parties for use in symmetric encryption. The example uses a key encapsulation mechanism (KEM). For our example in this scheme, Alice generates a random secret key, and then uses Bob’s pre-generated public key to encrypt (encapsulate) it. The resulting ciphertext is sent to Bob. Then, Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the secret key. See more details about SIKE in this Cloudflare blog.

Let’s see how to do this with CIRCL:

// Bob's key pair
prvB := NewPrivateKey(Fp503, KeyVariantSike)
pubB := NewPublicKey(Fp503, KeyVariantSike)
// Generate private key
prvB.Generate(rand.Reader)
// Generate public key
prvB.GeneratePublicKey(pubB)
var publicKeyBytes = make([]array, pubB.Size())
var privateKeyBytes = make([]array, prvB.Size())
pubB.Export(publicKeyBytes)
prvB.Export(privateKeyBytes)
// Encode public key to JSON
// Save privateKeyBytes on disk

Bob uploads the public key to a location accessible by anybody. When Alice wants to establish a shared secret with Bob, she performs encapsulation that results in two parts: a shared secret and the result of the encapsulation, the ciphertext.

// Read JSON to bytes
// Alice's key pair
pubB := NewPublicKey(Fp503, KeyVariantSike)
pubB.Import(publicKeyBytes)
var kem := sike.NewSike503(rand.Reader)
kem.Encapsulate(ciphertext, sharedSecret, pubB)
// send ciphertext to Bob

Bob now receives ciphertext from Alice and decapsulates the shared secret:

var kem := sike.NewSike503(rand.Reader)
kem.Decapsulate(sharedSecret, prvA, pubA, ciphertext)

At this point, both Alice and Bob can derive a symmetric encryption key from the secret generated.

SIKE implementation contains:

Two different field sizes: Fp503 and Fp751. The choice of the field is a trade-off between performance and security.

Code optimized for AMD64 and ARM64 architectures, as well as generic Go code. For AMD64, we detect the micro-architecture and if it’s recent enough (e.g., it supports ADOX/ADCX and BMI2 instruction sets), we use different multiplication techniques to make an execution even faster.

Code implemented in constant time, that is, the execution time doesn’t depend on secret values.

We also took care of low heap-memory footprint, so that the implementation uses a minimal amount of dynamically allocated memory. In the future, we plan to provide multiple implementations of post-quantum schemes. Currently, our focus is on algorithms useful for key exchange in TLS.

SIDH/SIKE are interesting because the key sizes produced by those algorithms are relatively small (comparing with other PQ schemes). Nevertheless, performance is not all that great yet, so we’ll continue looking. We plan to add lattice-based algorithms, such as NTRU-HRSS and Kyber, to CIRCL. We will also add another more experimental algorithm called cSIDH, which we would like to try in other applications. CIRCL doesn’t currently contain any post-quantum signature algorithms, which is also on our to-do list. After our experiment with TLS key exchange completes, we’re going to look at post-quantum PKI. But that’s a topic for a future blog post, so stay tuned.

Last, we must admit that our code is largely based on the implementation from the NIST submission along with the work of former intern Henry De Valence, and we would like to thank both Henry and the SIKE team for their great work.

Elliptic Curve Cryptography

Elliptic curve cryptography brings short keys sizes and faster evaluation of operations when compared to algorithms based on RSA. Elliptic curves were standardized during the early 2000s, and have recently gained popularity as they are a more efficient way for securing communications.

Elliptic curves are used in almost every project at Cloudflare, not only for establishing TLS connections, but also for certificate validation, certificate revocation (OCSP), Privacy Pass, certificate transparency, and AMP Real URL.

The Go language provides native support for NIST-standardized curves, the most popular of which is P-256. In a previous post, Vlad Krasnov described the relevance of optimizing several cryptographic algorithms, including P-256 curve. When working at Cloudflare scale, little issues around performance are significantly magnified. This is one reason why Cloudflare pushes the boundaries of efficiency.

A similar thing happened on the chained validation of certificates. For some certificates, we observed performance issues when validating a chain of certificates. Our team successfully diagnosed this issue: certificates which had signatures from the P-384 curve, which is the curve that corresponds to the 192-bit security level, were taking up 99% of CPU time! It is common for certificates closer to the root of the chain of trust to rely on stronger security assumptions, for example, using larger elliptic curves. Our first-aid reaction comes in the form of an optimized implementation written by Brendan McMillion that reduced the time of performing elliptic curve operations by a factor of 10. The code for P-384 is also available in CIRCL.

The latest developments in elliptic curve cryptography have caused a shift to use elliptic curve models with faster arithmetic operations. The best example is undoubtedly Curve25519; other examples are the Goldilocks and FourQ curves. CIRCL supports all of these curves, allowing instantiation of Diffie-Hellman exchanges and Edwards digital signatures. Although it slightly overlaps the Go native libraries, CIRCL has architecture-dependent optimizations.

Hashing to Groups

Many cryptographic protocols rely on the hardness of solving the Discrete Logarithm Problem (DLP) in special groups, one of which is the integers reduced modulo a large integer. To guarantee that the DLP is hard to solve, the modulus must be a large prime number. Increasing its size boosts on security, but also makes operations more expensive. A better approach is using elliptic curve groups since they provide faster operations.

In some cryptographic protocols, it is common to use a function with the properties of a cryptographic hash function that maps bit strings into elements of the group. This is easy to accomplish when, for example, the group is the set of integers modulo a large prime. However, it is not so clear how to perform this function using elliptic curves. In cryptographic literature, several methods have been proposed using the terms hashing to curves or hashing to point indistinctly.

The main issue is that there is no general method for deterministically finding points on any elliptic curve, the closest available are methods that target special curves and parameters. This is a problem for implementers of cryptographic algorithms, who have a hard time figuring out on a suitable method for hashing to points of an elliptic curve. Compounding that, chances of doing this wrong are high. There are many different methods, elliptic curves, and security considerations to analyze. For example, a vulnerability on WPA3 handshake protocol exploited a non-constant time hashing method resulting in a recovery of keys. Currently, an IETF draft is tracking work in-progress that provides hashing methods unifying requirements with curves and their parameters.

Corresponding to this problem, CIRCL will include implementations of hashing methods for elliptic curves. Our development is accompanying the evolution of the IEFT draft. Therefore, users of CIRCL will have this added value as the methods implement a ready-to-go functionality, covering the needs of some cryptographic protocols.

Update on Bilinear Pairings

Bilinear pairings are sometimes regarded as a tool for cryptanalysis, however pairings can also be used in a constructive way by allowing instantiation of advanced public-key algorithms, for example, identity-based encryption, attribute-based encryption, blind digital signatures, three-party key agreement, among others.

An efficient way to instantiate a bilinear pairing is to use elliptic curves. Note that only a special class of curves can be used, thus so-called pairing-friendly curves have specific properties that enable the efficient evaluation of a pairing.

Some families of pairing-friendly curves were introduced by Barreto-Naehrig (BN), Kachisa-Schaefer-Scott (KSS), and Barreto-Lynn-Scott (BLS). BN256 is a BN curve using a 256-bit prime and is one of the fastest options for implementing a bilinear pairing. The Go native library supports this curve in the package golang.org/x/crypto/bn256. In fact, the BN256 curve is used by Cloudflare’s Geo Key Manager, which allows distributing encrypted keys around the world. At Cloudflare, high-performance is a must and with this motivation, in 2017, we released an optimized implementation of the BN256 package that is 8x faster than the Go’s native package. The success of these optimizations reached several other projects such as the Ethereum protocol and the RandomnessBeacon project.

Recent improvements in solving the DLP over extension fields, GF(pᵐ) for p prime and m>1, impacted the security of pairings, causing recalculation of the parameters used for pairing-friendly curves.

Before these discoveries, the BN256 curve provided a 128-bit security level, but now larger primes are needed to target the same security level. That does not mean that the BN256 curve has been broken, since BN256 gives a security of 100 bits, that is, approximately 2¹⁰⁰ operations are required to cause a real danger, which is still unfeasible with current computing power.

With our CIRCL announcement, we want to announce our plans for research and development to obtain efficient curve(s) to become a stronger successor of BN256. According to the estimation by Barbulescu-Duquesne, a BN curve must use primes of at least 456 bits to match a 128-bit security level. However, the impact on the recalculation of parameters brings back to the main scene BLS and KSS curves as efficient alternatives. To this end a standardization effort at IEFT is in progress with the aim of defining parameters and pairing-friendly curves that match different security levels.

Note that regardless of the curve(s) chosen, there is an unavoidable performance downgrade when moving from BN256 to a stronger curve. Actual timings were presented by Aranha, who described the evolution of the race for high-performance pairing implementations. The purpose of our continuous development of CIRCL is to minimize this impact through fast implementations.

Optimizations

Go itself is a very easy to learn and use for system programming and yet makes it possible to use assembly so that you can stay close “to the metal”. We have blogged about improving performance in Go few times in the past (see these posts about encryption, ciphersuites, and image encoding).

When developing CIRCL, we crafted the code to get the best possible performance from the machine. We leverage the capabilities provided by the architecture and the architecture-specific instructions. This means that in some cases we need to get our hands dirty and rewrite parts of the software in Go assembly, which is not easy, but definitely worth the effort when it comes to performance. We focused on x86-64, as this is our main target, but we also think that it’s worth looking at ARM architecture, and in some cases (like SIDH or P-384), CIRCL has optimized code for this platform.

We also try to ensure that code uses memory efficiently – crafting it in a way that fast allocations on the stack are preferred over expensive heap allocations. In cases where heap allocation is needed, we tried to design the APIs in a way that, they allow pre-allocating memory ahead of time and reuse it for multiple operations.

Security

The CIRCL library is offered as-is, and without a guarantee. Therefore, it is expected that changes in the code, repository, and API occur in the future. We recommend to take caution before using this library in a production application since part of its content is experimental.

As new attacks and vulnerabilities arise over the time, security of software should be treated as a continuous process. In particular, the assessment of cryptographic software is critical, it requires the expertise of several fields, not only computer science. Cryptography engineers must be aware of the latest vulnerabilities and methods of attack in order to defend against them.

The development of CIRCL follows best practices on the secure development. For example, if time execution of the code depends on secret data, the attacker could leverage those irregularities and recover secret keys. In our code, we take care of writing constant-time code and hence prevent timing based attacks.

Developers of cryptographic software must also be aware of optimizations performed by the compiler and/or the processor since these optimizations can lead to insecure binary codes in some cases. All of these issues could be exploited in real attacks aimed at compromising systems and keys. Therefore, software changes must be tracked down through thorough code reviews. Also static analyzers and automated testing tools play an important role on the security of the software.

Summary

CIRCL is envisioned as an effective tool for experimenting with modern cryptographic algorithms yet providing high-performance implementations. Today is marked as the starting point of a continuous machinery of innovation and retribution to the community in the form of a cryptographic library. There are still several other applications such as homomorphic encryption, multi-party computation, and privacy-preserving protocols that we would like to explore.

We are team of cryptography, security, and software engineers working to improve and augment Cloudflare products. Our team keeps the communication channels open for receiving comments, including improvements, and merging contributions. We welcome opinions and contributions! If you would like to get in contact, you should check out our github repository for CIRCL github.com/cloudflare/circl. We want to share our work and hope it makes someone else’s job easier as well.

Finally, special thanks to all the contributors who has either directly or indirectly helped to implement the library – Ko Stoffelen, Brendan McMillion, Henry de Valence, Michael McLoughlin and all the people who invested their time in reviewing our code.

As part of Crypto Week 2019, today we are proud to release the source code of a cryptographic library we’ve been working on: a collection of cryptographic primitives written in Go, called CIRCL. This library includes a set of packages that target cryptographic algorithms for post-quantum (PQ), elliptic curve cryptography, and hash functions for prime groups. Our hope is that it’s useful for a broad audience. Get ready to discover how we made CIRCL unique.

Cryptography in Go

We use Go a lot at Cloudflare. It offers a good balance between ease of use and performance; the learning curve is very light, and after a short time, any programmer can get good at writing fast, lightweight backend services. And thanks to the possibility of implementing performance critical parts in Go assembly, we can try to ‘squeeze the machine’ and get every bit of performance.

Cloudflare’s cryptography team designs and maintains security-critical projects. It’s not a secret that security is hard. That’s why, we are introducing the Cloudflare Interoperable Reusable Cryptographic Library – CIRCL. There are multiple goals behind CIRCL. First, we want to concentrate our efforts to implement cryptographic primitives in a single place. This makes it easier to ensure that proper engineering processes are followed. Second, Cloudflare is an active member of the Internet community – we are trying to improve and propose standards to help make the Internet a better place.

Cloudflare’s mission is to help build a better Internet. For this reason, we want CIRCL helps the cryptographic community to create proof of concepts, like the post-quantum TLS experiments we are doing. Over the years, lots of ideas have been put on the table by cryptographers (for example, homomorphic encryption, multi-party computation, and privacy preserving constructions). Recently, we’ve seen those concepts picked up and exercised in a variety of contexts. CIRCL’s implementations of cryptographic primitives creates a powerful toolbox for developers wishing to use them.

The Go language provides native packages for several well-known cryptographic algorithms, such as key agreement algorithms, hash functions, and digital signatures. There are also packages maintained by the community under golang.org/x/crypto that provide a diverse set of algorithms for supporting authenticated encryption, stream ciphers, key derivation functions, and bilinear pairings. CIRCL doesn’t try to compete with golang.org/x/crypto in any sense. Our goal is to provide a complementary set of implementations that are more aggressively optimized, or may be less commonly used but have a good chance at being very useful in the future.

Unboxing CIRCL

Our cryptography team worked on a fresh proposal to augment the capabilities of Go users with a new set of packages. You can get them by typing:

$ go get github.com/cloudflare/circl

The contents of CIRCL is split across different categories, summarized in this table:

Category

Algorithms

Description

Applications

Post-Quantum Cryptography

SIDH

Isogeny-based cryptography.

SIDH provides key exchange mechanisms using ephemeral keys.

SIKE

SIKE is a key encapsulation mechanism (KEM).

Key agreement protocols.

Key Exchange

X25519, X448

RFC-7748 provides new key exchange mechanisms based on Montgomery elliptic curves.

Our optimizations reduce the burden when moving from P-256 to P-384.

ECDSA and ECDH using Suite B at top secret level.

SIKE, a Post-Quantum Key Encapsulation Method

To better understand the post-quantum world, we started experimenting with post-quantum key exchange schemes and using them for key agreement in TLS 1.3. CIRCL contains the sidh package, an implementation of Supersingular Isogeny-based Diffie-Hellman (SIDH), as well as CCA2-secure Supersingular Isogeny-based Key Encapsulation (SIKE), which is based on SIDH.

CIRCL makes playing with PQ key agreement very easy. Below is an example of the SIKE interface that can be used to establish a shared secret between two parties for use in symmetric encryption. The example uses a key encapsulation mechanism (KEM). For our example in this scheme, Alice generates a random secret key, and then uses Bob’s pre-generated public key to encrypt (encapsulate) it. The resulting ciphertext is sent to Bob. Then, Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the secret key. See more details about SIKE in this Cloudflare blog.

Let’s see how to do this with CIRCL:

// Bob's key pair
prvB := NewPrivateKey(Fp503, KeyVariantSike)
pubB := NewPublicKey(Fp503, KeyVariantSike)
// Generate private key
prvB.Generate(rand.Reader)
// Generate public key
prvB.GeneratePublicKey(pubB)
var publicKeyBytes = make([]array, pubB.Size())
var privateKeyBytes = make([]array, prvB.Size())
pubB.Export(publicKeyBytes)
prvB.Export(privateKeyBytes)
// Encode public key to JSON
// Save privateKeyBytes on disk

Bob uploads the public key to a location accessible by anybody. When Alice wants to establish a shared secret with Bob, she performs encapsulation that results in two parts: a shared secret and the result of the encapsulation, the ciphertext.

// Read JSON to bytes
// Alice's key pair
pubB := NewPublicKey(Fp503, KeyVariantSike)
pubB.Import(publicKeyBytes)
var kem := sike.NewSike503(rand.Reader)
kem.Encapsulate(ciphertext, sharedSecret, pubB)
// send ciphertext to Bob

Bob now receives ciphertext from Alice and decapsulates the shared secret:

var kem := sike.NewSike503(rand.Reader)
kem.Decapsulate(sharedSecret, prvA, pubA, ciphertext)

At this point, both Alice and Bob can derive a symmetric encryption key from the secret generated.

SIKE implementation contains:

Two different field sizes: Fp503 and Fp751. The choice of the field is a trade-off between performance and security.

Code optimized for AMD64 and ARM64 architectures, as well as generic Go code. For AMD64, we detect the micro-architecture and if it’s recent enough (e.g., it supports ADOX/ADCX and BMI2 instruction sets), we use different multiplication techniques to make an execution even faster.

Code implemented in constant time, that is, the execution time doesn’t depend on secret values.

We also took care of low heap-memory footprint, so that the implementation uses a minimal amount of dynamically allocated memory. In the future, we plan to provide multiple implementations of post-quantum schemes. Currently, our focus is on algorithms useful for key exchange in TLS.

SIDH/SIKE are interesting because the key sizes produced by those algorithms are relatively small (comparing with other PQ schemes). Nevertheless, performance is not all that great yet, so we’ll continue looking. We plan to add lattice-based algorithms, such as NTRU-HRSS and Kyber, to CIRCL. We will also add another more experimental algorithm called cSIDH, which we would like to try in other applications. CIRCL doesn’t currently contain any post-quantum signature algorithms, which is also on our to-do list. After our experiment with TLS key exchange completes, we’re going to look at post-quantum PKI. But that’s a topic for a future blog post, so stay tuned.

Last, we must admit that our code is largely based on the implementation from the NIST submission along with the work of former intern Henry De Valence, and we would like to thank both Henry and the SIKE team for their great work.

Elliptic Curve Cryptography

Elliptic curve cryptography brings short keys sizes and faster evaluation of operations when compared to algorithms based on RSA. Elliptic curves were standardized during the early 2000s, and have recently gained popularity as they are a more efficient way for securing communications.

Elliptic curves are used in almost every project at Cloudflare, not only for establishing TLS connections, but also for certificate validation, certificate revocation (OCSP), Privacy Pass, certificate transparency, and AMP Real URL.

The Go language provides native support for NIST-standardized curves, the most popular of which is P-256. In a previous post, Vlad Krasnov described the relevance of optimizing several cryptographic algorithms, including P-256 curve. When working at Cloudflare scale, little issues around performance are significantly magnified. This is one reason why Cloudflare pushes the boundaries of efficiency.

A similar thing happened on the chained validation of certificates. For some certificates, we observed performance issues when validating a chain of certificates. Our team successfully diagnosed this issue: certificates which had signatures from the P-384 curve, which is the curve that corresponds to the 192-bit security level, were taking up 99% of CPU time! It is common for certificates closer to the root of the chain of trust to rely on stronger security assumptions, for example, using larger elliptic curves. Our first-aid reaction comes in the form of an optimized implementation written by Brendan McMillion that reduced the time of performing elliptic curve operations by a factor of 10. The code for P-384 is also available in CIRCL.

The latest developments in elliptic curve cryptography have caused a shift to use elliptic curve models with faster arithmetic operations. The best example is undoubtedly Curve25519; other examples are the Goldilocks and FourQ curves. CIRCL supports all of these curves, allowing instantiation of Diffie-Hellman exchanges and Edwards digital signatures. Although it slightly overlaps the Go native libraries, CIRCL has architecture-dependent optimizations.

Hashing to Groups

Many cryptographic protocols rely on the hardness of solving the Discrete Logarithm Problem (DLP) in special groups, one of which is the integers reduced modulo a large integer. To guarantee that the DLP is hard to solve, the modulus must be a large prime number. Increasing its size boosts on security, but also makes operations more expensive. A better approach is using elliptic curve groups since they provide faster operations.

In some cryptographic protocols, it is common to use a function with the properties of a cryptographic hash function that maps bit strings into elements of the group. This is easy to accomplish when, for example, the group is the set of integers modulo a large prime. However, it is not so clear how to perform this function using elliptic curves. In cryptographic literature, several methods have been proposed using the terms hashing to curves or hashing to point indistinctly.

The main issue is that there is no general method for deterministically finding points on any elliptic curve, the closest available are methods that target special curves and parameters. This is a problem for implementers of cryptographic algorithms, who have a hard time figuring out on a suitable method for hashing to points of an elliptic curve. Compounding that, chances of doing this wrong are high. There are many different methods, elliptic curves, and security considerations to analyze. For example, a vulnerability on WPA3 handshake protocol exploited a non-constant time hashing method resulting in a recovery of keys. Currently, an IETF draft is tracking work in-progress that provides hashing methods unifying requirements with curves and their parameters.

Corresponding to this problem, CIRCL will include implementations of hashing methods for elliptic curves. Our development is accompanying the evolution of the IEFT draft. Therefore, users of CIRCL will have this added value as the methods implement a ready-to-go functionality, covering the needs of some cryptographic protocols.

Update on Bilinear Pairings

Bilinear pairings are sometimes regarded as a tool for cryptanalysis, however pairings can also be used in a constructive way by allowing instantiation of advanced public-key algorithms, for example, identity-based encryption, attribute-based encryption, blind digital signatures, three-party key agreement, among others.

An efficient way to instantiate a bilinear pairing is to use elliptic curves. Note that only a special class of curves can be used, thus so-called pairing-friendly curves have specific properties that enable the efficient evaluation of a pairing.

Some families of pairing-friendly curves were introduced by Barreto-Naehrig (BN), Kachisa-Schaefer-Scott (KSS), and Barreto-Lynn-Scott (BLS). BN256 is a BN curve using a 256-bit prime and is one of the fastest options for implementing a bilinear pairing. The Go native library supports this curve in the package golang.org/x/crypto/bn256. In fact, the BN256 curve is used by Cloudflare’s Geo Key Manager, which allows distributing encrypted keys around the world. At Cloudflare, high-performance is a must and with this motivation, in 2017, we released an optimized implementation of the BN256 package that is 8x faster than the Go’s native package. The success of these optimizations reached several other projects such as the Ethereum protocol and the RandomnessBeacon project.

Recent improvements in solving the DLP over extension fields, GF(pᵐ) for p prime and m>1, impacted the security of pairings, causing recalculation of the parameters used for pairing-friendly curves.

Before these discoveries, the BN256 curve provided a 128-bit security level, but now larger primes are needed to target the same security level. That does not mean that the BN256 curve has been broken, since BN256 gives a security of 100 bits, that is, approximately 2¹⁰⁰ operations are required to cause a real danger, which is still unfeasible with current computing power.

With our CIRCL announcement, we want to announce our plans for research and development to obtain efficient curve(s) to become a stronger successor of BN256. According to the estimation by Barbulescu-Duquesne, a BN curve must use primes of at least 456 bits to match a 128-bit security level. However, the impact on the recalculation of parameters brings back to the main scene BLS and KSS curves as efficient alternatives. To this end a standardization effort at IEFT is in progress with the aim of defining parameters and pairing-friendly curves that match different security levels.

Note that regardless of the curve(s) chosen, there is an unavoidable performance downgrade when moving from BN256 to a stronger curve. Actual timings were presented by Aranha, who described the evolution of the race for high-performance pairing implementations. The purpose of our continuous development of CIRCL is to minimize this impact through fast implementations.

Optimizations

Go itself is a very easy to learn and use for system programming and yet makes it possible to use assembly so that you can stay close “to the metal”. We have blogged about improving performance in Go few times in the past (see these posts about encryption, ciphersuites, and image encoding).

When developing CIRCL, we crafted the code to get the best possible performance from the machine. We leverage the capabilities provided by the architecture and the architecture-specific instructions. This means that in some cases we need to get our hands dirty and rewrite parts of the software in Go assembly, which is not easy, but definitely worth the effort when it comes to performance. We focused on x86-64, as this is our main target, but we also think that it’s worth looking at ARM architecture, and in some cases (like SIDH or P-384), CIRCL has optimized code for this platform.

We also try to ensure that code uses memory efficiently – crafting it in a way that fast allocations on the stack are preferred over expensive heap allocations. In cases where heap allocation is needed, we tried to design the APIs in a way that, they allow pre-allocating memory ahead of time and reuse it for multiple operations.

Security

The CIRCL library is offered as-is, and without a guarantee. Therefore, it is expected that changes in the code, repository, and API occur in the future. We recommend to take caution before using this library in a production application since part of its content is experimental.

As new attacks and vulnerabilities arise over the time, security of software should be treated as a continuous process. In particular, the assessment of cryptographic software is critical, it requires the expertise of several fields, not only computer science. Cryptography engineers must be aware of the latest vulnerabilities and methods of attack in order to defend against them.

The development of CIRCL follows best practices on the secure development. For example, if time execution of the code depends on secret data, the attacker could leverage those irregularities and recover secret keys. In our code, we take care of writing constant-time code and hence prevent timing based attacks.

Developers of cryptographic software must also be aware of optimizations performed by the compiler and/or the processor since these optimizations can lead to insecure binary codes in some cases. All of these issues could be exploited in real attacks aimed at compromising systems and keys. Therefore, software changes must be tracked down through thorough code reviews. Also static analyzers and automated testing tools play an important role on the security of the software.

Summary

CIRCL is envisioned as an effective tool for experimenting with modern cryptographic algorithms yet providing high-performance implementations. Today is marked as the starting point of a continuous machinery of innovation and retribution to the community in the form of a cryptographic library. There are still several other applications such as homomorphic encryption, multi-party computation, and privacy-preserving protocols that we would like to explore.

We are team of cryptography, security, and software engineers working to improve and augment Cloudflare products. Our team keeps the communication channels open for receiving comments, including improvements, and merging contributions. We welcome opinions and contributions! If you would like to get in contact, you should check out our github repository for CIRCL github.com/cloudflare/circl. We want to share our work and hope it makes someone else’s job easier as well.

Finally, special thanks to all the contributors who has either directly or indirectly helped to implement the library – Ko Stoffelen, Brendan McMillion, Henry de Valence, Michael McLoughlin and all the people who invested their time in reviewing our code.

Today, we are excited to announce Cloudflare’s Ethereum Gateway, where you can interact with the Ethereum network without installing any additional software on your computer.

This is another tool in Cloudflare’s Distributed Web Gateway tool set. Currently, Cloudflare lets you host content on the InterPlanetary File System (IPFS) and access it through your own custom domain. Similarly, the new Ethereum Gateway allows access to the Ethereum network, which you can provision through your custom hostname.

This setup makes it possible to add interactive elements to sites powered by Ethereum smart contracts, a decentralized computing platform. And, in conjunction with the IPFS gateway, this allows hosting websites and resources in a decentralized manner, and has the extra bonus of the added speed, security, and reliability provided by the Cloudflare edge network. You can access our Ethereum gateway directly at https://cloudflare-eth.com.

This brief primer on how Ethereum and smart contracts work has examples of the many possibilities of using the Cloudflare Distributed Web Gateway.

Primer on Ethereum

You may have heard of Ethereum as a cryptocurrency. What you may not know is that Ethereum is so much more. Ethereum is a distributed virtual computing network that stores and enforces smart contracts.

So, what is a smart contract?

Good question. Ethereum smart contracts are simply a piece of code stored on the Ethereum blockchain. When the contract is triggered, it runs on the Ethereum Virtual Machine (EVM). The EVM is a distributed virtual machine that runs smart contract code and produces cryptographically verified changes to the state of the Ethereum blockchain as its result.

To illustrate the power of smart contracts, let’s consider a little example.

Anna wants to start a VPN provider but she lacks the capital. To raise funds for her venture she decides to hold an Initial Coin Offering (ICO). Rather than design an ICO contract from scratch Anna bases her contract off of ERC-20. ERC-20 is a template for issuing fungible tokens, perfect for ICOs. Anna sends her ERC-20 compliant contract to the Ethereum network, and starts to sell stock in her new company, VPN Co.

Once she’s sorted out funds, Anna sits down and starts to write a smart contract. Anna’s contract asks customers to send her their public key, along with some Ether (the coin product of Ethereum). She then authorizes the public key to access her VPN service. All without having to hold any secret information. Huzzah!

Next, rather than set up the infrastructure to run a VPN herself, Anna decides to use the blockchain again, but this time as a customer. Cloud Co. sells managed cloud infrastructure using their own smart contract. Anna programs her contract to send the appropriate amount of Ether to Cloud Co.’s contract. Cloud Co. then provisions the servers she needs to host her VPN. By automatically purchasing more infrastructure every time she has a new customer, her VPN company can scale totally autonomously.

Finally, Anna pays dividends to her investors out of the profits, keeping a little for herself.

And there you have it.

A decentralised, autonomous, smart VPN provider.

A smart contract stored on the blockchain has an associated account for storing funds, and the contract is triggered when someone sends Ether to that account. So for our VPN example, the provisioning contract triggers when someone transfers money into the account associated with Anna’s contract.

What distinguishes smart contracts from ordinary code?

The “smart” part of a smart contract is they run autonomously. The “contract” part is the guarantee that the code runs as written.

Because this contract is enforced cryptographically, maintained in the tamper-resistant medium of the blockchain and verified by the consensus of the network, these contracts are more reliable than regular contracts which can provoke dispute.

Ethereum Smart Contracts vs. Traditional Contracts

A regular contract is enforced by the court system, litigated by lawyers. The outcome is uncertain; different courts rule differently and hiring more or better lawyers can swing the odds in your favor.

Smart contract outcomes are predetermined and are nearly incorruptible. However, here be dragons: though the outcome can be predetermined and incorruptible, a poorly written contract might not have the intended behavior, and because contracts are immutable, this is difficult to fix.

How are smart contracts written?

You can write smart contracts in a number of languages, some of which are Turing complete, e.g. Solidity. A Turing complete language lets you write code that can evaluate any computable function. This puts Solidity in the same class of languages as Python and Java. The compiled bytecode is then run on the EVM.

The EVM differs from a standard VM in a number of ways:

The EVM is distributed

Each piece of code is run by numerous nodes. Nodes verify the computation before accepting a block, and therefore ensure that miners who want their blocks accepted must always run the EVM honestly. A block is only considered accepted when more than half of the network accepts it. This is the consensus part of Ethereum.

The EVM is entirely deterministic

This means that the same inputs to a function always produce the same outputs. Because regular VMs have access to file storage and the network, the results of a function call can be non-deterministic. Every EVM has the same start state, thus a given set of inputs always gives the same outputs. This makes the EVM more reliable than a standard VM.

There are two big gotchas that come with this determinism:

EVM bytecode is Turing complete and therefore discerning the outputs without running the computation is not always possible.

Ethereum smart contracts can store state on the blockchain. This means that the output of the function can vary as the blockchain changes. Although, technically this is deterministic in that the blockchain is an input to the function, it may still be impossible to derive the output in advance.

This however means that they suffer from the same problems as any piece of software – bugs. However, unlike normal code where the authors can issue a patch, code stored on the blockchain is immutable. More problematically, even if the author provides a new smart contract, the old one is always still available on the blockchain.

This means that when writing contracts authors must be especially careful to write secure code, and include a kill switch to ensure that if bugs do reside in the code, they can be squashed. If there is no kill switch and there are vulnerabilities in the smart contract that can be exploited, it can potentially lead to the theft of resources from the smart contract or from other individuals. EVM Bytecode includes a special SELFDESTRUCT opcode that deletes a contract, and sends all funds to the specified address for just this purpose.

The need to include a kill switch was brought into sharp focus during the infamous DAO incident. The DAO smart contract acted as a complex decentralized venture capital (VC) fund and held Ether worth $250 million at its peak collected from a group of investors. Hackers exploited vulnerabilities in the smart contract and stole Ether worth $50 million.

Because there is no way to undo transactions in Ethereum, there was a highly controversial “hard fork,” where the majority of the community agreed to accept a block with an “irregular state change” that essentially drained all DAO funds into a special “WithdrawDAO” recovery contract. By convincing enough miners to accept this irregular block as valid, the DAO could return funds.

Not everyone agreed with the change. Those who disagreed rejected the irregular block and formed the Ethereum Classic network, with both branches of the fork growing independently.

Kill switches, however, can cause their own problems. For example, when a contract used as a library flips its kill switch, all contracts relying on this contract can no longer operate as intended, even though the underlying library code is immutable. This caused over 500,000 ETH to become stuck in multi-signature wallets when an attacker triggered the kill switch of an underlying library.

Users of the multi-signature library assumed the immutability of the code meant that the library would always operate as anticipated. But the smart contracts that interact with the blockchain are only deterministic when accounting for the state of the blockchain.

In the wake of the DAO, various tools were created that check smart contracts for bugs or enable bug bounties, for example Securify and The Hydra.

Another way smart contracts avoid bugs is using standardized patterns. For example, ERC-20 defines a standardized interface for producing tokens such as those used in ICOs, and ERC-721 defines a standardized interface for implementing non-fungible tokens. Non-fungible tokens can be used for trading-card games like CryptoKitties. CryptoKitties is a trading-card style game built on the Ethereum blockchain. Players can buy, sell, and breed cats, with each cat being unique.

CryptoKitties is built on a collection of smart contracts that provides an open-source Application Binary Interface (ABI) for interacting with the KittyVerse — the virtual world of the CryptoKitties application. An ABI simply allows you to call functions in a contract and receive any returned data. The KittyBase code may look like this:

Besides defining what a Kitty is, this contract defines two basic functions for transferring and creating kitties. Both are internal and can only be called by contracts that implement KittyBase. The KittyOwnership contract implements both ERC-721 and KittyBase, and implements an external transfer function that calls the internal _transfer function. This code is compiled into bytecode written to the blockchain.

By implementing a standardised interface like ERC-721, smart contracts that aren’t specifically aware of CryptoKitties can still interact with the KittyVerse. The CryptoKitties ABI functions allow users to create distributed apps (dApps), of their own design on top of the KittyVerse, and allow other users to use their dApps. This extensibility helps demonstrate the potential of smart contracts.

How is this so different?

Smart contracts are, by definition, public. Everyone can see the terms and understand where the money goes. This is a radically different approach to providing transparency and accountability. Because all contracts and transactions are public and verified by consensus, trust is distributed between the people, rather than centralized in a few big institutions.

The trust given to institutions is historic in that we trust them because they have previously demonstrated trustworthiness.

The trust placed in consensus-based algorithms is based on the assumption that most people are honest, or more accurately, that no sufficiently large subset of people can collude to produce a malicious outcome. This is the democratisation of trust.

In the case of the DAO attack, a majority of nodes agreed to accept an “irregular” state transition. This effectively undid the damage of the attack and demonstrates how, at least in the world of blockchain, perception is reality. Because most people “believed” (accepted) this irregular block, it became a “real,” valid block. Most people think of the blockchain as immutable, and trust the power of consensus to ensure correctness, however if enough people agree to do something irregular, they don’t have to keep the rules.

So where does Cloudflare fit in?

Accessing the Ethereum network and its attendant benefits directly requires running complex software, including downloading and cryptographically verifying hundreds of gigabytes of data, which apart from producing technical barriers to entry for users, can also exclude people with low-power devices.

To help those users and devices access the Ethereum network, the Cloudflare Ethereum gateway allows any device capable of accessing the web to interact with the Ethereum network in a safe, reliable way.

Through our gateway, not only can you explore the blockchain, but if you give our gateway a signed transaction, we’ll push it to the network to allow miners to add it to their blockchain. This means that you can send Ether and even put new contracts on the blockchain without having to run a node.

“But Jonathan,” I hear you say, “by providing a gateway aren’t you just making Cloudflare a centralizing institution?”

That’s a fair question. Thankfully, Cloudflare won’t be alone in offering these gateways. We’re joining alongside organizations, such as Infura, to expand the constellation of gateways that already exist. We hope that, by providing a fast, reliable service, we can enable people who never previously used smart-contracts to do so, and in so doing bring the benefits they offer to billions of regular Internet users.

“We’re excited that Cloudflare is bringing their infrastructure expertise to the Ethereum ecosystem. Infura has always believed in the importance of standardized, open APIs and compatibility between gateway providers, so we look forward to collaborating with their team to build a better distributed web.” – E.G. Galano, Infura co-founder.

By providing a gateway to the Ethereum network, we help users make the jump from general web-user to cryptocurrency native, and eventually make the distributed web a fundamental part of the Internet.

What can you do with Cloudflare’s Gateway?

Visit cloudflare-eth.com to interact with our example app. But to really explore the Ethereum world, access the RPC API, where you can do anything that can be done on the Ethereum network itself, from examining contracts, to transferring funds.

Our Gateway accepts POST requests containing JSON. For a complete list of calls, visit the Ethereum github page. So, to get the block number of the most recent block, you could run:

We also invite developers to build dApps based on our Ethereum gateway using our API. Our API allows developers to build websites powered by the Ethereum blockchain. Check out developer docs to get started. If you want to read more about how Ethereum works check out this deep dive.

The architecture

Cloudflare is uniquely positioned to host an Ethereum gateway, and we have the utmost faith in the products we offer to customers. This is why the Cloudflare Ethereum gateway runs as a Cloudflare customer and we dogfood our own products to provide a fast and reliable gateway. The domain we run the gateway on (https://cloudflare-eth.com) uses Cloudflare Workers to cache responses for popular queries made to the gateway. Responses for these queries are answered directly from the Cloudflare edge, which can result in a ~6x speed-up.

We also use Load balancing and Argo Tunnel for fast, redundant, and secure content delivery. With Argo Smart Routing enabled, requests and responses to our Ethereum gateway are tunnelled directly from our Ethereum node to the Cloudflare edge using the best possible routing.

Similar to our IPFS gateway, cloudflare-eth.com is an SSL for SaaS provider. This means that anyone can set up the Cloudflare Ethereum gateway as a backend for access to the Ethereum network through their own registered domains. For more details on how to set up your own domain with this functionality, see the Ethereum tab on cloudflare.com/distributed-web-gateway.

With these features, you can use Cloudflare’s Distributed Web Gateway to create a fully decentralized website with an interactive backend that allows interaction with the IPFS and Ethereum networks. For example, you can host your content on IPFS (using something like Pinata to pin the files), and then host the website backend as a smart contract on Ethereum. This architecture does not require a centralized server for hosting files or the actual website. Added to the power, speed, and security provided by Cloudflare’s edge network, your website is delivered to users around the world with unparalleled efficiency.

Embracing a distributed future

At Cloudflare, we support technologies that help distribute trust. By providing a gateway to the Ethereum network, we hope to facilitate the growth of a decentralized future.

We thank the Ethereum Foundation for their support of a new gateway in expanding the distributed web:

“Cloudflare’s Ethereum Gateway increases the options for thin-client applications as well as decentralization of the Ethereum ecosystem, and I can’t think of a better person to do this work than Cloudflare. Allowing access through a user’s custom hostname is a particularly nice touch. Bravo.” – Dr. Virgil Griffith, Head of Special Projects, Ethereum Foundation.

We hope that by allowing anyone to use the gateway as the backend for their domain, we make the Ethereum network more accessible for everyone; with the added speed and security brought by serving this content directly from Cloudflare’s global edge network.

So, go forth and build our vision – the distributed crypto-future!

Today, as part of Crypto Week 2019, we are excited to announce Cloudflare’s Ethereum Gateway, where you can interact with the Ethereum network without installing any additional software on your computer.

This is another tool in Cloudflare’s Distributed Web Gateway tool set. Currently, Cloudflare lets you host content on the InterPlanetary File System (IPFS) and access it through your own custom domain. Similarly, the new Ethereum Gateway allows access to the Ethereum network, which you can provision through your custom hostname.

This setup makes it possible to add interactive elements to sites powered by Ethereum smart contracts, a decentralized computing platform. And, in conjunction with the IPFS gateway, this allows hosting websites and resources in a decentralized manner, and has the extra bonus of the added speed, security, and reliability provided by the Cloudflare edge network. You can access our Ethereum gateway directly at https://cloudflare-eth.com.

This brief primer on how Ethereum and smart contracts work has examples of the many possibilities of using the Cloudflare Distributed Web Gateway.

Primer on Ethereum

You may have heard of Ethereum as a cryptocurrency. What you may not know is that Ethereum is so much more. Ethereum is a distributed virtual computing network that stores and enforces smart contracts.

So, what is a smart contract?

Good question. Ethereum smart contracts are simply a piece of code stored on the Ethereum blockchain. When the contract is triggered, it runs on the Ethereum Virtual Machine (EVM). The EVM is a distributed virtual machine that runs smart contract code and produces cryptographically verified changes to the state of the Ethereum blockchain as its result.

To illustrate the power of smart contracts, let’s consider a little example.

Anna wants to start a VPN provider but she lacks the capital. To raise funds for her venture she decides to hold an Initial Coin Offering (ICO). Rather than design an ICO contract from scratch Anna bases her contract off of ERC-20. ERC-20 is a template for issuing fungible tokens, perfect for ICOs. Anna sends her ERC-20 compliant contract to the Ethereum network, and starts to sell stock in her new company, VPN Co.

Once she’s sorted out funds, Anna sits down and starts to write a smart contract. Anna’s contract asks customers to send her their public key, along with some Ether (the coin product of Ethereum). She then authorizes the public key to access her VPN service. All without having to hold any secret information. Huzzah!

Next, rather than set up the infrastructure to run a VPN herself, Anna decides to use the blockchain again, but this time as a customer. Cloud Co. sells managed cloud infrastructure using their own smart contract. Anna programs her contract to send the appropriate amount of Ether to Cloud Co.’s contract. Cloud Co. then provisions the servers she needs to host her VPN. By automatically purchasing more infrastructure every time she has a new customer, her VPN company can scale totally autonomously.

Finally, Anna pays dividends to her investors out of the profits, keeping a little for herself.

And there you have it.

A decentralised, autonomous, smart VPN provider.

A smart contract stored on the blockchain has an associated account for storing funds, and the contract is triggered when someone sends Ether to that account. So for our VPN example, the provisioning contract triggers when someone transfers money into the account associated with Anna’s contract.

What distinguishes smart contracts from ordinary code?

The “smart” part of a smart contract is they run autonomously. The “contract” part is the guarantee that the code runs as written.

Because this contract is enforced cryptographically, maintained in the tamper-resistant medium of the blockchain and verified by the consensus of the network, these contracts are more reliable than regular contracts which can provoke dispute.

Ethereum Smart Contracts vs. Traditional Contracts

A regular contract is enforced by the court system, litigated by lawyers. The outcome is uncertain; different courts rule differently and hiring more or better lawyers can swing the odds in your favor.

Smart contract outcomes are predetermined and are nearly incorruptible. However, here be dragons: though the outcome can be predetermined and incorruptible, a poorly written contract might not have the intended behavior, and because contracts are immutable, this is difficult to fix.

How are smart contracts written?

You can write smart contracts in a number of languages, some of which are Turing complete, e.g. Solidity. A Turing complete language lets you write code that can evaluate any computable function. This puts Solidity in the same class of languages as Python and Java. The compiled bytecode is then run on the EVM.

The EVM differs from a standard VM in a number of ways:

The EVM is distributed

Each piece of code is run by numerous nodes. Nodes verify the computation before accepting a block, and therefore ensure that miners who want their blocks accepted must always run the EVM honestly. A block is only considered accepted when more than half of the network accepts it. This is the consensus part of Ethereum.

The EVM is entirely deterministic

This means that the same inputs to a function always produce the same outputs. Because regular VMs have access to file storage and the network, the results of a function call can be non-deterministic. Every EVM has the same start state, thus a given set of inputs always gives the same outputs. This makes the EVM more reliable than a standard VM.

There are two big gotchas that come with this determinism:

EVM bytecode is Turing complete and therefore discerning the outputs without running the computation is not always possible.

Ethereum smart contracts can store state on the blockchain. This means that the output of the function can vary as the blockchain changes. Although, technically this is deterministic in that the blockchain is an input to the function, it may still be impossible to derive the output in advance.

This however means that they suffer from the same problems as any piece of software – bugs. However, unlike normal code where the authors can issue a patch, code stored on the blockchain is immutable. More problematically, even if the author provides a new smart contract, the old one is always still available on the blockchain.

This means that when writing contracts authors must be especially careful to write secure code, and include a kill switch to ensure that if bugs do reside in the code, they can be squashed. If there is no kill switch and there are vulnerabilities in the smart contract that can be exploited, it can potentially lead to the theft of resources from the smart contract or from other individuals. EVM Bytecode includes a special SELFDESTRUCT opcode that deletes a contract, and sends all funds to the specified address for just this purpose.

The need to include a kill switch was brought into sharp focus during the infamous DAO incident. The DAO smart contract acted as a complex decentralized venture capital (VC) fund and held Ether worth $250 million at its peak collected from a group of investors. Hackers exploited vulnerabilities in the smart contract and stole Ether worth $50 million.

Because there is no way to undo transactions in Ethereum, there was a highly controversial “hard fork,” where the majority of the community agreed to accept a block with an “irregular state change” that essentially drained all DAO funds into a special “WithdrawDAO” recovery contract. By convincing enough miners to accept this irregular block as valid, the DAO could return funds.

Not everyone agreed with the change. Those who disagreed rejected the irregular block and formed the Ethereum Classic network, with both branches of the fork growing independently.

Kill switches, however, can cause their own problems. For example, when a contract used as a library flips its kill switch, all contracts relying on this contract can no longer operate as intended, even though the underlying library code is immutable. This caused over 500,000 ETH to become stuck in multi-signature wallets when an attacker triggered the kill switch of an underlying library.

Users of the multi-signature library assumed the immutability of the code meant that the library would always operate as anticipated. But the smart contracts that interact with the blockchain are only deterministic when accounting for the state of the blockchain.

In the wake of the DAO, various tools were created that check smart contracts for bugs or enable bug bounties, for example Securify and The Hydra.

Another way smart contracts avoid bugs is using standardized patterns. For example, ERC-20 defines a standardized interface for producing tokens such as those used in ICOs, and ERC-721 defines a standardized interface for implementing non-fungible tokens. Non-fungible tokens can be used for trading-card games like CryptoKitties. CryptoKitties is a trading-card style game built on the Ethereum blockchain. Players can buy, sell, and breed cats, with each cat being unique.

CryptoKitties is built on a collection of smart contracts that provides an open-source Application Binary Interface (ABI) for interacting with the KittyVerse — the virtual world of the CryptoKitties application. An ABI simply allows you to call functions in a contract and receive any returned data. The KittyBase code may look like this:

Besides defining what a Kitty is, this contract defines two basic functions for transferring and creating kitties. Both are internal and can only be called by contracts that implement KittyBase. The KittyOwnership contract implements both ERC-721 and KittyBase, and implements an external transfer function that calls the internal _transfer function. This code is compiled into bytecode written to the blockchain.

By implementing a standardised interface like ERC-721, smart contracts that aren’t specifically aware of CryptoKitties can still interact with the KittyVerse. The CryptoKitties ABI functions allow users to create distributed apps (dApps), of their own design on top of the KittyVerse, and allow other users to use their dApps. This extensibility helps demonstrate the potential of smart contracts.

How is this so different?

Smart contracts are, by definition, public. Everyone can see the terms and understand where the money goes. This is a radically different approach to providing transparency and accountability. Because all contracts and transactions are public and verified by consensus, trust is distributed between the people, rather than centralized in a few big institutions.

The trust given to institutions is historic in that we trust them because they have previously demonstrated trustworthiness.

The trust placed in consensus-based algorithms is based on the assumption that most people are honest, or more accurately, that no sufficiently large subset of people can collude to produce a malicious outcome. This is the democratisation of trust.

In the case of the DAO attack, a majority of nodes agreed to accept an “irregular” state transition. This effectively undid the damage of the attack and demonstrates how, at least in the world of blockchain, perception is reality. Because most people “believed” (accepted) this irregular block, it became a “real,” valid block. Most people think of the blockchain as immutable, and trust the power of consensus to ensure correctness, however if enough people agree to do something irregular, they don’t have to keep the rules.

So where does Cloudflare fit in?

Accessing the Ethereum network and its attendant benefits directly requires running complex software, including downloading and cryptographically verifying hundreds of gigabytes of data, which apart from producing technical barriers to entry for users, can also exclude people with low-power devices.

To help those users and devices access the Ethereum network, the Cloudflare Ethereum gateway allows any device capable of accessing the web to interact with the Ethereum network in a safe, reliable way.

Through our gateway, not only can you explore the blockchain, but if you give our gateway a signed transaction, we’ll push it to the network to allow miners to add it to their blockchain. This means that you can send Ether and even put new contracts on the blockchain without having to run a node.

“But Jonathan,” I hear you say, “by providing a gateway aren’t you just making Cloudflare a centralizing institution?”

That’s a fair question. Thankfully, Cloudflare won’t be alone in offering these gateways. We’re joining alongside organizations, such as Infura, to expand the constellation of gateways that already exist. We hope that, by providing a fast, reliable service, we can enable people who never previously used smart-contracts to do so, and in so doing bring the benefits they offer to billions of regular Internet users.

“We’re excited that Cloudflare is bringing their infrastructure expertise to the Ethereum ecosystem. Infura has always believed in the importance of standardized, open APIs and compatibility between gateway providers, so we look forward to collaborating with their team to build a better distributed web.” – E.G. Galano, Infura co-founder.

By providing a gateway to the Ethereum network, we help users make the jump from general web-user to cryptocurrency native, and eventually make the distributed web a fundamental part of the Internet.

What can you do with Cloudflare’s Gateway?

Visit cloudflare-eth.com to interact with our example app. But to really explore the Ethereum world, access the RPC API, where you can do anything that can be done on the Ethereum network itself, from examining contracts, to transferring funds.

Our Gateway accepts POST requests containing JSON. For a complete list of calls, visit the Ethereum github page. So, to get the block number of the most recent block, you could run:

We also invite developers to build dApps based on our Ethereum gateway using our API. Our API allows developers to build websites powered by the Ethereum blockchain. Check out developer docs to get started. If you want to read more about how Ethereum works check out this deep dive.

The architecture

Cloudflare is uniquely positioned to host an Ethereum gateway, and we have the utmost faith in the products we offer to customers. This is why the Cloudflare Ethereum gateway runs as a Cloudflare customer and we dogfood our own products to provide a fast and reliable gateway. The domain we run the gateway on (https://cloudflare-eth.com) uses Cloudflare Workers to cache responses for popular queries made to the gateway. Responses for these queries are answered directly from the Cloudflare edge, which can result in a ~6x speed-up.

We also use Load balancing and Argo Tunnel for fast, redundant, and secure content delivery. With Argo Smart Routing enabled, requests and responses to our Ethereum gateway are tunnelled directly from our Ethereum node to the Cloudflare edge using the best possible routing.

Similar to our IPFS gateway, cloudflare-eth.com is an SSL for SaaS provider. This means that anyone can set up the Cloudflare Ethereum gateway as a backend for access to the Ethereum network through their own registered domains. For more details on how to set up your own domain with this functionality, see the Ethereum tab on cloudflare.com/distributed-web-gateway.

With these features, you can use Cloudflare’s Distributed Web Gateway to create a fully decentralized website with an interactive backend that allows interaction with the IPFS and Ethereum networks. For example, you can host your content on IPFS (using something like Pinata to pin the files), and then host the website backend as a smart contract on Ethereum. This architecture does not require a centralized server for hosting files or the actual website. Added to the power, speed, and security provided by Cloudflare’s edge network, your website is delivered to users around the world with unparalleled efficiency.

Embracing a distributed future

At Cloudflare, we support technologies that help distribute trust. By providing a gateway to the Ethereum network, we hope to facilitate the growth of a decentralized future.

We thank the Ethereum Foundation for their support of a new gateway in expanding the distributed web:

“Cloudflare’s Ethereum Gateway increases the options for thin-client applications as well as decentralization of the Ethereum ecosystem, and I can’t think of a better person to do this work than Cloudflare. Allowing access through a user’s custom hostname is a particularly nice touch. Bravo.” – Dr. Virgil Griffith, Head of Special Projects, Ethereum Foundation.

We hope that by allowing anyone to use the gateway as the backend for their domain, we make the Ethereum network more accessible for everyone; with the added speed and security brought by serving this content directly from Cloudflare’s global edge network.

So, go forth and build our vision – the distributed crypto-future!

When we launched our InterPlanetary File System (IPFS) gateway last year we were blown away by the positive reception. Countless people gave us valuable suggestions for improvement and made open-source contributions to make serving content through our gateway easy (many captured in our developer docs). Since then, our gateway has grown to regularly handle over a thousand requests per second, and has become the primary access point for several IPFS websites.

We’re committed to helping grow IPFS and have taken what we have learned since our initial release to improve our gateway. So far, we’ve done the following:

Automatic Cache Purge

One of the ways we tried to improve the performance of our gateway when we initially set it up was by setting really high cache TTLs. After all, content on IPFS is largely meant to be static. The complaint we heard though, was that site owners were frustrated at wait times upwards of several hours for changes to their website to propagate.

The way an IPFS gateway knows what content to serve when it receives a request for a given domain is by looking up the value of a TXT record associated with the domain – the DNSLink record. The value of this TXT record is the hash of the entire site, which changes if any one bit of the website changes. So we wrote a Worker script that makes a DNS-over-HTTPS query to 1.1.1.1 and bypasses cache if it sees that the DNSLink record of a domain is different from when the content was originally cached.

Checking DNS gives the illusion of a much lower cache TTL and usually adds less than 5ms to a request, whereas revalidating the cache with a request to the origin could take anywhere from 30ms to 300ms. And as an additional usability bonus, the 1.1.1.1 cache automatically purges when Cloudflare customers change their DNS records. Customers who don’t manage their DNS records with us can purge their cache using this tool.

Beta Testing for Orange-to-Orange

Our gateway was originally based on a feature called SSL for SaaS. This tweaks the way our edge works to allow anyone, Cloudflare customers or not, to CNAME their own domain to a target domain on our network, and have us send traffic we see for their domain to the target domain’s origin. SSL for SaaS keeps valid certificates for these domains in the Cloudflare database (hence the name), and applies the target domain’s configuration to these requests (for example, enforcing Page Rules) before they reach the origin.

The great thing about SSL for SaaS is that it doesn’t require being on the Cloudflare network. New people can start serving their websites through our gateway with their existing DNS provider, instead of migrating everything over. All Cloudflare settings are inherited from the target domain. This is a huge convenience, but also means that the source domain can’t customize their settings even if they do migrate.

This can be improved by an experimental feature called Orange-to-Orange (O2O) from the Cloudflare Edge team. O2O allows one zone on Cloudflare to CNAME to another zone, and apply the settings of both zones in layers. For example, cloudflare-ipfs.com has Always Use HTTPS turned off for various reasons, which means that every site served through our gateway also does. O2O allows site owners to override this setting by enabling Always Use HTTPS just for their website, if they know it’s okay, as well as adding custom Page Rules and Worker scripts to embed all sorts of complicated logic.

If you’d like to try this out on your domain, open a support ticket with this request and we will enable it for you in the coming weeks.

Subdomain-based Gateway

To host an application on IPFS it’s pretty much essential to have a custom domain for your app. We discussed all the reasons for this in our post, End-to-End Integrity with IPFS – essentially saying that because browsers only sandbox websites at the domain-level, serving an app directly from a gateway’s URL is not secure because another (malicious) app could steal its data.

Having a custom domain gives apps a secure place to keep user data, but also makes it possible for whoever controls the DNS for the domain to change a website’s content without warning. To provide both a secure context to apps as well as eternal immutability, Cloudflare set up a subdomain-based gateway at cf-ipfs.com.

cf-ipfs.com doesn’t respond to requests to the root domain, only at subdomains, where it interprets the subdomain as the hash of the content to serve. This means a request to https://<hash>.cf-ipfs.com is the equivalent of going to https://cloudflare-ipfs.com/ipfs/<hash>. The only technicality is that because domain names are case-insensitive, the hash must be re-encoded from Base58 to Base32. Luckily, the standard IPFS client provides a utility for this!

The main downside of the subdomain approach is that for clients without Encrypted SNI support, the hash is leaked to the network as part of the TLS handshake. This can be bad for privacy and enable network-level censorship.

Enabling Session Affinity

Loading a website usually requires fetching more than one asset from a backend server, and more often than not, “more than one” is more like “more than a dozen.” When that website is being loaded over IPFS, it dramatically improves performance when the IPFS node can make one connection and re-use it for all assets.

Behind the curtain, we run several IPFS nodes to reduce the likelihood of an outage and improve throughput. Unfortunately, with the way it was originally setup, each request for a different asset on a website would likely go to a different IPFS node and all those connections would have to be made again.

We fixed this by replacing the original backend load balancer with our own Load Balancing product that supports Session Affinity and automatically directs requests from the same user to the same IPFS node, minimizing redundant network requests.

Connecting with Pinata

And finally, we’ve configured our IPFS nodes to maintain a persistent connection to the nodes run by Pinata, a company that helps people pin content to the IPFS network. Having a persistent connection significantly improves the performance and reliability of requests to our gateway, for content on their network. Pinata has written their own blog post, which you can find here, that describes how to upload a website to IPFS and keep it online with a combination of Cloudflare and Pinata.

As always, we look forward to seeing what the community builds on top of our work, and hearing about how else Cloudflare can improve the Internet.

Trust on the Internet is underpinned by the Public Key Infrastructure (PKI). PKI grants servers the ability to securely serve websites by issuing digital certificates, providing the foundation for encrypted and authentic communication.

Certificates make HTTPS encryption possible by using the public key in the certificate to verify server identity. HTTPS is especially important for websites that transmit sensitive data, such as banking credentials or private messages. Thankfully, modern browsers, such as Google Chrome, flag websites not secured using HTTPS by marking them “Not secure,” allowing users to be more security conscious of the websites they visit.

Now that we know what certificates are used for, let’s talk about where they come from.

Certificate Authorities

Certificate Authorities (CAs) are the institutions responsible for issuing certificates.

When issuing a certificate for any given domain, they use Domain Control Validation (DCV) to verify that the entity requesting a certificate for the domain is the legitimate owner of the domain. With DCV the domain owner:

creates a DNS resource record for a domain;

uploads a document to the web server located at that domain; OR

proves ownership of the domain’s administrative email account.

The DCV process prevents adversaries from obtaining private-key and certificate pairs for domains not owned by the requestor.

Preventing adversaries from acquiring this pair is critical: if an incorrectly issued certificate and private-key pair wind up in an adversary’s hands, they could pose as the victim’s domain and serve sensitive HTTPS traffic. This violates our existing trust of the Internet, and compromises private data on a potentially massive scale.

For example, an adversary that tricks a CA into mis-issuing a certificate for gmail.com could then perform TLS handshakes while pretending to be Google, and exfiltrate cookies and login information to gain access to the victim’s Gmail account. The risks of certificate mis-issuance are clearly severe.

Domain Control Validation

To prevent attacks like this, CAs only issue a certificate after performing DCV. One way of validating domain ownership is through HTTP validation, done by uploading a text file to a specific HTTP endpoint on the webserver they want to secure. Another DCV method is done using email verification, where an email with a validation code link is sent to the administrative contact for the domain.

HTTP Validation

Suppose Alice buys the domain name aliceswonderland.com and wants to get a dedicated certificate for this domain. Alice chooses to use Let’s Encrypt as their certificate authority. First, Alice must generate their own private key and create a certificate signing request (CSR). She sends the CSR to Let’s Encrypt, but the CA won’t issue a certificate for that CSR and private key until they know Alice owns aliceswonderland.com. Alice can then choose to prove that she owns this domain through HTTP validation.

When Let’s Encrypt performs DCV over HTTP, they require Alice to place a randomly named file in the /.well-known/acme-challenge path for her website. The CA must retrieve the text file by sending an HTTP GET request to http://aliceswonderland.com/.well-known/acme-challenge/<random_filename>. An expected value must be present on this endpoint for DCV to succeed.

For HTTP validation, Alice would upload a file to http://aliceswonderland.com/.well-known/acme-challenge/YnV0dHNz

where the body contains:

curl http://aliceswonderland.com/.well-known/acme-challenge/YnV0dHNz
GET /.well-known/acme-challenge/LoqXcYV8...jxAjEuX0
Host: aliceswonderland.com
HTTP/1.1 200 OK
Content-Type: application/octet-stream
YnV0dHNz.TEST_CLIENT_KEY

The CA instructs them to use the Base64 token YnV0dHNz. TEST_CLIENT_KEY in an account-linked key that only the certificate requestor and the CA know. The CA uses this field combination to verify that the certificate requestor actually owns the domain. Afterwards, Alice can get her certificate for her website!

DNS Validation

Another way users can validate domain ownership is to add a DNS TXT record containing a verification string or token from the CA to their domain’s resource records. For example, here’s a domain for an enterprise validating itself towards Google:

$ dig TXT aliceswonderland.com
aliceswonderland.com. 28 IN TXT "google-site-verification=COanvvo4CIfihirYW6C0jGMUt2zogbE_lC6YBsfvV-U"

Here, Alice chooses to create a TXT DNS resource record with a specific token value. A Google CA can verify the presence of this token to validate that Alice actually owns her website.

Types of BGP Hijacking Attacks

Certificate issuance is required for servers to securely communicate with clients. This is why it’s so important that the process responsible for issuing certificates is also secure. Unfortunately, this is not always the case.

Researchers at Princeton University recently discovered that common DCV methods are vulnerable to attacks executed by network-level adversaries. If Border Gateway Protocol (BGP) is the “postal service” of the Internet responsible for delivering data through the most efficient routes, then Autonomous Systems (AS) are individual post office branches that represent an Internet network run by a single organization. Sometimes network-level adversaries advertise false routes over BGP to steal traffic, especially if that traffic contains something important, like a domain’s certificate.

Bamboozling Certificate Authorities with BGP highlights five types of attacks that can be orchestrated during the DCV process to obtain a certificate for a domain the adversary does not own. After implementing these attacks, the authors were able to (ethically) obtain certificates for domains they did not own from the top five CAs: Let’s Encrypt, GoDaddy, Comodo, Symantec, and GlobalSign. But how did they do it?

Attacking the Domain Control Validation Process

There are two main approaches to attacking the DCV process with BGP hijacking:

Sub-Prefix Attack

Equally-Specific-Prefix Attack

These attacks create a vulnerability when an adversary sends a certificate signing request for a victim’s domain to a CA. When the CA verifies the network resources using an HTTP GET request (as discussed earlier), the adversary then uses BGP attacks to hijack traffic to the victim’s domain in a way that the CA’s request is rerouted to the adversary and not the domain owner. To understand how these attacks are conducted, we first need to do a little bit of math.

Every device on the Internet uses an IP (Internet Protocol) address as a numerical identifier. IPv4 addresses contain 32 bits and follow a slash notation to indicate the size of the prefix. So, in the network address 123.1.2.0/24, “/24” refers to how many bits the network contains. This means that there are 8 bits left that contain the host addresses, for a total of 256 host addresses. The smaller the prefix number, the more host addresses remain in the network. With this knowledge, let’s jump into the attacks!

Attack one: Sub-Prefix Attack

When BGP announces a route, the router always prefers to follow the more specific route. So if 123.0.0.0/8 and 123.1.2.0/24 are advertised, the router will use the latter as it is the more specific prefix. This becomes a problem when an adversary makes a BGP announcement to a specific IP address while using the victim’s domain IP address. Let’s say the IP address for our victim, leagueofentropy.com, is 123.0.0.0/8. If an adversary announces the prefix 123.1.2.0/24, then they will capture the victim’s traffic, launching a sub-prefix hijack attack.

For example, in an attack during April 2018, routes were announced with the more specific /24 vs. the existing /23. In the diagram below, /23 is Texas and /24 is the more specific Austin, Texas. The new (but nefarious) routes overrode the existing routes for portions of the Internet. The attacker then ran a nefarious DNS server on the normal IP addresses with DNS records pointing at some new nefarious web server instead of the existing server. This attracted the traffic destined for the victim’s domain within the area the nefarious routes were being propagated. The reason this attack was successful was because a more specific prefix is always preferred by the receiving routers.

Attack two: Equally-Specific-Prefix Attack

In the last attack, the adversary was able to hijack traffic by offering a more specific announcement, but what if the victim’s prefix is /24 and a sub-prefix attack is not viable? In this case, an attacker would launch an equally-specific-prefix hijack, where the attacker announces the same prefix as the victim. This means that the AS chooses the preferred route between the victim and the adversary’s announcements based on properties like path length. This attack only ever intercepts a portion of the traffic.

There are more advanced attacks that are covered in more depth in the paper. They are fundamentally similar attacks but are more stealthy.

Once an attacker has successfully obtained a bogus certificate for a domain that they do not own, they can perform a convincing attack where they pose as the victim’s domain and are able to decrypt and intercept the victim’s TLS traffic. The ability to decrypt the TLS traffic allows the adversary to completely Monster-in-the-Middle (MITM) encrypted TLS traffic and reroute Internet traffic destined for the victim’s domain to the adversary. To increase the stealthiness of the attack, the adversary will continue to forward traffic through the victim’s domain to perform the attack in an undetected manner.

DNS Spoofing

Another way an adversary can gain control of a domain is by spoofing DNS traffic by using a source IP address that belongs to a DNS nameserver. Because anyone can modify their packets’ outbound IP addresses, an adversary can fake the IP address of any DNS nameserver involved in resolving the victim’s domain, and impersonate a nameserver when responding to a CA.

This attack is more sophisticated than simply spamming a CA with falsified DNS responses. Because each DNS query has its own randomized query identifiers and source port, a fake DNS response must match the DNS query’s identifiers to be convincing. Because these query identifiers are random, making a spoofed response with the correct identifiers is extremely difficult.

Adversaries can fragment User Datagram Protocol (UDP) DNS packets so that identifying DNS response information (like the random DNS query identifier) is delivered in one packet, while the actual answer section follows in another packet. This way, the adversary spoofs the DNS response to a legitimate DNS query.

Say an adversary wants to get a mis-issued certificate for victim.com by forcing packet fragmentation and spoofing DNS validation. The adversary sends a DNS nameserver for victim.com a DNS packet with a small Maximum Transmission Unit, or maximum byte size. This gets the nameserver to start fragmenting DNS responses. When the CA sends a DNS query to a nameserver for victim.com asking for victim.com’s TXT records, the nameserver will fragment the response into the two packets described above: the first contains the query ID and source port, which the adversary cannot spoof, and the second one contains the answer section, which the adversary can spoof. The adversary can continually send a spoofed answer to the CA throughout the DNS validation process, in the hopes of sliding their spoofed answer in before the CA receives the real answer from the nameserver.

In doing so, the answer section of a DNS response (the important part!) can be falsified, and an adversary can trick a CA into mis-issuing a certificate.

Solution

At first glance, one could think a Certificate Transparency log could expose a mis-issued certificate and allow a CA to quickly revoke it. CT logs, however, can take up to 24 hours to include newly issued certificates, and certificate revocation can be inconsistently followed among different browsers. We need a solution that allows CAs to proactively prevent this attacks, not retroactively address them.

We’re excited to announce that Cloudflare provides CAs a free API to leverage our global network to perform DCV from multiple vantage points around the world. This API bolsters the DCV process against BGP hijacking and off-path DNS attacks.

Given that Cloudflare runs 175+ datacenters around the world, we are in a unique position to perform DCV from multiple vantage points. Each datacenter has a unique path to DNS nameservers or HTTP endpoints, which means that successful hijacking of a BGP route can only affect a subset of DCV requests, further hampering BGP hijacks. And since we use RPKI, we actually sign and verify BGP routes.

This DCV checker additionally protects CAs against off-path, DNS spoofing attacks. An additional feature that we built into the service that helps protect against off-path attackers is DNS query source IP randomization. By making the source IP unpredictable to the attacker, it becomes more challenging to spoof the second fragment of the forged DNS response to the DCV validation agent.

By comparing multiple DCV results collected over multiple paths, our DCV API makes it virtually impossible for an adversary to mislead a CA into thinking they own a domain when they actually don’t. CAs can use our tool to ensure that they only issue certificates to rightful domain owners.

Our multipath DCV checker consists of two services:

DCV agents responsible for performing DCV out of a specific datacenter, and

a DCV orchestrator that handles multipath DCV requests from CAs and dispatches them to a subset of DCV agents.

When a CA wants to ensure that DCV occurred without being intercepted, it can send a request to our API specifying the type of DCV to perform and its parameters.

The DCV orchestrator then forwards each request to a random subset of over 20 DCV agents in different datacenters. Each DCV agent performs the DCV request and forwards the result to the DCV orchestrator, which aggregates what each agent observed and returns it to the CA.

This approach can also be generalized to performing multipath queries over DNS records, like Certificate Authority Authorization (CAA) records. CAA records authorize CAs to issue certificates for a domain, so spoofing them to trick unauthorized CAs into issuing certificates is another attack vector that multipath observation prevents.

As we were developing our multipath checker, we were in contact with the Princeton research group that introduced the proof-of-concept (PoC) of certificate mis-issuance through BGP hijacking attacks. Prateek Mittal, coauthor of the Bamboozling Certificate Authorities with BGP paper, wrote:

“Our analysis shows that domain validation from multiple vantage points significantly mitigates the impact of localized BGP attacks. We recommend that all certificate authorities adopt this approach to enhance web security. A particularly attractive feature of Cloudflare’s implementation of this defense is that Cloudflare has access to a vast number of vantage points on the Internet, which significantly enhances the robustness of domain control validation.”

Our DCV checker follows our belief that trust on the Internet must be distributed, and vetted through third-party analysis (like that provided by Cloudflare) to ensure consistency and security. This tool joins our pre-existing Certificate Transparency monitor as a set of services CAs are welcome to use in improving the accountability of certificate issuance.

An Opportunity to Dogfood

Building our multipath DCV checker also allowed us to dogfood multiple Cloudflare products.

The DCV orchestrator as a simple fetcher and aggregator was a fantastic candidate for Cloudflare Workers. We implemented the orchestrator in TypeScript using this post as a guide, and created a typed, reliable orchestrator service that was easy to deploy and iterate on. Hooray that we don’t have to maintain our own dcv-orchestrator server!

We use Argo Tunnel to allow Cloudflare Workers to contact DCV agents. Argo Tunnel allows us to easily and securely expose our DCV agents to the Workers environment. Since Cloudflare has approximately 175 datacenters running DCV agents, we expose many services through Argo Tunnel, and have had the opportunity to load test Argo Tunnel as a power user with a wide variety of origins. Argo Tunnel readily handled this influx of new origins!

Getting Access to the Multipath DCV Checker

If you and/or your organization are interested in trying our DCV checker, email [email protected] and let us know! We’d love to hear more about how multipath querying and validation bolsters the security of your certificate issuance.

As a new class of BGP and IP spoofing attacks threaten to undermine PKI fundamentals, it’s important that website owners advocate for multipath validation when they are issued certificates. We encourage all CAs to use multipath validation, whether it is Cloudflare’s or their own. Jacob Hoffman-Andrews, Tech Lead, Let’s Encrypt, wrote:

“BGP hijacking is one of the big challenges the web PKI still needs to solve, and we think multipath validation can be part of the solution. We’re testing out our own implementation and we encourage other CAs to pursue multipath as well”

Hopefully in the future, website owners will look at multipath validation support when selecting a CA.

To kick-off Crypto Week 2019, we are really excited to announce a new solution to a long-standing problem in cryptography. To get a better understanding of the technical side behind this problem, please refer to the next post for a deeper dive.

Everything from cryptography to big money lottery to quantum mechanics requires some form of randomness. But what exactly does it mean for a number to be randomly generated and where does the randomness come from?

Generating randomness dates back three thousand years, when the ancients rolled “the bones” to determine their fate. Think of lotteries– seems simple, right? Everyone buys their tickets, chooses six numbers, and waits for an official to draw them randomly from a basket. Sounds like a foolproof solution. And then in 1980, the host of the Pennsylvania lottery drawing was busted for using weighted balls to choose the winning number. This lesson, along with the need of other complex systems for generating random numbers spurred the creation of random number generators.

Just like a lottery game selects random numbers unpredictably, a random number generator is a device or software responsible for generating sequences of numbers in an unpredictable manner. As the need for randomness has increased, so has the need for constant generation of substantially large, unpredictable numbers. This is why organizations developed publicly available randomness beacons — servers generating completely unpredictable 512-bit strings (about 155-digit numbers) at regular intervals.

Now, you might think using a randomness beacon for random generation processes, such as those needed for lottery selection, would make the process resilient against adversarial manipulation, but that’s not the case. Single-source randomness has been exploited to generate biased results.

Today, randomness beacons generate numbers for lotteries and election audits — both affect the lives and fortunes of millions of people. Unfortunately, exploitation of the single point of origin of these beacons have created dishonest results that benefited one corrupt insider. To thwart exploitation efforts, Cloudflare and other randomness-beacon providers have joined forces to bring users a quorum of decentralized randomness beacons. After all, eight independent globally distributed beacons can be much more trustworthy than one!

We’re happy to introduce you to ….

THE LEAGUE …. OF …. ENTROPY !!!!!!

What is a randomness beacon?

A randomness beacon is a public service that provides unpredictable random numbers at regular intervals.

For every randomness generation round, drand provides the following properties, as specified in the research paper:

Availability – The distributed randomness generation completes successfully with high probability.

Unpredictability – No party learns anything about the random output of the current round, except with negligible probability, until a sufficient number of drand nodes reveals their contributions in the randomness generation protocol.

Unbiasability – The random output represents an unbiased, uniformly random value, except with negligible probability.

Verifiability – The random output is third-party verifiable against the collective public key computed during drand’s setup. This serves as the unforgettable attestation that the documented set of drand nodes ran the protocol to produce the one-and-only random output, except with negligible probability.

Entropy measures the unpredictable nature of a number. For randomness, the more entropy the better, so naturally it’s where we got our name, the League of Entropy.

Our founding members are contributing their individual high-entropy sources to provide a more random and unpredictable beacon to generate publicly verifiable random values every sixty seconds. The fact that the drand beacon is decentralized and built using appropriate, provably-secure cryptographic primitives, increases our confidence that it possesses all the aforementioned properties.

This global network of servers generating randomness ensures that even if a few servers are offline, the beacon continues to produce new numbers by using the remaining online servers. Even if one or two of the servers or their entropy sources were to be compromised, the rest will still ensure that the jointly-produced entropy is fully unpredictable and unbiasable.

Who exactly is running this beacon? Currently, The League of Entropy is a consortium of global organizations and individual contributors, including: Cloudflare, Protocol Labs researcher Nicolas Gailly, University of Chile, École polytechnique fédérale de Lausanne (EPFL), Kudelski Security, and EPFL researchers, Philipp Jovanovic and Ludovic Barman.

Meet the League of Entropy

Cloudflare’s LavaRand: LavaRand sources her high entropy from Cloudflare’s wall of lava lamps at our San Francisco Headquarters. The unpredictable flow of “lava” inside the lamps is used as an input to a camera feed into a CSPRNG (Cryptographically Secure PseudoRandom Number Generator) that generates the random value.

EPFL’s URand: URand’s power comes from the local randomness generator present on every computer at /dev/urandom. The randomness input is collected from inputs such as keyboard presses, mouse clicks, network traffic, etc. URand bundles these random inputs to produce a continuous stream of randomness.

UChile’s Seismic Girl: Seismic Girl extracts super verifiable randomness from five sources queried every minute. These sources include: seismic measurements of shakes and earthquakes in Chile; a stream from a local radio station; a selection of Twitter posts; data from the Ethereum blockchain; and their own off-the-shelf RNG card.

Kudelski Security’s ChaChaRand: ChaChaRand uses a CRNG (Cryptographic Random Number Generator) based on the ChaCha20 stream cipher.

Protocol Labs’ InterplanetaryRand: InterplanetaryRand uses the power of entropy to ensure protocol safety across space and time by using environmental noise and the Linux PRNG, supplemented by CPU-sourced randomness (RdRand).

Together, our heroes are committed to #savetheinternet by combining their randomness to form a globally distributed and cryptographically verifiable randomness beacon.

Public versus Private Randomness

Different types of randomness are needed for different types of applications.

The trick to generating secure cryptographic keys is to use large, privately-generated random numbers that no one else can predict. With randomness beacons publicly generating and announcing random numbers, users should NOT be using the output of a randomness beacon for their secret keys, as these numbers are accessible by anyone. If an attacker can guess the random number that a user’s private cryptographic key was derived from, they can crack their system and decrypt confidential information. This simply means that random numbers generated by a public beacon are not safe to use for encryption keys: not because there’s anything wrong with the randomness, but simply because the randomness is public.

Clients using the drand beacon can request private randomness from some or all of the drand nodes if they would like to generate a random value that will not be publicly announced. For more information on how to do this, check the developer docs .

On the other hand, public randomness is often employed by users requiring a randomness value that is not supposed to be secret but whose generation must be transparent, fair, and unbiased. This is perfect for many purposes such as games, lotteries, and election auditing, where the auditor and the public require transparency into when and how and how fairly the random value was generated. The League of Entropy provides public randomness that any user can retrieve from leagueofentropy.com. Users will be able to view the 512-bit string value that is generated every 60 seconds. Why 60 seconds? No particular reason. Theoretically, the randomness generation can go as fast as the hardware allows, but it’s not necessary for most use cases. Values generated every 60 seconds give users 1440 random values in one 24-hour period.

*FRIENDLY REMINDER: THIS RANDOMNESS IS PUBLIC. DO NOT USE IT FOR PRIVATE CRYPTOGRAPHIC KEYS*

Why does public randomness matter?

Election auditing

In the US, most elections are followed by an audit to verify they were unbiased and conducted fairly. Robust auditing systems increase voter confidence by improving election officials’ ability to respond effectively to allegations of fraud, and to detect bugs in the system.

Currently, most election ballots and precincts are randomly chosen by election officials. This approach is potentially vulnerable to bias by a corrupt insider who might select certain precincts to present a preferred outcome. Even in a situation where every voter district was tampered with, by using a robust, distributed, and most importantly, unpredictable and unbiasable beacon, election auditors can trust that a small sample of districts are enough to audit, as long as an attacker cannot predict district selection.

In Chile, election poll workers are randomly selected from a pool of eligible voters. The University of Chile’s Random UChile project has been working on a prototype that uses their randomness beacon for this process. Alejandro Hevia, leader of Random UChile, believes that for election auditing, public randomness is important for transparency and distributed randomness gives people the ability to trust the unlikeliness that multiple contributors to the beacon colluded, as opposed to trusting a single entity.

Lotteries

From 2005 to 2014, the information security director for the Multi-State Lottery Association, Eddie Tipton, rigged a random number generator and won the lottery six times!

Tipton could predict the winning numbers by skipping the standard random seeding process. He was able to insert into the function of the random number generator code that checked the date, day of the week, and time. If these three variables did not align, the random number generator used radioactive material and a Geiger counter to generate a random seed. If the variables aligned as surreptitiously programmed, which usually only happened once a year, then it would generate the seed using a 7-variable formula fed into a Mersenne Twister, a pseudo random-number generator.

Tipton knew these 7 variables. He knew the small pool of numbers that might be the seed. This knowledge allowed him to predict the results of the Mersenne Twister. This is a scam which a distributed randomness beacon can make substantially more difficult, if not impossible.

Rob Sand, the former Iowa Assistant Attorney General and current Iowa State Auditor who prosecuted the Tipton cases, is also an advocate for improved controls. He said:

“There is no excuse for an industry that rakes in $80 billion in annual revenue not to use the most sophisticated, truly random means available to ensure integrity.”

Distributed ledger platforms

In many cryptocurrencies and blockchain-based distributed computing platforms, such as Ethereum, there is often a need for random selection at the application layer. One solution to prevent bias for such a random selection is to use a distributed randomness beacon like drand to generate the random value. Justin Drake, researcher at the Ethereum Foundation, believes “randomness from a drand-type federation could be a particularly good match for real-time decentralized applications on Ethereum such as live gaming and gambling”. This is due to the possibility to deliver ultra-low latency randomness applicable for a broad range of application where public randomness is required.

Let’s get you on drand!

To learn more about the League of Entropy and how to use the distributed randomness beacon, visit leagueofentropy.com. The website periodically displays the randomness generated by the network, and you can even see previously generated values. Go ahead, try it out!

How to join the league:

Want to join the league?? We’re not exclusive!

If you are an organization or an individual who is interested in contributing to the drand beacon, check out the developer docs for more information regarding the requirements for setting up a server and joining the existing group. drand is currently in its beta release phase and an approval request must be sent to [email protected] in order to be approved as a contributing server.

Looking into the future

It only makes sense that the Internet of the future will demand unpredictable randomness beacons. The League of Entropy is out there now, creating the basis for future systems to leverage trustworthy public randomness. Our goal is to increase user trust and provide a one-stop shop for all your public entropy needs. Come, join us!

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.