# Securing infrastructure at scale with Cloudflare Access

Post Syndicated from Jeremy Bernick original https://blog.cloudflare.com/access-wildcard-subdomain/

I rarely have to deal with the hassle of using a corporate VPN and I hope it remains this way. As a new member of the Cloudflare team, that seems possible. Coworkers who joined a few years ago did not have that same luck. They had to use a VPN to get any work done. What changed?

Cloudflare released Access, and now we’re able to do our work without ever needing a VPN again. Access is a way to control access to your internal applications and infrastructure. Today, we’re releasing a new feature to help you replace your VPN by deploying Access at an even greater scale.

### Access in an instant

Access replaces a corporate VPN by evaluating every request made to a resource secured behind Access. Administrators can make web applications, remote desktops, and physical servers available at dedicated URLs, configured as DNS records in Cloudflare. These tools are protected via access policies, set by the account owner, so that only authenticated users can access those resources. These end users are able to be authenticated over both HTTPS and SSH requests. They’re prompted to login with their SSO credentials and Access redirects them to the application or server.

For your team, Access makes your internal web applications and servers in your infrastructure feel as seamless to reach as your SaaS tools. Originally we built Access to replace our own corporate VPN. In practice, this became the fastest way to control who can reach different pieces of our own infrastructure. However, administrators configuring Access were required to create a discrete policy per each application/hostname. Now, administrators don’t have to create a dedicated policy for each new resource secured by Access; one policy will cover each URL protected.

When Access launched, the product’s primary use case was to secure internal web applications. Creating unique rules for each was tedious, but manageable. Access has since become a centralized way to secure infrastructure in many environments. Now that companies are using Access to secure hundreds of resources, that method of building policies no longer fits.

Starting today, Access users can build policies using a wildcard subdomain to replace the typical bottleneck that occurs when replacing dozens or even hundreds of bespoke rules within a single policy. With a wildcard, the same ruleset will now automatically apply to any subdomain your team generates that is gated by Access.

### How can teams deploy at scale with wildcard subdomains?

Administrators can secure their infrastructure with a wildcard policy in the Cloudflare dashboard. With Access enabled, Cloudflare adds identity-based evaluation to that traffic.

In the Access dashboard, you can now build a rule to secure any subdomain of the site you added to Cloudflare. Create a new policy and enter a wildcard tag (“*”) into the subdomain field. You can then configure rules, at a granular level, using your identity provider to control who can reach any subdomain of that apex domain.

This new policy will propagate to all 180 of Cloudflare’s data centers in seconds and any new subdomains created will be protected.

### How are teams using it?

Since releasing this feature in a closed beta, we’ve seen teams use it to gate access to their infrastructure in several new ways. Many teams use Access to secure dev and staging environments of sites that are being developed before they hit production. Whether for QA or collaboration with partner agencies, Access helps make it possible to share sites quickly with a layer of authentication. With wildcard subdomains, teams are deploying dozens of versions of new sites at new URLs without needing to touch the Access dashboard.

For example, an administrator can create a policy for “*.example.com” and then developers can deploy iterations of sites at “dev-1.example.com” and “dev-2.example.com” and both inherit the global Access policy.

The feature is also helping teams lock down their entire hybrid, on-premise, or public cloud infrastructure with the Access SSH feature. Teams can assign dynamic subdomains to their entire fleet of servers, regardless of environment, and developers and engineers can reach them over an SSH connection without a VPN. Administrators can now bring infrastructure online, in an entirely new environment, without additional or custom security rules.

### What about creating DNS records?

Cloudflare Access requires users to associate a resource with a domain or subdomain. While the wildcard policy will cover all subdomains, teams will still need to connect their servers to the Cloudflare network and generate DNS records for those services.

Argo Tunnel can reduce that burden significantly. Argo Tunnel lets you expose a server to the Internet without opening any inbound ports. The service runs a lightweight daemon on your server that initiates outbound tunnels to the Cloudflare network.

Instead of managing DNS, network, and firewall complexity, Argo Tunnel helps administrators serve traffic from their origin through Cloudflare with a single command. That single command will generate the DNS record in Cloudflare automatically, allowing you to focus your time on building and managing your infrastructure.

### What’s next?

More teams are adopting a hybrid or multi-cloud model for deploying their infrastructure. In the past, these teams were left with just two options for securing those resources: peering a VPN with each provider or relying on custom IAM flows with each environment. In the end, both of these solutions were not only quite costly but also equally unmanageable.

While infrastructure benefits from becoming distributed, security is something that is best when controlled in a single place. Access can consolidate how a team controls who can reach their entire fleet of servers and services.

# Get Cloudflare insights in your preferred analytics provider

Post Syndicated from Simon Steiner original https://blog.cloudflare.com/cloudflare-partners-with-analytics-providers/

Today, we’re excited to announce our partnerships with Chronicle Security, Datadog, Elastic, Looker, Splunk, and Sumo Logic to make it easy for our customers to analyze Cloudflare logs and metrics using their analytics provider of choice. In a joint effort, we have developed pre-built dashboards that are available as a Cloudflare App in each partner’s platform. These dashboards help customers better understand events and trends from their websites and applications on our network.

#### Cloudflare insights in the tools you’re already using

Data analytics is a frequent theme in conversations with Cloudflare customers. Our customers want to understand how Cloudflare speeds up their websites and saves them bandwidth, ranks their fastest and slowest pages, and be alerted if they are under attack. While providing insights is a core tenet of Cloudflare’s offering, the data analytics market has matured and many of our customers have started using third-party providers to analyze data—including Cloudflare logs and metrics. By aggregating data from multiple applications, infrastructure, and cloud platforms in one dedicated analytics platform, customers can create a single pane of glass and benefit from better end-to-end visibility over their entire stack.

While these analytics platforms provide great benefits in terms of functionality and flexibility, they can take significant time to configure: from ingesting logs, to specifying data models that make data searchable, all the way to building dashboards to get the right insights out of the raw data. We see this as an opportunity to partner with the companies our customers are already using to offer a better and more integrated solution.

#### Providing flexibility through easy-to-use integrations

To address these complexities of aggregating, managing, and displaying data, we have developed a number of product features and partnerships to make it easier to get insights out of Cloudflare logs and metrics. In February we announced Logpush, which allows customers to automatically push Cloudflare logs to Google Cloud Storage and Amazon S3. Both of these cloud storage solutions are supported by the major analytics providers as a source for collecting logs, making it possible to get Cloudflare logs into an analytics platform with just a few clicks. With today’s announcement of Cloudflare’s Analytics Partnerships, we’re releasing a Cloudflare App—a set of pre-built and fully customizable dashboards—in each partner’s app store or integrations catalogue to make the experience even more seamless.

By using these dashboards, customers can immediately analyze events and trends of their websites and applications without first needing to wade through individual log files and build custom searches. The dashboards feature all 55+ fields available in Cloudflare logs and include 90+ panels with information about the performance, security, and reliability of customers’ websites and applications.

# Get Cloudflare insights in your preferred analytics provider

Post Syndicated from Simon Steiner original https://blog.cloudflare.com/cloudflare-partners-with-analytics-providers/

Today, we’re excited to announce our partnerships with Chronicle Security, Datadog, Elastic, Looker, Splunk, and Sumo Logic to make it easy for our customers to analyze Cloudflare logs and metrics using their analytics provider of choice. In a joint effort, we have developed pre-built dashboards that are available as a Cloudflare App in each partner’s platform. These dashboards help customers better understand events and trends from their websites and applications on our network.

### Cloudflare insights in the tools you’re already using

Data analytics is a frequent theme in conversations with Cloudflare customers. Our customers want to understand how Cloudflare speeds up their websites and saves them bandwidth, ranks their fastest and slowest pages, and be alerted if they are under attack. While providing insights is a core tenet of Cloudflare’s offering, the data analytics market has matured and many of our customers have started using third-party providers to analyze data—including Cloudflare logs and metrics. By aggregating data from multiple applications, infrastructure, and cloud platforms in one dedicated analytics platform, customers can create a single pane of glass and benefit from better end-to-end visibility over their entire stack.

While these analytics platforms provide great benefits in terms of functionality and flexibility, they can take significant time to configure: from ingesting logs, to specifying data models that make data searchable, all the way to building dashboards to get the right insights out of the raw data. We see this as an opportunity to partner with the companies our customers are already using to offer a better and more integrated solution.

### Providing flexibility through easy-to-use integrations

To address these complexities of aggregating, managing, and displaying data, we have developed a number of product features and partnerships to make it easier to get insights out of Cloudflare logs and metrics. In February we announced Logpush, which allows customers to automatically push Cloudflare logs to Google Cloud Storage and Amazon S3. Both of these cloud storage solutions are supported by the major analytics providers as a source for collecting logs, making it possible to get Cloudflare logs into an analytics platform with just a few clicks. With today’s announcement of Cloudflare’s Analytics Partnerships, we’re releasing a Cloudflare App—a set of pre-built and fully customizable dashboards—in each partner’s app store or integrations catalogue to make the experience even more seamless.

By using these dashboards, customers can immediately analyze events and trends of their websites and applications without first needing to wade through individual log files and build custom searches. The dashboards feature all 55+ fields available in Cloudflare logs and include 90+ panels with information about the performance, security, and reliability of customers’ websites and applications.

# Introducing time.cloudflare.com

Post Syndicated from Guest Author original https://blog.cloudflare.com/secure-time/

This is a guest post by Aanchal Malhotra, a Graduate Research Assistant at Boston University and former Cloudflare intern on the Cryptography team.

Cloudflare has always been a leader in deploying secure versions of insecure Internet protocols and making them available for free for anyone to use. In 2014, we launched one of the world’s first free, secure HTTPS service (Universal SSL) to go along with our existing free HTTP plan. When we launched the 1.1.1.1 DNS resolver, we also supported the new secure versions of DNS (DNS over HTTPS and DNS over TLS). Today, we are doing the same thing for the Network Time Protocol (NTP), the dominant protocol for obtaining time over the Internet.

This announcement is personal for me. I’ve spent the last four years identifying and fixing vulnerabilities in time protocols. Today I’m proud to help introduce a service that would have made my life from 2015 through 2019 a whole lot harder: time.cloudflare.com, a free time service that supports both NTP and the emerging Network Time Security (NTS) protocol for securing NTP. Now, anyone can get time securely from all our datacenters in 180 cities around the world.

You can use time.cloudflare.com as the source of time for all your devices today with NTP, while NTS clients are still under development. NTPsec includes experimental support for NTS. If you’d like to get updates about NTS client development, email us asking to join at [email protected]. To use NTS to secure time synchronization, reach out to your vendors and inquire about NTS support.

### A small tale of “time” first

Back in 2015, as a fresh graduate student interested in Internet security, I came across this mostly esoteric Internet protocol called the Network Time Protocol (NTP). NTP was designed to synchronize time between computer systems communicating over unreliable and variable-latency network paths. I was actually studying Internet routing security, in particular attacks against the Resource Public Key Infrastructure (RPKI), and kept hitting a dead end because of a cache-flushing issue. As a last-ditch effort I decided to roll back the time on my computer manually, and the attack worked.

I had discovered the importance of time to computer security. Most cryptography uses timestamps to limit certificate and signature validity periods. When connecting to a website, knowledge of the correct time ensures that the certificate you see is current and is not compromised by an attacker. When looking at logs, time synchronization makes sure that events on different machines can be correlated accurately. Certificates and logging infrastructure can break with minutes, hours or months of time difference. Other applications like caching and Bitcoin are sensitive to even very small differences in time on the order of seconds.

Two factor authentication using rolling numbers also rely on accurate clocks. This then creates the need for computer clocks to have access to reasonably accurate time that is securely delivered. NTP is the most commonly used protocol for time synchronization on the Internet. If an attacker can leverage vulnerabilities in NTP to manipulate time on computer clocks, they can undermine the security guarantees provided by these systems.

Motivated by the severity of the issue, I decided to look deeper into NTP and its security. Since the need for synchronizing time across networks was visible early on, NTP is a very old protocol. The first standardized version of NTP dates back to 1985, while the latest NTP version 4 was completed in 2010 (see RFC5905).

In its most common mode, NTP works by having a client send a query packet out to an NTP server that then responds with its clock time. The client then computes an estimate of the difference between its clock and the remote clock and attempts to compensate for network delay in this. NTP client queries multiple servers and implements algorithms to select the best estimate, and rejects clearly wrong answers.

Surprisingly enough, research on NTP and its security was not very active at the time. Before this, in late 2013 and early 2014, high-profile Distributed Denial of Service (DDoS) attacks were carried out by amplifying traffic from NTP servers; attackers able to spoof a victim’s IP address were able to funnel copious amounts of traffic overwhelming the targeted domains. This caught the attention of some researchers. However, these attacks did not exploit flaws in the fundamental protocol design. The attackers simply used NTP as a boring bandwidth multiplier. Cloudflare wrote extensively about these attacks and you can read about it here, here, and here.

I found several flaws in the core NTP protocol design and its implementation that can be exploited by network attackers to launch much more devastating attacks by shifting time or denying service to NTP clients. What is even more concerning was that these attackers do not need to be a Monster-In-The-Middle (MITM), where an attacker can modify traffic between the client and the server, to mount these attacks. A set of recent papers authored by one of us showed that an off-path attacker present anywhere on the network can shift time or deny service to NTP clients. One of the ways this is done is by abusing IP fragmentation.

Fragmentation is a feature of the IP layer where a large packet is chopped into several smaller fragments so that they can pass through the networks that do not support large packets. Basically, any random network element on the path between the client and the server can send a special “ICMP fragmentation needed” packet to the server telling them to fragment the packet to say X bytes. Since the server is not expected to know the IP addresses of all the network elements on its path, this packet can be sent from any source IP.

In our attack, the attacker exploits this feature to make the NTP server fragment its NTP response packet for the victim NTP client. The attacker then spoofs carefully crafted overlapping response fragments from off-path that contain the attacker’s timestamp values. By further exploiting the reassembly policies for overlapping fragments the attacker fools the client into assembling a packet with legitimate fragments and the attacker’s insertions. This evades the authenticity checks that rely on values in the original parts of the packet.

### NTP’s past and future

At the time of NTP’s creation back in 1985, there were two main design goals for the service provided by NTP. First, they wanted it to be robust enough to handle networking errors and equipment failures. So it was designed as a service where client can gather timing samples from multiple peers over multiple communication paths and then average them to get more accurate measurement.

The second goal was load distribution. While every client would like to talk to time servers which are directly attached to high precision time-keeping devices like atomic clocks, GPS, etc, and thus have more accurate time, the capacity of those devices is only so much. So, to reduce protocol load on the network, the service was designed in a hierarchical manner. At the top of the hierarchy are servers connected to non-NTP time sources, that distribute time to other servers, that further distribute time to even more servers. Most computers connect to either these second or third level servers.

The original specification (RFC 958) also states the “non-goals” of the protocol, namely peer authentication and data integrity. Security wasn’t considered critical in the relatively small and trusting early Internet, and the protocols and applications that rely on time for security didn’t exist then. Securing NTP came second to improving the protocol and implementation.

As the Internet has grown, more and more core Internet protocols have been secured through cryptography to protect against abuse: TLS, DNSSEC, RPKI are all steps toward ensuring the security of all communications on the Internet. These protocols use “time” to provide security guarantees. Since security of Internet hinges on the security of NTP, it becomes even more important to secure NTP.

This research perspicuously showed the need for securing NTP. As a result, there was more work in the standards body for Internet Protocols, the Internet Engineering Task Force (IETF) towards cryptographically authenticating NTP. At the time, even though NTPv4 supported both symmetric and asymmetric cryptographic authentication, it was rarely used in practice due to limitations of both approaches.

NTPv4’s symmetric approach to securing synchronization doesn’t scale as the symmetric key must be pre-shared and configured manually: imagine if every client on earth needed a special secret key with the servers they wanted to get time from, the organizations that run those servers would have to do a great deal of work managing keys. This makes this solution quite cumbersome for public servers that must accept queries from arbitrary clients. For context, NIST operates important public time servers and distributes symmetric keys only to users that register, once per year, via US mail or facsimile; the US Naval Office does something similar.

The first attempt to solve the problem of key distribution was the Autokey protocol, described in RFC 5906. Many public NTP servers do not support Autokey (e.g., the NIST and USNO time servers, and many servers in pool.ntp.org). The protocol is badly broken as any network attacker can trivially retrieve the secret key shared between the client and server. The authentication mechanisms are non-standard and quite idiosyncratic.

The future of the Internet is a secure Internet, which means an authenticated and encrypted Internet. But until now NTP remains mostly insecure, despite continuing protocol development. In the meantime more and more services depended on it.

### Fixing the problem

Following the release of our paper, there was a lot more enthusiasm in the NTP community at standards body for Internet Protocols, the Internet Engineering Task Force (IETF) and outside for improving the state of NTP security. As a short-term fix, the ntpd reference implementation software was patched for several vulnerabilities that we found. And for a long-term solution, the community realized the dire need for a secure, authenticated time synchronization protocol based on public-key cryptography, which enables encryption and authentication without requiring the sharing of key material beforehand. Today we have a Network Time Security (NTS) draft at the IETF, thanks to the work of dozens of dedicated individuals at the NTP working group.

In a nutshell, the NTS protocol is divided into two-phases. The first phase is the NTS key exchange that establishes the necessary key material between the NTP client and the server. This phase uses the Transport Layer Security (TLS) handshake and relies on the same public key infrastructure as the web. Once the keys are exchanged, the TLS channel is closed and the protocol enters the second phase. In this phase the results of that TLS handshake are used to authenticate NTP time synchronization packets via extension fields. The interested reader can find more information in the Internet draft.

### Cloudflare’s new service

Today, Cloudflare announces its free time service to anyone on the Internet. We intend to solve the limitations with the existing public time services, in particular by increasing availability, robustness and security.

We use our global network to provide an advantage in latency and accuracy. Our 180 locations around the world all use anycast to automatically route your packets to our closest server. All of our servers are synchronized with stratum 1 time service providers, and then offer NTP to the general public, similar to how other public NTP providers function. The biggest source of inaccuracy for time synchronization protocols is the network asymmetry, leading to a difference in travel times between the client and server and back from the server to the client. However, our servers’ proximity to users means there will be less jitter — a measurement of variance in latency on the network — and possible asymmetry in packet paths. We also hope that in regions with a dearth of NTP servers our service significantly improves the capacity and quality of the NTP ecosystem.

Cloudflare servers obtain authenticated time by using a shared symmetric key with our stratum 1 upstream servers. These upstream servers are geographically spread and ensure that our servers have accurate time in our datacenters. But this approach to securing time doesn’t scale. We had to exchange emails individually with the organizations that run stratum 1 servers, as well as negotiate permission to use them. While this is a solution for us, it isn’t a solution for everyone on the Internet.

As a secure time service provider Cloudflare is proud to announce that we are among the first to offer a free and secure public time service based on Network Time Security. We have implemented the latest NTS IETF draft. As this draft progresses through the Internet standards process we are committed to keeping our service current.

Most NTP implementations are currently working on NTS support, and we expect that the next few months will see broader introduction as well as advancement of the current draft protocol to an RFC. Currently we have interoperability with NTPsec who have implemented draft 18 of NTS. We hope that our service will spur faster adoption of this important improvement to Internet security. Because this is a new service with no backwards compatibility requirements, we are requiring the use of TLS v1.3 with it to promote adoption of the most secure version of TLS.

### Use it

If you have an NTS client, point it at time.cloudflare.com:1234. Otherwise point your NTP client at time.cloudflare.com. More details on configuration are available in the developer docs.

### Conclusion

From our Roughtime service to Universal SSL Cloudflare has played a role in expanding the availability and use of secure protocols. Now with our free public time service we provide a trustworthy, widely available alternative to another insecure legacy protocol. It’s all a part of our mission to help make a faster, reliable, and more secure Internet for everyone.

Thanks to the many other engineers who worked on this project, including Watson Ladd, Gabbi Fisher, and Dina Kozlov

# Introducing time.cloudflare.com

Post Syndicated from Guest Author original https://blog.cloudflare.com/secure-time/

This is a guest post by Aanchal Malhotra, a Graduate Research Assistant at Boston University and former Cloudflare intern on the Cryptography team.

Cloudflare has always been a leader in deploying secure versions of insecure Internet protocols and making them available for free for anyone to use. In 2014, we launched one of the world’s first free, secure HTTPS service (Universal SSL) to go along with our existing free HTTP plan. When we launched the 1.1.1.1 DNS resolver, we also supported the new secure versions of DNS (DNS over HTTPS and DNS over TLS). Today, as part of Crypto Week 2019, we are doing the same thing for the Network Time Protocol (NTP), the dominant protocol for obtaining time over the Internet.

This announcement is personal for me. I’ve spent the last four years identifying and fixing vulnerabilities in time protocols. Today I’m proud to help introduce a service that would have made my life from 2015 through 2019 a whole lot harder: time.cloudflare.com, a free time service that supports both NTP and the emerging Network Time Security (NTS) protocol for securing NTP. Now, anyone can get time securely from all our datacenters in 180 cities around the world.

You can use time.cloudflare.com as the source of time for all your devices today with NTP, while NTS clients are still under development. NTPsec includes experimental support for NTS. If you’d like to get updates about NTS client development, email us asking to join at [email protected]. To use NTS to secure time synchronization, reach out to your vendors and inquire about NTS support.

### A small tale of “time” first

Back in 2015, as a fresh graduate student interested in Internet security, I came across this mostly esoteric Internet protocol called the Network Time Protocol (NTP). NTP was designed to synchronize time between computer systems communicating over unreliable and variable-latency network paths. I was actually studying Internet routing security, in particular attacks against the Resource Public Key Infrastructure (RPKI), and kept hitting a dead end because of a cache-flushing issue. As a last-ditch effort I decided to roll back the time on my computer manually, and the attack worked.

I had discovered the importance of time to computer security. Most cryptography uses timestamps to limit certificate and signature validity periods. When connecting to a website, knowledge of the correct time ensures that the certificate you see is current and is not compromised by an attacker. When looking at logs, time synchronization makes sure that events on different machines can be correlated accurately. Certificates and logging infrastructure can break with minutes, hours or months of time difference. Other applications like caching and Bitcoin are sensitive to even very small differences in time on the order of seconds.

Two factor authentication using rolling numbers also rely on accurate clocks. This then creates the need for computer clocks to have access to reasonably accurate time that is securely delivered. NTP is the most commonly used protocol for time synchronization on the Internet. If an attacker can leverage vulnerabilities in NTP to manipulate time on computer clocks, they can undermine the security guarantees provided by these systems.

Motivated by the severity of the issue, I decided to look deeper into NTP and its security. Since the need for synchronizing time across networks was visible early on, NTP is a very old protocol. The first standardized version of NTP dates back to 1985, while the latest NTP version 4 was completed in 2010 (see RFC5905).

In its most common mode, NTP works by having a client send a query packet out to an NTP server that then responds with its clock time. The client then computes an estimate of the difference between its clock and the remote clock and attempts to compensate for network delay in this. NTP client queries multiple servers and implements algorithms to select the best estimate, and rejects clearly wrong answers.

Surprisingly enough, research on NTP and its security was not very active at the time. Before this, in late 2013 and early 2014, high-profile Distributed Denial of Service (DDoS) attacks were carried out by amplifying traffic from NTP servers; attackers able to spoof a victim’s IP address were able to funnel copious amounts of traffic overwhelming the targeted domains. This caught the attention of some researchers. However, these attacks did not exploit flaws in the fundamental protocol design. The attackers simply used NTP as a boring bandwidth multiplier. Cloudflare wrote extensively about these attacks and you can read about it here, here, and here.

I found several flaws in the core NTP protocol design and its implementation that can be exploited by network attackers to launch much more devastating attacks by shifting time or denying service to NTP clients. What is even more concerning was that these attackers do not need to be a Monster-In-The-Middle (MITM), where an attacker can modify traffic between the client and the server, to mount these attacks. A set of recent papers authored by one of us showed that an off-path attacker present anywhere on the network can shift time or deny service to NTP clients. One of the ways this is done is by abusing IP fragmentation.

Fragmentation is a feature of the IP layer where a large packet is chopped into several smaller fragments so that they can pass through the networks that do not support large packets. Basically, any random network element on the path between the client and the server can send a special “ICMP fragmentation needed” packet to the server telling them to fragment the packet to say X bytes. Since the server is not expected to know the IP addresses of all the network elements on its path, this packet can be sent from any source IP.

In our attack, the attacker exploits this feature to make the NTP server fragment its NTP response packet for the victim NTP client. The attacker then spoofs carefully crafted overlapping response fragments from off-path that contain the attacker’s timestamp values. By further exploiting the reassembly policies for overlapping fragments the attacker fools the client into assembling a packet with legitimate fragments and the attacker’s insertions. This evades the authenticity checks that rely on values in the original parts of the packet.

### NTP’s past and future

At the time of NTP’s creation back in 1985, there were two main design goals for the service provided by NTP. First, they wanted it to be robust enough to handle networking errors and equipment failures. So it was designed as a service where client can gather timing samples from multiple peers over multiple communication paths and then average them to get more accurate measurement.

The second goal was load distribution. While every client would like to talk to time servers which are directly attached to high precision time-keeping devices like atomic clocks, GPS, etc, and thus have more accurate time, the capacity of those devices is only so much. So, to reduce protocol load on the network, the service was designed in a hierarchical manner. At the top of the hierarchy are servers connected to non-NTP time sources, that distribute time to other servers, that further distribute time to even more servers. Most computers connect to either these second or third level servers.

The original specification (RFC 958) also states the “non-goals” of the protocol, namely peer authentication and data integrity. Security wasn’t considered critical in the relatively small and trusting early Internet, and the protocols and applications that rely on time for security didn’t exist then. Securing NTP came second to improving the protocol and implementation.

As the Internet has grown, more and more core Internet protocols have been secured through cryptography to protect against abuse: TLS, DNSSEC, RPKI are all steps toward ensuring the security of all communications on the Internet. These protocols use “time” to provide security guarantees. Since security of Internet hinges on the security of NTP, it becomes even more important to secure NTP.

This research perspicuously showed the need for securing NTP. As a result, there was more work in the standards body for Internet Protocols, the Internet Engineering Task Force (IETF) towards cryptographically authenticating NTP. At the time, even though NTPv4 supported both symmetric and asymmetric cryptographic authentication, it was rarely used in practice due to limitations of both approaches.

NTPv4’s symmetric approach to securing synchronization doesn’t scale as the symmetric key must be pre-shared and configured manually: imagine if every client on earth needed a special secret key with the servers they wanted to get time from, the organizations that run those servers would have to do a great deal of work managing keys. This makes this solution quite cumbersome for public servers that must accept queries from arbitrary clients. For context, NIST operates important public time servers and distributes symmetric keys only to users that register, once per year, via US mail or facsimile; the US Naval Office does something similar.

The first attempt to solve the problem of key distribution was the Autokey protocol, described in RFC 5906. Many public NTP servers do not support Autokey (e.g., the NIST and USNO time servers, and many servers in pool.ntp.org). The protocol is badly broken as any network attacker can trivially retrieve the secret key shared between the client and server. The authentication mechanisms are non-standard and quite idiosyncratic.

The future of the Internet is a secure Internet, which means an authenticated and encrypted Internet. But until now NTP remains mostly insecure, despite continuing protocol development. In the meantime more and more services depended on it.

### Fixing the problem

Following the release of our paper, there was a lot more enthusiasm in the NTP community at standards body for Internet Protocols, the Internet Engineering Task Force (IETF) and outside for improving the state of NTP security. As a short-term fix, the ntpd reference implementation software was patched for several vulnerabilities that we found. And for a long-term solution, the community realized the dire need for a secure, authenticated time synchronization protocol based on public-key cryptography, which enables encryption and authentication without requiring the sharing of key material beforehand. Today we have a Network Time Security (NTS) draft at the IETF, thanks to the work of dozens of dedicated individuals at the NTP working group.

In a nutshell, the NTS protocol is divided into two-phases. The first phase is the NTS key exchange that establishes the necessary key material between the NTP client and the server. This phase uses the Transport Layer Security (TLS) handshake and relies on the same public key infrastructure as the web. Once the keys are exchanged, the TLS channel is closed and the protocol enters the second phase. In this phase the results of that TLS handshake are used to authenticate NTP time synchronization packets via extension fields. The interested reader can find more information in the Internet draft.

### Cloudflare’s new service

Today, Cloudflare announces its free time service to anyone on the Internet. We intend to solve the limitations with the existing public time services, in particular by increasing availability, robustness and security.

We use our global network to provide an advantage in latency and accuracy. Our 180 locations around the world all use anycast to automatically route your packets to our closest server. All of our servers are synchronized with stratum 1 time service providers, and then offer NTP to the general public, similar to how other public NTP providers function. The biggest source of inaccuracy for time synchronization protocols is the network asymmetry, leading to a difference in travel times between the client and server and back from the server to the client. However, our servers’ proximity to users means there will be less jitter — a measurement of variance in latency on the network — and possible asymmetry in packet paths. We also hope that in regions with a dearth of NTP servers our service significantly improves the capacity and quality of the NTP ecosystem.

Cloudflare servers obtain authenticated time by using a shared symmetric key with our stratum 1 upstream servers. These upstream servers are geographically spread and ensure that our servers have accurate time in our datacenters. But this approach to securing time doesn’t scale. We had to exchange emails individually with the organizations that run stratum 1 servers, as well as negotiate permission to use them. While this is a solution for us, it isn’t a solution for everyone on the Internet.

As a secure time service provider Cloudflare is proud to announce that we are among the first to offer a free and secure public time service based on Network Time Security. We have implemented the latest NTS IETF draft. As this draft progresses through the Internet standards process we are committed to keeping our service current.

Most NTP implementations are currently working on NTS support, and we expect that the next few months will see broader introduction as well as advancement of the current draft protocol to an RFC. Currently we have interoperability with NTPsec who have implemented draft 18 of NTS. We hope that our service will spur faster adoption of this important improvement to Internet security. Because this is a new service with no backwards compatibility requirements, we are requiring the use of TLS v1.3 with it to promote adoption of the most secure version of TLS.

### Use it

If you have an NTS client, point it at time.cloudflare.com:1234. Otherwise point your NTP client at time.cloudflare.com. More details on configuration are available in the developer docs.

### Conclusion

From our Roughtime service to Universal SSL Cloudflare has played a role in expanding the availability and use of secure protocols. Now with our free public time service we provide a trustworthy, widely available alternative to another insecure legacy protocol. It’s all a part of our mission to help make a faster, reliable, and more secure Internet for everyone.

Thanks to the many other engineers who worked on this project, including Watson Ladd, Gabbi Fisher, and Dina Kozlov

# The Quantum Menace

Post Syndicated from Armando Faz-Hernández original https://blog.cloudflare.com/the-quantum-menace/

Over the last few decades, the word ‘quantum’ has become increasingly popular. It is common to find articles, reports, and many people interested in quantum mechanics and the new capabilities and improvements it brings to the scientific community. This topic not only concerns physics, since the development of quantum mechanics impacts on several other fields such as chemistry, economics, artificial intelligence, operations research, and undoubtedly, cryptography.

This post begins a trio of blogs describing the impact of quantum computing on cryptography, and how to use stronger algorithms resistant to the power of quantum computing.

• This post introduces quantum computing and describes the main aspects of this new computing model and its devastating impact on security standards; it summarizes some approaches to securing information using quantum-resistant algorithms.
• Due to the relevance of this matter, we present our experiments on a large-scale deployment of quantum-resistant algorithms.
• Our third post introduces CIRCL, open-source Go library featuring optimized implementations of quantum-resistant algorithms and elliptic curve-based primitives.

All of this is part of Cloudflare’s Crypto Week 2019, now fasten your seatbelt and get ready to make a quantum leap.

### What is Quantum Computing?

Back in 1981, Richard Feynman raised the question about what kind of computers can be used to simulate physics. However, some physical phenomena, such as quantum mechanics, cannot be simulated using a classical computer. Then, he conjectured the existence of a computer model that behaves under quantum mechanics rules, which opened a field of research now called quantum computing. To understand the basics of quantum computing, it is necessary to recall how classical computers work, and from that shine a spotlight on the differences between these computational models.

In 1936, Alan Turing and Emil Post independently described models that gave rise to the foundation of the computing model known as the Post-Turing machine, which describes how computers work and allowed further determination of limits for solving problems.

In this model, the units of information are bits, which store one of two possible values, usually denoted by 0 and 1. A computing machine contains a set of bits and performs operations that modify the values of the bits, also known as the machine’s state. Thus, a machine with N bits can be in one of 2ᴺ possible states. With this in mind, the Post-Turing computing model can be abstractly described as a machine of states, in which running a program is translated as machine transitions along the set of states.

A paper David Deutsch published in 1985 describes a computing model that extends the capabilities of a Turing machine based on the theory of quantum mechanics. This computing model introduces several advantages over the Turing model for processing large volumes of information. It also presents unique properties that deviate from the way we understand classical computing. Most of these properties come from the nature of quantum mechanics. We’re going to dive into these details before approaching the concept of quantum computing.

### Superposition

One of the most exciting properties of quantum computing that provides an advantage over the classical computing model is superposition. In physics, superposition is the ability to produce valid states from the addition or superposition of several other states that are part of a system.

Applying these concepts to computing information, it means that there is a system in which it is possible to generate a machine state that represents a (weighted) sum of the states 0 and 1; in this case, the term weighted means that the state can keep track of “the quantity of” 0 and 1 present in the state. In the classical computation model, one bit can only store either the state of 0 or 1, not both; even using two bits, they cannot represent the weighted sum of these states. Hence, to make a distinction from the basic states, quantum computing uses the concept of a quantum bit (qubit) — a unit of information to denote the superposition of two states. This is a cornerstone concept of quantum computing as it provides a way of tracking more than a single state per unit of information, making it a powerful tool for processing information.

So, a qubit represents the sum of two parts: the 0 or 1 state plus the amount each 0/1 state contributes to produce the state of the qubit.

In mathematical notation, qubit $$| \Psi \rangle$$ is an explicit sum indicating that a qubit represents the superposition of the states 0 and 1. This is the Dirac notation used to describe the value of a qubit $$| \Psi \rangle = A | 0 \rangle +B | 1 \rangle$$, where, A and B are complex numbers known as the amplitude of the states 0 and 1, respectively. The value of the basic states is represented by qubits as $$| 0 \rangle = 1 | 0 \rangle + 0 | 1 \rangle$$  and $$| 1 \rangle = 0 | 0 \rangle + 1 | 1 \rangle$$, respectively. The right side of the term contains the abbreviated notation for these special states.

### Measurement

In a classical computer, the values 0 and 1 are implemented as digital signals. Measuring the current of the signal automatically reveals the status of a bit. This means that at any moment the value of the bit can be observed or measured.

The state of a qubit is maintained in a physically closed system, meaning that the properties of the system, such as superposition, require no interaction with the environment; otherwise any interaction, like performing a measurement, can cause interference on the state of a qubit.

Measuring a qubit is a probabilistic experiment. The result is a bit of information that depends on the state of the qubit. The bit, obtained by measuring $$| \Psi \rangle = A | 0 \rangle +B | 1 \rangle$$, will be equal to 0 with probability $$|A|^2$$,  and equal to 1 with probability $$|B|^2$$, where $$|x|$$ represents the absolute value of $$x$$.

From Statistics, we know that the sum of probabilities of all possible events is always equal to 1, so it must hold that $$|A|^2 +|B|^2 =1$$. This last equation motivates to represent qubits as the points of a circle of radius one, and more generally, as the points on the surface of a sphere of radius one, which is known as the Bloch Sphere.

Let’s break it down: If you measure a qubit you also destroy the superposition of the qubit, resulting in a superposition state collapse, where it assumes one of the basics states, providing your final result.

Another way to think about superposition and measurement is through the coin tossing experiment.

Toss a coin in the air and you give people a random choice between two options: heads or tails. Now, don’t focus on the randomness of the experiment, instead note that while the coin is rotating in the air, participants are uncertain which side will face up when the coin lands. Conversely, once the coin stops with a random side facing up, participants are 100% certain of the status.

How does it relate? Qubits are similar to the participants. When a qubit is in a superposition of states, it is tracking the probability of heads or tails, which is the participants’ uncertainty quotient while the coin is in the air. However, once you start to measure the qubit to retrieve its value, the superposition vanishes, and a classical bit value sticks: heads or tails. Measurement is that moment when the coin is static with only one side facing up.

A fair coin is a coin that is not biased. Each side (assume 0=heads and 1=tails) of a fair coin has the same probability of sticking after a measurement is performed. The qubit $$\tfrac{1}{\sqrt{2}}|0\rangle + \tfrac{1}{\sqrt{2}}|1\rangle$$ describes the probabilities of tossing a fair coin. Note that squaring either of the amplitudes results in ½, indicating that there is a 50% chance either heads or tails sticks.

It would be interesting to be able to charge a fair coin at will while it is in the air. Although this is the magic of a professional illusionist, this task, in fact, can be achieved by performing operations over qubits. So, get ready to become the next quantum magician!

### Quantum Gates

A logic gate represents a Boolean function operating over a set of inputs (on the left) and producing an output (on the right). A logic circuit is a set of connected logic gates, a convenient way to represent bit operations.

Other gates are AND, OR, XOR, and NAND, and more. A set of gates is universal if it can generate other gates. For example, NOR and NAND gates are universal since any circuit can be constructed using only these gates.

Quantum computing also admits a description using circuits. Quantum gates operate over qubits, modifying the superposition of the states. For example, there is a quantum gate analogous to the NOT gate, the X gate.

The X quantum gate interchanges the amplitudes of the states of the input qubit.

The Z quantum gate flips the sign’s amplitude of state 1:

Another quantum gate is the Hadamard gate, which generates an equiprobable superposition of the basic states.

Using our coin tossing analogy, the Hadamard gate has the action of tossing a fair coin to the air. In quantum circuits, a triangle represents measuring a qubit, and the resulting bit is indicated by a double-wire.

Other gates, such as the CNOT gate, Pauli’s gates, Toffoli gate, Deutsch gate, are slightly more advanced. Quirk, the open-source playground, is a fun sandbox where you can construct quantum circuits using all of these gates.

### Reversibility

An operation is reversible if there exists another operation that rolls back the output state to the initial state. For instance, a NOT gate is reversible since applying a second NOT gate recovers the initial input.

In contrast, AND, OR, NAND gates are not reversible. This means that some classical computations cannot be reversed by a classic circuit that uses only the output bits. However, if you insert additional bits of information, the operation can be reversed.

Quantum computing mainly focuses on reversible computations, because there’s always a way to construct a reversible circuit to perform an irreversible computation. The reversible version of a circuit could require the use of ancillary qubits as auxiliary (but not temporary) variables.

Due to the nature of composed systems, it could be possible that these ancillas (extra qubits) correlate to qubits of the main computation. This correlation makes it infeasible to reuse ancillas since any modification could have the side-effect on the operation of a reversible circuit. This is like memory assigned to a process by the operating system: the process cannot use memory from other processes or it could cause memory corruption, and processes cannot release their assigned memory to other processes. You could use garbage collection mechanisms for ancillas, but performing reversible computations increases your qubit budget.

### Composed Systems

In quantum mechanics, a single qubit can be described as a single closed system: a system that has no interaction with the environment nor other qubits. Letting qubits interact with others leads to a composed system where more states are represented. The state of a 2-qubit composite system is denoted as $$A_0|00\rangle+A_1|01\rangle+A_2|10\rangle+A_3|11\rangle$$, where, $$A_i$$ values correspond to the amplitudes of the four basic states 00, 01, 10, and 11. This qubit $$\tfrac{1}{2}|00\rangle+\tfrac{1}{2}|01\rangle+\tfrac{1}{2}|10\rangle+\tfrac{1}{2}|11\rangle$$ represents the superposition of these basic states, both having the same probability obtained after measuring the two qubits.

In the classical case, the state of N bits represents only one of 2ᴺ possible states, whereas a composed state of N qubits represents all the 2ᴺ states but in superposition. This is one big difference between these computing models as it carries two important properties: entanglement and quantum parallelism.

### Entanglement

According to the theory behind quantum mechanics, some composed states can be described through the description of its constituents. However, there are composed states where no description is possible, known as entangled states.

The entanglement phenomenon was pointed out by Einstein, Podolsky, and Rosen in the so-called EPR paradox. Suppose there is a composed system of two entangled qubits, in which by performing a measurement in one qubit causes interference in the measurement of the second. This interference occurs even when qubits are separated by a long distance, which means that some information transfer happens faster than the speed of light. This is how quantum entanglement conflicts with the theory of relativity, where information cannot travel faster than the speed of light. The EPR paradox motivated further investigation for deriving new interpretations about quantum mechanics and aiming to resolve the paradox.

Quantum entanglement can help to transfer information at a distance by following a communication protocol. The following protocol examples rely on the fact that Alice and Bob separately possess one of two entangled qubits:

• The superdense coding protocol allows Alice to communicate a 2-bit message $$m_0,m_1$$ to Bob using a quantum communication channel, for example, using fiber optics to transmit photons. All Alice has to do is operate on her qubit according to the value of the message and send the resulting qubit to Bob. Once Bob receives the qubit, he measures both qubits, noting that the collapsed 2-bit state corresponds to Alice’s message.

• The quantum teleportation protocol allows Alice to transmit a qubit to Bob without using a quantum communication channel. Alice measures the qubit to send Bob and her entangled qubit resulting in two bits. Alice sends these bits to Bob, who operates on his entangled qubit according to the bits received and notes that the result state matches the original state of Alice’s qubit.

### Quantum Parallelism

Composed systems of qubits allow representation of more information per composed state. Note that operating on a composed state of N qubits is equivalent to operating over a set of 2ᴺ states in superposition. This procedure is quantum parallelism. In this setting, operating over a large volume of information gives the intuition of performing operations in parallel, like in the parallel computing paradigm; one big caveat is that superposition is not equivalent to parallelism.

Remember that a composed state is a superposition of several states so, a computation that takes a composed state of inputs will result in a composed state of outputs. The main divergence between classical and quantum parallelism is that quantum parallelism can obtain only one of the processed outputs. Observe that a measurement in the output of a composed state causes that the qubits collapse to only one of the outputs, making it unattainable to calculate all computed values.

Although quantum parallelism does not match precisely with the traditional notion of parallel computing, you can still leverage this computational power to get related information.

Deutsch-Jozsa Problem: Assume $$F$$ is a function that takes as input N bits, outputs one bit, and is either constant (always outputs the same value for all inputs) or balanced (outputs 0 for half of the inputs and 1 for the other half). The problem is to determine if $$F$$ is constant or balanced.

The quantum algorithm that solves the Deutsch-Jozsa problem uses quantum parallelism. First, N qubits are initialized in a superposition of 2ᴺ states. Then, in a single shot, it evaluates $$F$$ for all of these states.

The result of applying $$F$$ appears (in the exponent) of the amplitude of the all-zero state. Note that only when $$F$$ is constant is this amplitude, either +1 or -1. If the result of measuring the N qubit is an all-zeros bitstring, then there is a 100% certainty that $$F$$ is constant. Any other result indicates that $$F$$ is balanced.

A deterministic classical algorithm solves this problem using $$2^{N-1}+1$$ evaluations of $$F$$ in the worst case. Meanwhile, the quantum algorithm requires only one evaluation. The Deutsch-Jozsa problem exemplifies the exponential advantage of a quantum algorithm over classical algorithms.

## Quantum Computers

The theory of quantum computing is supported by investigations in the field of quantum mechanics. However, constructing a quantum machine requires a physical system that allows representing qubits and manipulation of states in a reliable and precise way.

The DiVincenzo Criteria require that a physical implementation of a quantum computer must:

1. Be scalable and have well-defined qubits.
2. Be able to initialize qubits to a state.
3. Have long decoherence times to apply quantum error-correcting codes. Decoherence of a qubit happens when the qubit interacts with the environment, for example, when a measurement is performed.
4. Use a universal set of quantum gates.
5. Be able to measure single qubits without modifying others.

Quantum computer physical implementations face huge engineering obstacles to satisfy these requirements. The most important challenge is to guarantee low error rates during computation and measurement. Lowering these rates require techniques for error correction, which add a significant number of qubits specialized on this task. For this reason, the number of qubits of a quantum computer should not be regarded as for classical systems. In a classical computer, the bits of a computer are all effective for performing a calculation, whereas the number of qubits is the sum of the effective qubits (those used to make calculations) plus the ancillas (used for reversible computations) plus the error correction qubits.

Current implementations of quantum computers partially satisfy the DiVincenzo criteria. Quantum adiabatic computers fit in this category since they do not operate using quantum gates. For this reason, they are not considered to be universal quantum computers.

A recurrent problem in optimization is to find the global minimum of an objective function. For example, a route-traffic control system can be modeled as a function that reduces the cost of routing to a minimum. Simulated annealing is a heuristic procedure that provides a good solution to these types of problems. Simulated annealing finds the solution state by slowly introducing changes (the adiabatic process) on the variables that govern the system.

Quantum annealing is the analogous quantum version of simulated annealing. A qubit is initialized into a superposition of states representing all possible solutions to the problem. Here is used the Hamiltonian operator, which is the sum of vectors of potential and kinetic energies of the system. Hence, the objective function is encoded using this operator describing the evolution of the system in correspondence with time. Then, if the system is allowed to evolve very slowly, it will eventually land on a final state representing the optimal value of the objective function.

Currently, there exist adiabatic computers in the market, such as the D-Wave and IBM Q systems, featuring hundreds of qubits; however, their capabilities are somewhat limited to some problems that can be modeled as optimization problems. The limits of adiabatic computers were studied by van Dam et al, showing that despite solving local searching problems and even some instances of the max-SAT problem, there exists harder searching problems this computing model cannot efficiently solve.

### Nuclear Magnetic Resonance

Nuclear Magnetic Resonance (NMR) is a physical phenomena that can be used to represent qubits. The spin of atomic nucleus of molecules is perturbed by an oscillating magnetic field. A 2001 report describes successful implementation of Shor’s algorithm in a 7-qubit NMR quantum computer. An iconic result since this computer was able to factor the number 15.

### Superconducting Quantum Computers

One way to physically construct qubits is based on superconductors, materials that conduct electric current with zero resistance when exposed to temperatures close to absolute zero.

The Josephson effect, in which current flows across the junction of two superconductors separated by a non-superconducting material, is used to physically implement a superposition of states.

When a magnetic flux is applied to this junction, the current flows continuously in one direction. But, depending on the quantity of magnetic flux applied, the current can also flow in the opposite direction. There exists a quantum superposition of currents going both clockwise and counterclockwise leading to a physical implementation of a qubit called flux qubit. The complete device is known as the Superconducting Quantum Interference Device (SQUID) and can be easily coupled scaling the number of qubits. Thus, SQUIDs are like the transistors of a quantum computer.

Examples of superconducting computers are:

• D-wave’s adiabatic computers process quantum annealing for solving diverse optimization problems.
• Google’s 72-qubit computer was recently announced and also several engineering issues such as achieving lower temperatures.
• IBM’s IBM-Q Tokyo, a 20-qubit adiabatic computer, and IBM Q Experience, a cloud-based system for exploring quantum circuits.

IBM Q System

## The Imminent Threat of Quantum Algorithms

The quantum zoo website tracks problems that can be solved using quantum algorithms. As of mid-2018, more than 60 problems appear on this list, targeting diverse applications in the area of number theory, approximation, simulation, and searching. As terrific as it sounds, some easily-solvable problems by quantum computing are surrounding the security of information.

### Grover’s Algorithm

Tales of a quantum detective (fragment). A couple of detectives have the mission of finding one culprit in a group of suspects that always respond to this question honestly: “are you guilty?”.
The detective C follows a classic interrogative method and interviews every person one at a time, until finding the first one that confesses.
The detective Q proceeds in a different way, First gather all suspects in a completely dark room, and after that, the detective Q asks them — are you guilty? — A steady sound comes from the room saying “No!” while at the same time, a single voice mixed in the air responds “Yes!.” Since everybody is submerged in darkness, the detective cannot see the culprit. However, detective Q knows that, as long as the interrogation advances, the culprit will feel desperate and start to speak louder and louder, and so, he continues asking the same question. Suddenly, detective Q turns on the lights, enters into the room, and captures the culprit. How did he do it?

The task of the detective can be modeled as a searching problem. Given a Boolean function $$f$$ that takes N bits and produces one bit, to find the unique input $$x$$ such that $$f(x)=1$$.

A classical algorithm (detective C) finds $$x$$ using $$2^N-1$$ function evaluations in the worst case. However, the quantum algorithm devised by Grover, corresponding to detective Q, searches quadratically faster using around $$2^{N/2}$$ function evaluations.

The key intuition of Grover’s algorithm is increasing the amplitude of the state that represents the solution while maintaining the other states in a lower amplitude. In this way, a system of N qubits, which is a superposition of 2ᴺ possible inputs, can be continuously updated using this intuition until the solution state has an amplitude closer to 1. Hence, after updating the qubits many times, there will be a high probability to measure the solution state.

Initially, a superposition of 2ᴺ states (horizontal axis) is set, each state has an amplitude (vertical axis) close to 0. The qubits are updated so that the amplitude of the solution state increases more than the amplitude of other states. By repeating the update step, the amplitude of the solution state gets closer to 1, which boosts the probability of collapsing to the solution state after measuring.

Grover’s Algorithm (pseudo-code):

1. Prepare an N qubit $$|x\rangle$$ as a uniform superposition of 2ᴺ states.
2. Update the qubits by performing this core operation. $$|x\rangle \mapsto (-1)^{f(x)} |x\rangle$$ The result of $$f(x)$$ only flips the amplitude of the searched state.
3. Negate the N qubit over the average of the amplitudes.
4. Repeat Step 2 and 3 for $$(\tfrac{\pi}{4}) 2^{ N/2}$$ times.
5. Measure the qubit and return the bits obtained.

Alternatively, the second step can be better understood as a conditional statement:

IF f(x) = 1 THEN
Negate the amplitude of the solution state.
ELSE
/* nothing */
ENDIF


Grover’s algorithm considers function $$f$$ a black box, so with slight modifications, the algorithm can also be used to find collisions on the function. This implies that Grover’s algorithm can find a collision using an asymptotically less number of operations than using a brute-force algorithm.

The power of Grover’s algorithm can be turned against cryptographic hash functions. For instance, a quantum computer running Grover’s algorithm could find a collision on SHA256 performing only 2¹²⁸ evaluations of a reversible circuit of SHA256. The natural protection for hash functions is to increase the output size to double. More generally, most of symmetric key encryption algorithms will survive to the power of Grover’s algorithm by doubling the size of keys.

The scenario for public-key algorithms is devastating in face of Peter Shor’s algorithm.

### Shor’s Algorithm

Multiplying integers is an easy task to accomplish, however, finding the factors that compose an integer is difficult. The integer factorization problem is to decompose a given integer number into its prime factors. For example, 42 has three factors 2, 3, and 7 since $$2\times 3\times 7 = 42$$. As the numbers get bigger, integer factorization becomes more difficult to solve, and the hardest instances of integer factorization are when the factors are only two different large primes. Thus, given an integer number $$N$$, to find primes $$p$$ and $$q$$ such that $$N = p \times q$$, is known as integer splitting.

Factoring integers is like cutting wood, and the specific task of splitting integers is analogous to using an axe for splitting the log in two parts. There exist many different tools (algorithms) for accomplishing each task.

For integer factorization, trial division, the Rho method, the elliptic curve method are common algorithms. Fermat’s method, the quadratic- and rational-sieve, leads to the (general) number field sieve (NFS) algorithm for integer splitting. The latter relies on finding a congruence of squares, that is, splitting $$N$$ as a product of squares such that $$N = x^2 – y^2 = (x+y)\times(x-y)$$ The complexity of NFS is mainly attached to the number of pairs $$(x, y)$$ that must be examined before getting a pair that factors $$N$$. The NFS algorithm has subexponential complexity on the size of $$N$$, meaning that the time required for splitting an integer increases significantly as the size of $$N$$ grows. For large integers, the problem becomes intractable for classical computers.

### The Axe of Thor Shor

The many different guesses of the NFS algorithm are analogous to hitting the log using a dulled axe; after subexponential many tries, the log is cut by half. However, using a sharper axe allows you to split the log faster. This sharpened axe is the quantum algorithm proposed by Shor in 1994.

Let $$x$$ be an integer less than $$N$$ and of the order $$k$$. Then, if $$k$$ is even, there exists an integer $$q$$ so $$qN$$ can be factored as follows.

This approach has some issues. For example, the factorization could correspond to $$q$$ not $$N$$ and the order of $$x$$ is unknown, and here is where Shor’s algorithm enters the picture, finding the order of $$x$$.

The internals of Shor’s algorithm rely on encoding the order $$k$$ into a periodic function, so that its period can be obtained using the quantum version of the Fourier transform (QFT). The order of $$x$$ can be found using a polynomial number quantum evaluations of Shor’s algorithm. Therefore, splitting integers using this quantum approach has polynomial complexity on the size of $$N$$.

Shor’s algorithm carries strong implications on the security of the RSA encryption scheme because its security relies on integer factorization. A large-enough quantum computer can efficiently break RSA for current instances.

Alternatively, one may recur to elliptic curves, used in cryptographic protocols like ECDSA or ECDH. Moreover, all TLS ciphersuites use a combination of elliptic curve groups, large prime groups, and RSA and DSA signatures. Unfortunately, these algorithms all succumb to Shor’s algorithm. It only takes a few modifications for Shor’s algorithm to solve the discrete logarithm problem on finite groups. This sounds like a catastrophic story where all of our encrypted data and privacy are no longer secure with the advent of a quantum computer, and in some sense this is true.

On one hand, it is a fact that the quantum computers constructed as of 2019 are not large enough to run, for instance, Shor’s algorithm for the RSA key sizes used in standard protocols. For example, a 2018 report shows experiments on the factorization of a 19-bit number using 94 qubits, they also estimate that 147456 qubits are needed for factoring a 768-bit number. Hence, there numbers indicates that we are still far from breaking RSA.

What if we increment RSA key sizes to be resistant to quantum algorithms, just like for symmetric algorithms?

Bernstein et al. estimated that RSA public keys should be as large as 1 terabyte to maintain secure RSA even in the presence of quantum factoring algorithms. So, for public-key algorithms, increasing the size of keys does not help.

A recent investigation by Gidney and Ekerá shows improvements that accelerate the evaluation of quantum factorization. In their report, the cost of factoring 2048-bit integers is estimated to take a few hours using a quantum machine of 20 million qubits, which is far from any current development. Something worth noting is that the number of qubits needed is two orders of magnitude smaller than the estimated numbers given in previous works developed in this decade. Under these estimates, current encryption algorithms will remain secure several more years; however, consider the following not-so-unrealistic situation.

Information currently encrypted with for example, RSA, can be easily decrypted with a quantum computer in the future. Now, suppose that someone records encrypted information and stores them until a quantum computer is able to decrypt ciphertexts. Although this could be as far as 20 years from now, the forward-secrecy principle is violated. A 20-year gap to the future is sometimes difficult to imagine. So, let’s think backwards, what would happen if all you did on the Internet at the end of the 1990s can be revealed 20 years later — today. How does this impact the security of your personal information? What if the ciphertexts were company secrets or business deals? In 1999, most of us were concerned about the effects of the Y2K problem, now we’re facing Y2Q (years to quantum): the advent of quantum computers.

### Post-Quantum Cryptography

Although the current capacity of the physical implementation of quantum computers is far from a real threat to secure communications, a transition to use stronger problems to protect information has already started. This wave emerged as post-quantum cryptography (PQC). The core idea of PQC is finding algorithms difficult enough that no quantum (and classical) algorithm can solve them.

A recurrent question is: How does it look like a problem that even a quantum computer can not solve?

These so-called quantum-resistant algorithms rely on different hard mathematical assumptions; some of them as old as RSA, others more recently proposed. For example, McEliece cryptosystem, formulated in the late 70s, relies on the hardness of decoding a linear code (in the sense of coding theory). The practical use of this cryptosystem didn’t become widespread, since with the passing of time, other cryptosystems superseded in efficiency. Fortunately, McEliece cryptosystem remains immune to Shor’s algorithm, gaining it relevance in the post-quantum era.

Post-quantum cryptography presents alternatives:

As of 2017, the NIST started an evaluation process that tracks possible alternatives for next-generation secure algorithms. From a practical perspective, all candidates present different trade-offs in implementation and usage. The time and space requirements are diverse; at this moment, it’s too early to define which will succeed RSA and elliptic curves. An initial round collected 70 algorithms for deploying key encapsulation mechanisms and digital signatures. As of early 2019, 28 of these survive and are currently in the analysis, investigation, and experimentation phase.

Cloudflare’s mission is to help build a better Internet. As a proactive action, our cryptography team is preparing experiments on the deployment of post-quantum algorithms at Cloudflare scale. Watch our blog post for more details.

# The Quantum Menace

Post Syndicated from Armando Faz-Hernández original https://blog.cloudflare.com/the-quantum-menace/

Over the last few decades, the word ‘quantum’ has become increasingly popular. It is common to find articles, reports, and many people interested in quantum mechanics and the new capabilities and improvements it brings to the scientific community. This topic not only concerns physics, since the development of quantum mechanics impacts on several other fields such as chemistry, economics, artificial intelligence, operations research, and undoubtedly, cryptography.

This post begins a trio of blogs describing the impact of quantum computing on cryptography, and how to use stronger algorithms resistant to the power of quantum computing.

• This post introduces quantum computing and describes the main aspects of this new computing model and its devastating impact on security standards; it summarizes some approaches to securing information using quantum-resistant algorithms.
• Due to the relevance of this matter, we present our experiments on a large-scale deployment of quantum-resistant algorithms.
• Our third post introduces CIRCL, open-source Go library featuring optimized implementations of quantum-resistant algorithms and elliptic curve-based primitives.

All of this is part of Cloudflare’s Crypto Week 2019, now fasten your seatbelt and get ready to make a quantum leap.

### What is Quantum Computing?

Back in 1981, Richard Feynman raised the question about what kind of computers can be used to simulate physics. Although some physical systems can be simulated in a classical computer, the amount of resources used by such a computer can grow exponentially. Then, he conjectured the existence of a computer model that behaves under quantum mechanics rules, which opened a field of research now called quantum computing. To understand the basics of quantum computing, it is necessary to recall how classical computers work, and from that shine a spotlight on the differences between these computational models.

In 1936, Alan Turing and Emil Post independently described models that gave rise to the foundation of the computing model known as the Post-Turing machine, which describes how computers work and allowed further determination of limits for solving problems.

In this model, the units of information are bits, which store one of two possible values, usually denoted by 0 and 1. A computing machine contains a set of bits and performs operations that modify the values of the bits, also known as the machine’s state. Thus, a machine with N bits can be in one of 2ᴺ possible states. With this in mind, the Post-Turing computing model can be abstractly described as a machine of states, in which running a program is translated as machine transitions along the set of states.

A paper David Deutsch published in 1985 describes a computing model that extends the capabilities of a Turing machine based on the theory of quantum mechanics. This computing model introduces several advantages over the Turing model for processing large volumes of information. It also presents unique properties that deviate from the way we understand classical computing. Most of these properties come from the nature of quantum mechanics. We’re going to dive into these details before approaching the concept of quantum computing.

### Superposition

One of the most exciting properties of quantum computing that provides an advantage over the classical computing model is superposition. In physics, superposition is the ability to produce valid states from the addition or superposition of several other states that are part of a system.

Applying these concepts to computing information, it means that there is a system in which it is possible to generate a machine state that represents a (weighted) sum of the states 0 and 1; in this case, the term weighted means that the state can keep track of “the quantity of” 0 and 1 present in the state. In the classical computation model, one bit can only store either the state of 0 or 1, not both; even using two bits, they cannot represent the weighted sum of these states. Hence, to make a distinction from the basic states, quantum computing uses the concept of a quantum bit (qubit) — a unit of information to denote the superposition of two states. This is a cornerstone concept of quantum computing as it provides a way of tracking more than a single state per unit of information, making it a powerful tool for processing information.

So, a qubit represents the sum of two parts: the 0 or 1 state plus the amount each 0/1 state contributes to produce the state of the qubit.

In mathematical notation, qubit $$| \Psi \rangle$$ is an explicit sum indicating that a qubit represents the superposition of the states 0 and 1. This is the Dirac notation used to describe the value of a qubit $$| \Psi \rangle = A | 0 \rangle +B | 1 \rangle$$, where, A and B are complex numbers known as the amplitude of the states 0 and 1, respectively. The value of the basic states is represented by qubits as $$| 0 \rangle = 1 | 0 \rangle + 0 | 1 \rangle$$  and $$| 1 \rangle = 0 | 0 \rangle + 1 | 1 \rangle$$, respectively. The right side of the term contains the abbreviated notation for these special states.

### Measurement

In a classical computer, the values 0 and 1 are implemented as digital signals. Measuring the current of the signal automatically reveals the status of a bit. This means that at any moment the value of the bit can be observed or measured.

The state of a qubit is maintained in a physically closed system, meaning that the properties of the system, such as superposition, require no interaction with the environment; otherwise any interaction, like performing a measurement, can cause interference on the state of a qubit.

Measuring a qubit is a probabilistic experiment. The result is a bit of information that depends on the state of the qubit. The bit, obtained by measuring $$| \Psi \rangle = A | 0 \rangle +B | 1 \rangle$$, will be equal to 0 with probability $$|A|^2$$,  and equal to 1 with probability $$|B|^2$$, where $$|x|$$ represents the absolute value of $$x$$.

From Statistics, we know that the sum of probabilities of all possible events is always equal to 1, so it must hold that $$|A|^2 +|B|^2 =1$$. This last equation motivates to represent qubits as the points of a circle of radius one, and more generally, as the points on the surface of a sphere of radius one, which is known as the Bloch Sphere.

Let’s break it down: If you measure a qubit you also destroy the superposition of the qubit, resulting in a superposition state collapse, where it assumes one of the basics states, providing your final result.

Another way to think about superposition and measurement is through the coin tossing experiment.

Toss a coin in the air and you give people a random choice between two options: heads or tails. Now, don’t focus on the randomness of the experiment, instead note that while the coin is rotating in the air, participants are uncertain which side will face up when the coin lands. Conversely, once the coin stops with a random side facing up, participants are 100% certain of the status.

How does it relate? Qubits are similar to the participants. When a qubit is in a superposition of states, it is tracking the probability of heads or tails, which is the participants’ uncertainty quotient while the coin is in the air. However, once you start to measure the qubit to retrieve its value, the superposition vanishes, and a classical bit value sticks: heads or tails. Measurement is that moment when the coin is static with only one side facing up.

A fair coin is a coin that is not biased. Each side (assume 0=heads and 1=tails) of a fair coin has the same probability of sticking after a measurement is performed. The qubit $$\tfrac{1}{\sqrt{2}}|0\rangle + \tfrac{1}{\sqrt{2}}|1\rangle$$ describes the probabilities of tossing a fair coin. Note that squaring either of the amplitudes results in ½, indicating that there is a 50% chance either heads or tails sticks.

It would be interesting to be able to charge a fair coin at will while it is in the air. Although this is the magic of a professional illusionist, this task, in fact, can be achieved by performing operations over qubits. So, get ready to become the next quantum magician!

### Quantum Gates

A logic gate represents a Boolean function operating over a set of inputs (on the left) and producing an output (on the right). A logic circuit is a set of connected logic gates, a convenient way to represent bit operations.

Other gates are AND, OR, XOR, and NAND, and more. A set of gates is universal if it can generate other gates. For example, NOR and NAND gates are universal since any circuit can be constructed using only these gates.

Quantum computing also admits a description using circuits. Quantum gates operate over qubits, modifying the superposition of the states. For example, there is a quantum gate analogous to the NOT gate, the X gate.

The X quantum gate interchanges the amplitudes of the states of the input qubit.

The Z quantum gate flips the sign’s amplitude of state 1:

Another quantum gate is the Hadamard gate, which generates an equiprobable superposition of the basic states.

Using our coin tossing analogy, the Hadamard gate has the action of tossing a fair coin to the air. In quantum circuits, a triangle represents measuring a qubit, and the resulting bit is indicated by a double-wire.

Other gates, such as the CNOT gate, Pauli’s gates, Toffoli gate, Deutsch gate, are slightly more advanced. Quirk, the open-source playground, is a fun sandbox where you can construct quantum circuits using all of these gates.

### Reversibility

An operation is reversible if there exists another operation that rolls back the output state to the initial state. For instance, a NOT gate is reversible since applying a second NOT gate recovers the initial input.

In contrast, AND, OR, NAND gates are not reversible. This means that some classical computations cannot be reversed by a classic circuit that uses only the output bits. However, if you insert additional bits of information, the operation can be reversed.

Quantum computing mainly focuses on reversible computations, because there’s always a way to construct a reversible circuit to perform an irreversible computation. The reversible version of a circuit could require the use of ancillary qubits as auxiliary (but not temporary) variables.

Due to the nature of composed systems, it could be possible that these ancillas (extra qubits) correlate to qubits of the main computation. This correlation makes it infeasible to reuse ancillas since any modification could have the side-effect on the operation of a reversible circuit. This is like memory assigned to a process by the operating system: the process cannot use memory from other processes or it could cause memory corruption, and processes cannot release their assigned memory to other processes. You could use garbage collection mechanisms for ancillas, but performing reversible computations increases your qubit budget.

### Composed Systems

In quantum mechanics, a single qubit can be described as a single closed system: a system that has no interaction with the environment nor other qubits. Letting qubits interact with others leads to a composed system where more states are represented. The state of a 2-qubit composite system is denoted as $$A_0|00\rangle+A_1|01\rangle+A_2|10\rangle+A_3|11\rangle$$, where, $$A_i$$ values correspond to the amplitudes of the four basic states 00, 01, 10, and 11. This qubit $$\tfrac{1}{2}|00\rangle+\tfrac{1}{2}|01\rangle+\tfrac{1}{2}|10\rangle+\tfrac{1}{2}|11\rangle$$ represents the superposition of these basic states, both having the same probability obtained after measuring the two qubits.

In the classical case, the state of N bits represents only one of 2ᴺ possible states, whereas a composed state of N qubits represents all the 2ᴺ states but in superposition. This is one big difference between these computing models as it carries two important properties: entanglement and quantum parallelism.

### Entanglement

According to the theory behind quantum mechanics, some composed states can be described through the description of its constituents. However, there are composed states where no description is possible, known as entangled states.

The entanglement phenomenon was pointed out by Einstein, Podolsky, and Rosen in the so-called EPR paradox. Suppose there is a composed system of two entangled qubits, in which by performing a measurement in one qubit causes interference in the measurement of the second. This interference occurs even when qubits are separated by a long distance, which means that some information transfer happens faster than the speed of light. This is how quantum entanglement conflicts with the theory of relativity, where information cannot travel faster than the speed of light. The EPR paradox motivated further investigation for deriving new interpretations about quantum mechanics and aiming to resolve the paradox.

Quantum entanglement can help to transfer information at a distance by following a communication protocol. The following protocol examples rely on the fact that Alice and Bob separately possess one of two entangled qubits:

• The superdense coding protocol allows Alice to communicate a 2-bit message $$m_0,m_1$$ to Bob using a quantum communication channel, for example, using fiber optics to transmit photons. All Alice has to do is operate on her qubit according to the value of the message and send the resulting qubit to Bob. Once Bob receives the qubit, he measures both qubits, noting that the collapsed 2-bit state corresponds to Alice’s message.

• The quantum teleportation protocol allows Alice to transmit a qubit to Bob without using a quantum communication channel. Alice measures the qubit to send Bob and her entangled qubit resulting in two bits. Alice sends these bits to Bob, who operates on his entangled qubit according to the bits received and notes that the result state matches the original state of Alice’s qubit.

### Quantum Parallelism

Composed systems of qubits allow representation of more information per composed state. Note that operating on a composed state of N qubits is equivalent to operating over a set of 2ᴺ states in superposition. This procedure is quantum parallelism. In this setting, operating over a large volume of information gives the intuition of performing operations in parallel, like in the parallel computing paradigm; one big caveat is that superposition is not equivalent to parallelism.

Remember that a composed state is a superposition of several states so, a computation that takes a composed state of inputs will result in a composed state of outputs. The main divergence between classical and quantum parallelism is that quantum parallelism can obtain only one of the processed outputs. Observe that a measurement in the output of a composed state causes that the qubits collapse to only one of the outputs, making it unattainable to calculate all computed values.

Although quantum parallelism does not match precisely with the traditional notion of parallel computing, you can still leverage this computational power to get related information.

Deutsch-Jozsa Problem: Assume $$F$$ is a function that takes as input N bits, outputs one bit, and is either constant (always outputs the same value for all inputs) or balanced (outputs 0 for half of the inputs and 1 for the other half). The problem is to determine if $$F$$ is constant or balanced.

The quantum algorithm that solves the Deutsch-Jozsa problem uses quantum parallelism. First, N qubits are initialized in a superposition of 2ᴺ states. Then, in a single shot, it evaluates $$F$$ for all of these states.

The result of applying $$F$$ appears (in the exponent) of the amplitude of the all-zero state. Note that only when $$F$$ is constant is this amplitude, either +1 or -1. If the result of measuring the N qubit is an all-zeros bitstring, then there is a 100% certainty that $$F$$ is constant. Any other result indicates that $$F$$ is balanced.

A deterministic classical algorithm solves this problem using $$2^{N-1}+1$$ evaluations of $$F$$ in the worst case. Meanwhile, the quantum algorithm requires only one evaluation. The Deutsch-Jozsa problem exemplifies the exponential advantage of a quantum algorithm over classical algorithms.

## Quantum Computers

The theory of quantum computing is supported by investigations in the field of quantum mechanics. However, constructing a quantum machine requires a physical system that allows representing qubits and manipulation of states in a reliable and precise way.

The DiVincenzo Criteria require that a physical implementation of a quantum computer must:

1. Be scalable and have well-defined qubits.
2. Be able to initialize qubits to a state.
3. Have long decoherence times to apply quantum error-correcting codes. Decoherence of a qubit happens when the qubit interacts with the environment, for example, when a measurement is performed.
4. Use a universal set of quantum gates.
5. Be able to measure single qubits without modifying others.

Quantum computer physical implementations face huge engineering obstacles to satisfy these requirements. The most important challenge is to guarantee low error rates during computation and measurement. Lowering these rates require techniques for error correction, which add a significant number of qubits specialized on this task. For this reason, the number of qubits of a quantum computer should not be regarded as for classical systems. In a classical computer, the bits of a computer are all effective for performing a calculation, whereas the number of qubits is the sum of the effective qubits (those used to make calculations) plus the ancillas (used for reversible computations) plus the error correction qubits.

Current implementations of quantum computers partially satisfy the DiVincenzo criteria. Quantum adiabatic computers fit in this category since they do not operate using quantum gates. For this reason, they are not considered to be universal quantum computers.

A recurrent problem in optimization is to find the global minimum of an objective function. For example, a route-traffic control system can be modeled as a function that reduces the cost of routing to a minimum. Simulated annealing is a heuristic procedure that provides a good solution to these types of problems. Simulated annealing finds the solution state by slowly introducing changes (the adiabatic process) on the variables that govern the system.

Quantum annealing is the analogous quantum version of simulated annealing. A qubit is initialized into a superposition of states representing all possible solutions to the problem. Here is used the Hamiltonian operator, which is the sum of vectors of potential and kinetic energies of the system. Hence, the objective function is encoded using this operator describing the evolution of the system in correspondence with time. Then, if the system is allowed to evolve very slowly, it will eventually land on a final state representing the optimal value of the objective function.

Currently, there exist adiabatic computers in the market, such as the D-Wave and IBM Q systems, featuring hundreds of qubits; however, their capabilities are somewhat limited to some problems that can be modeled as optimization problems. The limits of adiabatic computers were studied by van Dam et al, showing that despite solving local searching problems and even some instances of the max-SAT problem, there exists harder searching problems this computing model cannot efficiently solve.

### Nuclear Magnetic Resonance

Nuclear Magnetic Resonance (NMR) is a physical phenomena that can be used to represent qubits. The spin of atomic nucleus of molecules is perturbed by an oscillating magnetic field. A 2001 report describes successful implementation of Shor’s algorithm in a 7-qubit NMR quantum computer. An iconic result since this computer was able to factor the number 15.

### Superconducting Quantum Computers

One way to physically construct qubits is based on superconductors, materials that conduct electric current with zero resistance when exposed to temperatures close to absolute zero.

The Josephson effect, in which current flows across the junction of two superconductors separated by a non-superconducting material, is used to physically implement a superposition of states.

When a magnetic flux is applied to this junction, the current flows continuously in one direction. But, depending on the quantity of magnetic flux applied, the current can also flow in the opposite direction. There exists a quantum superposition of currents going both clockwise and counterclockwise leading to a physical implementation of a qubit called flux qubit. The complete device is known as the Superconducting Quantum Interference Device (SQUID) and can be easily coupled scaling the number of qubits. Thus, SQUIDs are like the transistors of a quantum computer.

Examples of superconducting computers are:

• D-wave’s adiabatic computers process quantum annealing for solving diverse optimization problems.
• Google’s 72-qubit computer was recently announced and also several engineering issues such as achieving lower temperatures.
• IBM’s IBM-Q Tokyo, a 20-qubit adiabatic computer, and IBM Q Experience, a cloud-based system for exploring quantum circuits.

IBM Q System

## The Imminent Threat of Quantum Algorithms

The quantum zoo website tracks problems that can be solved using quantum algorithms. As of mid-2018, more than 60 problems appear on this list, targeting diverse applications in the area of number theory, approximation, simulation, and searching. As terrific as it sounds, some easily-solvable problems by quantum computing are surrounding the security of information.

### Grover’s Algorithm

Tales of a quantum detective (fragment). A couple of detectives have the mission of finding one culprit in a group of suspects that always respond to this question honestly: “are you guilty?”.
The detective C follows a classic interrogative method and interviews every person one at a time, until finding the first one that confesses.
The detective Q proceeds in a different way, First gather all suspects in a completely dark room, and after that, the detective Q asks them — are you guilty? — A steady sound comes from the room saying “No!” while at the same time, a single voice mixed in the air responds “Yes!.” Since everybody is submerged in darkness, the detective cannot see the culprit. However, detective Q knows that, as long as the interrogation advances, the culprit will feel desperate and start to speak louder and louder, and so, he continues asking the same question. Suddenly, detective Q turns on the lights, enters into the room, and captures the culprit. How did he do it?

The task of the detective can be modeled as a searching problem. Given a Boolean function $$f$$ that takes N bits and produces one bit, to find the unique input $$x$$ such that $$f(x)=1$$.

A classical algorithm (detective C) finds $$x$$ using $$2^N-1$$ function evaluations in the worst case. However, the quantum algorithm devised by Grover, corresponding to detective Q, searches quadratically faster using around $$2^{N/2}$$ function evaluations.

The key intuition of Grover’s algorithm is increasing the amplitude of the state that represents the solution while maintaining the other states in a lower amplitude. In this way, a system of N qubits, which is a superposition of 2ᴺ possible inputs, can be continuously updated using this intuition until the solution state has an amplitude closer to 1. Hence, after updating the qubits many times, there will be a high probability to measure the solution state.

Initially, a superposition of 2ᴺ states (horizontal axis) is set, each state has an amplitude (vertical axis) close to 0. The qubits are updated so that the amplitude of the solution state increases more than the amplitude of other states. By repeating the update step, the amplitude of the solution state gets closer to 1, which boosts the probability of collapsing to the solution state after measuring.

Grover’s Algorithm (pseudo-code):

1. Prepare an N qubit $$|x\rangle$$ as a uniform superposition of 2ᴺ states.
2. Update the qubits by performing this core operation. $$|x\rangle \mapsto (-1)^{f(x)} |x\rangle$$ The result of $$f(x)$$ only flips the amplitude of the searched state.
3. Negate the N qubit over the average of the amplitudes.
4. Repeat Step 2 and 3 for $$(\tfrac{\pi}{4}) 2^{ N/2}$$ times.
5. Measure the qubit and return the bits obtained.

Alternatively, the second step can be better understood as a conditional statement:

IF f(x) = 1 THEN
Negate the amplitude of the solution state.
ELSE
/* nothing */
ENDIF


Grover’s algorithm considers function $$f$$ a black box, so with slight modifications, the algorithm can also be used to find collisions on the function. This implies that Grover’s algorithm can find a collision using an asymptotically less number of operations than using a brute-force algorithm.

The power of Grover’s algorithm can be turned against cryptographic hash functions. For instance, a quantum computer running Grover’s algorithm could find a collision on SHA256 performing only 2¹²⁸ evaluations of a reversible circuit of SHA256. The natural protection for hash functions is to increase the output size to double. More generally, most of symmetric key encryption algorithms will survive to the power of Grover’s algorithm by doubling the size of keys.

The scenario for public-key algorithms is devastating in face of Peter Shor’s algorithm.

### Shor’s Algorithm

Multiplying integers is an easy task to accomplish, however, finding the factors that compose an integer is difficult. The integer factorization problem is to decompose a given integer number into its prime factors. For example, 42 has three factors 2, 3, and 7 since $$2\times 3\times 7 = 42$$. As the numbers get bigger, integer factorization becomes more difficult to solve, and the hardest instances of integer factorization are when the factors are only two different large primes. Thus, given an integer number $$N$$, to find primes $$p$$ and $$q$$ such that $$N = p \times q$$, is known as integer splitting.

Factoring integers is like cutting wood, and the specific task of splitting integers is analogous to using an axe for splitting the log in two parts. There exist many different tools (algorithms) for accomplishing each task.

For integer factorization, trial division, the Rho method, the elliptic curve method are common algorithms. Fermat’s method, the quadratic- and rational-sieve, leads to the (general) number field sieve (NFS) algorithm for integer splitting. The latter relies on finding a congruence of squares, that is, splitting $$N$$ as a product of squares such that $$N = x^2 – y^2 = (x+y)\times(x-y)$$ The complexity of NFS is mainly attached to the number of pairs $$(x, y)$$ that must be examined before getting a pair that factors $$N$$. The NFS algorithm has subexponential complexity on the size of $$N$$, meaning that the time required for splitting an integer increases significantly as the size of $$N$$ grows. For large integers, the problem becomes intractable for classical computers.

### The Axe of Thor Shor

The many different guesses of the NFS algorithm are analogous to hitting the log using a dulled axe; after subexponential many tries, the log is cut by half. However, using a sharper axe allows you to split the log faster. This sharpened axe is the quantum algorithm proposed by Shor in 1994.

Let $$x$$ be an integer less than $$N$$ and of the order $$k$$. Then, if $$k$$ is even, there exists an integer $$q$$ so $$qN$$ can be factored as follows.

This approach has some issues. For example, the factorization could correspond to $$q$$ not $$N$$ and the order of $$x$$ is unknown, and here is where Shor’s algorithm enters the picture, finding the order of $$x$$.

The internals of Shor’s algorithm rely on encoding the order $$k$$ into a periodic function, so that its period can be obtained using the quantum version of the Fourier transform (QFT). The order of $$x$$ can be found using a polynomial number quantum evaluations of Shor’s algorithm. Therefore, splitting integers using this quantum approach has polynomial complexity on the size of $$N$$.

Shor’s algorithm carries strong implications on the security of the RSA encryption scheme because its security relies on integer factorization. A large-enough quantum computer can efficiently break RSA for current instances.

Alternatively, one may recur to elliptic curves, used in cryptographic protocols like ECDSA or ECDH. Moreover, all TLS ciphersuites use a combination of elliptic curve groups, large prime groups, and RSA and DSA signatures. Unfortunately, these algorithms all succumb to Shor’s algorithm. It only takes a few modifications for Shor’s algorithm to solve the discrete logarithm problem on finite groups. This sounds like a catastrophic story where all of our encrypted data and privacy are no longer secure with the advent of a quantum computer, and in some sense this is true.

On one hand, it is a fact that the quantum computers constructed as of 2019 are not large enough to run, for instance, Shor’s algorithm for the RSA key sizes used in standard protocols. For example, a 2018 report shows experiments on the factorization of a 19-bit number using 94 qubits, they also estimate that 147456 qubits are needed for factoring a 768-bit number. Hence, there numbers indicates that we are still far from breaking RSA.

What if we increment RSA key sizes to be resistant to quantum algorithms, just like for symmetric algorithms?

Bernstein et al. estimated that RSA public keys should be as large as 1 terabyte to maintain secure RSA even in the presence of quantum factoring algorithms. So, for public-key algorithms, increasing the size of keys does not help.

A recent investigation by Gidney and Ekerá shows improvements that accelerate the evaluation of quantum factorization. In their report, the cost of factoring 2048-bit integers is estimated to take a few hours using a quantum machine of 20 million qubits, which is far from any current development. Something worth noting is that the number of qubits needed is two orders of magnitude smaller than the estimated numbers given in previous works developed in this decade. Under these estimates, current encryption algorithms will remain secure several more years; however, consider the following not-so-unrealistic situation.

Information currently encrypted with for example, RSA, can be easily decrypted with a quantum computer in the future. Now, suppose that someone records encrypted information and stores them until a quantum computer is able to decrypt ciphertexts. Although this could be as far as 20 years from now, the forward-secrecy principle is violated. A 20-year gap to the future is sometimes difficult to imagine. So, let’s think backwards, what would happen if all you did on the Internet at the end of the 1990s can be revealed 20 years later — today. How does this impact the security of your personal information? What if the ciphertexts were company secrets or business deals? In 1999, most of us were concerned about the effects of the Y2K problem, now we’re facing Y2Q (years to quantum): the advent of quantum computers.

### Post-Quantum Cryptography

Although the current capacity of the physical implementation of quantum computers is far from a real threat to secure communications, a transition to use stronger problems to protect information has already started. This wave emerged as post-quantum cryptography (PQC). The core idea of PQC is finding algorithms difficult enough that no quantum (and classical) algorithm can solve them.

A recurrent question is: How does it look like a problem that even a quantum computer can not solve?

These so-called quantum-resistant algorithms rely on different hard mathematical assumptions; some of them as old as RSA, others more recently proposed. For example, McEliece cryptosystem, formulated in the late 70s, relies on the hardness of decoding a linear code (in the sense of coding theory). The practical use of this cryptosystem didn’t become widespread, since with the passing of time, other cryptosystems superseded in efficiency. Fortunately, McEliece cryptosystem remains immune to Shor’s algorithm, gaining it relevance in the post-quantum era.

Post-quantum cryptography presents alternatives:

As of 2017, the NIST started an evaluation process that tracks possible alternatives for next-generation secure algorithms. From a practical perspective, all candidates present different trade-offs in implementation and usage. The time and space requirements are diverse; at this moment, it’s too early to define which will succeed RSA and elliptic curves. An initial round collected 70 algorithms for deploying key encapsulation mechanisms and digital signatures. As of early 2019, 28 of these survive and are currently in the analysis, investigation, and experimentation phase.

Cloudflare’s mission is to help build a better Internet. As a proactive action, our cryptography team is preparing experiments on the deployment of post-quantum algorithms at Cloudflare scale. Watch our blog post for more details.

# Towards Post-Quantum Cryptography in TLS

Post Syndicated from Kris Kwiatkowski original https://blog.cloudflare.com/towards-post-quantum-cryptography-in-tls/

We live in a completely connected society. A society connected by a variety of devices: laptops, mobile phones, wearables, self-driving or self-flying things. We have standards for a common language that allows these devices to communicate with each other. This is critical for wide-scale deployment – especially in cryptography where the smallest detail has great importance.

One of the most important standards-setting organizations is the National Institute of Standards and Technology (NIST), which is hugely influential in determining which standardized cryptographic systems see worldwide adoption. At the end of 2016, NIST announced it would hold a multi-year open project with the goal of standardizing new post-quantum (PQ) cryptographic algorithms secure against both quantum and classical computers.

Many of our devices have very different requirements and capabilities, so it may not be possible to select a “one-size-fits-all” algorithm during the process. NIST mathematician, Dustin Moody, indicated that institute will likely select more than one algorithm:

“There are several systems in use that could be broken by a quantum computer – public-key encryption and digital signatures, to take two examples – and we will need different solutions for each of those systems.”

Initially, NIST selected 82 candidates for further consideration from all submitted algorithms. At the beginning of 2019, this process entered its second stage. Today, there are 26 algorithms still in contention.

### Post-quantum cryptography: what is it really and why do I need it?

In 1994, Peter Shor made a significant discovery in quantum computation. He found an algorithm for integer factorization and computing discrete logarithms, both believed to be hard to solve in classical settings. Since then it has become clear that the ‘hard problems’ on which cryptosystems like RSA and elliptic curve cryptography (ECC) rely – integer factoring and computing discrete logarithms, respectively – are efficiently solvable with quantum computing.

A quantum computer can help to solve some of the problems that are intractable on a classical computer. In theory, they could efficiently solve some fundamental problems in mathematics. This amazing computing power would be highly beneficial, which is why companies are actually trying to build quantum computers. At first, Shor’s algorithm was merely a theoretical result – quantum computers powerful enough to execute it did not exist – but this is quickly changing. In March 2018, Google announced a 72-qubit universal quantum computer. While this is not enough to break say RSA-2048 (still more is needed), many fundamental problems have already been solved.

In anticipation of wide-spread quantum computing, we must start the transition from classical public-key cryptography primitives to post-quantum (PQ) alternatives. It may be that consumers will never get to hold a quantum computer, but a few powerful attackers who will get one can still pose a serious threat. Moreover, under the assumption that current TLS handshakes and ciphertexts are being captured and stored, a future attacker could crack these stored individual session keys and use those results to decrypt the corresponding individual ciphertexts. Even strong security guarantees, like forward secrecy, do not help out much there.

In 2006, the academic research community launched a conference series dedicated to finding alternatives to RSA and ECC. This so-called post-quantum cryptography should run efficiently on a classical computer, but it should also be secure against attacks performed by a quantum computer. As a research field, it has grown substantially in popularity.

Several companies, including Google, Microsoft, Digicert and Thales, are already testing the impact of deploying PQ cryptography. Cloudflare is involved in some of this, but we want to be a company that leads in this direction. The first thing we need to do is understand the real costs of deploying PQ cryptography, and that’s not obvious at all.

### What options do we have?

Many submissions to the NIST project are still under study. Some are very new and little understood; others are more mature and already standardized as RFCs. Some have been broken or withdrawn from the process; others are more conservative or illustrate how far classical cryptography would need to be pushed so that a quantum computer could not crack it within a reasonable cost. Some are very slow and big; others are not. But most cryptographic schemes can be categorized into these families: lattice-based, multivariate, hash-based (signatures only), code-based and isogeny-based.

For some algorithms, nevertheless, there is a fear they may be too inconvenient to use with today’s Internet. We must also be able to integrate new cryptographic schemes with existing protocols, such as SSH or TLS. To do that, designers of PQ cryptosystems must consider these characteristics:

• Latency caused by encryption and decryption on both ends of the communication channel, assuming a variety of devices from big and fast servers to slow and memory constrained IoT (Internet of Things) devices
• Small public keys and signatures to minimize bandwidth
• Clear design that allows cryptanalysis and determining weaknesses that could be exploited
• Use of existing hardware for fast implementation

The work on post-quantum public key cryptosystems must be done in a full view of organizations, governments, cryptographers, and the public. Emerging ideas must be properly vetted by this community to ensure widespread support.

### Helping Build a Better Internet

To better understand the post-quantum world, Cloudflare began experimenting with these algorithms and used them to provide confidentiality in TLS connections.

With Google, we are proposing a wide-scale experiment that combines client- and server-side data collection to evaluate the performance of key-exchange algorithms on actual users’ devices. We hope that this experiment helps choose an algorithm with the best characteristics for the future of the Internet. With Cloudflare’s highly distributed network of access points and Google’s Chrome browser, both companies are in a very good position to perform this experiment.

Our goal is to understand how these algorithms act when used by real clients over real networks, particularly candidate algorithms with significant differences in public-key or ciphertext sizes. Our focus is on how different key sizes affect handshake time in the context of Transport Layer Security (TLS) as used on the web over HTTPS.

Our primary candidates are an NTRU-based construction called HRSS-SXY (by Hülsing – Rijneveld – Schanck – Schwabe, and Tsunekazu Saito – Keita Xagawa – Takashi Yamakawa) and an isogeny-based Supersingular Isogeny Key Encapsulation (SIKE). Details of both algorithms are described in more detail below in section “Dive into post-quantum cryptography”. This table shows a few characteristics for both algorithms. Performance timings were obtained by running the BoringSSL speed test on an Intel Skylake CPU.

KEMPublic Key size (bytes)Ciphertext (bytes)Secret size (bytes)KeyGen (op/sec)Encaps (op/sec)Decaps (op/sec)NIST level
SIKE/p43433034616367.1228.0209.31

Currently the most commonly used key exchange algorithm (according to Cloudflare’s data) is the non-quantum X25519. Its public keys are 32 bytes and BoringSSL can generate 49301.2 key pairs, and is able to perform 19628.6 key agreements every second on my Skylake CPU.

Note that HRSS-SXY shows a significant speed advantage, while SIKE has a size advantage. In our experiment, we will deploy these two algorithms on both the server side using Cloudflare’s infrastructure, and the client side using Chrome Canary; both sides will collect telemetry information about TLS handshakes using these two PQ algorithms to see how they perform in practice.

### What do we expect to find?

In 2018, Adam Langley conducted an experiment with the goal of evaluating the likely latency impact of a post-quantum key exchange in TLS. Chrome was augmented with the ability to include a dummy, arbitrarily-sized extension in the TLS ClientHello (fixed number of bytes of random noise). After taking into account the performance and key size offered by different types key-exchange schemes, he concluded that constructs based on structured lattices may be most suitable for future use in TLS.

However, Langley also observed a peculiar phenomenon; client connections measured at 95th percentile had much higher latency than the median. It means that in those cases, isogeny-based systems may be a better choice. In the “Dive into post-quantum cryptography”, we describe the difference between isogeny-based SIKE and lattice-based NTRU cryptosystems.

In our experiment, we want to more thoroughly evaluate and ascribe root causes to these unexpected latency increases. We would particularly like to learn more about the characteristics of those networks: what causes increased latency? how does the performance cost of isogeny-based algorithms impact the TLS handshake? We want to answer key questions, like:

• What is a good ratio for speed-to-key size (or how much faster could SIKE get to achieve the client-perceived performance of HRSS)?
• How do network middleboxes behave when clients use new PQ algorithms, and which networks have problematic middleboxes?
• How do the different properties of client networks affect TLS performance with different PQ key exchanges? Can we identify specific autonomous systems, device configurations, or network configurations that favor one algorithm over another? How is performance affected in the long tail?

### Experiment Design

Our experiment will involve both server- and client-side performance statistics collection from real users around the world (all the data is anonymized). Cloudflare is operating the server-side TLS connections. We will enable the CECPQ2 (HRSS + X25519) and CECPQ2b (SIKE + X25519) key-agreement algorithms on all TLS-terminating edge servers.

In this experiment, the ClientHello will contain a CECPQ2 or CECPQ2b public key (but never both). Additionally, Chrome will always include X25519 for servers that do not support post-quantum key exchange. The post-quantum key exchange will only be negotiated in TLS version 1.3 when both sides support it.

Since Cloudflare only measures the server side of the connection, it is impossible to determine the time it takes for a ClientHello sent from Chrome to reach Cloudflare’s edge servers; however, we can measure the time it takes for the TLS ServerHello message containing post-quantum key exchange, to reach the client and for the client to respond.

On the client side, Chrome Canary will operate the TLS connection. Google will enable either CECPQ2 or CECPQ2b in Chrome for the following mix of architecture and OSes:

• x86-64: Windows, Linux, macOS, ChromeOS
• aarch64: Android

Our high-level expectation is to get similar results as Langley’s original experiment in 2018 — slightly increased latency for the 50th percentile and higher latency for the 95th. Unfortunately, data collected purely from real users’ connections may not suffice for diagnosing the root causes of why some clients experience excessive slowdown. To this end, we will perform follow-up experiments based on per-client information we collect server-side.

Our primary hypothesis is that excessive slowdowns, like those Langley observed, are largely due to in-network events, such as middleboxes or bloated/lossy links. As a first-pass analysis, we will investigate whether the slowed-down clients share common network features, like common ASes, common transit networks, common link types, and so on. To determine this, we will run a traceroute from vantage points close to our servers back toward the clients (not overloading any particular links or hosts) and study whether some client locations are subject to slowdowns for all destinations or just for some.

### Dive into post-quantum cryptography

Be warned: the details of PQ cryptography may be quite complicated. In some cases it builds on classical cryptography, and in other cases it is completely different math. It would be rather hard to describe details in a single blog post. Instead, we are giving you an intuition of post-quantum cryptography, rather than provide deep academic-level descriptions. We’re skipping a lot of details for the sake of brevity. Nevertheless, settle in for a bit of an epic journey because we have a lot to cover.

### Key encapsulation mechanism

NIST requires that all key-agreement algorithms have a form of key-encapsulation mechanism (KEM). The KEM is a simplified form of public key encryption (PKE). As PKE, it also allows agreement on a secret, but in a slightly different way. The idea is that the session key is an output of the encryption algorithm, conversely to public key encryption schemes where session key is an input to the algorithm. In a KEM, Alice generates a random key and uses the pre-generated public key from Bob to encrypt (encapsulate) it. This results in a ciphertext sent to Bob. Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the random key. The idea was initially introduced by Cramer and Shoup. Experience shows that such constructs are easier to design, analyze, and implement as the scheme is limited to communicating a fixed-size session key. Leonardo Da Vinci said, “Simplicity is the ultimate sophistication,” which is very true in cryptography.

The key exchange (KEX) protocol, like Diffie-Hellman, is yet a different construct: it allows two parties to agree on a shared secret that can be used as a symmetric encryption key. For example, Alice generates a key pair and sends a public key to Bob. Bob does the same and uses his own key pair with Alice’s public key to generate the shared secret. He then sends his public key to Alice who can now generate the same shared secret. What’s worth noticing is that both Alice and Bob perform exactly the same operations.

KEM construction can be converted to KEX. Alice performs key generation and sends the public key to Bob. Bob uses it to encapsulate a symmetric session key and sends it back to Alice. Alice decapsulates the ciphertext received from Bob and gets the symmetric key. This is actually what we do in our experiment to make integration with the TLS protocol less complicated.

### NTRU Lattice-based Encryption

We will enable the CECPQ2 implemented by Adam Langley from Google on our servers. He described this implementation in detail here. This key exchange uses the HRSS algorithm, which is based on the NTRU (N-Th Degree TRUncated Polynomial Ring) algorithm. Foregoing too much detail, I am going to explain how NTRU works and give simplified examples, and finally, compare it to HRSS.

NTRU is a cryptosystem based on a polynomial ring. This means that we do not operate on numbers modulo a prime (like in RSA), but on polynomials of degree $$N$$ , where the degree of a polynomial is the highest exponent of its variable. For example, $$x^7 + 6x^3 + 11x^2$$ has degree of 7.

One can add polynomials in the ring in the usual way, by simply adding theirs coefficients modulo some integer. In NTRU this integer is called $$q$$. Polynomials can also be multiplied, but remember, you are operating in the ring, therefore the result of a multiplication is always a polynomial of degree less than $$N$$. It basically means that exponents of the resulting polynomial are added to modulo $$N$$.

In other words, polynomial ring arithmetic is very similar to modular arithmetic, but instead of working with a set of numbers less than N, you are working with a set of polynomials with a degree less than N.

To instantiate the NTRU cryptosystem, three domain parameters must be chosen:

• $$N$$ – degree of the polynomial ring, in NTRU the principal objects are polynomials of degree $$N-1$$.
• $$p$$ – small modulus used during key generation and decryption for reducing message coefficients.
• $$q$$ – large modulus used during algorithm execution for reducing coefficients of the polynomials.

First, we generate a pair of public and private keys. To do that, two polynomials $$f$$ and $$g$$ are chosen from the ring in a way that their randomly generated coefficients are much smaller than $$q$$. Then key generation computes two inverses of the polynomial: $$f_p= f^{-1} \bmod{p} \\ f_q= f^{-1} \bmod{q}$$

The last step is to compute $$pk = p\cdot f_q\cdot g \bmod q$$, which we will use as public key pk. The private key consists of $$f$$ and $$f_p$$. The $$f_q$$ is not part of any key, however it must remain secret.

It might be the case that after choosing $$f$$, the inverses modulo $$p$$ and $$q$$ do not exist. In this case, the algorithm has to start from the beginning and generate another $$f$$. That’s unfortunate because calculating the inverse of a polynomial is a costly operation. HRSS brings an improvement to this issue since it ensures that those inverses always exist, making key generation faster than as proposed initially in NTRU.

The encryption of a message $$m$$ proceeds as follows. First, the message $$m$$ is converted to a ring element $$pt$$ (there exists an algorithm for performing this conversion in both directions). During encryption, NTRU randomly chooses one polynomial $$b$$ called blinder. The goal of the blinder is to generate different ciphertexts per encyption. Thus, the ciphetext $$ct$$ is obtained as $$ct = (b\cdot pk + pt ) \bmod q$$ Decryption looks a bit more complicated but it can also be easily understood. Decryption uses both the secret value $$f$$ and to recover the plaintext as $$v = f \cdot ct \bmod q \\ pt = v \cdot f_p \bmod p$$

This diagram demonstrates why and how decryption works.

After obtaining $$pt$$, the message $$m$$ is recovered by inverting the conversion function.

The underlying hard assumption is that given two polynomials: $$f$$ and $$g$$ whose coefficients are short compared to the modulus $$q$$, it is difficult to distinguish $$pk = \frac{f}{g}$$ from a random element in the ring. It means that it’s hard to find $$f$$ and $$g$$ given only public key pk.

### Lattices

NTRU cryptosystem is a grandfather of lattice-based encryption schemes. The idea of using  difficult problems for cryptographic purposes was due to Ajtai. His work evolved into a whole area of research with the goal of creating more practical, lattice-based cryptosystems.

### What is a lattice and why it can be used for post-quantum crypto?

The picture below visualizes lattice as points in a two-dimensional space. A lattice is defined by the origin $$O$$ and base vectors $$\{ b_1 , b_2\}$$. Every point on the lattice is represented as a linear combination of the base vectors, for example  $$V = -2b_1+b_2$$.

There are two classical NP-hard problems in lattice-based cryptography:

1. Shortest Vector Problem (SVP): Given a lattice, to find the shortest non-zero vector in the lattice. In the graph, the vector $$s$$ is the shortest one. The SVP problem is NP-hard only under some assumptions.
2. Closest Vector Problem (CVP). Given a lattice and a vector $$V$$ (not necessarily in the lattice), to find the closest vector to $$V$$. For example, the closest vector to $$t$$ is $$z$$.

In the graph above, it is easy for us to solve SVP and CVP by simple inspection. However, the lattices used in cryptography have higher dimensions, say above 1000, as well as highly non-orthogonal basis vectors. On these instances, the problems get extremely hard to solve. It’s even believed future quantum computers will have it tough.

HRSS, which we use in our experiment, is based on NTRU, but a slightly better instantiation. The main improvements are:

• Faster key generation algorithm.
• NTRU encryption can produce ciphertexts that are impossible to decrypt (true for many lattice-based schemes). But HRSS fixes this problem.
• HRSS is a key encapsulation mechanism.

### CECPQ2b – Isogeny-based Post-Quantum TLS

Following CECPQ2, we have integrated into BoringSSL another hybrid key exchange mechanism relying on SIKE. It is called CECPQ2b and we will use it in our experimentation in TLS 1.3. SIKE is a key encapsulation method based on Supersingular Isogeny Diffie-Hellman (SIDH). Read more about SIDH in our previous post. The math behind SIDH is related to elliptic curves. A comparison between SIDH and the classical Elliptic Curve Diffie-Hellman (ECDH) is given.

An elliptic curve is a set of points that satisfy a specific mathematical equation. The equation of an elliptic curve may have multiple forms, the standard form is called the Weierstrass equation $$y^2 = x^3 +ax +b$$ and its shape can look like the red curve.

An interesting fact about elliptic curves is have a group structure. That is, the set of points on the curve have associated a binary operation called point addition. The set of points on the elliptic curve is closed under addition. Thus, adding two points results in another point that is also on the elliptic curve.

If we can add two different points on a curve, then we can also add one point to itself. And if we do it multiple times, then the resulting operations is known as a scalar multiplication and denoted as  $$Q = k\cdot P = P+P+\dots+P$$ for an integer $$k$$.

Multiplication of scalars is commutative. It means that two scalar multiplications can be evaluated in any order $$\color{darkred}{k_a}\cdot\color{darkgreen}{k_b} = \color{darkgreen}{k_b}\cdot\color{darkred}{k_a}$$; this an important property that makes ECDH possible.

It turns out that carefully if choosing an elliptic curve “correctly”, scalar multiplication is easy to compute but extremely hard to reverse. Meaning, given two points $$Q$$ and $$P$$ such that $$Q=k\cdot P$$, finding the integer k is a difficult task known as the Elliptic Curve Discrete Logarithm problem (ECDLP). This problem is suitable for cryptographic purposes.

Alice and Bob agree on a secret key as follows. Alice generates a private key $$k_a$$. Then, she uses some publicly known point $$P$$ and calculates her public key as $$Q_a = k_a\cdot P$$. Bob proceeds in similar fashion and gets $$k_b$$ and $$Q_b = k_b\cdot P$$. To agree on a shared secret, each party multiplies their private key with the public key of the other party. The result of this is the shared secret. Key agreement as described above, works thanks to the fact that scalars can commute:
$$\color{darkgreen}{k_a} \cdot Q_b = \color{darkgreen}{k_a} \cdot \color{darkred}{k_b} \cdot P \iff \color{darkred}{k_b} \cdot \color{darkgreen}{k_a} \cdot P = \color{darkred}{k_b} \cdot Q_a$$

There is a vast theory behind elliptic curves. An introduction to elliptic curve cryptography was posted before and more details can be found in this book. Now, lets describe SIDH and compare with ECDH.

### Isogenies on Elliptic Curves

Before explaining the details of SIDH key exchange, I’ll explain the 3 most important concepts, namely: j-invariant, isogeny and its kernel.

Each curve has a number that can be associated to it. Let’s call this number a j-invariant. This number is not unique per curve, meaning many curves have the same value of j-invariant, but it can be viewed as a way to group multiple elliptic curves into disjoint sets. We say that two curves are isomorphic if they are in the same set, called the isomorphism class. The j-invariant is a simple criterion to determine whether two curves are isomorphic. The j-invariant of a curve $$E$$ in Weierstrass form $$y^2 = x^3 + ax + b$$ is given as $$j(E) = 1728\frac{4a^3}{4a^3 +27b^2}$$

When it comes to isogeny, think about it as a map between two curves. Each point on some curve $$E$$ is mapped by isogeny to the point on isogenous curve $$E’$$. We denote mapping from curve $$E$$ to $$E’$$ by isogeny $$\phi$$ as:

$$\phi: E \rightarrow E’$$

It depends on the map if those two curves are isomorphic or not. Isogeny can be visualised as:

There may exist many of those mappings, each curve used in SIDH has small number of isogenies to other curves. Natural question is how do we compute such isogeny. Here is where the kernel of an isogeny comes. The kernel uniquely determines isogeny (up to isomorphism class). Formulas for calculating isogeny from its kernel were initially given by J. Vélu and the idea of calculating them efficiently was extended.

To finish, I will summarize what was said above with a picture.

There are two isomorphism classes on the picture above. Both curves $$E_1$$ and $$E_2$$ are isomorphic and have  j-invariant = 6. As curves $$E_3$$ and $$E_4$$ have j-invariant=13, they are in a different isomorphism class. There exists an isogeny $$\phi_2$$ between curve $$E_3$$ and $$E_2$$, so they both are isogeneous. Curves $$\phi_1$$ and $$E_2$$ are isomorphic and there is isogeny $$\phi_1$$ between them. Curves $$E_1$$ and $$E_4$$ are not isomorphic.

For brevity I’m skipping many important details, like details of the finite field, the fact that isogenies must be separable and that the kernel is finite. But curious readers can find a number of academic research papers available on the Internet.

### Big picture: similarities with ECDH

Let’s generalize the ECDH algorithm described above, so that we can swap some elements and try to use Supersingular Isogeny Diffie-Hellman.

Note that what actually happens during an ECDH key exchange is:

• We have a set of points on elliptic curve, set S
• We have another group of integers used for point multiplication, G
• We use an element from Z to act on an element from S to get another element from S:

$$G \cdot S \rightarrow S$$

Now the question is: what is our G and S in an SIDH setting? For SIDH to work, we need a big set of elements and something secret that will act on the elements from that set. This “group action” must also be resistant to attacks performed by quantum computers.

In the SIDH setting, those two sets are defined as:

• Set S is a set (graph) of j-invariants, such that all the curves are supersingular: $$S = [j(E_1), j(E_2), j(E_3), …. , j(E_n)]$$
• Set G is a set of isogenies acting on elliptic curves and transforming, for example, the elliptic curve $$E_1$$ into $$E_n$$:

### Random walk on supersingular graph

When we talk about Isogeny Based Cryptography, as a topic distinct from Elliptic Curve Cryptography, we usually mean algorithms and protocols that rely fundamentally on the structure of isogeny graphs. An example of such a (small) graph is pictured below.

Each vertex of the graph represents a different j-invariant of a set of supersingular curves. The edges between vertices represent isogenies converting one elliptic curve to another. As you can notice, the graph is strongly connected, meaning every vertex can be reached from every other vertex. In the context of isogeny-based crypto, we call such a graph a supersingular isogeny graph. I’ll skip some technical details about the construction of this graph (look for those here or here), but instead describe ideas about how it can be used.

As the graph is strongly connected, it is possible to walk a whole graph by starting from any vertex, randomly choosing an edge, following it to the next vertex and then start the process again on a new vertex. Such a way of visiting edges of this graph is called a random walk.

The random walk is a key concept that makes isogeny based crypto feasible. When you look closely at the graph, you can notice that each vertex has a small number of edges incident to it, this is why we can compute the isogenies efficiently. But also for any vertex there is only a limited number of isogenies to choose from, which doesn’t look like good base for a cryptographic scheme. The key question is – where does the security of the scheme come from exactly? In order to get it, it is necessary to visit a couple hundred vertices. What it means in practice is that secret isogeny (of large degree) is constructed as a composition of multiple isogenies (of small, prime degree).  Which means, the secret isogeny is:

This property and properties of the isogeny graph are what makes some of us believe that scheme has a good chance to be secure. More specifically, there is no efficient way of finding a path that connects $$E_0$$ with $$E_n$$, even with quantum computer at hand. The security level of a system depends on value n – the number of steps taken during the walk.

The random walk is a core process used when both generating public keys and computing shared secrets. It starts with party generating random value m (see more below), starting curve $$E_0$$ and points P and Q on this curve. Those values are used to compute the kernel of an isogeny $$R_1$$ in the following way:

$$R_1 = P + m \cdot Q$$

Thanks to formulas given by Vélu we can now use the point $$R_1$$ to compute the isogeny, the party will choose to move from a vertex to another one. After the isogeny $$\phi_{R_1}$$ is calculated it is applied to $$E_0$$  which results in a new curve $$E_1$$:

$$\phi_{R_1}: E_0 \rightarrow E_1$$

Isogeny is also applied to points P and Q. Once on $$E_1$$ the process is repeated. This process is applied n times, and at the end a party ends up on some curve $$E_n$$ which defines isomorphism class, so also j-invariant.

### Supersingular Isogeny Diffie-Hellman

The core idea in SIDH is to compose two random walks on an isogeny graph of elliptic curves in such a way that the end node of both ways of composing is the same.

In order to do it, scheme sets public parameters – starting curve $$E_0$$ and 2 pairs of base points on this curve $$(PA,QA)$$ , $$(PB,QB)$$. Alice generates her random secret keys m, and calculates a secret isogeny $$\phi_q$$ by performing a random walk as described above. The walk finishes with 3 values: elliptic curve $$E_a$$ she has ended up with and pair of points $$\phi_a(PB)$$ and $$\phi_a(QB)$$ after pushing through Alice’s secret isogeny. Bob proceeds analogously which results in the triple $${E_b, \phi_b(PA), \phi_b(QA)}$$. The triple forms a public key which is exchanged between parties.

The picture below visualizes the operation. The black dots represent curves grouped in the same isomorphism classes represented by light blue circles. Alice takes the orange path ending up on a curve $$E_a$$ in a separate isomorphism class than Bob after taking his dark blue path ending on $$E_b$$. SIDH is parametrized in a way that Alice and Bob will always end up in different isomorphism classes.

Upon receipt of triple $${ E_a, \phi_a(PB), \phi_a(QB) }$$  from Alice, Bob will use his secret value m to calculate a new kernel – but instead of using point $$PA$$ and $$QA$$ to calculate an isogeny kernel, he will now use images $$\phi_a(PB)$$ and $$\phi_a(QB)$$ received from Alice:

$$R’_1 = \phi_a(PB) + m \cdot \phi_a(QB)$$

Afterwards, he uses $$R’_1$$ to start the walk again resulting in the isogeny $$\phi’_b: E_a \rightarrow E_{ab}$$. Allice proceeds analogously resulting in the isogeny $$\phi’_a: E_b \rightarrow E_{ba}$$. With isogenies calculated this way, both Alice and Bob will converge in the same isomorphism class. The math math may seem complicated, hopefully the picture below makes it easier to understand.

Bob computes a new isogeny and starts his random walk from $$E_a$$ received from Alice. He ends up on some curve $$E_{ba}$$. Similarly, Alice calculates a new isogeny, applies it on $$E_b$$ received from Bob and her random walk ends on some curve $$E_{ab}$$. Curves $$E_{ab}$$ and $$E_{ba}$$ are not likely to be the same, but construction guarantees that they are isomorphic. As mentioned earlier, isomorphic curves have the same value of j-invariant,  hence the shared secret is a value of j-invariant $$j(E_{ab})$$.

Coming back to differences between SIDH and ECDH – we can split them into four categories: the elements of the group we are operating on, the cornerstone computation required to agree on a shared secret, the elements representing secret values, and the difficult problem on which the security relies.

In ECDH there is a secret key which is an integer scalar, in case of SIDH it is a secret isogeny, which also is generated from an integer scalar. In the case of ECDH one multiplies a point on a curve by a scalar, in the case of SIDH it is a random walk in an isogeny graph. In the case of ECDH, the public key is a point on a curve, in the case of SIDH, the public part is a curve itself and the image of some points after applying isogeny. The shared secret in the case of ECDH is a point on a curve, in the case of SIDH it is a j-invariant.

### SIKE: Supersingular Isogeny Key Encapsulation

SIDH could potentially be used as a drop-in replacement of the ECDH protocol. We have actually implemented a proof-of-concept and added it to our implementation of TLS 1.3 in the tls-tris library and described (together with Mozilla) implementation details in this draft. Nevertheless, there is a problem with SIDH – the keys can be used only once. In 2016, a few researchers came up with an active attack on SIDH which works only when public keys are reused. In the context of TLS, it is not a big problem, because for each session a fresh key pair is generated (ephemeral keys), but it may not be true for other applications.

SIKE is an isogeny key encapsulation which solves this problem. Bob can generate SIKE keys, upload the public part somewhere in the Internet and then anybody can use it whenever he wants to communicate with Bob securely. SIKE reuses SIDH – internally both sides of the connection always perform SIDH key generation, SIDH key agreement and apply some other cryptographic primitives in order to convert SIDH to KEM. SIKE is implemented in a few variants – each variant corresponds to the security levels using 128-, 192- and 256-bit secret keys. Higher security level means longer running time. More details about SIKE can be found here.

SIKE is also one of the candidates in NIST post-quantum “competition“.

I’ve skipped many important details to give a brief description of how isogeny based crypto works. If you’re curious and hungry for details, look at either of these Cloudflare meetups, where Deirdre Connolly talked about isogeny-based cryptography or this talk by Chloe Martindale during PQ Crypto School 2017. And if you would like to know more about quantum attacks on this scheme, I highly recommend this work.

## Conclusion

Quantum computers that can break meaningful cryptographic parameter settings do not exist, yet. They won’t be built for at least the next few years. Nevertheless, they have already changed the way we look at current cryptographic deployments. There are at least two reasons it’s worth investing in PQ cryptography:

• It takes a lot of time to build secure cryptography and we don’t actually know when today’s classical cryptography will be broken. There is a need for a good mathematical base: an initial idea of what may be secure against something that doesn’t exist yet. If you have an idea, you also need good implementation, constant time, resistance to things like time and cache side-channels, DFA, DPA, EM, and a bunch of other abbreviations indicating side-channel resistance. There is also deployment of, for example, algorithms based on elliptic curves were introduced in ’85, but started to really be used in production only during the last decade, 20 or so years later. Obviously, the implementation must be blazingly fast! Last, but not least, integration: we need time to develop standards to allow integration of PQ cryptography with protocols like TLS.
• Even though efficient quantum computers probably won’t exist for another few years, the threat is real. Data encrypted with current cryptographic algorithms can be recorded now with hopes of being broken in the future.

Cloudflare is motivated to help build the Internet of tomorrow with the tools at hand today. Our interest is in cryptographic techniques that can be integrated into existing protocols and widely deployed on the Internet as seamlessly as possible. PQ cryptography, like the rest of cryptography, includes many cryptosystems that can be used for communications in today’s Internet; Alice and Bob need to perform some computation, but they do not need to buy new hardware to do that.

Cloudflare sees great potential in those algorithms and believes that some of them can be used as a safe replacement for classical public-key cryptosystems. Time will tell if we’re justified in this belief!

# Towards Post-Quantum Cryptography in TLS

Post Syndicated from Kris Kwiatkowski original https://blog.cloudflare.com/towards-post-quantum-cryptography-in-tls/

We live in a completely connected society. A society connected by a variety of devices: laptops, mobile phones, wearables, self-driving or self-flying things. We have standards for a common language that allows these devices to communicate with each other. This is critical for wide-scale deployment – especially in cryptography where the smallest detail has great importance.

One of the most important standards-setting organizations is the National Institute of Standards and Technology (NIST), which is hugely influential in determining which standardized cryptographic systems see worldwide adoption. At the end of 2016, NIST announced it would hold a multi-year open project with the goal of standardizing new post-quantum (PQ) cryptographic algorithms secure against both quantum and classical computers.

Many of our devices have very different requirements and capabilities, so it may not be possible to select a “one-size-fits-all” algorithm during the process. NIST mathematician, Dustin Moody, indicated that institute will likely select more than one algorithm:

“There are several systems in use that could be broken by a quantum computer – public-key encryption and digital signatures, to take two examples – and we will need different solutions for each of those systems.”

Initially, NIST selected 82 candidates for further consideration from all submitted algorithms. At the beginning of 2019, this process entered its second stage. Today, there are 26 algorithms still in contention.

### Post-quantum cryptography: what is it really and why do I need it?

In 1994, Peter Shor made a significant discovery in quantum computation. He found an algorithm for integer factorization and computing discrete logarithms, both believed to be hard to solve in classical settings. Since then it has become clear that the ‘hard problems’ on which cryptosystems like RSA and elliptic curve cryptography (ECC) rely – integer factoring and computing discrete logarithms, respectively – are efficiently solvable with quantum computing.

A quantum computer can help to solve some of the problems that are intractable on a classical computer. In theory, they could efficiently solve some fundamental problems in mathematics. This amazing computing power would be highly beneficial, which is why companies are actually trying to build quantum computers. At first, Shor’s algorithm was merely a theoretical result – quantum computers powerful enough to execute it did not exist – but this is quickly changing. In March 2018, Google announced a 72-qubit universal quantum computer. While this is not enough to break say RSA-2048 (still more is needed), many fundamental problems have already been solved.

In anticipation of wide-spread quantum computing, we must start the transition from classical public-key cryptography primitives to post-quantum (PQ) alternatives. It may be that consumers will never get to hold a quantum computer, but a few powerful attackers who will get one can still pose a serious threat. Moreover, under the assumption that current TLS handshakes and ciphertexts are being captured and stored, a future attacker could crack these stored individual session keys and use those results to decrypt the corresponding individual ciphertexts. Even strong security guarantees, like forward secrecy, do not help out much there.

In 2006, the academic research community launched a conference series dedicated to finding alternatives to RSA and ECC. This so-called post-quantum cryptography should run efficiently on a classical computer, but it should also be secure against attacks performed by a quantum computer. As a research field, it has grown substantially in popularity.

Several companies, including Google, Microsoft, Digicert and Thales, are already testing the impact of deploying PQ cryptography. Cloudflare is involved in some of this, but we want to be a company that leads in this direction. The first thing we need to do is understand the real costs of deploying PQ cryptography, and that’s not obvious at all.

### What options do we have?

Many submissions to the NIST project are still under study. Some are very new and little understood; others are more mature and already standardized as RFCs. Some have been broken or withdrawn from the process; others are more conservative or illustrate how far classical cryptography would need to be pushed so that a quantum computer could not crack it within a reasonable cost. Some are very slow and big; others are not. But most cryptographic schemes can be categorized into these families: lattice-based, multivariate, hash-based (signatures only), code-based and isogeny-based.

For some algorithms, nevertheless, there is a fear they may be too inconvenient to use with today’s Internet. We must also be able to integrate new cryptographic schemes with existing protocols, such as SSH or TLS. To do that, designers of PQ cryptosystems must consider these characteristics:

• Latency caused by encryption and decryption on both ends of the communication channel, assuming a variety of devices from big and fast servers to slow and memory constrained IoT (Internet of Things) devices
• Small public keys and signatures to minimize bandwidth
• Clear design that allows cryptanalysis and determining weaknesses that could be exploited
• Use of existing hardware for fast implementation

The work on post-quantum public key cryptosystems must be done in a full view of organizations, governments, cryptographers, and the public. Emerging ideas must be properly vetted by this community to ensure widespread support.

### Helping Build a Better Internet

To better understand the post-quantum world, Cloudflare began experimenting with these algorithms and used them to provide confidentiality in TLS connections.

With Google, we are proposing a wide-scale experiment that combines client- and server-side data collection to evaluate the performance of key-exchange algorithms on actual users’ devices. We hope that this experiment helps choose an algorithm with the best characteristics for the future of the Internet. With Cloudflare’s highly distributed network of access points and Google’s Chrome browser, both companies are in a very good position to perform this experiment.

Our goal is to understand how these algorithms act when used by real clients over real networks, particularly candidate algorithms with significant differences in public-key or ciphertext sizes. Our focus is on how different key sizes affect handshake time in the context of Transport Layer Security (TLS) as used on the web over HTTPS.

Our primary candidates are an NTRU-based construction called HRSS-SXY (by Hülsing – Rijneveld – Schanck – Schwabe, and Tsunekazu Saito – Keita Xagawa – Takashi Yamakawa) and an isogeny-based Supersingular Isogeny Key Encapsulation (SIKE). Details of both algorithms are described in more detail below in section “Dive into post-quantum cryptography”. This table shows a few characteristics for both algorithms. Performance timings were obtained by running the BoringSSL speed test on an Intel Skylake CPU.

KEMPublic Key size (bytes)Ciphertext (bytes)Secret size (bytes)KeyGen (op/sec)Encaps (op/sec)Decaps (op/sec)NIST level
SIKE/p43433034616367.1228.0209.31

Currently the most commonly used key exchange algorithm (according to Cloudflare’s data) is the non-quantum X25519. Its public keys are 32 bytes and BoringSSL can generate 49301.2 key pairs, and is able to perform 19628.6 key agreements every second on my Skylake CPU.

Note that HRSS-SXY shows a significant speed advantage, while SIKE has a size advantage. In our experiment, we will deploy these two algorithms on both the server side using Cloudflare’s infrastructure, and the client side using Chrome Canary; both sides will collect telemetry information about TLS handshakes using these two PQ algorithms to see how they perform in practice.

### What do we expect to find?

In 2018, Adam Langley conducted an experiment with the goal of evaluating the likely latency impact of a post-quantum key exchange in TLS. Chrome was augmented with the ability to include a dummy, arbitrarily-sized extension in the TLS ClientHello (fixed number of bytes of random noise). After taking into account the performance and key size offered by different types key-exchange schemes, he concluded that constructs based on structured lattices may be most suitable for future use in TLS.

However, Langley also observed a peculiar phenomenon; client connections measured at 95th percentile had much higher latency than the median. It means that in those cases, isogeny-based systems may be a better choice. In the Dive into post-quantum cryptography, we describe the difference between isogeny-based SIKE and lattice-based NTRU cryptosystems.

In our experiment, we want to more thoroughly evaluate and ascribe root causes to these unexpected latency increases. We would particularly like to learn more about the characteristics of those networks: what causes increased latency? how does the performance cost of isogeny-based algorithms impact the TLS handshake? We want to answer key questions, like:

• What is a good ratio for speed-to-key size (or how much faster could SIKE get to achieve the client-perceived performance of HRSS)?
• How do network middleboxes behave when clients use new PQ algorithms, and which networks have problematic middleboxes?
• How do the different properties of client networks affect TLS performance with different PQ key exchanges? Can we identify specific autonomous systems, device configurations, or network configurations that favor one algorithm over another? How is performance affected in the long tail?

### Experiment Design

Our experiment will involve both server- and client-side performance statistics collection from real users around the world (all the data is anonymized). Cloudflare is operating the server-side TLS connections. We will enable the CECPQ2 (HRSS + X25519) and CECPQ2b (SIKE + X25519) key-agreement algorithms on all TLS-terminating edge servers.

In this experiment, the ClientHello will contain a CECPQ2 or CECPQ2b public key (but never both). Additionally, Chrome will always include X25519 for servers that do not support post-quantum key exchange. The post-quantum key exchange will only be negotiated in TLS version 1.3 when both sides support it.

Since Cloudflare only measures the server side of the connection, it is impossible to determine the time it takes for a ClientHello sent from Chrome to reach Cloudflare’s edge servers; however, we can measure the time it takes for the TLS ServerHello message containing post-quantum key exchange, to reach the client and for the client to respond.

On the client side, Chrome Canary will operate the TLS connection. Google will enable either CECPQ2 or CECPQ2b in Chrome for the following mix of architecture and OSes:

• x86-64: Windows, Linux, macOS, ChromeOS
• aarch64: Android

Our high-level expectation is to get similar results as Langley’s original experiment in 2018 — slightly increased latency for the 50th percentile and higher latency for the 95th. Unfortunately, data collected purely from real users’ connections may not suffice for diagnosing the root causes of why some clients experience excessive slowdown. To this end, we will perform follow-up experiments based on per-client information we collect server-side.

Our primary hypothesis is that excessive slowdowns, like those Langley observed, are largely due to in-network events, such as middleboxes or bloated/lossy links. As a first-pass analysis, we will investigate whether the slowed-down clients share common network features, like common ASes, common transit networks, common link types, and so on. To determine this, we will run a traceroute from vantage points close to our servers back toward the clients (not overloading any particular links or hosts) and study whether some client locations are subject to slowdowns for all destinations or just for some.

### Dive into post-quantum cryptography

Be warned: the details of PQ cryptography may be quite complicated. In some cases it builds on classical cryptography, and in other cases it is completely different math. It would be rather hard to describe details in a single blog post. Instead, we are giving you an intuition of post-quantum cryptography, rather than provide deep academic-level descriptions. We’re skipping a lot of details for the sake of brevity. Nevertheless, settle in for a bit of an epic journey because we have a lot to cover.

### Key encapsulation mechanism

NIST requires that all key-agreement algorithms have a form of key-encapsulation mechanism (KEM). The KEM is a simplified form of public key encryption (PKE). As PKE, it also allows agreement on a secret, but in a slightly different way. The idea is that the session key is an output of the encryption algorithm, conversely to public key encryption schemes where session key is an input to the algorithm. In a KEM, Alice generates a random key and uses the pre-generated public key from Bob to encrypt (encapsulate) it. This results in a ciphertext sent to Bob. Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the random key. The idea was initially introduced by Cramer and Shoup. Experience shows that such constructs are easier to design, analyze, and implement as the scheme is limited to communicating a fixed-size session key. Leonardo Da Vinci said, “Simplicity is the ultimate sophistication,” which is very true in cryptography.

The key exchange (KEX) protocol, like Diffie-Hellman, is yet a different construct: it allows two parties to agree on a shared secret that can be used as a symmetric encryption key. For example, Alice generates a key pair and sends a public key to Bob. Bob does the same and uses his own key pair with Alice’s public key to generate the shared secret. He then sends his public key to Alice who can now generate the same shared secret. What’s worth noticing is that both Alice and Bob perform exactly the same operations.

KEM construction can be converted to KEX. Alice performs key generation and sends the public key to Bob. Bob uses it to encapsulate a symmetric session key and sends it back to Alice. Alice decapsulates the ciphertext received from Bob and gets the symmetric key. This is actually what we do in our experiment to make integration with the TLS protocol less complicated.

### NTRU Lattice-based Encryption

We will enable the CECPQ2 implemented by Adam Langley from Google on our servers. He described this implementation in detail here. This key exchange uses the HRSS algorithm, which is based on the NTRU (N-Th Degree TRUncated Polynomial Ring) algorithm. Foregoing too much detail, I am going to explain how NTRU works and give simplified examples, and finally, compare it to HRSS.

NTRU is a cryptosystem based on a polynomial ring. This means that we do not operate on numbers modulo a prime (like in RSA), but on polynomials of degree $$N$$ , where the degree of a polynomial is the highest exponent of its variable. For example, $$x^7 + 6x^3 + 11x^2$$ has degree of 7.

One can add polynomials in the ring in the usual way, by simply adding theirs coefficients modulo some integer. In NTRU this integer is called $$q$$. Polynomials can also be multiplied, but remember, you are operating in the ring, therefore the result of a multiplication is always a polynomial of degree less than $$N$$. It basically means that exponents of the resulting polynomial are added to modulo $$N$$.

In other words, polynomial ring arithmetic is very similar to modular arithmetic, but instead of working with a set of numbers less than N, you are working with a set of polynomials with a degree less than N.

To instantiate the NTRU cryptosystem, three domain parameters must be chosen:

• $$N$$ – degree of the polynomial ring, in NTRU the principal objects are polynomials of degree $$N-1$$.
• $$p$$ – small modulus used during key generation and decryption for reducing message coefficients.
• $$q$$ – large modulus used during algorithm execution for reducing coefficients of the polynomials.

First, we generate a pair of public and private keys. To do that, two polynomials $$f$$ and $$g$$ are chosen from the ring in a way that their randomly generated coefficients are much smaller than $$q$$. Then key generation computes two inverses of the polynomial: $$f_p= f^{-1} \bmod{p} \\ f_q= f^{-1} \bmod{q}$$

The last step is to compute $$pk = p\cdot f_q\cdot g \bmod q$$, which we will use as public key pk. The private key consists of $$f$$ and $$f_p$$. The $$f_q$$ is not part of any key, however it must remain secret.

It might be the case that after choosing $$f$$, the inverses modulo $$p$$ and $$q$$ do not exist. In this case, the algorithm has to start from the beginning and generate another $$f$$. That’s unfortunate because calculating the inverse of a polynomial is a costly operation. HRSS brings an improvement to this issue since it ensures that those inverses always exist, making key generation faster than as proposed initially in NTRU.

The encryption of a message $$m$$ proceeds as follows. First, the message $$m$$ is converted to a ring element $$pt$$ (there exists an algorithm for performing this conversion in both directions). During encryption, NTRU randomly chooses one polynomial $$b$$ called blinder. The goal of the blinder is to generate different ciphertexts per encyption. Thus, the ciphetext $$ct$$ is obtained as $$ct = (b\cdot pk + pt ) \bmod q$$ Decryption looks a bit more complicated but it can also be easily understood. Decryption uses both the secret value $$f$$ and to recover the plaintext as $$v = f \cdot ct \bmod q \\ pt = v \cdot f_p \bmod p$$

This diagram demonstrates why and how decryption works.

After obtaining $$pt$$, the message $$m$$ is recovered by inverting the conversion function.

The underlying hard assumption is that given two polynomials: $$f$$ and $$g$$ whose coefficients are short compared to the modulus $$q$$, it is difficult to distinguish $$pk = \frac{f}{g}$$ from a random element in the ring. It means that it’s hard to find $$f$$ and $$g$$ given only public key pk.

### Lattices

NTRU cryptosystem is a grandfather of lattice-based encryption schemes. The idea of using  difficult problems for cryptographic purposes was due to Ajtai. His work evolved into a whole area of research with the goal of creating more practical, lattice-based cryptosystems.

### What is a lattice and why it can be used for post-quantum crypto?

The picture below visualizes lattice as points in a two-dimensional space. A lattice is defined by the origin $$O$$ and base vectors $$\{ b_1 , b_2\}$$. Every point on the lattice is represented as a linear combination of the base vectors, for example  $$V = -2b_1+b_2$$.

There are two classical NP-hard problems in lattice-based cryptography:

1. Shortest Vector Problem (SVP): Given a lattice, to find the shortest non-zero vector in the lattice. In the graph, the vector $$s$$ is the shortest one. The SVP problem is NP-hard only under some assumptions.
2. Closest Vector Problem (CVP). Given a lattice and a vector $$V$$ (not necessarily in the lattice), to find the closest vector to $$V$$. For example, the closest vector to $$t$$ is $$z$$.

In the graph above, it is easy for us to solve SVP and CVP by simple inspection. However, the lattices used in cryptography have higher dimensions, say above 1000, as well as highly non-orthogonal basis vectors. On these instances, the problems get extremely hard to solve. It’s even believed future quantum computers will have it tough.

HRSS, which we use in our experiment, is based on NTRU, but a slightly better instantiation. The main improvements are:

• Faster key generation algorithm.
• NTRU encryption can produce ciphertexts that are impossible to decrypt (true for many lattice-based schemes). But HRSS fixes this problem.
• HRSS is a key encapsulation mechanism.

### CECPQ2b – Isogeny-based Post-Quantum TLS

Following CECPQ2, we have integrated into BoringSSL another hybrid key exchange mechanism relying on SIKE. It is called CECPQ2b and we will use it in our experimentation in TLS 1.3. SIKE is a key encapsulation method based on Supersingular Isogeny Diffie-Hellman (SIDH). Read more about SIDH in our previous post. The math behind SIDH is related to elliptic curves. A comparison between SIDH and the classical Elliptic Curve Diffie-Hellman (ECDH) is given.

An elliptic curve is a set of points that satisfy a specific mathematical equation. The equation of an elliptic curve may have multiple forms, the standard form is called the Weierstrass equation $$y^2 = x^3 +ax +b$$ and its shape can look like the red curve.

An interesting fact about elliptic curves is have a group structure. That is, the set of points on the curve have associated a binary operation called point addition. The set of points on the elliptic curve is closed under addition. Thus, adding two points results in another point that is also on the elliptic curve.

If we can add two different points on a curve, then we can also add one point to itself. And if we do it multiple times, then the resulting operations is known as a scalar multiplication and denoted as  $$Q = k\cdot P = P+P+\dots+P$$ for an integer $$k$$.

Multiplication of scalars is commutative. It means that two scalar multiplications can be evaluated in any order $$\color{darkred}{k_a}\cdot\color{darkgreen}{k_b} = \color{darkgreen}{k_b}\cdot\color{darkred}{k_a}$$; this an important property that makes ECDH possible.

It turns out that carefully if choosing an elliptic curve “correctly”, scalar multiplication is easy to compute but extremely hard to reverse. Meaning, given two points $$Q$$ and $$P$$ such that $$Q=k\cdot P$$, finding the integer k is a difficult task known as the Elliptic Curve Discrete Logarithm problem (ECDLP). This problem is suitable for cryptographic purposes.

Alice and Bob agree on a secret key as follows. Alice generates a private key $$k_a$$. Then, she uses some publicly known point $$P$$ and calculates her public key as $$Q_a = k_a\cdot P$$. Bob proceeds in similar fashion and gets $$k_b$$ and $$Q_b = k_b\cdot P$$. To agree on a shared secret, each party multiplies their private key with the public key of the other party. The result of this is the shared secret. Key agreement as described above, works thanks to the fact that scalars can commute:
$$\color{darkgreen}{k_a} \cdot Q_b = \color{darkgreen}{k_a} \cdot \color{darkred}{k_b} \cdot P \iff \color{darkred}{k_b} \cdot \color{darkgreen}{k_a} \cdot P = \color{darkred}{k_b} \cdot Q_a$$

There is a vast theory behind elliptic curves. An introduction to elliptic curve cryptography was posted before and more details can be found in this book. Now, lets describe SIDH and compare with ECDH.

### Isogenies on Elliptic Curves

Before explaining the details of SIDH key exchange, I’ll explain the 3 most important concepts, namely: j-invariant, isogeny and its kernel.

Each curve has a number that can be associated to it. Let’s call this number a j-invariant. This number is not unique per curve, meaning many curves have the same value of j-invariant, but it can be viewed as a way to group multiple elliptic curves into disjoint sets. We say that two curves are isomorphic if they are in the same set, called the isomorphism class. The j-invariant is a simple criterion to determine whether two curves are isomorphic. The j-invariant of a curve $$E$$ in Weierstrass form $$y^2 = x^3 + ax + b$$ is given as $$j(E) = 1728\frac{4a^3}{4^3 +27b^2}$$

When it comes to isogeny, think about it as a map between two curves. Each point on some curve $$E$$ is mapped by isogeny to the point on isogenous curve $$E’$$. We denote mapping from curve $$E$$ to $$E’$$ by isogeny $$\phi$$ as:

$$\phi: E \rightarrow E’$$

It depends on the map if those two curves are isomorphic or not. Isogeny can be visualised as:

There may exist many of those mappings, each curve used in SIDH has small number of isogenies to other curves. Natural question is how do we compute such isogeny. Here is where the kernel of an isogeny comes. The kernel uniquely determines isogeny (up to isomorphism class). Formulas for calculating isogeny from its kernel were initially given by J. Vélu and the idea of calculating them efficiently was extended .

To finish, I will summarize what was said above with a picture.

There are two isomorphism classes on the picture above. Both curves $$E_1$$ and $$E_2$$ are isomorphic and have  j-invariant = 6. As curves $$E_3$$ and $$E_4$$ have j-invariant=13, they are in a different isomorphism class. There exists an isogeny $$\phi_2$$ between curve $$E_3$$ and $$E_2$$, so they both are isogeneous. Curves $$\phi_1$$ and $$E_2$$ are isomorphic and there is isogeny $$\phi_1$$ between them. Curves $$E_1$$ and $$E_4$$ are neither isomorphic nor isogeneus.

For brevity I’m skipping many important details, like details of the finite field, the fact that isogenies must be separable and that the kernel is finite. But curious readers can find a number of academic research papers available on the Internet.

### Big picture: similarities with ECDH

Let’s generalize the ECDH algorithm described above, so that we can swap some elements and try to use Supersingular Isogeny Diffie-Hellman.

Note that what actually happens during an ECDH key exchange is:

• We have a set of points on elliptic curve, set S
• We have another group of integers used for point multiplication, G
• We use an element from Z to act on an element from S to get another element from S:

$$G \cdot S \rightarrow S$$

Now the question is: what is our G and S in an SIDH setting? For SIDH to work, we need a big set of elements and something secret that will act on the elements from that set. This “group action” must also be resistant to attacks performed by quantum computers.

In the SIDH setting, those two sets are defined as:

• Set S is a set (graph) of j-invariants, such that all the curves are supersingular: $$S = [j(E_1), j(E_2), j(E_3), …. , j(E_n)]$$
• Set G is a set of isogenies acting on elliptic curves and transforming, for example, the elliptic curve $$E_1$$ into $$E_n$$:

### Random walk on supersingular graph

When we talk about Isogeny Based Cryptography, as a topic distinct from Elliptic Curve Cryptography, we usually mean algorithms and protocols that rely fundamentally on the structure of isogeny graphs. An example of such a (small) graph is pictured below.

Each vertex of the graph represents a different j-invariant of a set of supersingular curves. The edges between vertices represent isogenies converting one elliptic curve to another. As you can notice, the graph is strongly connected, meaning every vertex can be reached from every other vertex. In the context of isogeny-based crypto, we call such a graph a supersingular isogeny graph. I’ll skip some technical details about the construction of this graph (look for those here or here), but instead describe ideas about how it can be used.

As the graph is strongly connected, it is possible to walk a whole graph by starting from any vertex, randomly choosing an edge, following it to the next vertex and then start the process again on a new vertex. Such a way of visiting edges of this graph is called a random walk.

The random walk is a key concept that makes isogeny based crypto feasible. When you look closely at the graph, you can notice that each vertex has a small number of edges incident to it, this is why we can compute the isogenies efficiently. But also for any vertex there is only a limited number of isogenies to choose from, which doesn’t look like good base for a cryptographic scheme. The key question is – where does the security of the scheme come from exactly? In order to get it, it is necessary to visit a couple hundred vertices. What it means in practice is that secret isogeny (of large degree) is constructed as a composition of multiple isogenies (of small, prime degree).  Which means, the secret isogeny is:

This property and properties of the isogeny graph are what makes some of us believe that scheme has a good chance to be secure. More specifically, there is no efficient way of finding a path that connects $$E_0$$ with $$E_n$$, even with quantum computer at hand. The security level of a system depends on value n – the number of steps taken during the walk.

The random walk is a core process used when both generating public keys and computing shared secrets. It starts with party generating random value m (see more below), starting curve $$E_0$$ and points P and Q on this curve. Those values are used to compute the kernel of an isogeny $$R_1$$ in the following way:

$$R_1 = P + m \cdot Q$$

Thanks to formulas given by Vélu we can now use the point $$R_1$$ to compute the isogeny, the party will choose to move from a vertex to another one. After the isogeny $$\phi_{R_1}$$ is calculated it is applied to $$E_0$$  which results in a new curve $$E_1$$:

$$\phi_{R_1}: E_0 \rightarrow E_1$$

Isogeny is also applied to points P and Q. Once on $$E_1$$ the process is repeated. This process is applied n times, and at the end a party ends up on some curve $$E_n$$ which defines isomorphism class, so also j-invariant.

### Supersingular Isogeny Diffie-Hellman

The core idea in SIDH is to compose two random walks on an isogeny graph of elliptic curves in such a way that the end node of both ways of composing is the same.

In order to do it, scheme sets public parameters – starting curve $$E_0$$ and 2 pairs of base points on this curve $$(PA,QA)$$ , $$(PB,QB)$$. Alice generates her random secret keys m, and calculates a secret isogeny $$\phi_q$$ by performing a random walk as described above. The walk finishes with 3 values: elliptic curve $$E_a$$ she has ended up with and pair of points $$\phi_a(PB)$$ and $$\phi_a(QB)$$ after pushing through Alice’s secret isogeny. Bob proceeds analogously which results in the triple $${E_b, \phi_b(PA), \phi_b(QA)}$$. The triple forms a public key which is exchanged between parties.

The picture below visualizes the operation. The black dots represent curves grouped in the same isomorphism classes represented by light blue circles. Alice takes the orange path ending up on a curve $$E_a$$ in a separate isomorphism class than Bob after taking his dark blue path ending on $$E_b$$. SIDH is parametrized in a way that Alice and Bob will always end up in different isomorphism classes.

Upon receipt of triple $${ E_a, \phi_a(PB), \phi_a(QB) }$$  from Alice, Bob will use his secret value m to calculate a new kernel – but instead of using point $$PA$$ and $$QA$$ to calculate an isogeny kernel, he will now use images $$\phi_a(PB)$$ and $$\phi_a(QB)$$ received from Alice:

$$R’_1 = \phi_a(PB) + m \cdot \phi_a(QB)$$

Afterwards, he uses $$R’_1$$ to start the walk again resulting in the isogeny $$\phi’_b: E_a \rightarrow E_{ab}$$. Allice proceeds analogously resulting in the isogeny $$\phi’_a: E_b \rightarrow E_{ba}$$. With isogenies calculated this way, both Alice and Bob will converge in the same isomorphism class. The math math may seem complicated, hopefully the picture below makes it easier to understand.

Bob computes a new isogeny and starts his random walk from $$E_a$$ received from Alice. He ends up on some curve $$E_{ba}$$. Similarly, Alice calculates a new isogeny, applies it on $$E_b$$ received from Bob and her random walk ends on some curve $$E_{ab}$$. Curves $$E_{ab}$$ and $$E_{ba}$$ are not likely to be the same, but construction guarantees that they are isomorphic. As mentioned earlier, isomorphic curves have the same value of j-invariant,  hence the shared secret is a value of j-invariant $$j(E_{ab})$$.

Coming back to differences between SIDH and ECDH – we can split them into four categories: the elements of the group we are operating on, the cornerstone computation required to agree on a shared secret, the elements representing secret values, and the difficult problem on which the security relies.

In ECDH there is a secret key which is an integer scalar, in case of SIDH it is a secret isogeny, which also is generated from an integer scalar. In the case of ECDH one multiplies a point on a curve by a scalar, in the case of SIDH it is a random walk in an isogeny graph. In the case of ECDH, the public key is a point on a curve, in the case of SIDH, the public part is a curve itself and the image of some points after applying isogeny. The shared secret in the case of ECDH is a point on a curve, in the case of SIDH it is a j-invariant.

### SIKE: Supersingular Isogeny Key Encapsulation

SIDH could potentially be used as a drop-in replacement of the ECDH protocol. We have actually implemented a proof-of-concept and added it to our implementation of TLS 1.3 in the tls-tris library and described (together with Mozilla) implementation details in this draft. Nevertheless, there is a problem with SIDH – the keys can be used only once. In 2016, a few researchers came up with an active attack on SIDH which works only when public keys are reused. In the context of TLS, it is not a big problem, because for each session a fresh key pair is generated (ephemeral keys), but it may not be true for other applications.

SIKE is an isogeny key encapsulation which solves this problem. Bob can generate SIKE keys, upload the public part somewhere in the Internet and then anybody can use it whenever he wants to communicate with Bob securely. SIKE reuses SIDH – internally both sides of the connection always perform SIDH key generation, SIDH key agreement and apply some other cryptographic primitives in order to convert SIDH to KEM. SIKE is implemented in a few variants – each variant corresponds to the security levels using 128-, 192- and 256-bit secret keys. Higher security level means longer running time. More details about SIKE can be found here.

SIKE is also one of the candidates in NIST post-quantum “competition“.

I’ve skipped many important details to give a brief description of how isogeny based crypto works. If you’re curious and hungry for details, look at either of these Cloudflare meetups, where Deirdre Connolly talked about isogeny-based cryptography or this talk by Chloe Martindale during PQ Crypto School 2017. And if you would like to know more about quantum attacks on this scheme, I highly recommend this work.

## Conclusion

Quantum computers that can break meaningful cryptographic parameter settings do not exist, yet. They won’t be built for at least the next few years. Nevertheless, they have already changed the way we look at current cryptographic deployments. There are at least two reasons it’s worth investing in PQ cryptography:

• It takes a lot of time to build secure cryptography and we don’t actually know when today’s classical cryptography will be broken. There is a need for a good mathematical base: an initial idea of what may be secure against something that doesn’t exist yet. If you have an idea, you also need good implementation, constant time, resistance to things like time and cache side-channels, DFA, DPA, EM, and a bunch of other abbreviations indicating side-channel resistance. There is also deployment of, for example, algorithms based on elliptic curves were introduced in ’85, but started to really be used in production only during the last decade, 20 or so years later. Obviously, the implementation must be blazingly fast! Last, but not least, integration: we need time to develop standards to allow integration of PQ cryptography with protocols like TLS.
• Even though efficient quantum computers probably won’t exist for another few years, the threat is real. Data encrypted with current cryptographic algorithms can be recorded now with hopes of being broken in the future.

Cloudflare is motivated to help build the Internet of tomorrow with the tools at hand today. Our interest is in cryptographic techniques that can be integrated into existing protocols and widely deployed on the Internet as seamlessly as possible. PQ cryptography, like the rest of cryptography, includes many cryptosystems that can be used for communications in today’s Internet; Alice and Bob need to perform some computation, but they do not need to buy new hardware to do that.

Cloudflare sees great potential in those algorithms and believes that some of them can be used as a safe replacement for classical public-key cryptosystems. Time will tell if we’re justified in this belief!

# Introducing CIRCL: An Advanced Cryptographic Library

Post Syndicated from Kris Kwiatkowski original https://blog.cloudflare.com/introducing-circl/

As part of Crypto Week 2019, today we are proud to release the source code of a cryptographic library we’ve been working on: a collection of cryptographic primitives written in Go, called CIRCL. This library includes a set of packages that target cryptographic algorithms for post-quantum (PQ), elliptic curve cryptography, and hash functions for prime groups. Our hope is that it’s useful for a broad audience. Get ready to discover how we made CIRCL unique.

### Cryptography in Go

We use Go a lot at Cloudflare. It offers a good balance between ease of use and performance; the learning curve is very light, and after a short time, any programmer can get good at writing fast, lightweight backend services. And thanks to the possibility of implementing performance critical parts in Go assembly, we can try to ‘squeeze the machine’ and get every bit of performance.

Cloudflare’s cryptography team designs and maintains security-critical projects. It’s not a secret that security is hard. That’s why, we are introducing the Cloudflare Interoperable Reusable Cryptographic Library – CIRCL. There are multiple goals behind CIRCL. First, we want to concentrate our efforts to implement cryptographic primitives in a single place. This makes it easier to ensure that proper engineering processes are followed. Second, Cloudflare is an active member of the Internet community – we are trying to improve and propose standards to help make the Internet a better place.

Cloudflare’s mission is to help build a better Internet. For this reason, we want CIRCL helps the cryptographic community to create proof of concepts, like the post-quantum TLS experiments we are doing. Over the years, lots of ideas have been put on the table by cryptographers (for example, homomorphic encryption, multi-party computation, and privacy preserving constructions). Recently, we’ve seen those concepts picked up and exercised in a variety of contexts. CIRCL’s implementations of cryptographic primitives creates a powerful toolbox for developers wishing to use them.

The Go language provides native packages for several well-known cryptographic algorithms, such as key agreement algorithms, hash functions, and digital signatures. There are also packages maintained by the community under golang.org/x/crypto that provide a diverse set of algorithms for supporting authenticated encryption, stream ciphers, key derivation functions, and bilinear pairings. CIRCL doesn’t try to compete with golang.org/x/crypto in any sense. Our goal is to provide a complementary set of implementations that are more aggressively optimized, or may be less commonly used but have a good chance at being very useful in the future.

### Unboxing CIRCL

Our cryptography team worked on a fresh proposal to augment the capabilities of Go users with a new set of packages.  You can get them by typing:

$go get github.com/cloudflare/circl The contents of CIRCL is split across different categories, summarized in this table: CategoryAlgorithmsDescriptionApplications Post-Quantum CryptographySIDHIsogeny-based cryptography.SIDH provides key exchange mechanisms using ephemeral keys. SIKESIKE is a key encapsulation mechanism (KEM).Key agreement protocols. Key ExchangeX25519, X448RFC-7748 provides new key exchange mechanisms based on Montgomery elliptic curves.TLS 1.3. Secure Shell. FourQOne of the fastest elliptic curves at 128-bit security level.Experimental for key agreement and digital signatures. Digital SignaturesEd25519RFC-8032 provides new digital signature algorithms based on twisted Edwards curves.Digital certificates and authentication methods. Hash to Elliptic Curve GroupsSeveral algorithms: Elligator2, Ristretto, SWU, Icart.Protocols based on elliptic curves require hash functions that map bit strings to points on an elliptic curve.Useful in protocols such as Privacy Pass. OPAQUE. PAKE. Verifiable random functions. OptimizationCurve P-384Our optimizations reduce the burden when moving from P-256 to P-384.ECDSA and ECDH using Suite B at top secret level. ### SIKE, a Post-Quantum Key Encapsulation Mechanism To better understand the post-quantum world, we started experimenting with post-quantum key exchange schemes and using them for key agreement in TLS 1.3. CIRCL contains the sidh package, an implementation of Supersingular Isogeny-based Diffie-Hellman (SIDH), as well as CCA2-secure Supersingular Isogeny-based Key Encapsulation (SIKE), which is based on SIDH. CIRCL makes playing with PQ key agreement very easy. Below is an example of the SIKE interface that can be used to establish a shared secret between two parties for use in symmetric encryption. The example uses a key encapsulation mechanism (KEM). For our example in this scheme, Alice generates a random secret key, and then uses Bob’s pre-generated public key to encrypt (encapsulate) it. The resulting ciphertext is sent to Bob. Then, Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the secret key. See more details about SIKE in this Cloudflare blog. Let’s see how to do this with CIRCL: // Bob's key pair prvB := NewPrivateKey(Fp503, KeyVariantSike) pubB := NewPublicKey(Fp503, KeyVariantSike) // Generate private key prvB.Generate(rand.Reader) // Generate public key prvB.GeneratePublicKey(pubB) var publicKeyBytes = make([]array, pubB.Size()) var privateKeyBytes = make([]array, prvB.Size()) pubB.Export(publicKeyBytes) prvB.Export(privateKeyBytes) // Encode public key to JSON // Save privateKeyBytes on disk  Bob uploads the public key to a location accessible by anybody. When Alice wants to establish a shared secret with Bob, she performs encapsulation that results in two parts: a shared secret and the result of the encapsulation, the ciphertext. // Read JSON to bytes // Alice's key pair pubB := NewPublicKey(Fp503, KeyVariantSike) pubB.Import(publicKeyBytes) var kem := sike.NewSike503(rand.Reader) kem.Encapsulate(ciphertext, sharedSecret, pubB) // send ciphertext to Bob  Bob now receives ciphertext from Alice and decapsulates the shared secret: var kem := sike.NewSike503(rand.Reader) kem.Decapsulate(sharedSecret, prvA, pubA, ciphertext)  At this point, both Alice and Bob can derive a symmetric encryption key from the secret generated. SIKE implementation contains: • Two different field sizes: Fp503 and Fp751. The choice of the field is a trade-off between performance and security. • Code optimized for AMD64 and ARM64 architectures, as well as generic Go code. For AMD64, we detect the micro-architecture and if it’s recent enough (e.g., it supports ADOX/ADCX and BMI2 instruction sets), we use different multiplication techniques to make an execution even faster. • Code implemented in constant time, that is, the execution time doesn’t depend on secret values. We also took care of low heap-memory footprint, so that the implementation uses a minimal amount of dynamically allocated memory. In the future, we plan to provide multiple implementations of post-quantum schemes. Currently, our focus is on algorithms useful for key exchange in TLS. SIDH/SIKE are interesting because the key sizes produced by those algorithms are relatively small (comparing with other PQ schemes). Nevertheless, performance is not all that great yet, so we’ll continue looking. We plan to add lattice-based algorithms, such as NTRU-HRSS and Kyber, to CIRCL. We will also add another more experimental algorithm called cSIDH, which we would like to try in other applications. CIRCL doesn’t currently contain any post-quantum signature algorithms, which is also on our to-do list. After our experiment with TLS key exchange completes, we’re going to look at post-quantum PKI. But that’s a topic for a future blog post, so stay tuned. Last, we must admit that our code is largely based on the implementation from the NIST submission along with the work of former intern Henry De Valence, and we would like to thank both Henry and the SIKE team for their great work. ### Elliptic Curve Cryptography Elliptic curve cryptography brings short keys sizes and faster evaluation of operations when compared to algorithms based on RSA. Elliptic curves were standardized during the early 2000s, and have recently gained popularity as they are a more efficient way for securing communications. Elliptic curves are used in almost every project at Cloudflare, not only for establishing TLS connections, but also for certificate validation, certificate revocation (OCSP), Privacy Pass, certificate transparency, and AMP Real URL. The Go language provides native support for NIST-standardized curves, the most popular of which is P-256. In a previous post, Vlad Krasnov described the relevance of optimizing several cryptographic algorithms, including P-256 curve. When working at Cloudflare scale, little issues around performance are significantly magnified. This is one reason why Cloudflare pushes the boundaries of efficiency. A similar thing happened on the chained validation of certificates. For some certificates, we observed performance issues when validating a chain of certificates. Our team successfully diagnosed this issue: certificates which had signatures from the P-384 curve, which is the curve that corresponds to the 192-bit security level, were taking up 99% of CPU time! It is common for certificates closer to the root of the chain of trust to rely on stronger security assumptions, for example, using larger elliptic curves. Our first-aid reaction comes in the form of an optimized implementation written by Brendan McMillion that reduced the time of performing elliptic curve operations by a factor of 10. The code for P-384 is also available in CIRCL. The latest developments in elliptic curve cryptography have caused a shift to use elliptic curve models with faster arithmetic operations. The best example is undoubtedly Curve25519; other examples are the Goldilocks and FourQ curves. CIRCL supports all of these curves, allowing instantiation of Diffie-Hellman exchanges and Edwards digital signatures. Although it slightly overlaps the Go native libraries, CIRCL has architecture-dependent optimizations. ### Hashing to Groups Many cryptographic protocols rely on the hardness of solving the Discrete Logarithm Problem (DLP) in special groups, one of which is the integers reduced modulo a large integer. To guarantee that the DLP is hard to solve, the modulus must be a large prime number. Increasing its size boosts on security, but also makes operations more expensive. A better approach is using elliptic curve groups since they provide faster operations. In some cryptographic protocols, it is common to use a function with the properties of a cryptographic hash function that maps bit strings into elements of the group. This is easy to accomplish when, for example, the group is the set of integers modulo a large prime. However, it is not so clear how to perform this function using elliptic curves. In cryptographic literature, several methods have been proposed using the terms hashing to curves or hashing to point indistinctly. The main issue is that there is no general method for deterministically finding points on any elliptic curve, the closest available are methods that target special curves and parameters. This is a problem for implementers of cryptographic algorithms, who have a hard time figuring out on a suitable method for hashing to points of an elliptic curve. Compounding that, chances of doing this wrong are high. There are many different methods, elliptic curves, and security considerations to analyze. For example, a vulnerability on WPA3 handshake protocol exploited a non-constant time hashing method resulting in a recovery of keys. Currently, an IETF draft is tracking work in-progress that provides hashing methods unifying requirements with curves and their parameters. Corresponding to this problem, CIRCL will include implementations of hashing methods for elliptic curves. Our development is accompanying the evolution of the IEFT draft. Therefore, users of CIRCL will have this added value as the methods implement a ready-to-go functionality, covering the needs of some cryptographic protocols. ### Update on Bilinear Pairings Bilinear pairings are sometimes regarded as a tool for cryptanalysis, however pairings can also be used in a constructive way by allowing instantiation of advanced public-key algorithms, for example, identity-based encryption, attribute-based encryption, blind digital signatures, three-party key agreement, among others. An efficient way to instantiate a bilinear pairing is to use elliptic curves. Note that only a special class of curves can be used, thus so-called pairing-friendly curves have specific properties that enable the efficient evaluation of a pairing. Some families of pairing-friendly curves were introduced by Barreto-Naehrig (BN), Kachisa-Schaefer-Scott (KSS), and Barreto-Lynn-Scott (BLS). BN256 is a BN curve using a 256-bit prime and is one of the fastest options for implementing a bilinear pairing. The Go native library supports this curve in the package golang.org/x/crypto/bn256. In fact, the BN256 curve is used by Cloudflare’s Geo Key Manager, which allows distributing encrypted keys around the world. At Cloudflare, high-performance is a must and with this motivation, in 2017, we released an optimized implementation of the BN256 package that is 8x faster than the Go’s native package. The success of these optimizations reached several other projects such as the Ethereum protocol and the Randomness Beacon project. Recent improvements in solving the DLP over extension fields, GF(pᵐ) for p prime and m>1, impacted the security of pairings, causing recalculation of the parameters used for pairing-friendly curves. Before these discoveries, the BN256 curve provided a 128-bit security level, but now larger primes are needed to target the same security level. That does not mean that the BN256 curve has been broken, since BN256 gives a security of 100 bits, that is, approximately 2¹⁰⁰ operations are required to cause a real danger, which is still unfeasible with current computing power. With our CIRCL announcement, we want to announce our plans for research and development to obtain efficient curve(s) to become a stronger successor of BN256. According to the estimation by Barbulescu-Duquesne, a BN curve must use primes of at least 456 bits to match a 128-bit security level. However, the impact on the recalculation of parameters brings back to the main scene BLS and KSS curves as efficient alternatives. To this end a standardization effort at IEFT is in progress with the aim of defining parameters and pairing-friendly curves that match different security levels. Note that regardless of the curve(s) chosen, there is an unavoidable performance downgrade when moving from BN256 to a stronger curve. Actual timings were presented by Aranha, who described the evolution of the race for high-performance pairing implementations. The purpose of our continuous development of CIRCL is to minimize this impact through fast implementations. ### Optimizations Go itself is a very easy to learn and use for system programming and yet makes it possible to use assembly so that you can stay close “to the metal”. We have blogged about improving performance in Go few times in the past (see these posts about encryption, ciphersuites, and image encoding). When developing CIRCL, we crafted the code to get the best possible performance from the machine. We leverage the capabilities provided by the architecture and the architecture-specific instructions. This means that in some cases we need to get our hands dirty and rewrite parts of the software in Go assembly, which is not easy, but definitely worth the effort when it comes to performance. We focused on x86-64, as this is our main target, but we also think that it’s worth looking at ARM architecture, and in some cases (like SIDH or P-384), CIRCL has optimized code for this platform. We also try to ensure that code uses memory efficiently – crafting it in a way that fast allocations on the stack are preferred over expensive heap allocations. In cases where heap allocation is needed, we tried to design the APIs in a way that, they allow pre-allocating memory ahead of time and reuse it for multiple operations. ### Security The CIRCL library is offered as-is, and without a guarantee. Therefore, it is expected that changes in the code, repository, and API occur in the future. We recommend to take caution before using this library in a production application since part of its content is experimental. As new attacks and vulnerabilities arise over the time, security of software should be treated as a continuous process. In particular, the assessment of cryptographic software is critical, it requires the expertise of several fields, not only computer science. Cryptography engineers must be aware of the latest vulnerabilities and methods of attack in order to defend against them. The development of CIRCL follows best practices on the secure development. For example, if time execution of the code depends on secret data, the attacker could leverage those irregularities and recover secret keys. In our code, we take care of writing constant-time code and hence prevent timing based attacks. Developers of cryptographic software must also be aware of optimizations performed by the compiler and/or the processor since these optimizations can lead to insecure binary codes in some cases. All of these issues could be exploited in real attacks aimed at compromising systems and keys. Therefore, software changes must be tracked down through thorough code reviews. Also static analyzers and automated testing tools play an important role on the security of the software. ### Summary CIRCL is envisioned as an effective tool for experimenting with modern cryptographic algorithms yet providing high-performance implementations. Today is marked as the starting point of a continuous machinery of innovation and retribution to the community in the form of a cryptographic library. There are still several other applications such as homomorphic encryption, multi-party computation, and privacy-preserving protocols that we would like to explore. We are team of cryptography, security, and software engineers working to improve and augment Cloudflare products. Our team keeps the communication channels open for receiving comments, including improvements, and merging contributions. We welcome opinions and contributions! If you would like to get in contact, you should check out our github repository for CIRCL github.com/cloudflare/circl. We want to share our work and hope it makes someone else’s job easier as well. Finally, special thanks to all the contributors who has either directly or indirectly helped to implement the library – Ko Stoffelen, Brendan McMillion, Henry de Valence, Michael McLoughlin and all the people who invested their time in reviewing our code. # Introducing CIRCL: An Advanced Cryptographic Library Post Syndicated from Kris Kwiatkowski original https://blog.cloudflare.com/introducing-circl/ As part of Crypto Week 2019, today we are proud to release the source code of a cryptographic library we’ve been working on: a collection of cryptographic primitives written in Go, called CIRCL. This library includes a set of packages that target cryptographic algorithms for post-quantum (PQ), elliptic curve cryptography, and hash functions for prime groups. Our hope is that it’s useful for a broad audience. Get ready to discover how we made CIRCL unique. ### Cryptography in Go We use Go a lot at Cloudflare. It offers a good balance between ease of use and performance; the learning curve is very light, and after a short time, any programmer can get good at writing fast, lightweight backend services. And thanks to the possibility of implementing performance critical parts in Go assembly, we can try to ‘squeeze the machine’ and get every bit of performance. Cloudflare’s cryptography team designs and maintains security-critical projects. It’s not a secret that security is hard. That’s why, we are introducing the Cloudflare Interoperable Reusable Cryptographic Library – CIRCL. There are multiple goals behind CIRCL. First, we want to concentrate our efforts to implement cryptographic primitives in a single place. This makes it easier to ensure that proper engineering processes are followed. Second, Cloudflare is an active member of the Internet community – we are trying to improve and propose standards to help make the Internet a better place. Cloudflare’s mission is to help build a better Internet. For this reason, we want CIRCL helps the cryptographic community to create proof of concepts, like the post-quantum TLS experiments we are doing. Over the years, lots of ideas have been put on the table by cryptographers (for example, homomorphic encryption, multi-party computation, and privacy preserving constructions). Recently, we’ve seen those concepts picked up and exercised in a variety of contexts. CIRCL’s implementations of cryptographic primitives creates a powerful toolbox for developers wishing to use them. The Go language provides native packages for several well-known cryptographic algorithms, such as key agreement algorithms, hash functions, and digital signatures. There are also packages maintained by the community under golang.org/x/crypto that provide a diverse set of algorithms for supporting authenticated encryption, stream ciphers, key derivation functions, and bilinear pairings. CIRCL doesn’t try to compete with golang.org/x/crypto in any sense. Our goal is to provide a complementary set of implementations that are more aggressively optimized, or may be less commonly used but have a good chance at being very useful in the future. ### Unboxing CIRCL Our cryptography team worked on a fresh proposal to augment the capabilities of Go users with a new set of packages. You can get them by typing: $ go get github.com/cloudflare/circl

The contents of CIRCL is split across different categories, summarized in this table:

CategoryAlgorithmsDescriptionApplications
Post-Quantum CryptographySIDHIsogeny-based cryptography.SIDH provides key exchange mechanisms using ephemeral keys.
SIKESIKE is a key encapsulation mechanism (KEM).Key agreement protocols.
Key ExchangeX25519, X448RFC-7748 provides new key exchange mechanisms based on Montgomery elliptic curves.TLS 1.3. Secure Shell.
FourQOne of the fastest elliptic curves at 128-bit security level.Experimental for key agreement and digital signatures.
Digital SignaturesEd25519RFC-8032 provides new digital signature algorithms based on twisted Edwards curves.Digital certificates and authentication methods.
Hash to Elliptic Curve GroupsSeveral algorithms: Elligator2, Ristretto, SWU, Icart.Protocols based on elliptic curves require hash functions that map bit strings to points on an elliptic curve.Useful in protocols such as Privacy Pass. OPAQUE.
PAKE.
Verifiable random functions.
OptimizationCurve P-384Our optimizations reduce the burden when moving from P-256 to P-384.ECDSA and ECDH using Suite B at top secret level.

### SIKE, a Post-Quantum Key Encapsulation Method

To better understand the post-quantum world, we started experimenting with post-quantum key exchange schemes and using them for key agreement in TLS 1.3. CIRCL contains the sidh package, an implementation of Supersingular Isogeny-based Diffie-Hellman (SIDH), as well as CCA2-secure Supersingular Isogeny-based Key Encapsulation (SIKE), which is based on SIDH.

CIRCL makes playing with PQ key agreement very easy. Below is an example of the SIKE interface that can be used to establish a shared secret between two parties for use in symmetric encryption. The example uses a key encapsulation mechanism (KEM). For our example in this scheme, Alice generates a random secret key, and then uses Bob’s pre-generated public key to encrypt (encapsulate) it. The resulting ciphertext is sent to Bob. Then, Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the secret key. See more details about SIKE in this Cloudflare blog.

Let’s see how to do this with CIRCL:

// Bob's key pair
prvB := NewPrivateKey(Fp503, KeyVariantSike)
pubB := NewPublicKey(Fp503, KeyVariantSike)

// Generate private key
// Generate public key
prvB.GeneratePublicKey(pubB)

var publicKeyBytes = make([]array, pubB.Size())
var privateKeyBytes = make([]array, prvB.Size())

pubB.Export(publicKeyBytes)
prvB.Export(privateKeyBytes)

// Encode public key to JSON
// Save privateKeyBytes on disk


Bob uploads the public key to a location accessible by anybody. When Alice wants to establish a shared secret with Bob, she performs encapsulation that results in two parts: a shared secret and the result of the encapsulation, the ciphertext.

// Read JSON to bytes

// Alice's key pair
pubB := NewPublicKey(Fp503, KeyVariantSike)
pubB.Import(publicKeyBytes)

kem.Encapsulate(ciphertext, sharedSecret, pubB)

// send ciphertext to Bob


Bob now receives ciphertext from Alice and decapsulates the shared secret:

var kem := sike.NewSike503(rand.Reader)
kem.Decapsulate(sharedSecret, prvA, pubA, ciphertext)


At this point, both Alice and Bob can derive a symmetric encryption key from the secret generated.

SIKE implementation contains:

• Two different field sizes: Fp503 and Fp751. The choice of the field is a trade-off between performance and security.
• Code optimized for AMD64 and ARM64 architectures, as well as generic Go code. For AMD64, we detect the micro-architecture and if it’s recent enough (e.g., it supports ADOX/ADCX and BMI2 instruction sets), we use different multiplication techniques to make an execution even faster.
• Code implemented in constant time, that is, the execution time doesn’t depend on secret values.

We also took care of low heap-memory footprint, so that the implementation uses a minimal amount of dynamically allocated memory. In the future, we plan to provide multiple implementations of post-quantum schemes. Currently, our focus is on algorithms useful for key exchange in TLS.

SIDH/SIKE are interesting because the key sizes produced by those algorithms are relatively small (comparing with other PQ schemes). Nevertheless, performance is not all that great yet, so we’ll continue looking. We plan to add lattice-based algorithms, such as NTRU-HRSS and Kyber, to CIRCL. We will also add another more experimental algorithm called cSIDH, which we would like to try in other applications. CIRCL doesn’t currently contain any post-quantum signature algorithms, which is also on our to-do list. After our experiment with TLS key exchange completes, we’re going to look at post-quantum PKI. But that’s a topic for a future blog post, so stay tuned.

Last, we must admit that our code is largely based on the implementation from the NIST submission along with the work of former intern Henry De Valence, and we would like to thank both Henry and the SIKE team for their great work.

### Elliptic Curve Cryptography

Elliptic curve cryptography brings short keys sizes and faster evaluation of operations when compared to algorithms based on RSA. Elliptic curves were standardized during the early 2000s, and have recently gained popularity as they are a more efficient way for securing communications.

Elliptic curves are used in almost every project at Cloudflare, not only for establishing TLS connections, but also for certificate validation, certificate revocation (OCSP), Privacy Pass, certificate transparency, and AMP Real URL.

The Go language provides native support for NIST-standardized curves, the most popular of which is P-256. In a previous post, Vlad Krasnov described the relevance of optimizing several cryptographic algorithms, including P-256 curve. When working at Cloudflare scale, little issues around performance are significantly magnified. This is one reason why Cloudflare pushes the boundaries of efficiency.

A similar thing happened on the chained validation of certificates. For some certificates, we observed performance issues when validating a chain of certificates. Our team successfully diagnosed this issue: certificates which had signatures from the P-384 curve, which is the curve that corresponds to the 192-bit security level, were taking up 99% of CPU time! It is common for certificates closer to the root of the chain of trust to rely on stronger security assumptions, for example, using larger elliptic curves. Our first-aid reaction comes in the form of an optimized implementation written by Brendan McMillion that reduced the time of performing elliptic curve operations by a factor of 10. The code for P-384 is also available in CIRCL.

The latest developments in elliptic curve cryptography have caused a shift to use elliptic curve models with faster arithmetic operations. The best example is undoubtedly Curve25519; other examples are the Goldilocks and FourQ curves. CIRCL supports all of these curves, allowing instantiation of Diffie-Hellman exchanges and Edwards digital signatures. Although it slightly overlaps the Go native libraries, CIRCL has architecture-dependent optimizations.

### Hashing to Groups

Many cryptographic protocols rely on the hardness of solving the Discrete Logarithm Problem (DLP) in special groups, one of which is the integers reduced modulo a large integer. To guarantee that the DLP is hard to solve, the modulus must be a large prime number. Increasing its size boosts on security, but also makes operations more expensive. A better approach is using elliptic curve groups since they provide faster operations.

In some cryptographic protocols, it is common to use a function with the properties of a cryptographic hash function that maps bit strings into elements of the group. This is easy to accomplish when, for example, the group is the set of integers modulo a large prime. However, it is not so clear how to perform this function using elliptic curves. In cryptographic literature, several methods have been proposed using the terms hashing to curves or hashing to point indistinctly.

The main issue is that there is no general method for deterministically finding points on any elliptic curve, the closest available are methods that target special curves and parameters. This is a problem for implementers of cryptographic algorithms, who have a hard time figuring out on a suitable method for hashing to points of an elliptic curve. Compounding that, chances of doing this wrong are high. There are many different methods, elliptic curves, and security considerations to analyze. For example, a vulnerability on WPA3 handshake protocol exploited a non-constant time hashing method resulting in a recovery of keys. Currently, an IETF draft is tracking work in-progress that provides hashing methods unifying requirements with curves and their parameters.

Corresponding to this problem, CIRCL will include implementations of hashing methods for elliptic curves. Our development is accompanying the evolution of the IEFT draft. Therefore, users of CIRCL will have this added value as the methods implement a ready-to-go functionality, covering the needs of some cryptographic protocols.

### Update on Bilinear Pairings

Bilinear pairings are sometimes regarded as a tool for cryptanalysis, however pairings can also be used in a constructive way by allowing instantiation of advanced public-key algorithms, for example, identity-based encryption, attribute-based encryption, blind digital signatures, three-party key agreement, among others.

An efficient way to instantiate a bilinear pairing is to use elliptic curves. Note that only a special class of curves can be used, thus so-called pairing-friendly curves have specific properties that enable the efficient evaluation of a pairing.

Some families of pairing-friendly curves were introduced by Barreto-Naehrig (BN), Kachisa-Schaefer-Scott (KSS), and Barreto-Lynn-Scott (BLS). BN256 is a BN curve using a 256-bit prime and is one of the fastest options for implementing a bilinear pairing. The Go native library supports this curve in the package golang.org/x/crypto/bn256. In fact, the BN256 curve is used by Cloudflare’s Geo Key Manager, which allows distributing encrypted keys around the world. At Cloudflare, high-performance is a must and with this motivation, in 2017, we released an optimized implementation of the BN256 package that is 8x faster than the Go’s native package. The success of these optimizations reached several other projects such as the Ethereum protocol and the Randomness Beacon project.

Recent improvements in solving the DLP over extension fields, GF(pᵐ) for p prime and m>1, impacted the security of pairings, causing recalculation of the parameters used for pairing-friendly curves.

Before these discoveries, the BN256 curve provided a 128-bit security level, but now larger primes are needed to target the same security level. That does not mean that the BN256 curve has been broken, since BN256 gives a security of 100 bits, that is, approximately 2¹⁰⁰ operations are required to cause a real danger, which is still unfeasible with current computing power.

With our CIRCL announcement, we want to announce our plans for research and development to obtain efficient curve(s) to become a stronger successor of BN256. According to the estimation by Barbulescu-Duquesne, a BN curve must use primes of at least 456 bits to match a 128-bit security level. However, the impact on the recalculation of parameters brings back to the main scene BLS and KSS curves as efficient alternatives. To this end a standardization effort at IEFT is in progress with the aim of defining parameters and pairing-friendly curves that match different security levels.

Note that regardless of the curve(s) chosen, there is an unavoidable performance downgrade when moving from BN256 to a stronger curve. Actual timings were presented by Aranha, who described the evolution of the race for high-performance pairing implementations. The purpose of our continuous development of CIRCL is to minimize this impact through fast implementations.

### Optimizations

Go itself is a very easy to learn and use for system programming and yet makes it possible to use assembly so that you can stay close “to the metal”. We have blogged about improving performance in Go few times in the past (see these posts about encryption, ciphersuites, and image encoding).

When developing CIRCL, we crafted the code to get the best possible performance from the machine. We leverage the capabilities provided by the architecture and the architecture-specific instructions. This means that in some cases we need to get our hands dirty and rewrite parts of the software in Go assembly, which is not easy, but definitely worth the effort when it comes to performance. We focused on x86-64, as this is our main target, but we also think that it’s worth looking at ARM architecture, and in some cases (like SIDH or P-384), CIRCL has optimized code for this platform.

We also try to ensure that code uses memory efficiently – crafting it in a way that fast allocations on the stack are preferred over expensive heap allocations. In cases where heap allocation is needed, we tried to design the APIs in a way that, they allow pre-allocating memory ahead of time and reuse it for multiple operations.

### Security

The CIRCL library is offered as-is, and without a guarantee. Therefore, it is expected that changes in the code, repository, and API occur in the future. We recommend to take caution before using this library in a production application since part of its content is experimental.

As new attacks and vulnerabilities arise over the time, security of software should be treated as a continuous process. In particular, the assessment of cryptographic software is critical, it requires the expertise of several fields, not only computer science. Cryptography engineers must be aware of the latest vulnerabilities and methods of attack in order to defend against them.

The development of CIRCL follows best practices on the secure development. For example, if time execution of the code depends on secret data, the attacker could leverage those irregularities and recover secret keys. In our code, we take care of writing constant-time code and hence prevent timing based attacks.

Developers of cryptographic software must also be aware of optimizations performed by the compiler and/or the processor since these optimizations can lead to insecure binary codes in some cases. All of these issues could be exploited in real attacks aimed at compromising systems and keys. Therefore, software changes must be tracked down through thorough code reviews. Also static analyzers and automated testing tools play an important role on the security of the software.

### Summary

CIRCL is envisioned as an effective tool for experimenting with modern cryptographic algorithms yet providing high-performance implementations. Today is marked as the starting point of a continuous machinery of innovation and retribution to the community in the form of a cryptographic library. There are still several other applications such as homomorphic encryption, multi-party computation, and privacy-preserving protocols that we would like to explore.

We are team of cryptography, security, and software engineers working to improve and augment Cloudflare products. Our team keeps the communication channels open for receiving comments, including improvements, and merging contributions. We welcome opinions and contributions! If you would like to get in contact, you should check out our github repository for CIRCL github.com/cloudflare/circl. We want to share our work and hope it makes someone else’s job easier as well.

Finally, special thanks to all the contributors who has either directly or indirectly helped to implement the library – Ko Stoffelen, Brendan McMillion, Henry de Valence, Michael McLoughlin and all the people who invested their time in reviewing our code.

# Welcome to Crypto Week 2019

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/welcome-to-crypto-week-2019/

The Internet is an extraordinarily complex and evolving ecosystem. Its constituent protocols range from the ancient and archaic (hello FTP) to the modern and sleek (meet WireGuard), with a fair bit of everything in between. This evolution is ongoing, and as one of the most connected networks on the Internet, Cloudflare has a duty to be a good steward of this ecosystem. We take this responsibility to heart: Cloudflare’s mission is to help build a better Internet. In this spirit, we are very proud to announce Crypto Week 2019.

Every day this week we’ll announce a new project or service that uses modern cryptography to build a more secure, trustworthy Internet. Everything we release this week will be free and immediately useful. This blog is a fun exploration of the themes of the week.

• Monday: Coming Soon
• Tuesday: Coming Soon
• Wednesday: Coming Soon
• Thursday: Coming Soon
• Friday: Coming Soon

### The Internet of the Future

Many pieces of the Internet in use today were designed in a different era with different assumptions. The Internet’s success is based on strong foundations that support constant reassessment and improvement. Sometimes these improvements require deploying new protocols.

Performing an upgrade on a system as large and decentralized as the Internet can’t be done by decree;

• There are too many economic, cultural, political, and technological factors at play.
• Changes must be compatible with existing systems and protocols to even be considered for adoption.
• To gain traction, new protocols must provide tangible improvements for users. Nobody wants to install an update that doesn’t improve their experience!

The last time the Internet had a complete reboot and upgrade was during TCP/IP flag day in 1983. Back then, the Internet (called ARPANET) had fewer than ten thousand hosts! To have an Internet-wide flag day today to switch over to a core new protocol is inconceivable; the scale and diversity of the components involved is way too massive. Too much would break. It’s challenging enough to deprecate outmoded functionality. In some ways, the open Internet is a victim of its own success. The bigger a system grows and the longer it stays the same, the harder it is to change. The Internet is like a massive barge: it takes forever to steer in a different direction and it’s carrying a lot of garbage.

As you would expect, many of the warts of the early Internet still remain. Both academic security researchers and real-life adversaries are still finding and exploiting vulnerabilities in the system. Many vulnerabilities are due to the fact that most of the protocols in use on the Internet have a weak notion of trust inherited from the early days. With 50 hosts online, it’s relatively easy to trust everyone, but in a world-scale system, that trust breaks down in fascinating ways. The primary tool to scale trust is cryptography, which helps provide some measure of accountability, though it has its own complexities.

In an ideal world, the Internet would provide a trustworthy substrate for human communication and commerce. Some people naïvely assume that this is the natural direction the evolution of the Internet will follow. However, constant improvement is not a given. It’s possible that the Internet of the future will actually be worse than the Internet today: less open, less secure, less private, less trustworthy. There are strong incentives to weaken the Internet on a fundamental level by Governments, by businesses such as ISPs, and even by the financial institutions entrusted with our personal data.

In a system with as many stakeholders as the Internet, real change requires principled commitment from all invested parties. At Cloudflare, we believe everyone is entitled to an Internet built on a solid foundation of trust. Crypto Week is our way of helping nudge the Internet’s evolution in a more trust-oriented direction. Each announcement this week helps bring the Internet of the future to the present in a tangible way.

Before we explore the Internet of the future, let’s explore some of the previous and ongoing attempts to upgrade the Internet’s fundamental protocols.

#### Routing Security

As we highlighted in last year’s Crypto Week one of the weak links on the Internet is routing. Not all networks are directly connected.

To send data from one place to another, you might have to rely on intermediary networks to pass your data along. A packet sent from one host to another may have to be passed through up to a dozen of these intermediary networks. No single network knows the full path the data will have to take to get to its destination, it only knows which network to pass it to next.  The protocol that determines how packets are routed is called the Border Gateway Protocol (BGP.) Generally speaking, networks use BGP to announce to each other which addresses they know how to route packets for and (dependent on a set of complex rules) these networks share what they learn with their neighbors.

Unfortunately, BGP is completely insecure:

• Any network can announce any set of addresses to any other network, even addresses they don’t control. This leads to a phenomenon called BGP hijacking, where networks are tricked into sending data to the wrong network.
• A BGP hijack is most often caused by accidental misconfiguration, but can also be the result of malice on the network operator’s part.
• During a BGP hijack, a network inappropriately announces a set of addresses to other networks, which results in packets destined for the announced addresses to be routed through the illegitimate network.

#### Understanding the risk

If the packets represent unencrypted data, this can be a big problem as it allows the hijacker to read or even change the data:

#### Mitigating the risk

The Resource Public Key Infrastructure (RPKI) system helps bring some trust to BGP by enabling networks to utilize cryptography to digitally sign network routes with certificates, making BGP hijacking much more difficult.

• This enables participants of the network to gain assurances about the authenticity of route advertisements. Certificate Transparency (CT) is a tool that enables additional trust for certificate-based systems. Cloudflare operates the Cirrus CT log to support RPKI.

Since we announced our support of RPKI last year, routing security has made big strides. More routes are signed, more networks validate RPKI, and the software ecosystem has matured, but this work is not complete. Most networks are still vulnerable to BGP hijacking. For example, Pakistan knocked YouTube offline with a BGP hijack back in 2008, and could likely do the same today. Adoption here is driven less by providing a benefit to users, but rather by reducing systemic risk, which is not the strongest motivating factor for adopting a complex new technology. Full routing security on the Internet could take decades.

### DNS Security

The Domain Name System (DNS) is the phone book of the Internet. Or, for anyone under 25 who doesn’t remember phone books, it’s the system that takes hostnames (like cloudflare.com or facebook.com) and returns the Internet address where that host can be found. For example, as of this publication, www.cloudflare.com is 104.17.209.9 and 104.17.210.9 (IPv4) and 2606:4700::c629:d7a2, 2606:4700::c629:d6a2 (IPv6). Like BGP, DNS is completely insecure. Queries and responses sent unencrypted over the Internet are modifiable by anyone on the path.

There are many ongoing attempts to add security to DNS, such as:

• DNSSEC that adds a chain of digital signatures to DNS responses
• DoT/DoH that wraps DNS queries in the TLS encryption protocol (more on that later)

Both technologies are slowly gaining adoption, but have a long way to go.

Just like RPKI, securing DNS comes with a performance cost, making it less attractive to users. However,

### The Web

Transport Layer Security (TLS) is a cryptographic protocol that gives two parties the ability to communicate over an encrypted and authenticated channel. TLS protects communications from eavesdroppers even in the event of a BGP hijack. TLS is what puts the “S” in HTTPS. TLS protects web browsing against multiple types of network adversaries.

The adoption of TLS on the web is partially driven by the fact that:

• It’s easy and free for websites to get an authentication certificate (via Let’s Encrypt, Universal SSL, etc.)
• Browsers make TLS adoption appealing to website operators by only supporting new web features such as HTTP/2 over HTTPS.

This has led to the rapid adoption of HTTPS over the last five years.

To further that adoption, TLS recently got an upgrade in TLS 1.3, making it faster and more secure (a combination we love). It’s taking over the Internet!

Despite this fantastic progress in the adoption of security for routing, DNS, and the web, there are still gaps in the trust model of the Internet. There are other things needed to help build the Internet of the future. To find and identify these gaps, we lean on research experts.

### Research Farm to Table

Cryptographic security on the Internet is a hot topic and there have been many flaws and issues recently pointed out in academic journals. Researchers often study the vulnerabilities of the past and ask:

• What other critical components of the Internet have the same flaws?
• What underlying assumptions can subvert trust in these existing systems?

The answers to these questions help us decide what to tackle next. Some recent research  topics we’ve learned about include:

• Quantum Computing
• Attacks on Time Synchronization
• DNS attacks affecting Certificate issuance
• Scaling distributed trust

Cloudflare keeps abreast of these developments and we do what we can to bring these new ideas to the Internet at large. In this respect, we’re truly standing on the shoulders of giants.

### Future-proofing Internet Cryptography

The new protocols we are currently deploying (RPKI, DNSSEC, DoT/DoH, TLS 1.3) use relatively modern cryptographic algorithms published in the 1970s and 1980s.

• The security of these algorithms is based on hard mathematical problems in the field of number theory, such as factoring and the elliptic curve discrete logarithm problem.
• If you can solve the hard problem, you can crack the code. Using a bigger key makes the problem harder, making it more difficult to break, but also slows performance.

Modern Internet protocols typically pick keys large enough to make it infeasible to break with classical computers, but no larger. The sweet spot is around 128-bits of security; meaning a computer has to do approximately 2¹²⁸ operations to break it.

Arjen Lenstra and others created a useful measure of security levels by comparing the amount of energy it takes to break a key to the amount of water you can boil using that much energy. You can think of this as the electric bill you’d get if you run a computer long enough to crack the key.

• 35-bit security is “Teaspoon security” — It takes about the same amount of energy to break a 35-bit key as it does to boil a teaspoon of water (pretty easy).

• 65 bits gets you up to “Pool security” – The energy needed to boil the average amount of water in a swimming pool.

• 105 bits is “Sea Security” – The energy needed to boil the Mediterranean Sea.

• 114-bits is “Global Security” –  The energy needed to boil all water on Earth.

• 128-bit security is safely beyond that of Global Security – Anything larger is overkill.
• 256-bit security corresponds to “Universal Security” – The estimated mass-energy of the observable universe. So, if you ever hear someone suggest 256-bit AES, you know they mean business.

### Post-Quantum of Solace

As far as we know, the algorithms we use for cryptography are functionally uncrackable with all known algorithms that classical computers can run. Quantum computers change this calculus. Instead of transistors and bits, a quantum computer uses the effects of quantum mechanics to perform calculations that just aren’t possible with classical computers. As you can imagine, quantum computers are very difficult to build. However, despite large-scale quantum computers not existing quite yet, computer scientists have already developed algorithms that can only run efficiently on quantum computers. Surprisingly, it turns out that with a sufficiently powerful quantum computer, most of the hard mathematical problems we rely on for Internet security become easy!

Although there are still quantum-skeptics out there, some experts estimate that within 15-30 years these large quantum computers will exist, which poses a risk to every security protocol online. Progress is moving quickly; every few months a more powerful quantum computer is announced.

Luckily, there are cryptography algorithms that rely on different hard math problems that seem to be resistant to attack from quantum computers. These math problems form the basis of so-called quantum-resistant (or post-quantum) cryptography algorithms that can run on classical computers. These algorithms can be used as substitutes for most of our current quantum-vulnerable algorithms.

• Some quantum-resistant algorithms (such as McEliece and Lamport Signatures) were invented decades ago, but there’s a reason they aren’t in common use: they lack some of the nice properties of the algorithms we’re currently using, such as key size and efficiency.
• Some quantum-resistant algorithms require much larger keys to provide 128-bit security
• Some are very CPU intensive,
• And some just haven’t been studied enough to know if they’re secure.

It is possible to swap our current set of quantum-vulnerable algorithms with new quantum-resistant algorithms, but it’s a daunting engineering task. With widely deployed protocols, it is hard to make the transition from something fast and small to something slower, bigger or more complicated without providing concrete user benefits. When exploring new quantum-resistant algorithms, minimizing user impact is of utmost importance to encourage adoption. This is a big deal, because almost all the protocols we use to protect the Internet are vulnerable to quantum computers.

Cryptography-breaking quantum computing is still in the distant future, but we must start the transition to ensure that today’s secure communications are safe from tomorrow’s quantum-powered onlookers; however, that’s not the most timely problem with the Internet. We haven’t addressed that…yet.

### Attacking time

Just like DNS, BGP, and HTTP, the Network Time Protocol (NTP) is fundamental to how the Internet works. And like these other protocols, it is completely insecure.

• Last year, Cloudflare introduced Roughtime as a mechanism for computers to access the current time from a trusted server in an authenticated way.
• Roughtime is powerful because it provides a way to distribute trust among multiple time servers so that if one server attempts to lie about the time, it will be caught.

However, Roughtime is not exactly a secure drop-in replacement for NTP.

• Roughtime lacks the complex mechanisms of NTP that allow it to compensate for network latency and yet maintain precise time, especially if the time servers are remote. This leads to imprecise time.
• Roughtime also involves expensive cryptography that can further reduce precision. This lack of precision makes Roughtime useful for browsers and other systems that need coarse time to validate certificates (most certificates are valid for 3 months or more), but some systems (such as those used for financial trading) require precision to the millisecond or below.

With Roughtime we supported the time protocol of the future, but there are things we can do to help improve the health of security online today.

Some academic researchers, including Aanchal Malhotra of Boston University, have demonstrated a variety of attacks against NTP, including BGP hijacking and off-path User Datagram Protocol (UDP) attacks.

• Some of these attacks can be avoided by connecting to an NTP server that is close to you on the Internet.
• However, to bring cryptographic trust to time while maintaining precision, we need something in between NTP and Roughtime.
• To solve this, it’s natural to turn to the same system of trust that enabled us to patch HTTP and DNS: Web PKI.

### Attacking the Web PKI

The Web PKI is similar to the RPKI, but is more widely visible since it relates to websites rather than routing tables.

• If you’ve ever clicked the lock icon on your browser’s address bar, you’ve interacted with it.
• The PKI relies on a set of trusted organizations called Certificate Authorities (CAs) to issue certificates to websites and web services.
• Websites use these certificates to authenticate themselves to clients as part of the TLS protocol in HTTPS.

While we were all patting ourselves on the back for moving the web to HTTPS, some researchers managed to find and exploit a weakness in the system: the process for getting HTTPS certificates.

Certificate Authorities (CAs) use a process called domain control validation (DCV) to ensure that they only issue certificates to websites owners who legitimately request them.

• Some CAs do this validation manually, which is secure, but can’t scale to the total number of websites deployed today.
• More progressive CAs have automated this validation process, but rely on insecure methods (HTTP and DNS) to validate domain ownership.

Without ubiquitous cryptography in place (DNSSEC may never reach 100% deployment), there is no completely secure way to bootstrap this system. So, let’s look at how to distribute trust using other methods.

One tool at our disposal is the distributed nature of the Cloudflare network.

Cloudflare is global. We have locations all over the world connected to dozens of networks. That means we have different vantage points, resulting in different ways to traverse networks. This diversity can prove an advantage when dealing with BGP hijacking, since an attacker would have to hijack multiple routes from multiple locations to affect all the traffic between Cloudflare and other distributed parts of the Internet. The natural diversity of the network raises the cost of the attacks.

A distributed set of connections to the Internet and using them as a quorum is a mighty paradigm to distribute trust, with or without cryptography.

### Distributed Trust

This idea of distributing the source of trust is powerful. Last year we announced the Distributed Web Gateway that

• Enables users to access content on the InterPlanetary File System (IPFS), a network structured to reduce the trust placed in any single party.
• Even if a participant of the network is compromised, it can’t be used to distribute compromised content because the network is content-addressed.
• However, using content-based addressing is not the only way to distribute trust between multiple independent parties.

Another way to distribute trust is to literally split authority between multiple independent parties. We’ve explored this topic before. In the context of Internet services, this means ensuring that no single server can authenticate itself to a client on its own. For example,

• In HTTPS the server’s private key is the lynchpin of its security. Compromising the owner of the private key (by hook or by crook) gives an attacker the ability to impersonate (spoof) that service. This single point of failure puts services at risk. You can mitigate this risk by distributing the authority to authenticate the service between multiple independently-operated services.

The Internet barge is old and slow, and we’ve only been able to improve it through the meticulous process of patching it piece by piece. Another option is to build new secure systems on top of this insecure foundation. IPFS is doing this, and IPFS is not alone in its design. There has been more research into secure systems with decentralized trust in the last ten years than ever before.

The result is radical new protocols and designs that use exotic new algorithms. These protocols do not supplant those at the core of the Internet (like TCP/IP), but instead, they sit on top of the existing Internet infrastructure, enabling new applications, much like HTTP did for the web.

### Gaining Traction

Some of the most innovative technical projects were considered failures because they couldn’t attract users. New technology has to bring tangible benefits to users to sustain it: useful functionality, content, and a decent user experience. Distributed projects, such as IPFS and others, are gaining popularity, but have not found mass adoption. This is a chicken-and-egg problem. New protocols have a high barrier to entryusers have to install new softwareand because of the small audience, there is less incentive to create compelling content. Decentralization and distributed trust are nice security features to have, but they are not products. Users still need to get some benefit out of using the platform.

An example of a system to break this cycle is the web. In 1992 the web was hardly a cornucopia of awesomeness. What helped drive the dominance of the web was its users.

• The growth of the user base meant more incentive for people to build services, and the availability of more services attracted more users. It was a virtuous cycle.
• It’s hard for a platform to gain momentum, but once the cycle starts, a flywheel effect kicks in to help the platform grow.

The Distributed Web Gateway project Cloudflare launched last year in Crypto Week is our way of exploring what happens if we try to kickstart that flywheel. By providing a secure, reliable, and fast interface from the classic web with its two billion users to the content on the distributed web, we give the fledgling ecosystem an audience.

• If the advantages provided by building on the distributed web are appealing to users, then the larger audience will help these services grow in popularity.
• This is somewhat reminiscent of how IPv6 gained adoption. It started as a niche technology only accessible using IPv4-to-IPv6 translation services.
• IPv6 adoption has now grown so much that it is becoming a requirement for new services. For example, Apple is requiring that all apps work in IPv6-only contexts.

Eventually, as user-side implementations of distributed web technologies improve, people may move to using the distributed web natively rather than through an HTTP gateway. Or they may not! By leveraging Cloudflare’s global network to give users access to new technologies based on distributed trust, we give these technologies a better chance at gaining adoption.

### Happy Crypto Week

At Cloudflare, we always support new technologies that help make the Internet better. Part of helping make a better Internet is scaling the systems of trust that underpin web browsing and protect them from attack. We provide the tools to create better systems of assurance with fewer points of vulnerability. We work with academic researchers of security to get a vision of the future and engineer away vulnerabilities before they can become widespread. It’s a constant journey.

Cloudflare knows that none of this is possible without the work of researchers. From award-winning researcher publishing papers in top journals to the blog posts of clever hobbyists, dedicated and curious people are moving the state of knowledge of the world forward. However, the push to publish new and novel research sometimes holds researchers back from committing enough time and resources to fully realize their ideas. Great research can be powerful on its own, but it can have an even broader impact when combined with practical applications. We relish the opportunity to stand on the shoulders of these giants and use our engineering know-how and global reach to expand on their work to help build a better Internet.

# A free Argo Tunnel for your next project

Post Syndicated from Sam Rhea original https://blog.cloudflare.com/a-free-argo-tunnel-for-your-next-project/

Argo Tunnel lets you expose a server to the Internet without opening any ports. The service runs a lightweight process on your server that creates outbound tunnels to the Cloudflare network. Instead of managing DNS, network, and firewall complexity, Argo Tunnel helps administrators serve traffic from their origin through Cloudflare with a single command.

We built Argo Tunnel to remove the burden of securing and connecting servers to the Internet. This new model makes it easier to run a service in multi-cloud and hybrid deployments by replacing manual and error-prone work with a process that adds intelligence to the last-mile between Cloudflare and your origins or clusters. However, the service was previously only available to users with Cloudflare accounts. We want to make Argo Tunnel more accessible for any project.

Starting today, any user, even those without a Cloudflare account, can try this new method of connecting their server to the Internet. Argo Tunnel can now be used in a free model that will create a new URL, known only to you, that will proxy traffic to your server. We’re excited to make connecting a server to the Internet more accessible for everyone.

### What is Argo Tunnel?

Argo Tunnel replaces legacy models of connecting a server to the Internet with a secure, persistent connection to Cloudflare. Since Cloudflare first launched in 2009, customers have added their site to our platform by changing their name servers at their domain’s registrar to ones managed by Cloudflare. Administrators then create a DNS record in our dashboard that points visitors to their domain to their origin server.

When requests are made for those domains, the queries hit our data centers first. We’re able to use that position to block malicious traffic like DDoS attacks. However, if attackers discovered that origin IP, they could bypass Cloudflare’s security features and attack the server directly. Adding additional protections against that risk introduced more hassle and configuration.

One year ago, Cloudflare launched Argo Tunnel to solve those problems. Argo Tunnel connects your origin server to the Cloudflare network by running a lightweight daemon on your machine that only makes outbound calls. The process generates DNS records in the dashboard for you, removing the need to manually configure records and origin IP addresses.

Most importantly, Argo Tunnel helps shield your origin by simplifying the firewall rules you need to configure. Argo Tunnel makes outbound calls to the Cloudflare network and proxies requests back to your server. You can then disable all ingress to the machine and ensure that Cloudflare’s security features always stand between your server and the rest of the Internet. In addition to secure, we made it fast. The connection uses our Argo Smart Routing technology to find the most performant path from your visitors to your origin.

### How can I use the free version?

Argo Tunnel is now available to all users without a Cloudflare account. All that is needed is the Cloudflare daemon, cloudflared, running on your machine. With a single command, cloudflared will generate a random subdomain of “trycloudflare.com” and begin proxying traffic to your server.

1. Install cloudflared on your web server or laptop; instructions are available here. If you have an older copy, you’ll first need to update your version to the latest (2019.6.0)
2. Launch a web server.
3. Run the terminal command below to start a free tunnel. cloudflared will begin proxying requests to your localhost server; no additional flags needed.

$cloudflared tunnel The command above will proxy traffic to port 8080 by default, but you can specify a different port with the –url flag $ cloudflared tunnel --url localhost:7000

cloudflared will generate a random subdomain when connecting to the Cloudflare network and print it in the terminal for you to use. This will make whatever server you are running on your local machine accessible to the world through a public URL only you know. The output will resemble the following:

### How can I use it?

• Run a web server on your laptop to share a project with collaborates on different networks
• Test mobile browser compatibility for a new site
• Perform speed tests from different regions

### Why is it free?

We want more users to experience the speed and security improvements of Argo Tunnel (and Argo Smart Routing). We hope you’ll feel the same way about those benefits after testing it with the free version and that you’ll start using it for your production sites.

We also don’t guarantee any SLA or up-time of the free service – we plan to test new Argo Tunnel features and improvements on these free tunnels. This provides us with a group of connections to test before we deploy to production customers. Free tunnels are meant to be used for testing and development, not for deploying a production website.

### What’s next?

You can read our guide here to start using the free version of Argo Tunnel. Got feedback? Please send it here.

# Just Write Code: Improving Developer Experience for Cloudflare Workers

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/just-write-code-improving-developer-experience-for-cloudflare-workers/

We’re excited to announce that starting today, Cloudflare Workers® gets a CLI, new and improved docs, multiple scripts for everyone, the ability to run applications on workers.dev without bringing your own domain, and a free tier to make experimentation easier than ever. We are building the serverless platform of the future, and want you to build your application on it, today. In this post, we’ll elaborate on what a serverless platform of the future looks like, how it changes today’s paradigms, and our commitment to making building on it a great experience.

Three years ago, I was interviewing with Cloudflare for a Solutions Engineering role. As a part of an interview assignment, I had to set up an origin behind Cloudflare on my own  domain. I spent my weekend, frustrated and lost in configurations, trying to figure out how to set up an EC2 instance, connect to it over IPv6, and install NGINX on Ubuntu 16.4 just so I could end up with a static site with a picture of my cat on it. I have a computer science degree, and spent my career up until that point as a software engineer — building this simple app was a horrible experience. A weekend spent coding, without worrying about servers, would have yielded a much richer application.

And this is just one rung in the ladder — the first one. While the primitives have moved up the stack, the fact is, developing an application, putting it on the Internet, and growing it from MVP to a scalable, performant product all still remain distinct steps in the development process.

This is what “serverless” has promised to solve. Abstract away the servers at all stages of the process, and allow developers to do what they do best: develop, without having to worry about infrastructure.

And yet, with many serverless offerings today, the first thing they do is the thing that they promised you they wouldn’t — they make you think about servers. “What region would you like?” (the first question that comes to my mind: why are you forcing me to think about which customers I care more about: East Coast, or West Coast? Why can’t you solve this for me?). Or: “how much memory do you think you’ll need?” (again: why are you making this my problem?! You figure it out!).

We don’t think it should work like this.

I often think back to that problem I was facing myself three years ago, and that I know developers all around the world face every day. Developers should be able to just focus on the code. Someone else should deal with everything else from setting up infrastructure through making that infrastructure fast and scalable. While we’ve made some architectural decisions in building Workers that enable us to do this better than anyone else, today isn’t about expounding on them (though if you’d like to read more, here’s a great blog post detailing some of them). What today is about is really honing Workers in on the needs of developers.

We want Workers to bring the dream of serverless to life —  of letting developers only worry about bugs in their code. Today marks the start of a sustained push that Cloudflare is making towards building a great developer experience with Workers. We have some exciting things to announce today — but this is just the beginning.

### Wrangler: the official Workers CLI

Wrangler, originally open sourced as the Rust CLI for Workers, has graduated into being the official Workers CLI, supporting all your Workers deployment needs.

Get started now by installing Wrangler

npm install -g @cloudflare/wrangler


Generate your first project from our template gallery

wrangler generate <name> <template> --type=["webpack", "javascript", "rust"]

wrangler publish

A few of the other goodies we’re excited for you to use Wrangler for:

• Compile Rust, C, and C++ to WebAssembly
• Create single or multi-file JavaScript applications
• Install NPM dependencies (we take care of webpack for you)
• Add KV namespaces and bindings
• Get started with pre-made templates

### New and Improved Docs

We’ve updated our docs (and used Wrangler to do so!) to make it easier than ever for you to get started and deploy your first application with Workers.

Check out our new tutorials:

### Multiscript for All

You asked, we listened. When we introduced Workers, we wanted to keep things as simple as possible. As a developer, you want to break up your code into logical components. Rather than having a single monolithic script, we want to allow you to deploy your code in a way that makes sense to you.

### no-domain-required.workers.dev

Writing software is a creative process: a new project means creating something out of nothing. You may not entirely know what exactly it’s going to be yet, let alone what to name it.

We are changing the way you get started on Workers, by allowing you to deploy to a-subdomain-of-your-choice.workers.dev.

You may have heard about this announcement back in February, and we’re excited to deliver. For those of you who pre-registered, your subdomains will be waiting for you upon signing up and clicking into Workers.

### A Free Tier to Experiment

-H "X-Auth-Email: $CLOUDFLARE_AUTH_EMAIL" \ -d '[ {"key": "built_by", value: "kyle, alex, charlie, andrew, and brett"}, {"key": "reviewed_by", value: "joaquin"}, {"key": "approved_by", value: "steve"} ]'  Let’s walk through an example use case: you want to off-load your website translation to Workers. Since you’re reading translation keys frequently and only occasionally updating them, this application works well with the eventual consistency model of Workers KV. In this example, we hook into Crowdin, a popular platform to manage translation data. This Worker responds to a /translate endpoint, downloads all your translation keys, and bulk writes them to Workers KV so you can read it later on our edge: addEventListener("fetch", event => { if (event.request.url.pathname === "/translate") { event.respondWith(uploadTranslations()) } }) async function uploadTranslations() { // Ask crowdin for all of our translations. var response = await fetch( "https://api.crowdin.com/api/project" + "/:ci_project_id/download/all.zip?key=:ci_secret_key") // If crowdin is responding, parse the response into // a single json with all of our translations. if (response.ok) { var translations = await zipToJson(response) return await bulkWrite(translations) } // Return the errored response from crowdin. return response } async function bulkWrite(keyValuePairs) { return fetch( "https://api.cloudflare.com/client/v4/accounts" + "/:cf_account_id/storage/kv/namespaces/:cf_namespace_id/bulk", { method: "PUT", headers: { "Content-Type": "application/json", "X-Auth-Key": ":cf_auth_key", "X-Auth-Email": ":cf_email" }, body: JSON.stringify(keyValuePairs) } ) } async function zipToJson(response) { // ... omitted for brevity ... // (eg. https://stuk.github.io/jszip) return [ {key: "hello.EN", value: "Hello World"}, {key: "hello.ES", value: "Hola Mundo"} ] }  Now, when you want to translate a page, all you have to do is read from Workers KV: async function translate(keys, lang) { // You bind your translations namespace to the TRANSLATIONS variable. return Promise.all(keys.map(key => TRANSLATIONS.get(key + "." + lang))) }  #### 2. Expiring Keys By default, key-value pairs stored in Workers KV last forever. However, sometimes you want your data to auto-delete after a certain amount of time. That’s why we’re introducing the expiration and expirationTtloptions for write operations. // Key expires 60 seconds from now. NAMESPACE.put("myKey", "myValue", {expirationTtl: 60}) // Key expires if the UNIX epoch is in the past. NAMESPACE.put("myKey", "myValue", {expiration: 1247788800})  # You can also set keys to expire from the Cloudflare API. curl "https://api.cloudflare.com/client/v4/accounts/ \$ACCOUNT_ID/storage/kv/namespaces/$NAMESPACE_ID/ \ values/$KEY?expiration_ttl=$EXPIRATION_IN_SECONDS" -X PUT \ -H "X-Auth-Key:$CLOUDFLARE_AUTH_KEY" \
-H "X-Auth-Email: $CLOUDFLARE_AUTH_EMAIL" \ -d "$VALUE"


Let’s say you want to block users that have been flagged as inappropriate from your website, but only for a week. With an expiring key, you can set the expire time and not have to worry about deleting it later.

In this example, we assume users and IP addresses are one of the same. If your application has authentication, you could use access tokens as the key identifier.

addEventListener("fetch", event => {
var url = new URL(event.request.url)
// An internal API that blocks a new user IP.
// (eg. example.com/block/1.2.3.4)
if (url.pathname.startsWith("/block")) {
var ip = url.pathname.split("/").pop()
event.respondWith(blockIp(ip))
} else {
// Other requests check if the IP is blocked.
event.respondWith(handleRequest(event.request))
}
})

async function blockIp(ip) {
// Values are allowed to be empty in KV,
// we don't need to store any extra information anyway.
await BLOCKED.put(ip, "", {expirationTtl: 60*60*24*7})
return new Response("ok")
}

async function handleRequest(request) {
if (ip) {
var blocked = await BLOCKED.get(ip)
// If we detect an IP and its blocked, respond with a 403 error.
if (blocked) {
return new Response({status: 403, statusText: "You are blocked!"})
}
}
// Otherwise, passthrough the original request.
return fetch(request)
}


#### 3. Larger Values

We’ve increased our size limit on values from 64 kB to 2 MB. This is quite useful if you need to store buffer-based or file data in Workers KV.

Consider this scenario: you want to let your users upload their favorite GIF to their profile without having to store these GIFs as binaries in your database or managing another cloud storage bucket.

Workers KV is a great fit for this use case! You can create a Workers KV namespace for your users’ GIFs that is fast and reliable wherever your customers are located.

addEventListener("fetch", event => {
var url = event.request.url
var arg = request.url.split("/").pop()
// User sends a URI encoded link to the GIF they wish to upload.
// Profile contains link to view the GIF.
} else if (url.pathname.startsWith("/api/view_gif")) {
event.respondWith(getGif(arg))
}
})

// Fetch the GIF from the Internet.
var gif = await fetch(decodeURIComponent(url))
var buffer = await gif.arrayBuffer()
// Upload the GIF as a buffer to Workers KV.
await GIFS.put(user.name, buffer)
return gif
}

var gif = await GIFS.get(username, "arrayBuffer")
// If the user has set one, respond with the GIF.
if (gif) {
return new Response(gif, {headers: {"Content-Type": "image/gif"}})
} else {
return new Response({status: 404, statusText: "User has no GIF!"})
}
}


Lastly, we want to thank all of our beta customers. It was your valuable feedback that led us to develop these changes to Workers KV. Make sure to stay in touch with us, we’re always looking ahead for what’s next and we love hearing from you!

### Pricing

We’re also ready to announce our GA pricing. If you’re one of our Enterprise customers, your pricing obviously remains unchanged.

• $0.50 / GB of data stored, 1 GB included •$0.50 / million reads, 10 million included
• 5 / million write, list, and delete operations, 1 million included During the beta period, we learned customers don’t want to just read values at our edge, they want to write values from our edge too. Since there is high demand for these edge operations, which are more costly, we have started charging non-read operations per month. ### Limits As mentioned earlier, we increased our value size limit from 64 kB to 2 MB. We’ve also removed our cap on the number of keys per namespace — it’s now unlimited. Here are our GA limits: • Up to 20 namespaces per account, each with unlimited keys • Keys of up to 512 bytes and values of up to 2 MB • Unlimited writes per second for different keys • One write per second for the same key • Unlimited reads per second per key ### Try it out now! Now open to all customers, you can start using Workers KV today from your Cloudflare dashboard under the Workers tab. You can also look at our updated documentation. We’re really excited to see what you all can build with Workers KV! # One more thing… new Speed Page Post Syndicated from Andrew Galloni original https://blog.cloudflare.com/new-speed-page/ Congratulations on making it through Speed Week. In the last week, Cloudflare has: described how our global network speeds up the Internet, launched a HTTP/2 prioritisation model that will improve web experiences on all browsers, launched an image resizing service which will deliver the optimal image to every device, optimized live video delivery, detailed how to stream progressive images so that they render twice as fast – using the flexibility of our new HTTP/2 prioritisation model and finally, prototyped a new over-the-wire format for JavaScript that could improve application start-up performance especially on mobile devices. As a bonus, we’re also rolling out one more new feature: “TCP Turbo” automatically chooses the TCP settings to further accelerate your website. As a company, we want to help every one of our customers improve web experiences. The growth of Cloudflare, along with the increase in features, has often made simple questions difficult to answer: • How fast is my website? • How should I be thinking about performance features? • How much faster would the site be if I were to enable a particular feature? This post will describe the exciting changes we have made to the Speed Page on the Cloudflare dashboard to give our customers a much clearer understanding of how their websites are performing and how they can be made even faster. The new Speed Page consists of : • A visual comparison of your website loading on Cloudflare, with caching enabled, compared to connecting directly to the origin. • The measured improvement expected if any performance feature is enabled. • A report describing how fast your website is on desktop and mobile. We want to simplify the complexity of making web experiences fast and give our customers control. Take a look – We hope you like it. ### Why do fast web experiences matter? Customer experience : No one likes slow service. Imagine if you go to a restaurant and the service is slow, especially when you arrive; you are not likely to go back or recommend it to your friends. It turns out the web works in the same way and Internet customers are even more demanding. As many as 79% of customers who are “dissatisfied” with a website’s performance are less likely to buy from that site again. Engagement and Revenue : There are many studies explaining how speed affects customer engagement, bounce rates and revenue. Reputation : There is also brand reputation to consider as customers associate an online experience to the brand. A study found that for 66% of the sample website performance influences their impression of the company. Diversity : Mobile traffic has grown to be larger than its desktop counterpart over the last few years. Mobile customers’ expectations have becoming increasingly demanding and expect seamless Internet access regardless of location. Mobile provides a new set of challenges that includes the diversity of device specifications. When testing, be aware that the average mobile device is significantly less capable than the top-of-the-range models. For example, there can be orders-of-magnitude disparity in the time different mobile devices take to run JavaScript. Another challenge is the variance in mobile performance, as customers move from a strong, high quality office network to mobile networks of different speeds (3G/5G), and quality within the same browsing session. ### New Speed Page There is compelling evidence that a faster web experience is important for anyone online. Most of the major studies involve the largest tech companies, who have whole teams dedicated to measuring and improving web experiences for their own services. At Cloudflare we are on a mission to help build a better and faster Internet for everyone – not just the selected few. Delivering fast web experiences is not a simple matter. That much is clear. To know what to send and when requires a deep understanding of every layer of the stack, from TCP tuning, protocol level prioritisation, content delivery formats through to the intricate mechanics of browser rendering. You will also need a global network that strives to be within 10 ms of every Internet user. The intrinsic value of such a network, should be clear to everyone. Cloudflare has this network, but it also offers many additional performance features. With the Speed Page redesign, we are emphasizing the performance benefits of using Cloudflare and the additional improvements possible from our features. The de facto standard for measuring website performance has been WebPageTest. Having its creator in-house at Cloudflare encouraged us to use it as the basis for website performance measurement. So, what is the easiest way to understand how a web page loads? A list of statistics do not paint a full picture of actual user experience. One of the cool features of WebPageTest is that it can generate a filmstrip of screen snapshots taken during a web page load, enabling us to quantify how a page loads, visually. This view makes it significantly easier to determine how long the page is blank for, and how long it takes for the most important content to render. Being able to look at the results in this way, provides the ability to empathise with the user. ### How fast on Cloudflare ? After moving your website to Cloudflare, you may have asked: How fast did this decision make my website? Well, now we provide the answer: As well as the increase in speed, we provide filmstrips of before and after, so that it is easy to compare and understand how differently a user will experience the website. If our tests are unable to reach your origin and you are already setup on Cloudflare, we will test with development mode enabled, which disables caching and minification. Site performance statistics How can we measure the user experience of a website? Traditionally, page load was the important metric. Page load is a technical measurement used by browser vendors that has no bearing on the presentation or usability of a page. The metric reports on how long it takes not only to load the important content but also all of the 3rd party content (social network widgets, advertising, tracking scripts etc.). A user may very well not see anything until after all the page content has loaded, or they may be able to interact with a page immediately, while content continues to load. A user will not decide whether a page is fast by a single measure or moment. A user will perceive how fast a website is from a combination of factors: • when they see any response • when they see the content they expect • when they can interact with the page • when they can perform the task they intended Experience has shown that if you focus on one measure, it will likely be to the detriment of the others. Importance of Visual response If an impatient user navigates to your site and sees no content for several seconds or no valuable content, they are likely to get frustrated and leave. The paint timing spec defines a set of paint metrics, when content appears on a page, to measure the key moments in how a user perceives performance. First Contentful Paint (FCP) is the time when the browser first renders any DOM content. First Meaningful Paint (FMP) is the point in time when the page’s “primary” content appears on the screen. This metric should relate to what the user has come to the site to see and is designed as the point in time when the largest visible layout change happens. Speed Index attempts to quantify the value of the filmstrip rather than using a single paint timing. The speed index measures the rate at which content is displayed – essentially the area above the curve. In the chart below from our progressive image feature you can see reaching 80% happens much earlier for the parallelized (red) load rather than the regular (blue). Importance of interactivity The same impatient user is now happy that the content they want to see has appeared. They will still become frustrated if they are unable to interact with the site. Time to Interactive is the time it takes for content to be rendered and the page is ready to receive input from the user. Technically this is defined as when the browser’s main processing thread has been idle for several seconds after first meaningful paint. The Speed Tab displays these key metrics for mobile and desktop. ### How much faster on Cloudflare ? The Cloudflare Dashboard provides a list of performance features which can, admittedly, be both confusing and daunting. What would be the benefit of turning on Rocket Loader and on which performance metrics will it have the most impact ? If you upgrade to Pro what will be the value of the enhanced HTTP/2 prioritisation ? The optimization section answers these questions. Tests are run with each performance feature turned on and off. The values for the tests for the appropriate performance metrics are displayed, along with the improvement. You can enable or upgrade the feature from this view. Here are a few examples : If Rocket Loader were enabled for this website, the render-blocking JavaScript would be deferred causing first paint time to drop from 1.25s to 0.81s – an improvement of 32% on desktop. Image heavy sites do not perform well on slow mobile connections. If you enable Mirage, your customers on 3G connections would see meaningful content 1s sooner – an improvement of 29.4%. So how about our new features? We tested the enhanced HTTP/2 prioritisation feature on an Edge browser on desktop and saw meaningful content display 2s sooner – an improvement of 64%. This is a more interesting result taken from the blog example used to illustrate the progressive image streaming. At first glance the improvement of 29% in speed index is good. The filmstrip comparison shows a more significant difference. In this case the page with no images shown is already 43% visually complete for both scenarios after 1.5s. At 2.5s the difference is 77% compared to 50%. This is a great example of how metrics do not tell the full story. They cannot completely replace viewing the page loading flow and understanding what is important for your site. ### How to try This is our first iteration of the new Speed Page and we are eager to get your feedback. We will be rolling this out to beta customers who are interested in seeing how their sites perform. To be added to the queue for activation of the new Speed Page please click on the banner on the overview page, or click on the banner on the existing Speed Page. # Faster script loading with BinaryAST? Post Syndicated from Ingvar Stepanyan original https://blog.cloudflare.com/binary-ast/ ### JavaScript Cold starts The performance of applications on the web platform is becoming increasingly bottlenecked by the startup (load) time. Large amounts of JavaScript code are required to create rich web experiences that we’ve become used to. When we look at the total size of JavaScript requested on mobile devices from HTTPArchive, we see that an average page loads 350KB of JavaScript, while 10% of pages go over the 1MB threshold. The rise of more complex applications can push these numbers even higher. While caching helps, popular websites regularly release new code, which makes cold start (first load) times particularly important. With browsers moving to separate caches for different domains to prevent cross-site leaks, the importance of cold starts is growing even for popular subresources served from CDNs, as they can no longer be safely shared. Usually, when talking about the cold start performance, the primary factor considered is a raw download speed. However, on modern interactive pages one of the other big contributors to cold starts is JavaScript parsing time. This might seem surprising at first, but makes sense – before starting to execute the code, the engine has to first parse the fetched JavaScript, make sure it doesn’t contain any syntax errors and then compile it to the initial bytecode. As networks become faster, parsing and compilation of JavaScript could become the dominant factor. The device capability (CPU or memory performance) is the most important factor in the variance of JavaScript parsing times and correspondingly the time to application start. A 1MB JavaScript file will take an order of a 100 ms to parse on a modern desktop or high-end mobile device but can take over a second on an average phone (Moto G4). A more detailed post on the overall cost of parsing, compiling and execution of JavaScript shows how the JavaScript boot time can vary on different mobile devices. For example, in the case of news.google.com, it can range from 4s on a Pixel 2 to 28s on a low-end device. While engines continuously improve raw parsing performance, with V8 in particular doubling it over the past year, as well as moving more things off the main thread, parsers still have to do lots of potentially unnecessary work that consumes memory, battery and might delay the processing of the useful resources. ### The “BinaryAST” Proposal This is where BinaryAST comes in. BinaryAST is a new over-the-wire format for JavaScript proposed and actively developed by Mozilla that aims to speed up parsing while keeping the semantics of the original JavaScript intact. It does so by using an efficient binary representation for code and data structures, as well as by storing and providing extra information to guide the parser ahead of time. The name comes from the fact that the format stores the JavaScript source as an AST encoded into a binary file. The specification lives at tc39.github.io/proposal-binary-ast and is being worked on by engineers from Mozilla, Facebook, Bloomberg and Cloudflare. “Making sure that web applications start quickly is one of the most important, but also one of the most challenging parts of web development. We know that BinaryAST can radically reduce startup time, but we need to collect real-world data to demonstrate its impact. Cloudflare’s work on enabling use of BinaryAST with Cloudflare Workers is an important step towards gathering this data at scale.” Till Schneidereit, Senior Engineering Manager, Developer Technologies Mozilla ### Parsing JavaScript For regular JavaScript code to execute in a browser the source is parsed into an intermediate representation known as an AST that describes the syntactic structure of the code. This representation can then be compiled into a byte code or a native machine code for execution. A simple example of adding two numbers can be represented in an AST as: Parsing JavaScript is not an easy task; no matter which optimisations you apply, it still requires reading the entire text file char by char, while tracking extra context for syntactic analysis. The goal of the BinaryAST is to reduce the complexity and the amount of work the browser parser has to do overall by providing an additional information and context by the time and place where the parser needs it. To execute JavaScript delivered as BinaryAST the only steps required are: Another benefit of BinaryAST is that it makes possible to only parse the critical code necessary for start-up, completely skipping over the unused bits. This can dramatically improve the initial loading time. This post will now describe some of the challenges of parsing JavaScript in more detail, explain how the proposed format addressed them, and how we made it possible to run its encoder in Workers. ### Hoisting JavaScript relies on hoisting for all declarations – variables, functions, classes. Hoisting is a property of the language that allows you to declare items after the point they’re syntactically used. Let’s take the following example: function f() { return g(); } function g() { return 42; } Here, when the parser is looking at the body of f, it doesn’t know yet what g is referring to – it could be an already existing global function or something declared further in the same file – so it can’t finalise parsing of the original function and start the actual compilation. BinaryAST fixes this by storing all the scope information and making it available upfront before the actual expressions. As shown by the difference between the initial AST and the enhanced AST in a JSON representation: ### Lazy parsing One common technique used by modern engines to improve parsing times is lazy parsing. It utilises the fact that lots of websites include more JavaScript than they actually need, especially for the start-up. Working around this involves a set of heuristics that try to guess when any given function body in the code can be safely skipped by the parser initially and delayed for later. A common example of such heuristic is immediately running the full parser for any function that is wrapped into parentheses: (function(... Such prefix usually indicates that a following function is going to be an IIFE (immediately-invoked function expression), and so the parser can assume that it will be compiled and executed ASAP, and wouldn’t benefit from being skipped over and delayed for later. (function() { … })(); These heuristics significantly improve the performance of the initial parsing and cold starts, but they’re not completely reliable or trivial to implement. One of the reasons is the same as in the previous section – even with lazy parsing, you still need to read the contents, analyse them and store an additional scope information for the declarations. Another reason is that the JavaScript specification requires reporting any syntax errors immediately during load time, and not when the code is actually executed. A class of these errors, called early errors, is checking for mistakes like usage of the reserved words in invalid contexts, strict mode violations, variable name clashes and more. All of these checks require not only lexing JavaScript source, but also tracking extra state even during the lazy parsing. Having to do such extra work means you need to be careful about marking functions as lazy too eagerly, especially if they actually end up being executed during the page load. Otherwise you’re making cold start costs even worse, as now every function that is erroneously marked as lazy, needs to be parsed twice – once by the lazy parser and then again by the full one. Because BinaryAST is meant to be an output format of other tools such as Babel, TypeScript and bundlers such as Webpack, the browser parser can rely on the JavaScript being already analysed and verified by the initial parser. This allows it to skip function bodies completely, making lazy parsing essentially free. It reduces the cost of a completely unused code – while including it is still a problem in terms of the network bandwidth (don’t do this!), at least it’s not affecting parsing times anymore. These benefits apply equally to the code that is used later in the page lifecycle (for example, invoked in response to user actions), but is not required during the startup. Last but not least important benefit of such approach is that BinaryAST encodes lazy annotations as part of the format, giving tools and developers direct and full control over the heuristics. For example, a tool targeting the Web platform or a framework CLI can use its domain-specific knowledge to mark some event handlers as lazy or eager depending on the context and the event type. ### Avoiding ambiguity in parsing Using a text format for a programming language is great for readability and debugging, but it’s not the most efficient representation for parsing and execution. For example, parsing low-level types like numbers, booleans and even strings from text requires extra analysis and computation, which is unnecessary when you can just store and read them as native binary-encoded values in the first place and read directly on the other side. Another problem is an ambiguity in the grammar itself. It was already an issue in the ES5 world, but could usually be resolved with some extra bookkeeping based on the previously seen tokens. However, in ES6+ there are productions that can be ambiguous all the way through until they’re parsed completely. For example, a token sequence like: (a, {b: c, d}, [e = 1])... can start either a parenthesized comma expression with nested object and array literals and an assignment: (a, {b: c, d}, [e = 1]); // it was an expression or a parameter list of an arrow expression function with nested object and array patterns and a default value: (a, {b: c, d}, [e = 1]) => … // it was a parameter list Both representations are perfectly valid, but have completely different semantics, and you can’t know which one you’re dealing with until you see the final token. To work around this, parsers usually have to either backtrack, which can easily get exponentially slow, or to parse contents into intermediate node types that are capable of holding both expressions and patterns, with following conversion. The latter approach preserves linear performance, but makes the implementation more complicated and requires preserving more state. In the BinaryAST format this issue doesn’t exist in the first place because the parser sees the type of each node before it even starts parsing its contents. ### Cloudflare Implementation Currently, the format is still in flux, but the very first version of the client-side implementation was released under a flag in Firefox Nightly several months ago. Keep in mind this is only an initial unoptimised prototype, and there are already several experiments changing the format to provide improvements to both size and parsing performance. On the producer side, the reference implementation lives at github.com/binast/binjs-ref. Our goal was to take this reference implementation and consider how we would deploy it at Cloudflare scale. If you dig into the codebase, you will notice that it currently consists of two parts. One is the encoder itself, which is responsible for taking a parsed AST, annotating it with scope and other relevant information, and writing out the result in one of the currently supported formats. This part is written in Rust and is fully native. Another part is what produces that initial AST – the parser. Interestingly, unlike the encoder, it’s implemented in JavaScript. Unfortunately, there is currently no battle-tested native JavaScript parser with an open API, let alone implemented in Rust. There have been a few attempts, but, given the complexity of JavaScript grammar, it’s better to wait a bit and make sure they’re well-tested before incorporating it into the production encoder. On the other hand, over the last few years the JavaScript ecosystem grew to extensively rely on developer tools implemented in JavaScript itself. In particular, this gave a push to rigorous parser development and testing. There are several JavaScript parser implementations that have been proven to work on thousands of real-world projects. With that in mind, it makes sense that the BinaryAST implementation chose to use one of them – in particular, Shift – and integrated it with the Rust encoder, instead of attempting to use a native parser. ### Connecting Rust and JavaScript Integration is where things get interesting. Rust is a native language that can compile to an executable binary, but JavaScript requires a separate engine to be executed. To connect them, we need some way to transfer data between the two without sharing the memory. Initially, the reference implementation generated JavaScript code with an embedded input on the fly, passed it to Node.js and then read the output when the process had finished. That code contained a call to the Shift parser with an inlined input string and produced the AST back in a JSON format. This doesn’t scale well when parsing lots of JavaScript files, so the first thing we did is transformed the Node.js side into a long-living daemon. Now Rust could spawn a required Node.js process just once and keep passing inputs into it and getting responses back as individual messages. ### Running in the cloud While the Node.js solution worked fairly well after these optimisations, shipping both a Node.js instance and a native bundle to production requires some effort. It’s also potentially risky and requires manual sandboxing of both processes to make sure we don’t accidentally start executing malicious code. On the other hand, the only thing we needed from Node.js is the ability to run the JavaScript parser code. And we already have an isolated JavaScript engine running in the cloud – Cloudflare Workers! By additionally compiling the native Rust encoder to Wasm (which is quite easy with the native toolchain and wasm-bindgen), we can even run both parts of the code in the same process, making cold starts and communication much faster than in a previous model. ### Optimising data transfer The next logical step is to reduce the overhead of data transfer. JSON worked fine for communication between separate processes, but with a single process we should be able to retrieve the required bits directly from the JavaScript-based AST. To attempt this, first of all, we needed to move away from the direct JSON usage to something more generic that would allow us to support various import formats. The Rust ecosystem already has an amazing serialisation framework for that – Serde. Aside from allowing us to be more flexible in regard to the inputs, rewriting to Serde helped an existing native use case too. Now, instead of parsing JSON into an intermediate representation and then walking through it, all the native typed AST structures can be deserialized directly from the stdout pipe of the Node.js process in a streaming manner. This significantly improved both the CPU usage and memory pressure. But there is one more thing we can do: instead of serializing and deserializing from an intermediate format (let alone, a text format like JSON), we should be able to operate [almost] directly on JavaScript values, saving memory and repetitive work. How is this possible? wasm-bindgen provides a type called JsValue that stores a handle to an arbitrary value on the JavaScript side. This handle internally contains an index into a predefined array. Each time a JavaScript value is passed to the Rust side as a result of a function call or a property access, it’s stored in this array and an index is sent to Rust. The next time Rust wants to do something with that value, it passes the index back and the JavaScript side retrieves the original value from the array and performs the required operation. By reusing this mechanism, we could implement a Serde deserializer that requests only the required values from the JS side and immediately converts them to their native representation. It’s now open-sourced under https://github.com/cloudflare/serde-wasm-bindgen. At first, we got a much worse performance out of this due to the overhead of more frequent calls between 1) Wasm and JavaScript – SpiderMonkey has improved these recently, but other engines still lag behind and 2) JavaScript and C++, which also can’t be optimised well in most engines. The JavaScript <-> C++ overhead comes from the usage of TextEncoder to pass strings between JavaScript and Wasm in wasm-bindgen, and, indeed, it showed up as the highest in the benchmark profiles. This wasn’t surprising – after all, strings can appear not only in the value payloads, but also in property names, which have to be serialized and sent between JavaScript and Wasm over and over when using a generic JSON-like structure. Luckily, because our deserializer doesn’t have to be compatible with JSON anymore, we can use our knowledge of Rust types and cache all the serialized property names as JavaScript value handles just once, and then keep reusing them for further property accesses. This, combined with some changes to wasm-bindgen which we have upstreamed, allows our deserializer to be up to 3.5x faster in benchmarks than the original Serde support in wasm-bindgen, while saving ~33% off the resulting code size. Note that for string-heavy data structures it might still be slower than the current JSON-based integration, but situation is expected to improve over time when reference types proposal lands natively in Wasm. After implementing and integrating this deserializer, we used the wasm-pack plugin for Webpack to build a Worker with both Rust and JavaScript parts combined and shipped it to some test zones. ### Show me the numbers Keep in mind that this proposal is in very early stages, and current benchmarks and demos are not representative of the final outcome (which should improve numbers much further). As mentioned earlier, BinaryAST can mark functions that should be parsed lazily ahead of time. By using different levels of lazification in the encoder (https://github.com/binast/binjs-ref/blob/b72aff7dac7c692a604e91f166028af957cdcda5/crates/binjs_es6/src/lazy.rs#L43) and running tests against some popular JavaScript libraries, we found following speed-ups. #### Level 0 (no functions are lazified) With lazy parsing disabled in both parsers we got a raw parsing speed improvement of between 3 and 10%. NameSource size (kb)JavaScript Parse time (average ms)BinaryAST parse time (average ms)Diff (%) React200.4030.385-4.56 D3 (v5)24011.17810.525-6.018 Angular1806.9856.331-9.822 Babel78021.25520.599-3.135 Backbone320.7750.699-10.312 wabtjs172064.83659.556-8.489 Fuzzball (1.2)723.1652.768-13.383 #### Level 3 (functions up to 3 levels deep are lazified) But with the lazification set to skip nested functions of up to 3 levels we see much more dramatic improvements in parsing time between 90 and 97%. As mentioned earlier in the post, BinaryAST makes lazy parsing essentially free by completely skipping over the marked functions. NameSource size (kb)Parse time (average ms)BinaryAST parse time (average ms)Diff (%) React200.4070.032-92.138 D3 (v5)24011.6230.224-98.073 Angular1807.0930.680-90.413 Babel78021.1000.895-95.758 Backbone320.8980.045-94.989 wabtjs172059.8021.601-97.323 Fuzzball (1.2)722.9370.089-96.970 All the numbers are from manual tests on a Linux x64 Intel i7 with 16Gb of ram. While these synthetic benchmarks are impressive, they are not representative of real-world scenarios. Normally you will use at least some of the loaded JavaScript during the startup. To check this scenario, we decided to test some realistic pages and demos on desktop and mobile Firefox and found speed-ups in page loads too. For a sample application (https://github.com/cloudflare/binjs-demo, https://serve-binjs.that-test.site/) which weighed in at around 1.2 MB of JavaScript we got the following numbers for initial script execution: DeviceJavaScriptBinaryAST Desktop338ms314ms Mobile (HTC One M8)2019ms1455ms Here is a video that will give you an idea of the improvement as seen by a user on mobile Firefox (in this case showing the entire page startup time): Next step is to start gathering data on real-world websites, while improving the underlying format. ### How do I test BinaryAST on my website? We’ve open-sourced our Worker so that it could be installed on any Cloudflare zone: https://github.com/binast/binjs-ref/tree/cf-wasm. One thing to be currently wary of is that, even though the result gets stored in the cache, the initial encoding is still an expensive process, and might easily hit CPU limits on any non-trivial JavaScript files and fall back to the unencoded variant. We are working to improve this situation by releasing BinaryAST encoder as a separate feature with more relaxed limits in the following few days. Meanwhile, if you want to play with BinaryAST on larger real-world scripts, an alternative option is to use a static binjs_encode tool from https://github.com/binast/binjs-ref to pre-encode JavaScript files ahead of time. Then, you can use a Worker from https://github.com/cloudflare/binast-cf-worker to serve the resulting BinaryAST assets when supported and requested by the browser. On the client side, you’ll currently need to download Firefox Nightly, go to about:config and enable unrestricted BinaryAST support via the following options: Now, when opening a website with either of the Workers installed, Firefox will get BinaryAST instead of JavaScript automatically. ### Summary The amount of JavaScript in modern apps is presenting performance challenges for all consumers. Engine vendors are experimenting with different ways to improve the situation – some are focusing on raw decoding performance, some on parallelizing operations to reduce overall latency, some are researching new optimised formats for data representation, and some are inventing and improving protocols for the network delivery. No matter which one it is, we all have a shared goal of making the Web better and faster. On Cloudflare’s side, we’re always excited about collaborating with all the vendors and combining various approaches to make that goal closer with every step. # Live video just got more live: Introducing Concurrent Streaming Acceleration Post Syndicated from Jon Levine original https://blog.cloudflare.com/introducing-concurrent-streaming-acceleration/ Today we’re excited to introduce Concurrent Streaming Acceleration, a new technique for reducing the end-to-end latency of live video on the web when using Stream Delivery. Let’s dig into live-streaming latency, why it’s important, and what folks have done to improve it. ### How “live” is “live” video? Live streaming makes up an increasing share of video on the web. Whether it’s a TV broadcast, a live game show, or an online classroom, users expect video to arrive quickly and smoothly. And the promise of “live” is that the user is seeing events as they happen. But just how close to “real-time” is “live” Internet video? Delivering live video on the Internet is still hard and adds lots of latency: 1. The content source records video and sends it to an encoding server; 2. The origin server transforms this video into a format like DASH, HLS or CMAF that can be delivered to millions of devices efficiently; 3. A CDN is typically used to deliver encoded video across the globe 4. Client players decode the video and render it on the screen And all of this is under a time constraint — the whole process need to happen in a few seconds, or video experiences will suffer. We call the total delay between when the video was shot, and when it can be viewed on an end-user’s device, as “end-to-end latency” (think of it as the time from the camera lens to your phone’s screen). ### Traditional segmented delivery Video formats like DASH, HLS, and CMAF work by splitting video into small files, called “segments”. A typical segment duration is 6 seconds. If a client player needs to wait for a whole 6s segment to be encoded, sent through a CDN, and then decoded, it can be a long wait! It takes even longer if you want the client to build up a buffer of segments to protect against any interruptions in delivery. A typical player buffer for HLS is 3 segments: When you consider encoding delays, it’s easy to see why live streaming latency on the Internet has typically been about 20-30 seconds. We can do better. ### Reduced latency with chunked transfer encoding A natural way to solve this problem is to enable client players to start playing the chunks while they’re downloading, or even while they’re still being created. Making this possible requires a clever bit of cooperation to encode and deliver the files in a particular way, known as “chunked encoding.” This involves splitting up segments into smaller, bite-sized pieces, or “chunks”. Chunked encoding can typically bring live latency down to 5 or 10 seconds. Confusingly, the word “chunk” is overloaded to mean two different things: 1. CMAF or HLS chunks, which are small pieces of a segment (typically 1s) that are aligned on key frames 2. HTTP chunks, which are just a way of delivering any file over the web HTTP chunks are important because web clients have limited ability to process streams of data. Most clients can only work with data once they’ve received the full HTTP response, or at least a complete HTTP chunk. By using HTTP chunked transfer encoding, we enable video players to start parsing and decoding video sooner. CMAF chunks are important so that decoders can actually play the bits that are in the HTTP chunks. Without encoding video in a careful way, decoders would have random bits of a video file that can’t be played. ### CDNs can introduce additional buffering Chunked encoding with HLS and CMAF is growing in use across the web today. Part of what makes this technique great is that HTTP chunked encoding is widely supported by CDNs – it’s been part of the HTTP spec for 20 years. CDN support is critical because it allows low-latency live video to scale up and reach audiences of thousands or millions of concurrent viewers – something that’s currently very difficult to do with other, non-HTTP based protocols. Unfortunately, even if you enable chunking to optimise delivery, your CDN may be working against you by buffering the entire segment. To understand why consider what happens when many people request a live segment at the same time: If the file is already in cache, great! CDNs do a great job at delivering cached files to huge audiences. But what happens when the segment isn’t in cache yet? Remember – this is the typical request pattern for live video! Typically, CDNs are able to “stream on cache miss” from the origin. That looks something like this: But again – what happens when multiple people request the file at once? CDNs typically need to pull the entire file into cache before serving additional viewers: This behavior is understandable. CDN data centers consist of many servers. To avoid overloading origins, these servers typically coordinate amongst themselves using a “cache lock” (mutex) that allows only one server to request a particular file from origin at a given time. A side effect of this is that while a file is being pulled into cache, it can’t be served to any user other than the first one that requested it. Unfortunately, this cache lock also defeats the purpose of using chunked encoding! To recap thus far: • Chunked encoding splits up video segments into smaller pieces • This can reduce end-to-end latency by allowing chunks to be fetched and decoded by players, even while segments are being produced at the origin server • Some CDNs neutralize the benefits of chunked encoding by buffering entire files inside the CDN before they can be delivered to clients ### Cloudflare’s solution: Concurrent Streaming Acceleration As you may have guessed, we think we can do better. Put simply, we now have the ability to deliver un-cached files to multiple clients simultaneously while we pull the file once from the origin server. This sounds like a simple change, but there’s a lot of subtlety to do this safely. Under the hood, we’ve made deep changes to our caching infrastructure to remove the cache lock and enable multiple clients to be able to safely read from a single file while it’s still being written. The best part is – all of Cloudflare now works this way! There’s no need to opt-in, or even make a config change to get the benefit. We rolled this feature out a couple months ago and have been really pleased with the results so far. We measure success by the “cache lock wait time,” i.e. how long a request must wait for other requests – a direct component of Time To First Byte. One OTT customer saw this metric drop from 1.5s at P99 to nearly 0, as expected: This directly translates into a 1.5-second improvement in end-to-end latency. Live video just got more live! ### Conclusion New techniques like chunked encoding have revolutionized live delivery, enabling publishers to deliver low-latency live video at scale. Concurrent Streaming Acceleration helps you unlock the power of this technique at your CDN, potentially shaving precious seconds of end-to-end latency. If you’re interested in using Cloudflare for live video delivery, contact our enterprise sales team. And if you’re interested in working on projects like this and helping us improve live video delivery for the entire Internet, join our engineering team! # Announcing Cloudflare Image Resizing: Simplifying Optimal Image Delivery In the past three years, the amount of image data on the median mobile webpage has doubled. Growing images translate directly to users hitting data transfer caps, experiencing slower websites, and even leaving if a website doesn’t load in a reasonable amount of time. The crime is many of these images are so slow because they are larger than they need to be, sending data over the wire which has absolutely no (positive) impact on the user’s experience. To provide a concrete example, let’s consider this photo of Cloudflare’s Lava Lamp Wall: On the left you see the photo, scaled to 300 pixels wide. On the right you see the same image delivered in its original high resolution, scaled in a desktop web browser. They both look exactly the same, yet the image on the right takes more than twenty times more data to load. Even for the best and most conscientious developers resizing every image to handle every possible device geometry consumes valuable time, and it’s exceptionally easy to forget to do this resizing altogether. Today we are launching a new product, Image Resizing, to fix this problem once and for all. ### Announcing Image Resizing With Image Resizing, Cloudflare adds another important product to its suite of available image optimizations. This product allows customers to perform a rich set of the key actions on images. • Resize – The source image will be resized to the specified height and width. This action allows multiple different sized variants to be created for each specific use. • Crop – The source image will be resized to a new size that does not maintain the original aspect ratio and a portion of the image will be removed. This can be especially helpful for headshots and product images where different formats must be achieved by keeping only a portion of the image. • Compress – The source image will have its file size reduced by applying lossy compression. This should be used when slight quality reduction is an acceptable trade for file size reduction. • Convert to WebP – When the users browser supports it, the source image will be converted to WebP. Delivering a WebP image takes advantage of the modern, highly optimized image format. By using a combination of these actions, customers store a single high quality image on their server, and Image Resizing can be leveraged to create specialized variants for each specific use case. Without any additional effort, each variant will also automatically benefit from Cloudflare’s global caching. ### Examples #### Ecommerce Thumbnails Ecommerce sites typically store a high-quality image of each product. From that image, they need to create different variants depending on how that product will be displayed. One example is creating thumbnails for a catalog view. Using Image Resizing, if the high quality image is located here: https://example.com/images/shoe123.jpg This is how to display a 75×75 pixel thumbnail using Image Resizing: <img src="/cdn-cgi/image/width=75,height=75/images/shoe123.jpg"> #### Responsive Images When tailoring a site to work on various device types and sizes, it’s important to always use correctly sized images. This can be difficult when images are intended to fill a particular percentage of the screen. To solve this problem, <img srcset sizes> can be used. Without Image Resizing, multiple versions of the same image would need to be created and stored. In this example, a single high quality copy of hero.jpg is stored, and Image Resizing is used to resize for each particular size as needed. <img width="100%" srcset=" /cdn-cgi/image/fit=contain,width=320/assets/hero.jpg 320w, /cdn-cgi/image/fit=contain,width=640/assets/hero.jpg 640w, /cdn-cgi/image/fit=contain,width=960/assets/hero.jpg 960w, /cdn-cgi/image/fit=contain,width=1280/assets/hero.jpg 1280w, /cdn-cgi/image/fit=contain,width=2560/assets/hero.jpg 2560w, " src="/cdn-cgi/image/width=960/assets/hero.jpg">  #### Enforce Maximum Size Without Changing URLs Image Resizing is also available from within a Cloudflare Worker. Workers allow you to write code which runs close to your users all around the world. For example, you might wish to add Image Resizing to your images while keeping the same URLs. Your users and client would be able to use the same image URLs as always, but the images will be transparently modified in whatever way you need. You can install a Worker on a route which matches your image URLs, and resize any images larger than a limit: addEventListener('fetch', event => { event.respondWith(handleRequest(event.request)) }) async function handleRequest(request) { return fetch(request, { cf: { image: { width: 800, height: 800, fit: 'scale-down' } }); }  As a Worker is just code, it is also easy to run this worker only on URLs with image extensions, or even to only resize images being delivered to mobile clients. ### Cloudflare and Images Cloudflare has a long history building tools to accelerate images. Our caching has always helped reduce latency by storing a copy of images closer to the user. Polish automates options for both lossless and lossy image compression to remove unnecessary bytes from images. Mirage accelerates image delivery based on device type. We are continuing to invest in all of these tools, as they all serve a unique role in improving the image experience on the web. Image Resizing is different because it is the first image product at Cloudflare to give developers full control over how their images would be served. You should choose Image Resizing if you are comfortable defining the sizes you wish your images to be served at in advance or within a Cloudflare Worker. ### Next Steps and Simple Pricing Image Resizing is available today for Business and Enterprise Customers. To enable it, login to the Cloudflare Dashboard and navigate to the Speed Tab. There you’ll find the section for Image Resizing which you can enable with one click. This product is included in the Business and Enterprise plans at no additional cost with generous usage limits. Business Customers have 100k requests per month limit and will be charged10 for each additional 100k requests per month.  Enterprise Customers have a 10M request per month limit with discounted tiers for higher usage.  Requests are defined as a hit on a URI that contains Image Resizing or a call to Image Resizing from a Worker.

Now that you’ve enabled Image Resizing, it’s time to resize your first image.

1. Using your existing site, store an image here: https://yoursite.com/images/yourimage.jpg
2. Use this URL to resize that image:
https://yoursite.com/cdn-cgi/image/width=100,height=100,quality=75/images/yourimage.jpg
3. Experiment with changing width=, height=, and quality=.

The instructions above use the Default URL Format for Image Resizing.  For details on options, uses cases, and compatibility, refer to our Developer Documentation.