# Internship Experience: Cryptography Engineer

Post Syndicated from Watson Ladd original https://blog.cloudflare.com/internship-experience-cryptography-engineer/

Back in the summer of 2017 I was an intern at Cloudflare. During the scholastic year I was a graduate student working on automorphic forms and computational Langlands at Berkeley: a part of number theory with deep connections to representation theory, aimed at uncovering some of the deepest facts about number fields. I had also gotten involved in Internet standardization and security research, but much more on the applied side.

While I had published papers in computer security and had coded for my dissertation, building and deploying new protocols to production systems was going to be new. Going from the academic environment of little day to day supervision to the industrial one of more direction; from greenfield code that would only ever be run by one person to large projects that had to be understandable by a team; from goals measured in years or even decades, to goals measured in days, weeks, or quarters; these transitions would present some challenges.

Cloudflare at that stage was a very different company from what it is now. Entire products and offices simply did not exist. Argo, now a mainstay of our offering for sophisticated companies, was slowly emerging. Access, which has been helping safeguard employees working from home these past weeks, was then experiencing teething issues. Workers was being extensively developed for launch that autumn. Quicksilver was still in the slow stages of replacing KyotoTycoon. Lisbon wasn’t on the map, and Austin was very new.

### Day 1

My first job was to get my laptop working. Quickly I discovered that despite the promise of using either Mac or Linux, only Mac was supported as a local development environment. Most Linux users would take a good part of a month to tweak all the settings and get the local development environment up. I didn’t have months. After three days, I broke down and got a Mac.

Needless to say I asked for some help. Like a drowning man in quicksand, I managed to attract three engineers to this near insoluble problem of the edge dev stack, and after days of hacking on it, fixing problems that had long been ignored, we got it working well enough to test a few things. That development environment is now gone and replaced with one built Kubernetes VMs, and works much better that way. When things work on your machine, you can now send everyone your machine.

### Speeding up

With setup complete enough, it was on to the problem we needed to solve. Our goal was to implement a set of three interrelated Internet drafts, one defining secondary certificates, one defining external authentication with TLS certificates, and a third permitting servers to advertise the websites they could serve.

External authentication is a TLS feature that permits a server or a client on an already opened connection to prove its possession of the private key of another certificate. This proof of possession is tied to the TLS connection, avoiding attacks on bearer tokens caused by the lack of this binding.

Secondary certificates is an HTTP/2 feature enabling clients and servers to send certificates together with proof that they actually know the private key. This feature has many applications such as certificate-based authentication, but also enables us to prove that we are permitted to serve the websites we claim to serve.

The last draft was the HTTP/2 ORIGIN frame. The ORIGIN frame enables a website to advertise other sites that it could serve, permitting more connection reuse than allowed under the traditional rules. Connection reuse is an important part of browser performance as it avoids much of the setup of a connection.

These drafts solved an important problem for Cloudflare. Many resources such as JavaScript, CSS, and images hosted by one website can be used by others. Because Cloudflare proxies so many different websites, our servers have often cached these resources as well. Browsers though, do not know that these different websites are made faster by Cloudflare, and as a result they repeat all the steps to request the subresources again. This takes unnecessary time since there is an established and usable perfectly good connection already. If the browser could know this, it could use the connection again.

We could only solve this problem by getting browsers and the broader community of TLS implementers on board. Some of these drafts such as external authentication and secondary certificates had a broader set of motivations, such as getting certificate based authentication to work with HTTP/2 and TLS 1.3. All of these needs had to be addressed in the drafts, even if we were only implementing a subset of the uses.

Successful standards cover the use cases that are needed while being simple enough to implement and achieve interoperability. Implementation experience is essential to achieving this success: a standard with no implementations fails to incorporate hard won lessons. Computers are hard.

### Prototype

My first goal was to set up a simple prototype to test the much more complex production implementation, as well as to share outside of Cloudflare so that others could have confidence in their implementations. But these drafts that had to be implemented in the prototype were incremental improvements to an already massive stack of TLS and HTTP standards.

I decided it would be easiest to build on top of an already existing implementation of TLS and HTTP. I picked the Go standard library as my base: it’s simple, readable, and in a language I was already familiar with. There was already a basic demo showcasing support in Firefox for the ORIGIN frame, and it would be up to me to extend it.

Using that as my starting point I was able in 3 weeks to set up a demonstration server and a client. This showed good progress, and that nothing in the specification was blocking implementation. But without integrating it into our servers for further experimentation so that we might  discover rare issues that could be showstoppers. This was a bitter lesson learned from TLS 1.3, where it took months to track down a single brand of printer that was incompatible with the standard, and forced a change.

### From Prototype to Production

We also wanted to understand the benefits with some real world data, to convince others that this approach was worthwhile. Our position as a provider to many websites globally gives us diverse, real world data on performance that we use to make our products better, and perhaps more important, to learn lessons that help everyone make the Internet better. As a result we had to implement this in production: the experimental framework for TLS 1.3 development had been removed and we didn’t have an environment for experimentation.

At the time everything at Cloudflare was based on variants of NGINX. We had extended it with modules to implement features like Keyless and customized certificate handling to meet our needs, but much of the business logic was and is carried out in Lua via OpenResty.

Lua has many virtues, but at the time both the TLS termination and the core business logic lived in the same repo despite being different processes at runtime. This made it very difficult to understand what code was running when, and changes to basic libraries could create problems for both. The build system for this creation had the significant disadvantage of building the same targets with different settings. Lua also is a very dynamic language, but unlike the dynamic languages I was used to, there was no way to interact with the system as it was running on requests.

The first step was implementing the ORIGIN frame. In implementing this, we had to figure out which sites hosted the subresources used by the page we were serving. Luckily, we already had this logic to enable server push support driven by Link headers. Building on this let me quickly get ORIGIN working.

This work wasn’t the only thing I was up to as an intern. I was also participating in weekly team meetings, attending our engineering presentations, and getting a sense of what life was like at Cloudflare. We had an excursion for interns to the Computer History Museum in Mountain View and Moffett Field, where we saw the base museum.

The next challenge was getting the CERTIFICATE frame to work. This was a much deeper problem. NGINX processes a request in phases, and some of the phases, like the header processing phase, do not permit network I/O without locking up the event loop. Since we are parsing the headers to determine what to send, the frame is created in the header processing phase. But finding a certificate and telling Keyless to sign it required network I/O.

The standard solution to this problem is to have Lua execute a timer callback, in which network I/O is possible. But this context doesn’t have any data from the request: some serious refactoring was needed to create a way to get the keyless module to function outside the context of a request.

Once the signature was created, the battle was half over. Formatting the CERTIFICATE frame was simple, but it had to be stuck into the connection associated with the request that had demanded it be created. And there was no reason to expect the request was still alive, and no way to know what state it was in when the request was handled by the Keyless module.

To handle this issue I made a shared btree indexed by a number containing space for the data to be passed back and forth. This enabled the request to record that it was ready to send the CERTIFICATE frame and Keyless to record that it was ready with a frame to send. Whichever of these happened second would do the work to enqueue the frame to send out.

This was not an easy solution: the Keyless module had been written years before and largely unmodified. It fundamentally assumed it could access data from the request, and changing this assumption opened the door to difficult to diagnose bugs. It integrates into BoringSSL callbacks through some pretty tricky mechanisms.

However, I was able to test it using the client from the prototype and it worked. Unfortunately when I pushed the commit in which it worked upstream, the CI system could not find the git repo where the client prototype was due to a setting I forgot to change. The CI system unfortunately didn’t associate this failure with the branch, but attempted to check it out whenever it checked out any other branch people were working on. Murphy ensured my accomplishment had happened on a Friday afternoon Pacific time, and the team that manages the SSL server was then exclusively in London…

Monday morning the issue was quickly fixed, and whatever tempers had frayed were smoothed over when we discovered the deficiency in the CI system that had enabled a single branch to break every build. It’s always tricky to work in a global team. Later Alessandro flew to San Francisco for a number of projects with the team here and we worked side by side trying to get a demonstration working on a test site. Unfortunately there was some difficulty tracking down a bug that prevented it working in production. We had run out of time, and my internship was over.  Alessandro flew back to London, and I flew to Idaho to see the eclipse.

### The End

Ultimately we weren’t able to integrate this feature into the software at our edge: the risks of such intrusive changes for a very experimental feature outweighed the benefits. With not much prospect of support by clients, it would be difficult to get the real savings in performance promised. There also were nontechnical issues in standardization that have made this approach more difficult to implement: any form of traffic direction that doesn’t obey DNS creates issues for network debugging, and there were concerns about the impact of certificate misissuance.

While the project was less successful than I hoped it would be, I learned a lot of important skills: collaborating on large software projects, working with git, and communicating with other implementers about issues we found.  I also got a taste of what it would be like to be on the Research team at Cloudflare and turning research from idea into practical reality and this ultimately confirmed my choice to go into industrial research.

I’ve now returned to Cloudflare full-time, working on extensions for TLS as well as time synchronization. These drafts have continued to progress through the standardization process, and we’ve contributed some of the code I wrote as a starting point for other implementers to use.  If we knew all our projects would work out, they wouldn’t be ambitious enough to be research worth doing.

If this sort of research experience appeals to you, we’re hiring.

# TLS 1.2 to become the minimum for all AWS FIPS endpoints

Post Syndicated from Janelle Hopper original https://aws.amazon.com/blogs/security/tls-1-2-to-become-the-minimum-for-all-aws-fips-endpoints/

To improve security for data in transit, AWS will update all of our AWS Federal Information Processing Standard (FIPS) endpoints to a minimum Transport Layer Security (TLS) version TLS 1.2 over the next year. This update will deprecate the ability to use TLS 1.0 and TLS 1.1 on all FIPS endpoints across all AWS Regions by March 31, 2021. No other AWS endpoints are affected by this change.

As outlined in the AWS Shared Responsibility Model, security and compliance is a shared responsibility between AWS and our customers. When a customer makes a connection from their client application to an AWS service endpoint, the client provides its TLS minimum and TLS maximum version. The AWS service endpoint selects the maximum version offered.

## What should customers do to prepare for this update?

Customers should confirm that their client applications support TLS 1.2 by verifying it is encapsulated between the clients’ minimum and the maximum TLS versions. We encourage customers to be proactive with security standards in order to avoid any impact to availability and to protect the integrity of their data in transit. Also, we recommend configuration changes should be tested in a staging environment, before introduction into production workloads.

## When will these changes happen?

To minimize the impact to our customers who use TLS 1.0 and TLS 1.1, AWS is rolling out changes on a service-by-service basis between now and the end of March 2021. For each service, after a 30-day period during which no connections are detected, AWS will deploy a configuration change to remove support for them. After March 31, 2021, AWS may update the endpoint configuration to remove TLS 1.0 and 1.1, even if we detect customer connections. Additional reminders will be provided before these updates are final.

## What are AWS FIPS endpoints?

All AWS services offer Transport Layer Security (TLS) 1.2 encrypted endpoints that can be used for all API calls. Some AWS services also offer FIPS 140-2 endpoints for customers that require use of FIPS validated cryptographic libraries.

## What is Transport Layer Security (TLS)?

Transport Layer Security (TLS) is a cryptographic protocol designed to provide secure communication across a computer network. API calls to AWS services are secured using TLS.

## Is there more assistance available to help verify or update client applications?

Customers using an AWS Software Development Kit (AWS SDK) can find information about how to properly configure their client’s minimum and maximum TLS versions on the following topics in the AWS SDKs:

Or see Tools to Build on AWS, and browse by programming language to find the relevant SDK.

Additionally, AWS IQ enables customers to find, securely collaborate with, and pay AWS Certified third-party experts for on-demand project work. Visit the AWS IQ page for information about how to submit a request, get responses from experts, and choose the expert with the right skills and experience. Log into your console and select Get Started with AWS IQ to start a request.

The AWS Technical Support tiers cover development and production issues for AWS products and services, along with other key stack components. AWS Support does not include code development for client applications.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Sincerely,
Amazon Web Services

### Janelle Hopper

Janelle Hopper is a Senior Technical Program Manager in AWS Security with over 15 years of experience in the IT security field. She works with AWS services, infrastructure, and administrative teams to identify and drive innovative solutions that improve AWS’ security posture.

# How to improve LDAP security in AWS Directory Service with client-side LDAPS

You can now better protect your organization’s identity data by encrypting Lightweight Directory Access Protocol (LDAP) communications between AWS Directory Service products (AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft AD, and AD Connector) and self-managed Active Directory. Client-side secure LDAP (LDAPS) support enables applications that integrate with AWS Directory Service, such as Amazon WorkSpaces and AWS Single Sign-On, to connect to AD using Secure Sockets Layer/Transport Layer Security (SSL/TLS).

Note: In 2017, AWS Directory Service released server-side LDAPS support in AWS Managed Microsoft AD. This update adds client-side LDAPS support to both AWS Managed Microsoft AD and AD Connector.

In this post, I’ll step through configuring client-side LDAPS to enable encrypted communications between Amazon WorkSpaces and an Amazon Elastic Compute Cloud (Amazon EC2)-based self-managed AD.

## Solution architecture

When you have completed the steps outlined in this post, your solution will look like Figure 1:

Figure 1: Solution architecture

To build the solution, you will follow a three step process:

1. Prepare all prerequisites, including the setup of certificate-based security in the self-managed AD environment.
2. Register your certificate authority (CA) certificate into AWS Directory Service and enable client-side LDAPS (purple arrow in diagram above).
3. Test client-side LDAPS using Amazon WorkSpaces and AWS Directory Service (yellow arrows in diagram above).

## Step one: Set up prerequisites

To follow the steps described in this blog, you will need:

1. A self-managed AD deployment to store your user identities. You can find setup guidance in “Step 1: Set Up Your Environment for Trusts” of the Tutorial: Creating a Trust from AWS Managed Microsoft AD to a Self-Managed Active Directory Installation on Amazon EC2.
2. A server authentication certificate installed on your self-managed AD domain controller. Creating the certificate is typically done one of two ways:
1. Using Active Directory Certificate Services (AD CS) in Windows Server to deploy an in-house CA for issuing server certificates. For help with setting up an AD CS deployment that supports LDAPS, see Microsoft’s LDAP over SSL (LDAPS) Certificate.
2. Purchasing SSL certificates from a commercial CA like Verisign or AWS Certificate Manager. For help using commercial certificates with AD, see How to enable LDAP over SSL with a third-party certification authority.
3. An AWS Directory Service directory, either AWS Managed Microsoft AD or AD Connector, to act as a bridge from AWS to your self-managed AD. See the documentation for AWS Managed Microsoft AD or AD Connector for detailed steps and tutorials. If you’re using AWS Managed Microsoft AD, also set up a two-way trust with your self-managed AD using Tutorial: Creating a Trust from AWS Managed Microsoft AD to a Self-Managed Active Directory Installation on Amazon EC2.
4. Amazon WorkSpaces connected to your AWS Directory Service directory to look up and authenticate users. See the WorkSpaces documentation for detailed steps on using AWS Managed Microsoft AD with a Trusted Domain or AD Connector.

The remainder of this post assumes you have:

1. Created an AWS Managed Microsoft AD instance called corp.example.com
2. Connected corp.example.com via two-way trust to an EC2-based self-managed AD called example.local
3. Deployed an AD CS enterprise root certificate authority in example.local with the common name Example SelfManaged CA.

When you perform the steps described below, you should replace these names with the names you selected.

## Step two: Configure client-side LDAPS in AWS Directory Service

Now, you’ll retrieve the CA certificate — which represents the issuing certificate authority — from your self-managed AD and use it to enable client-side LDAPS in AWS Directory Service. To review CA certificate requirements for AWS Directory Service, see the client-side LDAPS documentation for AWS Managed Microsoft AD or AD Connector.

1. Export the CA certificate from the example.local CA:
1. To open the Certification Authority MMC snap-in, on the example.local server hosting AD CS, right-click the Windows icon, select Run, type certsrv.msc, and select OK.
2. Right-click the name of the CA (in this case, Example SelfManaged CA) and select Properties.
3. In the Properties window, on the General tab, under CA certificates, select the CA certificate listed, and then select View Certificate.

Figure 2: View the CA certificate

4. In the Certificate window, on the Details tab, select Copy to File.
5. In the Certificate Export Wizard, select Next.
6. In the Export File Format screen, select Base-64 encoded X.509 (.CER), and then select Next. This saves the file in the format required by AWS.

Figure 3: Select the base-64 encoded export file format

7. Select Browse, and then select a file name and save location for the CA certificate.
8. Select Save, and then click Next.
9. Select Finish, then select OK to complete the export process.
10. Copy the file to a location accessible by the machine where you will be performing the AWS Directory Service configuration.
2. Register the example.local CA certificate in AWS Directory Service:
1. In the AWS Management Console, select Directory Service, and then select the Directory ID link for the AWS Directory Service directory connected to example.local (in this case, corp.example.com).

Figure 4: Select the Directory ID

2. On the Directory details page, in the Networking & security tab, in the Client-side LDAPS section (shown in Figure 5), select the Actions menu, and then select Register certificate.

Figure 5: Select “Register certificate”

3. In the Register a CA certificate dialog box, select Browse, navigate to the location where you stored the CA certificate for your AD CS certificate authority, select Open, and then select Register certificate.

Figure 6: Register a CA certificate

3. Enable client-side LDAPS in AWS Directory Service:
1. In the Client-side LDAPS section, once the Registration status field for the certificate reads Registered, select the Enable button. Click the Refresh button for updated status.

Figure 7: Check the “Registration status” and then select “Enable”

2. In the Enable client-side LDAPS dialog box, select Enable.
3. In the Client-side LDAPS section, under Status, when the status field changes to Enabled, LDAPS is successfully configured. Click the Refresh button for updated status.

Figure 8: LDAPS successfully configured

## Step three: Test client-side LDAPS with Amazon WorkSpaces

The last step is to test client-side LDAPS with an AWS application. Now that client-side LDAPS has been configured, all LDAP traffic to the self-managed AD will be encrypted and travel over port 636.

Note: Ensure that AWS security group, network firewall, and Windows firewall settings applied to the AWS Directory Service directory (outbound) and self-managed AD (inbound) allow TCP communications on port 636.

To test your client-side LDAPS configuration, perform a WorkSpaces user look up:

1. In the AWS Management Console, choose WorkSpaces, and then click Launch WorkSpaces.
2. On the Select a Directory screen, pick corp.example.com and then select Next Step.
3. On the Identify Users screen, In the Select trust from forest menu, select example.local, and then select Show All Users (see Figure 9 for an example). This search will be executed over LDAPS.

Figure 9: Searching users from a trusted domain with client-side LDAPS

## Summary

In this post, we’ve explored how client-side LDAPS support in AWS Managed Microsoft AD and AD Connector improves LDAP security for AWS applications and services like Amazon WorkSpaces, AWS Single Sign-On, and Amazon QuickSight by encrypting sensitive network traffic between AWS and Active Directory.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

### Dave Martinez

Dave is a Senior Product Manager working on AWS Directory Service. Outside of work he enjoys Seattle sports and coaching his son’s Little League baseball team.

# The NSA Warns of TLS Inspection

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/11/the_nsa_warns_o.html

The NSA has released a security advisory warning of the dangers of TLS inspection:

Transport Layer Security Inspection (TLSI), also known as TLS break and inspect, is a security process that allows enterprises to decrypt traffic, inspect the decrypted content for threats, and then re-encrypt the traffic before it enters or leaves the network. Introducing this capability into an enterprise enhances visibility within boundary security products, but introduces new risks. These risks, while not inconsequential, do have mitigations.

[…]

The primary risk involved with TLSI’s embedded CA is the potential abuse of the CA to issue unauthorized certificates trusted by the TLS clients. Abuse of a trusted CA can allow an adversary to sign malicious code to bypass host IDS/IPSs or to deploy malicious services that impersonate legitimate enterprise services to the hosts.

[…]

A further risk of introducing TLSI is that an adversary can focus their exploitation efforts on a single device where potential traffic of interest is decrypted, rather than try to exploit each location where the data is stored.Setting a policy to enforce that traffic is decrypted and inspected only as authorized, and ensuring that decrypted traffic is contained in an out-of-band, isolated segment of the network prevents unauthorized access to the decrypted traffic.

[…]

To minimize the risks described above, breaking and inspecting TLS traffic should only be conducted once within the enterprise network. Redundant TLSI, wherein a client-server traffic flow is decrypted, inspected, and re-encrypted by one forward proxy and is then forwarded to a second forward proxy for more of the same,should not be performed.Inspecting multiple times can greatly complicate diagnosing network issues with TLS traffic. Also, multi-inspection further obscures certificates when trying to ascertain whether a server should be trusted. In this case, the “outermost” proxy makes the decisions on what server certificates or CAs should be trusted and is the only location where certificate pinning can be performed.Finally, a single TLSI implementation is sufficient for detecting encrypted traffic threats; additional TLSI will have access to the same traffic. If the first TLSI implementation detected a threat, killed the session, and dropped the traffic, then additional TLSI implementations would be rendered useless since they would not even receive the dropped traffic for further inspection. Redundant TLSI increases the risk surface, provides additional opportunities for adversaries to gain unauthorized access to decrypted traffic, and offers no additional benefits.

Nothing surprising or novel. No operational information about who might be implementing these attacks. No classified information revealed.

News article.

# Even faster connection establishment with QUIC 0-RTT resumption

Post Syndicated from Alessandro Ghedini original https://blog.cloudflare.com/even-faster-connection-establishment-with-quic-0-rtt-resumption/

One of the more interesting features introduced by TLS 1.3, the latest revision of the TLS protocol, was the so called “zero roundtrip time connection resumption”, a mode of operation that allows a client to start sending application data, such as HTTP requests, without having to wait for the TLS handshake to complete, thus reducing the latency penalty incurred in establishing a new connection.

The basic idea behind 0-RTT connection resumption is that if the client and server had previously established a TLS connection between each other, they can use information cached from that session to establish a new one without having to negotiate the connection’s parameters from scratch. Notably this allows the client to compute the private encryption keys required to protect application data before even talking to the server.

However, in the case of TLS, “zero roundtrip” only refers to the TLS handshake itself: the client and server are still required to first establish a TCP connection in order to be able to exchange TLS data.

### Zero means zero

QUIC goes a step further, and allows clients to send application data in the very first roundtrip of the connection, without requiring any other handshake to be completed beforehand.

After all, QUIC already shaved a full round-trip off of a typical connection’s handshake by merging the transport and cryptographic handshakes into one. By reducing the handshake by an additional roundtrip, QUIC achieves real 0-RTT connection establishment.

It literally can’t get any faster!

### Attack of the clones

Unfortunately, 0-RTT connection resumption is not all smooth sailing, and it comes with caveats and risks, which is why Cloudflare does not enable 0-RTT connection resumption by default. Users should consider the risks involved and decide whether to use this feature or not.

For starters, 0-RTT connection resumption does not provide forward secrecy, meaning that a compromise of the secret parameters of a connection will trivially allow compromising the application data sent during the 0-RTT phase of new connections resumed from it. Data sent after the 0-RTT phase, meaning after the handshake has been completed, would still be safe though, as TLS 1.3 (and QUIC) will still perform the normal key exchange algorithm (which is forward secret) for data sent after the handshake completion.

More worryingly, application data sent during 0-RTT can be captured by an on-path attacker and then replayed multiple times to the same server. In many cases this is not a problem, as the attacker wouldn’t be able to decrypt the data, which is why 0-RTT connection resumption is useful, but in some cases this can be dangerous.

For example, imagine a bank that allows an authenticated user (e.g. using HTTP cookies, or other HTTP authentication mechanisms) to send money from their account to another user by making an HTTP request to a specific API endpoint. If an attacker was able to capture that request when 0-RTT connection resumption was used, they wouldn’t be able to see the plaintext and get the user’s credentials, because they wouldn’t know the secret key used to encrypt the data; however they could still potentially drain that user’s bank account by replaying the same request over and over:

Of course this problem is not specific to banking APIs: any non-idempotent request has the potential to cause undesired side effects, ranging from slight malfunctions to serious security breaches.

In order to help mitigate this risk, Cloudflare will always reject 0-RTT requests that are obviously not idempotent (like POST or PUT requests), but in the end it’s up to the application sitting behind Cloudflare to decide which requests can and cannot be allowed with 0-RTT connection resumption, as even innocuous-looking ones can have side effects on the origin server.

To help origins detect and potentially disallow specific requests, Cloudflare also follows the techniques described in RFC8470. Notably, Cloudflare will add the Early-Data: 1 HTTP header to requests received during 0-RTT resumption that are forwarded to origins.

Origins able to understand this header can then decide to answer the request with the 425 (Too Early) HTTP status code, which will instruct the client that originated the request to retry sending the same request but only after the TLS or QUIC handshake have fully completed, at which point there is no longer any risk of replay attacks. This could even be implemented as part of a Cloudflare Worker.

This makes it possible for origins to allow 0-RTT requests for endpoints that are safe, such as a website’s index page which is where 0-RTT is most useful, as that is typically the first request a browser makes after establishing a connection, while still protecting other endpoints such as APIs and form submissions. But if an origin does not provide any of those non-idempotent endpoints, no action is required.

### One stop shop for all your 0-RTT needs

Just like we previously did for TLS 1.3, we now support 0-RTT resumption for QUIC as well. In honor of this event, we have dusted off the user-interface controls that allow Cloudflare users to enable this feature for their websites, and introduced a dedicated toggle to control whether 0-RTT connection resumption is enabled or not, which can be found under the “Network” tab on the Cloudflare dashboard:

When TLS 1.3 and/or QUIC (via the HTTP/3 toggle) are enabled, 0-RTT connection resumption will be automatically offered to clients that support it, and the replay mitigation mentioned above will also be applied to the connections making use of this feature.

In addition, if you are a user of our open-source HTTP/3 patch for NGINX, after updating the patch to the latest version, you’ll be able to enable support for 0-RTT connection resumption in your own NGINX-based HTTP/3 deployment by using the built-in “ssl_early_data” option, which will work for both TLS 1.3 and QUIC+HTTP/3.

# Post-quantum TLS now supported in AWS KMS

Post Syndicated from Andrew Hopkins original https://aws.amazon.com/blogs/security/post-quantum-tls-now-supported-in-aws-kms/

AWS Key Management Service (AWS KMS) now supports post-quantum hybrid key exchange for the Transport Layer Security (TLS) network encryption protocol that is used when connecting to KMS API endpoints. In this post, I’ll tell you what post-quantum TLS is, what hybrid key exchange is, why it’s important, how to take advantage of this new feature, and how to give us feedback.

## What is post-quantum TLS?

Post-quantum TLS is a feature that adds new, post-quantum cipher suites to the protocol. AWS implements TLS using s2n, a streamlined open source implementation of TLS. In June, 2019, AWS introduced post-quantum s2n, which implements two proposed post-quantum hybrid cipher suites specified in this IETF draft. The cipher suites specify a key exchange that provides the security protections of both the classical and post-quantum schemes.

## Why is this important?

A large-scale quantum computer would break the current public key cryptography that is used for key exchange in every TLS connection. While a large-scale quantum computer is not available today, it’s still important to think about and plan for your long-term security needs. TLS traffic recorded today could be decrypted by a large-scale quantum computer in the future. If you’re developing applications that rely on the long-term confidentiality of data passed over a TLS connection, you should consider a plan to migrate to post-quantum cryptography before a large-scale quantum computer is available for use by potential adversaries. AWS is working to prepare for this future, and we want you to be well-prepared, too.

We’re offering this feature now instead of waiting so you’ll have a way to measure the potential performance impact to your applications, and you’ll have the additional benefit of the protection afforded by the proposed post-quantum schemes today. While we believe the use of this feature raises the already high security bar for connecting to KMS endpoints, these new cipher suites will have an impact on bandwidth utilization, latency, and could also create issues for intermediate systems that proxy TLS connections. We’d like to get feedback from you on the effectiveness of our implementation so we can improve it over time.

### Some background on post-quantum TLS

Today, all requests to AWS KMS use TLS with one of two key exchange schemes:

FFDHE and ECDHE are industry standards for secure key exchange. KMS uses only ephemeral keys for TLS key negotiation; this ensures every connection uses a unique key and the compromise of one connection does not affect the security of another connection. They are secure today against known cryptanalysis techniques which use classic computers; however, they’re not secure against known attacks which use a large-scale quantum computer. In the future a sufficiently capable large-scale quantum computer could run Shor’s Algorithm to recover the TLS session key of a recorded session, and therefore gain access to the data inside. Protecting against a large-scale quantum computer requires using a post-quantum key exchange algorithm during the TLS handshake.

The possibility of large-scale quantum computing has spurred the development of new quantum-resistant cryptographic algorithms. The National Institute for Science and Technology (NIST) has started the process of standardizing post-quantum cryptographic algorithms. AWS contributed to two NIST submissions:

BIKE and SIKE are Key Encapsulation Mechanisms (KEMs); a KEM is a type of key exchange used to establish a shared symmetric key. Post-quantum s2n only uses ephemeral BIKE and SIKE keys.

The NIST standardization process isn’t expected to complete until 2024. Until then, there is a risk that the exclusive use of proposed algorithms like BIKE and SIKE could expose data in TLS connections to security vulnerabilities not yet discovered. To mitigate this risk and use these new post-quantum schemes safely today, we need a way to combine classical algorithms with the expected post-quantum security of the new algorithms submitted to NIST. The Hybrid Post-Quantum Key Encapsulation Methods for Transport Layer Security 1.2 IETF draft describes how to combine BIKE and SIKE with ECDHE to create two new cipher suites for TLS.

These two cipher suites use a hybrid key exchange that performs two independent key exchanges during the TLS handshake and then cryptographically combines the keys into a single TLS session key. This strategy combines the high assurance of a classical key exchange with the security of the proposed post-quantum key exchanges.

## The effect of hybrid post-quantum TLS on performance

Post-quantum cipher suites have a different performance profile and bandwidth requirements than traditional cipher suites. We measured the latency and bandwidth for a single handshake on an EC2 C5 2x.large. This provides a baseline for what to expect when you connect to KMS with the SDK. Your exact results will depend on your hardware (CPU speed and number of cores), existing workloads (how often you call KMS and what other work your application performs), and your network (location and capacity).

BIKE and SIKE have different performance tradeoffs: BIKE has faster computations and large keys, and SIKE has slower computations and smaller keys. The tables below show the results of the AWS measurements. ECDHE, a classic cryptographic key exchange algorithm, is included by itself for comparison.

Table 1
TLS MessageECDHE (bytes)ECDHE w/ BIKE (bytes)ECDHE w/SIKE (bytes)
ClientHello139147147
ServerKeyExchange3292,875711
ClientKeyExchange662,610470

Table 1 shows the amount of data (in bytes) sent in each TLS message. The ClientHello message is larger for post-quantum cipher suites because they include a new ClientHello extension. The key exchange messages are larger because they include BIKE or SIKE messages.

Table 2
ItemECDHE (ms)ECDHE w/ BIKE (ms)ECDHE w/SIKE (ms)
Server processing time0.1120.2695.53
Client processing time0.100.3957.05
Total handshake time1.1925.58155.08

Table 2 shows the time (in milliseconds) a client and server in the same region take to complete a handshake. Server processing time includes: key generation, signing the server key exchange message, and processing the client key exchange message. The client processing time includes: verifying the server’s certificate, processing the server key exchange message, and generating the client key exchange message. The total time was measured on the client from the start of the handshake to the end and includes network transfer time. All connections used RSA authentication with a 2048-bit key, and ECDHE used the secp256r1 curve. The BIKE test used the BIKE-1 Level 1 parameter and the SIKE test used the SIKEp503 parameter.

A TLS handshake is only performed once to setup a new connection. The SDK will reuse connections for multiple KMS requests when possible. This means that you don’t want to include measurements of subsequent round-trips under an existing TLS session, otherwise you will skew your performance data.

## How to use hybrid post-quantum cipher suites

Note: The “AWS CRT HTTP Client” in the aws-crt-dev-preview branch of the aws-sdk-java-v2 repository is a beta release. This beta release and your use are subject to Section 1.10 (“Beta Service Participation”) of the AWS Service Terms.

To use the post-quantum cipher suites with AWS KMS, you’ll need the Developer Previews of the Java SDK 2.0 and the AWS Common Runtime. You’ll need to configure the AWS Common Runtime HTTP client to use s2n’s post-quantum hybrid cipher suites, and configure the AWS Java SDK 2.0 to use that HTTP client. This client can then be used when connecting to any KMS endpoints, but only those endpoints that are not using FIPS 140-2 validated crypto for the TLS termination. For example, kms.<region>.amazonaws.com supports the use of post-quantum cipher suites, while kms-fips.<region>.amazonaws.com does not.

To see a complete example of everything setup check out the example application here.

Figure 1: GitHub and package layout

Figure 1 shows the GitHub and package layout. The steps below will walk you through building and configuring the SDK.


$git clone [email protected]:aws/aws-sdk-java-v2.git --branch aws-crt-dev-preview$ cd aws-sdk-java-v2


2. Build the aws-crt-client JAR:

mvn install -Pquick  3. In your project add the AWS Common Runtime client to your Maven Dependencies:  <dependency> <groupId>software.amazon.awssdk</groupId> <artifactId>aws-crt-client</artifactId> <version>2.10.7-SNAPSHOT</version> </dependency>  4. Configure the new SDK and cipher suite in your application’s existing initialization code:  if(!TLS_CIPHER_KMS_PQ_TLSv1_0_2019_06.isSupported()){ throw new RuntimeException("Post Quantum Ciphers not supported on this Platform"); } SdkAsyncHttpClient awsCrtHttpClient = AwsCrtAsyncHttpClient.builder() .tlsCipherPreference(TLS_CIPHER_KMS_PQ_TLSv1_0_2019_06) .build(); KmsAsyncClient kms = KmsAsyncClient.builder() .httpClient(awsCrtHttpClient) .build(); ListKeysResponse response = kms.listKeys().get();  Now, all connections made to AWS KMS in supported regions will use the new hybrid post-quantum cipher suites. ## Things to try Here are some ideas about how to use this post-quantum-enabled client: • Run load tests and benchmarks. These new cipher suites perform differently than traditional key exchange algorithms. You might need to adjust your connection timeouts to allow for the longer handshake times or, if you’re running inside an AWS Lambda function, extend the execution timeout setting. • Try connecting from different locations. Depending on the network path your request takes, you might discover that intermediate hosts, proxies, or firewalls with deep packet inspection (DPI) block the request. This could be due to the new cipher suites in the ClientHello or the larger key exchange messages. If this is the case, you might need to work with your Security team or IT administrators to update the relevant configuration to unblock the new TLS cipher suites. We’d like to hear from your about how your infrastructure interacts with this new variant of TLS traffic. ## More info If you’re interested to learn more about post-quantum cryptography check out: ## Conclusion In this blog post, I introduced you to the topic of post-quantum security and covered what AWS and NIST are doing to address the issue. I also showed you how to begin experimenting with hybrid post-quantum key exchange algorithms for TLS when connecting to KMS endpoints. If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about how to configure the HTTP client or its interaction with KMS endpoints, please start a new thread on the AWS KMS discussion forum. # The TLS Post-Quantum Experiment Post Syndicated from Kris Kwiatkowski original https://blog.cloudflare.com/the-tls-post-quantum-experiment/ In June, we announced a wide-scale post-quantum experiment with Google. We implemented two post-quantum (i.e., not yet known to be broken by quantum computers) key exchanges, integrated them into our TLS stack and deployed the implementation on our edge servers and in Chrome Canary clients. The goal of the experiment was to evaluate the performance and feasibility of deployment in TLS of two post-quantum key agreement ciphers. In our previous blog post on post-quantum cryptography, we described differences between those two ciphers in detail. In case you didn’t have a chance to read it, we include a quick recap here. One characteristic of post-quantum key exchange algorithms is that the public keys are much larger than those used by “classical” algorithms. This will have an impact on the duration of the TLS handshake. For our experiment, we chose two algorithms: isogeny-based SIKE and lattice-based HRSS. The former has short key sizes (~330 bytes) but has a high computational cost; the latter has larger key sizes (~1100 bytes), but is a few orders of magnitude faster. During NIST’s Second PQC Standardization Conference, Nick Sullivan presented our approach to this experiment and some initial results. Quite accurately, he compared NTRU-HRSS to an ostrich and SIKE to a turkey—one is big and fast and the other is small and slow. ## Setup & Execution We based our experiment on TLS 1.3. Cloudflare operated the server-side TLS connections and Google Chrome (Canary and Dev builds) represented the client side of the experiment. We enabled both CECPQ2 (HRSS + X25519) and CECPQ2b (SIKE/p434 + X25519) key-agreement algorithms on all TLS-terminating edge servers. Since the post-quantum algorithms are considered experimental, the X25519 key exchange serves as a fallback to ensure the classical security of the connection. Clients participating in the experiment were split into 3 groups—those who initiated TLS handshake with post-quantum CECPQ2, CECPQ2b or non post-quantum X25519 public keys. Each group represented approximately one third of the Chrome Canary population participating in the experiment. In order to distinguish between clients participating in or excluded from the experiment, we added a custom extension to the TLS handshake. It worked as a simple flag sent by clients and echoed back by Cloudflare edge servers. This allowed us to measure the duration of TLS handshakes only for clients participating in the experiment. For each connection, we collected telemetry metrics. The most important metric was a TLS server-side handshake duration defined as the time between receiving the Client Hello and Client Finished messages. The diagram below shows details of what was measured and how post-quantum key exchange was integrated with TLS 1.3. The experiment ran for 53 days in total, between August and October. During this time we collected millions of data samples, representing 5% of (anonymized) TLS connections that contained the extension signaling that the client was part of the experiment. We carried out the experiment in two phases. In the first phase of the experiment, each client was assigned to use one of the three key exchange groups, and each client offered the same key exchange group for every connection. We collected over 10 million records over 40 days. In the second phase of the experiment, client behavior was modified so that each client randomly chose which key exchange group to offer for each new connection, allowing us to directly compare the performance of each algorithm on a per-client basis. Data collection for this phase lasted 13 days and we collected 270 thousand records. ## Results We now describe our server-side measurement results. Client-side results are described at https://www.imperialviolet.org/2019/10/30/pqsivssl.html. ### What did we find? The primary metric we collected for each connection was the server-side handshake duration. The below histograms show handshake duration timings for all client measurements gathered in the first phase of the experiment, as well as breakdowns into the top five operating systems. The operating system breakdowns shown are restricted to only desktop/laptop devices except for Android, which consists of only mobile devices. It’s clear from the above plots that for most clients, CECPQ2b performs worse than CECPQ2 and CONTROL. Thus, the small key size of CECPQ2b does not make up for its large computational cost—the ostrich outpaces the turkey. ### Digging a little deeper This means we’re done, right? Not quite. We are interested in determining if there are any populations of TLS clients for which CECPQ2b consistency outperforms CECPQ2. This requires taking a closer look at the long tail of handshake durations. The below plots show cumulative distribution functions (CDFs) of handshake timings zoomed in on the 80th percentile (e.g., showing the top 20% of slowest handshakes). Here, we start to see something interesting. For Android, Linux, and Windows devices, there is a crossover point where CECPQ2b actually starts to outperform CECPQ2 (Android: ~94th percentile, Linux: ~92nd percentile, Windows: ~95th percentile). macOS and ChromeOS do not appear to have these crossover points. These effects are small but statistically significant in some cases. The below table shows approximate 95% confidence intervals for the 50th (median), 95th, and 99th percentiles of handshake durations for each key exchange group and device type, calculated using Maritz-Jarrett estimators. The numbers within square brackets give the lower and upper bounds on our estimates for each percentile of the “true” distribution of handshake durations based on the samples collected in the experiment. For example, with a 95% confidence level we can say that the 99th percentile of handshake durations for CECPQ2 on Android devices lies between 4057ms and 4478ms, while the 99th percentile for CECPQ2b lies between 3276ms and 3646ms. Since the intervals do not overlap, we say that with statistical significance, the experiment indicates that CECPQ2b performs better than CECPQ2 for the slowest 1% of Android connections. Configurations where CECPQ2 or CECPQ2b outperforms the other with statistical significance are marked with green in the table. ### Per-client comparison A second phase of the experiment directly examined the performance of each key exchange algorithm for individual clients, where a client is defined to be a unique (anonymized) IP address and user agent pair. Instead of choosing a single key exchange algorithm for the duration of the experiment, clients randomly selected one of the experiment configurations for each new connection. Although the duration and sample size were limited for this phase of the experiment, we collected at least three handshake measurements for each group configuration from 3900 unique clients. The plot below shows for each of these clients the difference in latency between CECPQ2 and CECPQ2b, taking the minimum latency sample for each key exchange group as the representative value. The CDF plot shows that for 80% of clients, CECPQ2 outperformed or matched CECPQ2b, and for 99% of clients, the latency gap remained within 70ms. At a high level, this indicates that very few clients performed significantly worse with CECPQ2 over CECPQ2b. ### Do other factors impact the latency gap? We looked at a number of other factors—including session resumption, IP version, and network location—to see if they impacted the latency gap between CECPQ2 and CECPQ2b. These factors impacted the overall handshake latency, but we did not find that any made a significant impact on the latency gap between post-quantum ciphers. We share some interesting observations from this analysis below. #### Session resumption Approximately 53% of all connections in the experiment were completed with TLS handshake resumption. However, the percentage of resumed connections varied significantly based on the device configuration. Connections from mobile devices were only resumed ~25% of the time, while between 40% and 70% of connections from laptop/desktop devices were resumed. Additionally, resumption provided between a 30% and 50% speedup for all device types. #### IP version We also examined the impact of IP version on handshake latency. Only 12.5% of the connections in the experiment used IPv6. These connections were 20-40% faster than IPv4 connections for desktop/laptop devices, but ~15% slower for mobile devices. This could be an artifact of IPv6 being generally deployed on newer devices with faster processors. For Android, the experiment was only run on devices with more modern processors, which perhaps eliminated the bias. #### Network location The slow connections making up the long tail of handshake durations were not isolated to a few countries, Autonomous Systems (ASes), or subnets, but originated from a globally diverse set of clients. We did not find a correlation between the relative performance of the two post-quantum key exchange algorithms based on these factors. ## Discussion We found that CECPQ2 (the ostrich) outperformed CECPQ2 (the turkey) for the majority of connections in the experiment, indicating that fast algorithms with large keys may be more suitable for TLS than slow algorithms with small keys. However, we observed the opposite—that CECPQ2b outperformed CECPQ2—for the slowest connections on some devices, including Windows computers and Android mobile devices. One possible explanation for this is packet fragmentation and packet loss. The maximum size of TCP packets that can be sent across a network is limited by the maximum transmission unit (MTU) of the network path, which is often ~1400 bytes. During the TLS handshake the server responds to the client with its public key and ciphertext, the combined size of which exceeds the MTU, so it is likely that handshake messages must be split across multiple TCP packets. This increases the risk of lost packets and delays due to retransmission. A repeat of this experiment that includes collection of fine-grained TCP telemetry could confirm this hypothesis. A somewhat surprising result of this experiment is just how fast HRSS performs for the majority of connections. Recall that the CECPQ2 cipher performs key exchange operations for both X25519 and HRSS, but the additional overhead of HRSS is barely noticeable. Comparing benchmark results, we can see that HRSS will be faster than X25519 on the server side and slower on the client side. In our design, the client side performs two operations—key generation and KEM decapsulation. Looking at those two operations we can see that the key generation is a bottleneck here. Key generation: 3553.5 [ops/sec] KEM decapsulation: 17186.7 [ops/sec]  In algorithms with quotient-style keys (like NTRU), the key generation algorithm performs an inversion in the quotient ring—an operation that is quite computationally expensive. Alternatively, a TLS implementation could generate ephemeral keys ahead of time in order to speed up key exchange. There are several other lattice-based key exchange candidates that may be worth experimenting with in the context of TLS key exchange, which are based on different underlying principles than the HRSS construction. These candidates have similar key sizes and faster key generation algorithms, but have their own drawbacks. For now, HRSS looks like the more promising algorithm for use in TLS. In the case of SIKE, we implemented the most recent version of the algorithm, and instantiated it with the most performance-efficient parameter set for our experiment. The algorithm is computationally expensive, so we were required to use assembly to optimize it. In order to ensure best performance on Intel, most performance-critical operations have two different implementations; the library detects CPU capabilities and uses faster instructions if available, but otherwise falls back to a slightly slower generic implementation. We developed our own optimizations for 64-bit ARM CPUs. Nevertheless, our results show that SIKE incurred a significant overhead for every connection, especially on devices with weaker processors. It must be noted that high-performance isogeny-based public key cryptography is arguably much less developed than its lattice-based counterparts. Some ideas to develop this are floating around, and we hope to see performance improvements in the future. # New Reductor Nation-State Malware Compromises TLS Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/10/new_reductor_na.html Kaspersky has a detailed blog post about a new piece of sophisticated malware that it’s calling Reductor. The malware is able to compromise TLS traffic by infecting the computer with hacked TLS engine substituted on the fly, “marking” infected TLS handshakes by compromising the underlining random-number generator, and adding new digital certificates. The result is that the attacker can identify, intercept, and decrypt TLS traffic from the infected computer. The Kaspersky Attribution Engine shows strong code similarities between this family and the COMPfun Trojan. Moreover, further research showed that the original COMpfun Trojan most probably is used as a downloader in one of the distribution schemes. Based on these similarities, we’re quite sure the new malware was developed by the COMPfun authors. The COMpfun malware was initially documented by G-DATA in 2014. Although G-DATA didn’t identify which actor was using this malware, Kaspersky tentatively linked it to the Turla APT, based on the victimology. Our telemetry indicates that the current campaign using Reductor started at the end of April 2019 and remained active at the time of writing (August 2019). We identified targets in Russia and Belarus. […] Turla has in the past shown many innovative ways to accomplish its goals, such as using hijacked satellite infrastructure. This time, if we’re right that Turla is the actor behind this new wave of attacks, then with Reductor it has implemented a very interesting way to mark a host’s encrypted TLS traffic by patching the browser without parsing network packets. The victimology for this new campaign aligns with previous Turla interests. We didn’t observe any MitM functionality in the analyzed malware samples. However, Reductor is able to install digital certificates and mark the targets’ TLS traffic. It uses infected installers for initial infection through HTTP downloads from warez websites. The fact the original files on these sites are not infected also points to evidence of subsequent traffic manipulation. The attribution chain from Reductor to COMPfun to Turla is thin. Speculation is that the attacker behind all of this is Russia. # Towards Post-Quantum Cryptography in TLS Post Syndicated from Kris Kwiatkowski original https://blog.cloudflare.com/towards-post-quantum-cryptography-in-tls/ We live in a completely connected society. A society connected by a variety of devices: laptops, mobile phones, wearables, self-driving or self-flying things. We have standards for a common language that allows these devices to communicate with each other. This is critical for wide-scale deployment – especially in cryptography where the smallest detail has great importance. One of the most important standards-setting organizations is the National Institute of Standards and Technology (NIST), which is hugely influential in determining which standardized cryptographic systems see worldwide adoption. At the end of 2016, NIST announced it would hold a multi-year open project with the goal of standardizing new post-quantum (PQ) cryptographic algorithms secure against both quantum and classical computers. Many of our devices have very different requirements and capabilities, so it may not be possible to select a “one-size-fits-all” algorithm during the process. NIST mathematician, Dustin Moody, indicated that institute will likely select more than one algorithm: “There are several systems in use that could be broken by a quantum computer – public-key encryption and digital signatures, to take two examples – and we will need different solutions for each of those systems.” Initially, NIST selected 82 candidates for further consideration from all submitted algorithms. At the beginning of 2019, this process entered its second stage. Today, there are 26 algorithms still in contention. ### Post-quantum cryptography: what is it really and why do I need it? In 1994, Peter Shor made a significant discovery in quantum computation. He found an algorithm for integer factorization and computing discrete logarithms, both believed to be hard to solve in classical settings. Since then it has become clear that the ‘hard problems’ on which cryptosystems like RSA and elliptic curve cryptography (ECC) rely – integer factoring and computing discrete logarithms, respectively – are efficiently solvable with quantum computing. A quantum computer can help to solve some of the problems that are intractable on a classical computer. In theory, they could efficiently solve some fundamental problems in mathematics. This amazing computing power would be highly beneficial, which is why companies are actually trying to build quantum computers. At first, Shor’s algorithm was merely a theoretical result – quantum computers powerful enough to execute it did not exist – but this is quickly changing. In March 2018, Google announced a 72-qubit universal quantum computer. While this is not enough to break say RSA-2048 (still more is needed), many fundamental problems have already been solved. In anticipation of wide-spread quantum computing, we must start the transition from classical public-key cryptography primitives to post-quantum (PQ) alternatives. It may be that consumers will never get to hold a quantum computer, but a few powerful attackers who will get one can still pose a serious threat. Moreover, under the assumption that current TLS handshakes and ciphertexts are being captured and stored, a future attacker could crack these stored individual session keys and use those results to decrypt the corresponding individual ciphertexts. Even strong security guarantees, like forward secrecy, do not help out much there. In 2006, the academic research community launched a conference series dedicated to finding alternatives to RSA and ECC. This so-called post-quantum cryptography should run efficiently on a classical computer, but it should also be secure against attacks performed by a quantum computer. As a research field, it has grown substantially in popularity. Several companies, including Google, Microsoft, Digicert and Thales, are already testing the impact of deploying PQ cryptography. Cloudflare is involved in some of this, but we want to be a company that leads in this direction. The first thing we need to do is understand the real costs of deploying PQ cryptography, and that’s not obvious at all. ### What options do we have? Many submissions to the NIST project are still under study. Some are very new and little understood; others are more mature and already standardized as RFCs. Some have been broken or withdrawn from the process; others are more conservative or illustrate how far classical cryptography would need to be pushed so that a quantum computer could not crack it within a reasonable cost. Some are very slow and big; others are not. But most cryptographic schemes can be categorized into these families: lattice-based, multivariate, hash-based (signatures only), code-based and isogeny-based. For some algorithms, nevertheless, there is a fear they may be too inconvenient to use with today’s Internet. We must also be able to integrate new cryptographic schemes with existing protocols, such as SSH or TLS. To do that, designers of PQ cryptosystems must consider these characteristics: • Latency caused by encryption and decryption on both ends of the communication channel, assuming a variety of devices from big and fast servers to slow and memory constrained IoT (Internet of Things) devices • Small public keys and signatures to minimize bandwidth • Clear design that allows cryptanalysis and determining weaknesses that could be exploited • Use of existing hardware for fast implementation The work on post-quantum public key cryptosystems must be done in a full view of organizations, governments, cryptographers, and the public. Emerging ideas must be properly vetted by this community to ensure widespread support. ### Helping Build a Better Internet To better understand the post-quantum world, Cloudflare began experimenting with these algorithms and used them to provide confidentiality in TLS connections. With Google, we are proposing a wide-scale experiment that combines client- and server-side data collection to evaluate the performance of key-exchange algorithms on actual users’ devices. We hope that this experiment helps choose an algorithm with the best characteristics for the future of the Internet. With Cloudflare’s highly distributed network of access points and Google’s Chrome browser, both companies are in a very good position to perform this experiment. Our goal is to understand how these algorithms act when used by real clients over real networks, particularly candidate algorithms with significant differences in public-key or ciphertext sizes. Our focus is on how different key sizes affect handshake time in the context of Transport Layer Security (TLS) as used on the web over HTTPS. Our primary candidates are an NTRU-based construction called HRSS-SXY (by Hülsing – Rijneveld – Schanck – Schwabe, and Tsunekazu Saito – Keita Xagawa – Takashi Yamakawa) and an isogeny-based Supersingular Isogeny Key Encapsulation (SIKE). Details of both algorithms are described in more detail below in section “Dive into post-quantum cryptography”. This table shows a few characteristics for both algorithms. Performance timings were obtained by running the BoringSSL speed test on an Intel Skylake CPU. KEMPublic Key size (bytes)Ciphertext (bytes)Secret size (bytes)KeyGen (op/sec)Encaps (op/sec)Decaps (op/sec)NIST level HRSS-SXY11381138323952.376034.721905.81 SIKE/p43433034616367.1228.0209.31 Currently the most commonly used key exchange algorithm (according to Cloudflare’s data) is the non-quantum X25519. Its public keys are 32 bytes and BoringSSL can generate 49301.2 key pairs, and is able to perform 19628.6 key agreements every second on my Skylake CPU. Note that HRSS-SXY shows a significant speed advantage, while SIKE has a size advantage. In our experiment, we will deploy these two algorithms on both the server side using Cloudflare’s infrastructure, and the client side using Chrome Canary; both sides will collect telemetry information about TLS handshakes using these two PQ algorithms to see how they perform in practice. ### What do we expect to find? In 2018, Adam Langley conducted an experiment with the goal of evaluating the likely latency impact of a post-quantum key exchange in TLS. Chrome was augmented with the ability to include a dummy, arbitrarily-sized extension in the TLS ClientHello (fixed number of bytes of random noise). After taking into account the performance and key size offered by different types key-exchange schemes, he concluded that constructs based on structured lattices may be most suitable for future use in TLS. However, Langley also observed a peculiar phenomenon; client connections measured at 95th percentile had much higher latency than the median. It means that in those cases, isogeny-based systems may be a better choice. In the Dive into post-quantum cryptography, we describe the difference between isogeny-based SIKE and lattice-based NTRU cryptosystems. In our experiment, we want to more thoroughly evaluate and ascribe root causes to these unexpected latency increases. We would particularly like to learn more about the characteristics of those networks: what causes increased latency? how does the performance cost of isogeny-based algorithms impact the TLS handshake? We want to answer key questions, like: • What is a good ratio for speed-to-key size (or how much faster could SIKE get to achieve the client-perceived performance of HRSS)? • How do network middleboxes behave when clients use new PQ algorithms, and which networks have problematic middleboxes? • How do the different properties of client networks affect TLS performance with different PQ key exchanges? Can we identify specific autonomous systems, device configurations, or network configurations that favor one algorithm over another? How is performance affected in the long tail? ### Experiment Design Our experiment will involve both server- and client-side performance statistics collection from real users around the world (all the data is anonymized). Cloudflare is operating the server-side TLS connections. We will enable the CECPQ2 (HRSS + X25519) and CECPQ2b (SIKE + X25519) key-agreement algorithms on all TLS-terminating edge servers. In this experiment, the ClientHello will contain a CECPQ2 or CECPQ2b public key (but never both). Additionally, Chrome will always include X25519 for servers that do not support post-quantum key exchange. The post-quantum key exchange will only be negotiated in TLS version 1.3 when both sides support it. Since Cloudflare only measures the server side of the connection, it is impossible to determine the time it takes for a ClientHello sent from Chrome to reach Cloudflare’s edge servers; however, we can measure the time it takes for the TLS ServerHello message containing post-quantum key exchange, to reach the client and for the client to respond. On the client side, Chrome Canary will operate the TLS connection. Google will enable either CECPQ2 or CECPQ2b in Chrome for the following mix of architecture and OSes: • x86-64: Windows, Linux, macOS, ChromeOS • aarch64: Android Our high-level expectation is to get similar results as Langley’s original experiment in 2018 — slightly increased latency for the 50th percentile and higher latency for the 95th. Unfortunately, data collected purely from real users’ connections may not suffice for diagnosing the root causes of why some clients experience excessive slowdown. To this end, we will perform follow-up experiments based on per-client information we collect server-side. Our primary hypothesis is that excessive slowdowns, like those Langley observed, are largely due to in-network events, such as middleboxes or bloated/lossy links. As a first-pass analysis, we will investigate whether the slowed-down clients share common network features, like common ASes, common transit networks, common link types, and so on. To determine this, we will run a traceroute from vantage points close to our servers back toward the clients (not overloading any particular links or hosts) and study whether some client locations are subject to slowdowns for all destinations or just for some. ### Dive into post-quantum cryptography Be warned: the details of PQ cryptography may be quite complicated. In some cases it builds on classical cryptography, and in other cases it is completely different math. It would be rather hard to describe details in a single blog post. Instead, we are giving you an intuition of post-quantum cryptography, rather than provide deep academic-level descriptions. We’re skipping a lot of details for the sake of brevity. Nevertheless, settle in for a bit of an epic journey because we have a lot to cover. ### Key encapsulation mechanism NIST requires that all key-agreement algorithms have a form of key-encapsulation mechanism (KEM). The KEM is a simplified form of public key encryption (PKE). As PKE, it also allows agreement on a secret, but in a slightly different way. The idea is that the session key is an output of the encryption algorithm, conversely to public key encryption schemes where session key is an input to the algorithm. In a KEM, Alice generates a random key and uses the pre-generated public key from Bob to encrypt (encapsulate) it. This results in a ciphertext sent to Bob. Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the random key. The idea was initially introduced by Cramer and Shoup. Experience shows that such constructs are easier to design, analyze, and implement as the scheme is limited to communicating a fixed-size session key. Leonardo Da Vinci said, “Simplicity is the ultimate sophistication,” which is very true in cryptography. The key exchange (KEX) protocol, like Diffie-Hellman, is yet a different construct: it allows two parties to agree on a shared secret that can be used as a symmetric encryption key. For example, Alice generates a key pair and sends a public key to Bob. Bob does the same and uses his own key pair with Alice’s public key to generate the shared secret. He then sends his public key to Alice who can now generate the same shared secret. What’s worth noticing is that both Alice and Bob perform exactly the same operations. KEM construction can be converted to KEX. Alice performs key generation and sends the public key to Bob. Bob uses it to encapsulate a symmetric session key and sends it back to Alice. Alice decapsulates the ciphertext received from Bob and gets the symmetric key. This is actually what we do in our experiment to make integration with the TLS protocol less complicated. ### NTRU Lattice-based Encryption We will enable the CECPQ2 implemented by Adam Langley from Google on our servers. He described this implementation in detail here. This key exchange uses the HRSS algorithm, which is based on the NTRU (N-Th Degree TRUncated Polynomial Ring) algorithm. Foregoing too much detail, I am going to explain how NTRU works and give simplified examples, and finally, compare it to HRSS. NTRU is a cryptosystem based on a polynomial ring. This means that we do not operate on numbers modulo a prime (like in RSA), but on polynomials of degree $$N$$ , where the degree of a polynomial is the highest exponent of its variable. For example, $$x^7 + 6x^3 + 11x^2$$ has degree of 7. One can add polynomials in the ring in the usual way, by simply adding theirs coefficients modulo some integer. In NTRU this integer is called $$q$$. Polynomials can also be multiplied, but remember, you are operating in the ring, therefore the result of a multiplication is always a polynomial of degree less than $$N$$. It basically means that exponents of the resulting polynomial are added to modulo $$N$$. In other words, polynomial ring arithmetic is very similar to modular arithmetic, but instead of working with a set of numbers less than N, you are working with a set of polynomials with a degree less than N. To instantiate the NTRU cryptosystem, three domain parameters must be chosen: • $$N$$ – degree of the polynomial ring, in NTRU the principal objects are polynomials of degree $$N-1$$. • $$p$$ – small modulus used during key generation and decryption for reducing message coefficients. • $$q$$ – large modulus used during algorithm execution for reducing coefficients of the polynomials. First, we generate a pair of public and private keys. To do that, two polynomials $$f$$ and $$g$$ are chosen from the ring in a way that their randomly generated coefficients are much smaller than $$q$$. Then key generation computes two inverses of the polynomial: $$f_p= f^{-1} \bmod{p} \\ f_q= f^{-1} \bmod{q}$$ The last step is to compute $$pk = p\cdot f_q\cdot g \bmod q$$, which we will use as public key pk. The private key consists of $$f$$ and $$f_p$$. The $$f_q$$ is not part of any key, however it must remain secret. It might be the case that after choosing $$f$$, the inverses modulo $$p$$ and $$q$$ do not exist. In this case, the algorithm has to start from the beginning and generate another $$f$$. That’s unfortunate because calculating the inverse of a polynomial is a costly operation. HRSS brings an improvement to this issue since it ensures that those inverses always exist, making key generation faster than as proposed initially in NTRU. The encryption of a message $$m$$ proceeds as follows. First, the message $$m$$ is converted to a ring element $$pt$$ (there exists an algorithm for performing this conversion in both directions). During encryption, NTRU randomly chooses one polynomial $$b$$ called blinder. The goal of the blinder is to generate different ciphertexts per encyption. Thus, the ciphetext $$ct$$ is obtained as $$ct = (b\cdot pk + pt ) \bmod q$$ Decryption looks a bit more complicated but it can also be easily understood. Decryption uses both the secret value $$f$$ and to recover the plaintext as $$v = f \cdot ct \bmod q \\ pt = v \cdot f_p \bmod p$$ This diagram demonstrates why and how decryption works. After obtaining $$pt$$, the message $$m$$ is recovered by inverting the conversion function. The underlying hard assumption is that given two polynomials: $$f$$ and $$g$$ whose coefficients are short compared to the modulus $$q$$, it is difficult to distinguish $$pk = \frac{f}{g}$$ from a random element in the ring. It means that it’s hard to find $$f$$ and $$g$$ given only public key pk. ### Lattices NTRU cryptosystem is a grandfather of lattice-based encryption schemes. The idea of using difficult problems for cryptographic purposes was due to Ajtai. His work evolved into a whole area of research with the goal of creating more practical, lattice-based cryptosystems. ### What is a lattice and why it can be used for post-quantum crypto? The picture below visualizes lattice as points in a two-dimensional space. A lattice is defined by the origin $$O$$ and base vectors $$\{ b_1 , b_2\}$$. Every point on the lattice is represented as a linear combination of the base vectors, for example $$V = -2b_1+b_2$$. There are two classical NP-hard problems in lattice-based cryptography: 1. Shortest Vector Problem (SVP): Given a lattice, to find the shortest non-zero vector in the lattice. In the graph, the vector $$s$$ is the shortest one. The SVP problem is NP-hard only under some assumptions. 2. Closest Vector Problem (CVP). Given a lattice and a vector $$V$$ (not necessarily in the lattice), to find the closest vector to $$V$$. For example, the closest vector to $$t$$ is $$z$$. In the graph above, it is easy for us to solve SVP and CVP by simple inspection. However, the lattices used in cryptography have higher dimensions, say above 1000, as well as highly non-orthogonal basis vectors. On these instances, the problems get extremely hard to solve. It’s even believed future quantum computers will have it tough. ### NTRU vs HRSS HRSS, which we use in our experiment, is based on NTRU, but a slightly better instantiation. The main improvements are: • Faster key generation algorithm. • NTRU encryption can produce ciphertexts that are impossible to decrypt (true for many lattice-based schemes). But HRSS fixes this problem. • HRSS is a key encapsulation mechanism. ### CECPQ2b – Isogeny-based Post-Quantum TLS Following CECPQ2, we have integrated into BoringSSL another hybrid key exchange mechanism relying on SIKE. It is called CECPQ2b and we will use it in our experimentation in TLS 1.3. SIKE is a key encapsulation method based on Supersingular Isogeny Diffie-Hellman (SIDH). Read more about SIDH in our previous post. The math behind SIDH is related to elliptic curves. A comparison between SIDH and the classical Elliptic Curve Diffie-Hellman (ECDH) is given. An elliptic curve is a set of points that satisfy a specific mathematical equation. The equation of an elliptic curve may have multiple forms, the standard form is called the Weierstrass equation $$y^2 = x^3 +ax +b$$ and its shape can look like the red curve. An interesting fact about elliptic curves is have a group structure. That is, the set of points on the curve have associated a binary operation called point addition. The set of points on the elliptic curve is closed under addition. Thus, adding two points results in another point that is also on the elliptic curve. If we can add two different points on a curve, then we can also add one point to itself. And if we do it multiple times, then the resulting operations is known as a scalar multiplication and denoted as $$Q = k\cdot P = P+P+\dots+P$$ for an integer $$k$$. Multiplication of scalars is commutative. It means that two scalar multiplications can be evaluated in any order $$\color{darkred}{k_a}\cdot\color{darkgreen}{k_b} = \color{darkgreen}{k_b}\cdot\color{darkred}{k_a}$$; this an important property that makes ECDH possible. It turns out that carefully if choosing an elliptic curve “correctly”, scalar multiplication is easy to compute but extremely hard to reverse. Meaning, given two points $$Q$$ and $$P$$ such that $$Q=k\cdot P$$, finding the integer k is a difficult task known as the Elliptic Curve Discrete Logarithm problem (ECDLP). This problem is suitable for cryptographic purposes. Alice and Bob agree on a secret key as follows. Alice generates a private key $$k_a$$. Then, she uses some publicly known point $$P$$ and calculates her public key as $$Q_a = k_a\cdot P$$. Bob proceeds in similar fashion and gets $$k_b$$ and $$Q_b = k_b\cdot P$$. To agree on a shared secret, each party multiplies their private key with the public key of the other party. The result of this is the shared secret. Key agreement as described above, works thanks to the fact that scalars can commute: $$\color{darkgreen}{k_a} \cdot Q_b = \color{darkgreen}{k_a} \cdot \color{darkred}{k_b} \cdot P \iff \color{darkred}{k_b} \cdot \color{darkgreen}{k_a} \cdot P = \color{darkred}{k_b} \cdot Q_a$$ There is a vast theory behind elliptic curves. An introduction to elliptic curve cryptography was posted before and more details can be found in this book. Now, lets describe SIDH and compare with ECDH. ### Isogenies on Elliptic Curves Before explaining the details of SIDH key exchange, I’ll explain the 3 most important concepts, namely: j-invariant, isogeny and its kernel. Each curve has a number that can be associated to it. Let’s call this number a j-invariant. This number is not unique per curve, meaning many curves have the same value of j-invariant, but it can be viewed as a way to group multiple elliptic curves into disjoint sets. We say that two curves are isomorphic if they are in the same set, called the isomorphism class. The j-invariant is a simple criterion to determine whether two curves are isomorphic. The j-invariant of a curve $$E$$ in Weierstrass form $$y^2 = x^3 + ax + b$$ is given as $$j(E) = 1728\frac{4a^3}{4^3 +27b^2}$$ When it comes to isogeny, think about it as a map between two curves. Each point on some curve $$E$$ is mapped by isogeny to the point on isogenous curve $$E’$$. We denote mapping from curve $$E$$ to $$E’$$ by isogeny $$\phi$$ as: $$\phi: E \rightarrow E’$$ It depends on the map if those two curves are isomorphic or not. Isogeny can be visualised as: There may exist many of those mappings, each curve used in SIDH has small number of isogenies to other curves. Natural question is how do we compute such isogeny. Here is where the kernel of an isogeny comes. The kernel uniquely determines isogeny (up to isomorphism class). Formulas for calculating isogeny from its kernel were initially given by J. Vélu and the idea of calculating them efficiently was extended . To finish, I will summarize what was said above with a picture. There are two isomorphism classes on the picture above. Both curves $$E_1$$ and $$E_2$$ are isomorphic and have j-invariant = 6. As curves $$E_3$$ and $$E_4$$ have j-invariant=13, they are in a different isomorphism class. There exists an isogeny $$\phi_2$$ between curve $$E_3$$ and $$E_2$$, so they both are isogeneous. Curves $$\phi_1$$ and $$E_2$$ are isomorphic and there is isogeny $$\phi_1$$ between them. Curves $$E_1$$ and $$E_4$$ are neither isomorphic nor isogeneus. For brevity I’m skipping many important details, like details of the finite field, the fact that isogenies must be separable and that the kernel is finite. But curious readers can find a number of academic research papers available on the Internet. ### Big picture: similarities with ECDH Let’s generalize the ECDH algorithm described above, so that we can swap some elements and try to use Supersingular Isogeny Diffie-Hellman. Note that what actually happens during an ECDH key exchange is: • We have a set of points on elliptic curve, set S • We have another group of integers used for point multiplication, G • We use an element from Z to act on an element from S to get another element from S: $$G \cdot S \rightarrow S$$ Now the question is: what is our G and S in an SIDH setting? For SIDH to work, we need a big set of elements and something secret that will act on the elements from that set. This “group action” must also be resistant to attacks performed by quantum computers. In the SIDH setting, those two sets are defined as: • Set S is a set (graph) of j-invariants, such that all the curves are supersingular: $$S = [j(E_1), j(E_2), j(E_3), …. , j(E_n)]$$ • Set G is a set of isogenies acting on elliptic curves and transforming, for example, the elliptic curve $$E_1$$ into $$E_n$$: ### Random walk on supersingular graph When we talk about Isogeny Based Cryptography, as a topic distinct from Elliptic Curve Cryptography, we usually mean algorithms and protocols that rely fundamentally on the structure of isogeny graphs. An example of such a (small) graph is pictured below. Each vertex of the graph represents a different j-invariant of a set of supersingular curves. The edges between vertices represent isogenies converting one elliptic curve to another. As you can notice, the graph is strongly connected, meaning every vertex can be reached from every other vertex. In the context of isogeny-based crypto, we call such a graph a supersingular isogeny graph. I’ll skip some technical details about the construction of this graph (look for those here or here), but instead describe ideas about how it can be used. As the graph is strongly connected, it is possible to walk a whole graph by starting from any vertex, randomly choosing an edge, following it to the next vertex and then start the process again on a new vertex. Such a way of visiting edges of this graph is called a random walk. The random walk is a key concept that makes isogeny based crypto feasible. When you look closely at the graph, you can notice that each vertex has a small number of edges incident to it, this is why we can compute the isogenies efficiently. But also for any vertex there is only a limited number of isogenies to choose from, which doesn’t look like good base for a cryptographic scheme. The key question is – where does the security of the scheme come from exactly? In order to get it, it is necessary to visit a couple hundred vertices. What it means in practice is that secret isogeny (of large degree) is constructed as a composition of multiple isogenies (of small, prime degree). Which means, the secret isogeny is: This property and properties of the isogeny graph are what makes some of us believe that scheme has a good chance to be secure. More specifically, there is no efficient way of finding a path that connects $$E_0$$ with $$E_n$$, even with quantum computer at hand. The security level of a system depends on value n – the number of steps taken during the walk. The random walk is a core process used when both generating public keys and computing shared secrets. It starts with party generating random value m (see more below), starting curve $$E_0$$ and points P and Q on this curve. Those values are used to compute the kernel of an isogeny $$R_1$$ in the following way: $$R_1 = P + m \cdot Q$$ Thanks to formulas given by Vélu we can now use the point $$R_1$$ to compute the isogeny, the party will choose to move from a vertex to another one. After the isogeny $$\phi_{R_1}$$ is calculated it is applied to $$E_0$$ which results in a new curve $$E_1$$: $$\phi_{R_1}: E_0 \rightarrow E_1$$ Isogeny is also applied to points P and Q. Once on $$E_1$$ the process is repeated. This process is applied n times, and at the end a party ends up on some curve $$E_n$$ which defines isomorphism class, so also j-invariant. ### Supersingular Isogeny Diffie-Hellman The core idea in SIDH is to compose two random walks on an isogeny graph of elliptic curves in such a way that the end node of both ways of composing is the same. In order to do it, scheme sets public parameters – starting curve $$E_0$$ and 2 pairs of base points on this curve $$(PA,QA)$$ , $$(PB,QB)$$. Alice generates her random secret keys m, and calculates a secret isogeny $$\phi_q$$ by performing a random walk as described above. The walk finishes with 3 values: elliptic curve $$E_a$$ she has ended up with and pair of points $$\phi_a(PB)$$ and $$\phi_a(QB)$$ after pushing through Alice’s secret isogeny. Bob proceeds analogously which results in the triple $${E_b, \phi_b(PA), \phi_b(QA)}$$. The triple forms a public key which is exchanged between parties. The picture below visualizes the operation. The black dots represent curves grouped in the same isomorphism classes represented by light blue circles. Alice takes the orange path ending up on a curve $$E_a$$ in a separate isomorphism class than Bob after taking his dark blue path ending on $$E_b$$. SIDH is parametrized in a way that Alice and Bob will always end up in different isomorphism classes. Upon receipt of triple $${ E_a, \phi_a(PB), \phi_a(QB) }$$ from Alice, Bob will use his secret value m to calculate a new kernel – but instead of using point $$PA$$ and $$QA$$ to calculate an isogeny kernel, he will now use images $$\phi_a(PB)$$ and $$\phi_a(QB)$$ received from Alice: $$R’_1 = \phi_a(PB) + m \cdot \phi_a(QB)$$ Afterwards, he uses $$R’_1$$ to start the walk again resulting in the isogeny $$\phi’_b: E_a \rightarrow E_{ab}$$. Allice proceeds analogously resulting in the isogeny $$\phi’_a: E_b \rightarrow E_{ba}$$. With isogenies calculated this way, both Alice and Bob will converge in the same isomorphism class. The math math may seem complicated, hopefully the picture below makes it easier to understand. Bob computes a new isogeny and starts his random walk from $$E_a$$ received from Alice. He ends up on some curve $$E_{ba}$$. Similarly, Alice calculates a new isogeny, applies it on $$E_b$$ received from Bob and her random walk ends on some curve $$E_{ab}$$. Curves $$E_{ab}$$ and $$E_{ba}$$ are not likely to be the same, but construction guarantees that they are isomorphic. As mentioned earlier, isomorphic curves have the same value of j-invariant, hence the shared secret is a value of j-invariant $$j(E_{ab})$$. Coming back to differences between SIDH and ECDH – we can split them into four categories: the elements of the group we are operating on, the cornerstone computation required to agree on a shared secret, the elements representing secret values, and the difficult problem on which the security relies. In ECDH there is a secret key which is an integer scalar, in case of SIDH it is a secret isogeny, which also is generated from an integer scalar. In the case of ECDH one multiplies a point on a curve by a scalar, in the case of SIDH it is a random walk in an isogeny graph. In the case of ECDH, the public key is a point on a curve, in the case of SIDH, the public part is a curve itself and the image of some points after applying isogeny. The shared secret in the case of ECDH is a point on a curve, in the case of SIDH it is a j-invariant. ### SIKE: Supersingular Isogeny Key Encapsulation SIDH could potentially be used as a drop-in replacement of the ECDH protocol. We have actually implemented a proof-of-concept and added it to our implementation of TLS 1.3 in the tls-tris library and described (together with Mozilla) implementation details in this draft. Nevertheless, there is a problem with SIDH – the keys can be used only once. In 2016, a few researchers came up with an active attack on SIDH which works only when public keys are reused. In the context of TLS, it is not a big problem, because for each session a fresh key pair is generated (ephemeral keys), but it may not be true for other applications. SIKE is an isogeny key encapsulation which solves this problem. Bob can generate SIKE keys, upload the public part somewhere in the Internet and then anybody can use it whenever he wants to communicate with Bob securely. SIKE reuses SIDH – internally both sides of the connection always perform SIDH key generation, SIDH key agreement and apply some other cryptographic primitives in order to convert SIDH to KEM. SIKE is implemented in a few variants – each variant corresponds to the security levels using 128-, 192- and 256-bit secret keys. Higher security level means longer running time. More details about SIKE can be found here. SIKE is also one of the candidates in NIST post-quantum “competition“. I’ve skipped many important details to give a brief description of how isogeny based crypto works. If you’re curious and hungry for details, look at either of these Cloudflare meetups, where Deirdre Connolly talked about isogeny-based cryptography or this talk by Chloe Martindale during PQ Crypto School 2017. And if you would like to know more about quantum attacks on this scheme, I highly recommend this work. ## Conclusion Quantum computers that can break meaningful cryptographic parameter settings do not exist, yet. They won’t be built for at least the next few years. Nevertheless, they have already changed the way we look at current cryptographic deployments. There are at least two reasons it’s worth investing in PQ cryptography: • It takes a lot of time to build secure cryptography and we don’t actually know when today’s classical cryptography will be broken. There is a need for a good mathematical base: an initial idea of what may be secure against something that doesn’t exist yet. If you have an idea, you also need good implementation, constant time, resistance to things like time and cache side-channels, DFA, DPA, EM, and a bunch of other abbreviations indicating side-channel resistance. There is also deployment of, for example, algorithms based on elliptic curves were introduced in ’85, but started to really be used in production only during the last decade, 20 or so years later. Obviously, the implementation must be blazingly fast! Last, but not least, integration: we need time to develop standards to allow integration of PQ cryptography with protocols like TLS. • Even though efficient quantum computers probably won’t exist for another few years, the threat is real. Data encrypted with current cryptographic algorithms can be recorded now with hopes of being broken in the future. Cloudflare is motivated to help build the Internet of tomorrow with the tools at hand today. Our interest is in cryptographic techniques that can be integrated into existing protocols and widely deployed on the Internet as seamlessly as possible. PQ cryptography, like the rest of cryptography, includes many cryptosystems that can be used for communications in today’s Internet; Alice and Bob need to perform some computation, but they do not need to buy new hardware to do that. Cloudflare sees great potential in those algorithms and believes that some of them can be used as a safe replacement for classical public-key cryptosystems. Time will tell if we’re justified in this belief! # Towards Post-Quantum Cryptography in TLS Post Syndicated from Kris Kwiatkowski original https://blog.cloudflare.com/towards-post-quantum-cryptography-in-tls/ We live in a completely connected society. A society connected by a variety of devices: laptops, mobile phones, wearables, self-driving or self-flying things. We have standards for a common language that allows these devices to communicate with each other. This is critical for wide-scale deployment – especially in cryptography where the smallest detail has great importance. One of the most important standards-setting organizations is the National Institute of Standards and Technology (NIST), which is hugely influential in determining which standardized cryptographic systems see worldwide adoption. At the end of 2016, NIST announced it would hold a multi-year open project with the goal of standardizing new post-quantum (PQ) cryptographic algorithms secure against both quantum and classical computers. Many of our devices have very different requirements and capabilities, so it may not be possible to select a “one-size-fits-all” algorithm during the process. NIST mathematician, Dustin Moody, indicated that institute will likely select more than one algorithm: “There are several systems in use that could be broken by a quantum computer – public-key encryption and digital signatures, to take two examples – and we will need different solutions for each of those systems.” Initially, NIST selected 82 candidates for further consideration from all submitted algorithms. At the beginning of 2019, this process entered its second stage. Today, there are 26 algorithms still in contention. ### Post-quantum cryptography: what is it really and why do I need it? In 1994, Peter Shor made a significant discovery in quantum computation. He found an algorithm for integer factorization and computing discrete logarithms, both believed to be hard to solve in classical settings. Since then it has become clear that the ‘hard problems’ on which cryptosystems like RSA and elliptic curve cryptography (ECC) rely – integer factoring and computing discrete logarithms, respectively – are efficiently solvable with quantum computing. A quantum computer can help to solve some of the problems that are intractable on a classical computer. In theory, they could efficiently solve some fundamental problems in mathematics. This amazing computing power would be highly beneficial, which is why companies are actually trying to build quantum computers. At first, Shor’s algorithm was merely a theoretical result – quantum computers powerful enough to execute it did not exist – but this is quickly changing. In March 2018, Google announced a 72-qubit universal quantum computer. While this is not enough to break say RSA-2048 (still more is needed), many fundamental problems have already been solved. In anticipation of wide-spread quantum computing, we must start the transition from classical public-key cryptography primitives to post-quantum (PQ) alternatives. It may be that consumers will never get to hold a quantum computer, but a few powerful attackers who will get one can still pose a serious threat. Moreover, under the assumption that current TLS handshakes and ciphertexts are being captured and stored, a future attacker could crack these stored individual session keys and use those results to decrypt the corresponding individual ciphertexts. Even strong security guarantees, like forward secrecy, do not help out much there. In 2006, the academic research community launched a conference series dedicated to finding alternatives to RSA and ECC. This so-called post-quantum cryptography should run efficiently on a classical computer, but it should also be secure against attacks performed by a quantum computer. As a research field, it has grown substantially in popularity. Several companies, including Google, Microsoft, Digicert and Thales, are already testing the impact of deploying PQ cryptography. Cloudflare is involved in some of this, but we want to be a company that leads in this direction. The first thing we need to do is understand the real costs of deploying PQ cryptography, and that’s not obvious at all. ### What options do we have? Many submissions to the NIST project are still under study. Some are very new and little understood; others are more mature and already standardized as RFCs. Some have been broken or withdrawn from the process; others are more conservative or illustrate how far classical cryptography would need to be pushed so that a quantum computer could not crack it within a reasonable cost. Some are very slow and big; others are not. But most cryptographic schemes can be categorized into these families: lattice-based, multivariate, hash-based (signatures only), code-based and isogeny-based. For some algorithms, nevertheless, there is a fear they may be too inconvenient to use with today’s Internet. We must also be able to integrate new cryptographic schemes with existing protocols, such as SSH or TLS. To do that, designers of PQ cryptosystems must consider these characteristics: • Latency caused by encryption and decryption on both ends of the communication channel, assuming a variety of devices from big and fast servers to slow and memory constrained IoT (Internet of Things) devices • Small public keys and signatures to minimize bandwidth • Clear design that allows cryptanalysis and determining weaknesses that could be exploited • Use of existing hardware for fast implementation The work on post-quantum public key cryptosystems must be done in a full view of organizations, governments, cryptographers, and the public. Emerging ideas must be properly vetted by this community to ensure widespread support. ### Helping Build a Better Internet To better understand the post-quantum world, Cloudflare began experimenting with these algorithms and used them to provide confidentiality in TLS connections. With Google, we are proposing a wide-scale experiment that combines client- and server-side data collection to evaluate the performance of key-exchange algorithms on actual users’ devices. We hope that this experiment helps choose an algorithm with the best characteristics for the future of the Internet. With Cloudflare’s highly distributed network of access points and Google’s Chrome browser, both companies are in a very good position to perform this experiment. Our goal is to understand how these algorithms act when used by real clients over real networks, particularly candidate algorithms with significant differences in public-key or ciphertext sizes. Our focus is on how different key sizes affect handshake time in the context of Transport Layer Security (TLS) as used on the web over HTTPS. Our primary candidates are an NTRU-based construction called HRSS-SXY (by Hülsing – Rijneveld – Schanck – Schwabe, and Tsunekazu Saito – Keita Xagawa – Takashi Yamakawa) and an isogeny-based Supersingular Isogeny Key Encapsulation (SIKE). Details of both algorithms are described in more detail below in section “Dive into post-quantum cryptography”. This table shows a few characteristics for both algorithms. Performance timings were obtained by running the BoringSSL speed test on an Intel Skylake CPU. KEMPublic Key size (bytes)Ciphertext (bytes)Secret size (bytes)KeyGen (op/sec)Encaps (op/sec)Decaps (op/sec)NIST level HRSS-SXY11381138323952.376034.721905.81 SIKE/p43433034616367.1228.0209.31 Currently the most commonly used key exchange algorithm (according to Cloudflare’s data) is the non-quantum X25519. Its public keys are 32 bytes and BoringSSL can generate 49301.2 key pairs, and is able to perform 19628.6 key agreements every second on my Skylake CPU. Note that HRSS-SXY shows a significant speed advantage, while SIKE has a size advantage. In our experiment, we will deploy these two algorithms on both the server side using Cloudflare’s infrastructure, and the client side using Chrome Canary; both sides will collect telemetry information about TLS handshakes using these two PQ algorithms to see how they perform in practice. ### What do we expect to find? In 2018, Adam Langley conducted an experiment with the goal of evaluating the likely latency impact of a post-quantum key exchange in TLS. Chrome was augmented with the ability to include a dummy, arbitrarily-sized extension in the TLS ClientHello (fixed number of bytes of random noise). After taking into account the performance and key size offered by different types key-exchange schemes, he concluded that constructs based on structured lattices may be most suitable for future use in TLS. However, Langley also observed a peculiar phenomenon; client connections measured at 95th percentile had much higher latency than the median. It means that in those cases, isogeny-based systems may be a better choice. In the “Dive into post-quantum cryptography”, we describe the difference between isogeny-based SIKE and lattice-based NTRU cryptosystems. In our experiment, we want to more thoroughly evaluate and ascribe root causes to these unexpected latency increases. We would particularly like to learn more about the characteristics of those networks: what causes increased latency? how does the performance cost of isogeny-based algorithms impact the TLS handshake? We want to answer key questions, like: • What is a good ratio for speed-to-key size (or how much faster could SIKE get to achieve the client-perceived performance of HRSS)? • How do network middleboxes behave when clients use new PQ algorithms, and which networks have problematic middleboxes? • How do the different properties of client networks affect TLS performance with different PQ key exchanges? Can we identify specific autonomous systems, device configurations, or network configurations that favor one algorithm over another? How is performance affected in the long tail? ### Experiment Design Our experiment will involve both server- and client-side performance statistics collection from real users around the world (all the data is anonymized). Cloudflare is operating the server-side TLS connections. We will enable the CECPQ2 (HRSS + X25519) and CECPQ2b (SIKE + X25519) key-agreement algorithms on all TLS-terminating edge servers. In this experiment, the ClientHello will contain a CECPQ2 or CECPQ2b public key (but never both). Additionally, Chrome will always include X25519 for servers that do not support post-quantum key exchange. The post-quantum key exchange will only be negotiated in TLS version 1.3 when both sides support it. Since Cloudflare only measures the server side of the connection, it is impossible to determine the time it takes for a ClientHello sent from Chrome to reach Cloudflare’s edge servers; however, we can measure the time it takes for the TLS ServerHello message containing post-quantum key exchange, to reach the client and for the client to respond. On the client side, Chrome Canary will operate the TLS connection. Google will enable either CECPQ2 or CECPQ2b in Chrome for the following mix of architecture and OSes: • x86-64: Windows, Linux, macOS, ChromeOS • aarch64: Android Our high-level expectation is to get similar results as Langley’s original experiment in 2018 — slightly increased latency for the 50th percentile and higher latency for the 95th. Unfortunately, data collected purely from real users’ connections may not suffice for diagnosing the root causes of why some clients experience excessive slowdown. To this end, we will perform follow-up experiments based on per-client information we collect server-side. Our primary hypothesis is that excessive slowdowns, like those Langley observed, are largely due to in-network events, such as middleboxes or bloated/lossy links. As a first-pass analysis, we will investigate whether the slowed-down clients share common network features, like common ASes, common transit networks, common link types, and so on. To determine this, we will run a traceroute from vantage points close to our servers back toward the clients (not overloading any particular links or hosts) and study whether some client locations are subject to slowdowns for all destinations or just for some. ### Dive into post-quantum cryptography Be warned: the details of PQ cryptography may be quite complicated. In some cases it builds on classical cryptography, and in other cases it is completely different math. It would be rather hard to describe details in a single blog post. Instead, we are giving you an intuition of post-quantum cryptography, rather than provide deep academic-level descriptions. We’re skipping a lot of details for the sake of brevity. Nevertheless, settle in for a bit of an epic journey because we have a lot to cover. ### Key encapsulation mechanism NIST requires that all key-agreement algorithms have a form of key-encapsulation mechanism (KEM). The KEM is a simplified form of public key encryption (PKE). As PKE, it also allows agreement on a secret, but in a slightly different way. The idea is that the session key is an output of the encryption algorithm, conversely to public key encryption schemes where session key is an input to the algorithm. In a KEM, Alice generates a random key and uses the pre-generated public key from Bob to encrypt (encapsulate) it. This results in a ciphertext sent to Bob. Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the random key. The idea was initially introduced by Cramer and Shoup. Experience shows that such constructs are easier to design, analyze, and implement as the scheme is limited to communicating a fixed-size session key. Leonardo Da Vinci said, “Simplicity is the ultimate sophistication,” which is very true in cryptography. The key exchange (KEX) protocol, like Diffie-Hellman, is yet a different construct: it allows two parties to agree on a shared secret that can be used as a symmetric encryption key. For example, Alice generates a key pair and sends a public key to Bob. Bob does the same and uses his own key pair with Alice’s public key to generate the shared secret. He then sends his public key to Alice who can now generate the same shared secret. What’s worth noticing is that both Alice and Bob perform exactly the same operations. KEM construction can be converted to KEX. Alice performs key generation and sends the public key to Bob. Bob uses it to encapsulate a symmetric session key and sends it back to Alice. Alice decapsulates the ciphertext received from Bob and gets the symmetric key. This is actually what we do in our experiment to make integration with the TLS protocol less complicated. ### NTRU Lattice-based Encryption We will enable the CECPQ2 implemented by Adam Langley from Google on our servers. He described this implementation in detail here. This key exchange uses the HRSS algorithm, which is based on the NTRU (N-Th Degree TRUncated Polynomial Ring) algorithm. Foregoing too much detail, I am going to explain how NTRU works and give simplified examples, and finally, compare it to HRSS. NTRU is a cryptosystem based on a polynomial ring. This means that we do not operate on numbers modulo a prime (like in RSA), but on polynomials of degree $$N$$ , where the degree of a polynomial is the highest exponent of its variable. For example, $$x^7 + 6x^3 + 11x^2$$ has degree of 7. One can add polynomials in the ring in the usual way, by simply adding theirs coefficients modulo some integer. In NTRU this integer is called $$q$$. Polynomials can also be multiplied, but remember, you are operating in the ring, therefore the result of a multiplication is always a polynomial of degree less than $$N$$. It basically means that exponents of the resulting polynomial are added to modulo $$N$$. In other words, polynomial ring arithmetic is very similar to modular arithmetic, but instead of working with a set of numbers less than N, you are working with a set of polynomials with a degree less than N. To instantiate the NTRU cryptosystem, three domain parameters must be chosen: • $$N$$ – degree of the polynomial ring, in NTRU the principal objects are polynomials of degree $$N-1$$. • $$p$$ – small modulus used during key generation and decryption for reducing message coefficients. • $$q$$ – large modulus used during algorithm execution for reducing coefficients of the polynomials. First, we generate a pair of public and private keys. To do that, two polynomials $$f$$ and $$g$$ are chosen from the ring in a way that their randomly generated coefficients are much smaller than $$q$$. Then key generation computes two inverses of the polynomial: $$f_p= f^{-1} \bmod{p} \\ f_q= f^{-1} \bmod{q}$$ The last step is to compute $$pk = p\cdot f_q\cdot g \bmod q$$, which we will use as public key pk. The private key consists of $$f$$ and $$f_p$$. The $$f_q$$ is not part of any key, however it must remain secret. It might be the case that after choosing $$f$$, the inverses modulo $$p$$ and $$q$$ do not exist. In this case, the algorithm has to start from the beginning and generate another $$f$$. That’s unfortunate because calculating the inverse of a polynomial is a costly operation. HRSS brings an improvement to this issue since it ensures that those inverses always exist, making key generation faster than as proposed initially in NTRU. The encryption of a message $$m$$ proceeds as follows. First, the message $$m$$ is converted to a ring element $$pt$$ (there exists an algorithm for performing this conversion in both directions). During encryption, NTRU randomly chooses one polynomial $$b$$ called blinder. The goal of the blinder is to generate different ciphertexts per encyption. Thus, the ciphetext $$ct$$ is obtained as $$ct = (b\cdot pk + pt ) \bmod q$$ Decryption looks a bit more complicated but it can also be easily understood. Decryption uses both the secret value $$f$$ and to recover the plaintext as $$v = f \cdot ct \bmod q \\ pt = v \cdot f_p \bmod p$$ This diagram demonstrates why and how decryption works. After obtaining $$pt$$, the message $$m$$ is recovered by inverting the conversion function. The underlying hard assumption is that given two polynomials: $$f$$ and $$g$$ whose coefficients are short compared to the modulus $$q$$, it is difficult to distinguish $$pk = \frac{f}{g}$$ from a random element in the ring. It means that it’s hard to find $$f$$ and $$g$$ given only public key pk. ### Lattices NTRU cryptosystem is a grandfather of lattice-based encryption schemes. The idea of using difficult problems for cryptographic purposes was due to Ajtai. His work evolved into a whole area of research with the goal of creating more practical, lattice-based cryptosystems. ### What is a lattice and why it can be used for post-quantum crypto? The picture below visualizes lattice as points in a two-dimensional space. A lattice is defined by the origin $$O$$ and base vectors $$\{ b_1 , b_2\}$$. Every point on the lattice is represented as a linear combination of the base vectors, for example $$V = -2b_1+b_2$$. There are two classical NP-hard problems in lattice-based cryptography: 1. Shortest Vector Problem (SVP): Given a lattice, to find the shortest non-zero vector in the lattice. In the graph, the vector $$s$$ is the shortest one. The SVP problem is NP-hard only under some assumptions. 2. Closest Vector Problem (CVP). Given a lattice and a vector $$V$$ (not necessarily in the lattice), to find the closest vector to $$V$$. For example, the closest vector to $$t$$ is $$z$$. In the graph above, it is easy for us to solve SVP and CVP by simple inspection. However, the lattices used in cryptography have higher dimensions, say above 1000, as well as highly non-orthogonal basis vectors. On these instances, the problems get extremely hard to solve. It’s even believed future quantum computers will have it tough. ### NTRU vs HRSS HRSS, which we use in our experiment, is based on NTRU, but a slightly better instantiation. The main improvements are: • Faster key generation algorithm. • NTRU encryption can produce ciphertexts that are impossible to decrypt (true for many lattice-based schemes). But HRSS fixes this problem. • HRSS is a key encapsulation mechanism. ### CECPQ2b – Isogeny-based Post-Quantum TLS Following CECPQ2, we have integrated into BoringSSL another hybrid key exchange mechanism relying on SIKE. It is called CECPQ2b and we will use it in our experimentation in TLS 1.3. SIKE is a key encapsulation method based on Supersingular Isogeny Diffie-Hellman (SIDH). Read more about SIDH in our previous post. The math behind SIDH is related to elliptic curves. A comparison between SIDH and the classical Elliptic Curve Diffie-Hellman (ECDH) is given. An elliptic curve is a set of points that satisfy a specific mathematical equation. The equation of an elliptic curve may have multiple forms, the standard form is called the Weierstrass equation $$y^2 = x^3 +ax +b$$ and its shape can look like the red curve. An interesting fact about elliptic curves is have a group structure. That is, the set of points on the curve have associated a binary operation called point addition. The set of points on the elliptic curve is closed under addition. Thus, adding two points results in another point that is also on the elliptic curve. If we can add two different points on a curve, then we can also add one point to itself. And if we do it multiple times, then the resulting operations is known as a scalar multiplication and denoted as $$Q = k\cdot P = P+P+\dots+P$$ for an integer $$k$$. Multiplication of scalars is commutative. It means that two scalar multiplications can be evaluated in any order $$\color{darkred}{k_a}\cdot\color{darkgreen}{k_b} = \color{darkgreen}{k_b}\cdot\color{darkred}{k_a}$$; this an important property that makes ECDH possible. It turns out that carefully if choosing an elliptic curve “correctly”, scalar multiplication is easy to compute but extremely hard to reverse. Meaning, given two points $$Q$$ and $$P$$ such that $$Q=k\cdot P$$, finding the integer k is a difficult task known as the Elliptic Curve Discrete Logarithm problem (ECDLP). This problem is suitable for cryptographic purposes. Alice and Bob agree on a secret key as follows. Alice generates a private key $$k_a$$. Then, she uses some publicly known point $$P$$ and calculates her public key as $$Q_a = k_a\cdot P$$. Bob proceeds in similar fashion and gets $$k_b$$ and $$Q_b = k_b\cdot P$$. To agree on a shared secret, each party multiplies their private key with the public key of the other party. The result of this is the shared secret. Key agreement as described above, works thanks to the fact that scalars can commute: $$\color{darkgreen}{k_a} \cdot Q_b = \color{darkgreen}{k_a} \cdot \color{darkred}{k_b} \cdot P \iff \color{darkred}{k_b} \cdot \color{darkgreen}{k_a} \cdot P = \color{darkred}{k_b} \cdot Q_a$$ There is a vast theory behind elliptic curves. An introduction to elliptic curve cryptography was posted before and more details can be found in this book. Now, lets describe SIDH and compare with ECDH. ### Isogenies on Elliptic Curves Before explaining the details of SIDH key exchange, I’ll explain the 3 most important concepts, namely: j-invariant, isogeny and its kernel. Each curve has a number that can be associated to it. Let’s call this number a j-invariant. This number is not unique per curve, meaning many curves have the same value of j-invariant, but it can be viewed as a way to group multiple elliptic curves into disjoint sets. We say that two curves are isomorphic if they are in the same set, called the isomorphism class. The j-invariant is a simple criterion to determine whether two curves are isomorphic. The j-invariant of a curve $$E$$ in Weierstrass form $$y^2 = x^3 + ax + b$$ is given as $$j(E) = 1728\frac{4a^3}{4a^3 +27b^2}$$ When it comes to isogeny, think about it as a map between two curves. Each point on some curve $$E$$ is mapped by isogeny to the point on isogenous curve $$E’$$. We denote mapping from curve $$E$$ to $$E’$$ by isogeny $$\phi$$ as: $$\phi: E \rightarrow E’$$ It depends on the map if those two curves are isomorphic or not. Isogeny can be visualised as: There may exist many of those mappings, each curve used in SIDH has small number of isogenies to other curves. Natural question is how do we compute such isogeny. Here is where the kernel of an isogeny comes. The kernel uniquely determines isogeny (up to isomorphism class). Formulas for calculating isogeny from its kernel were initially given by J. Vélu and the idea of calculating them efficiently was extended. To finish, I will summarize what was said above with a picture. There are two isomorphism classes on the picture above. Both curves $$E_1$$ and $$E_2$$ are isomorphic and have j-invariant = 6. As curves $$E_3$$ and $$E_4$$ have j-invariant=13, they are in a different isomorphism class. There exists an isogeny $$\phi_2$$ between curve $$E_3$$ and $$E_2$$, so they both are isogeneous. Curves $$\phi_1$$ and $$E_2$$ are isomorphic and there is isogeny $$\phi_1$$ between them. Curves $$E_1$$ and $$E_4$$ are not isomorphic. For brevity I’m skipping many important details, like details of the finite field, the fact that isogenies must be separable and that the kernel is finite. But curious readers can find a number of academic research papers available on the Internet. ### Big picture: similarities with ECDH Let’s generalize the ECDH algorithm described above, so that we can swap some elements and try to use Supersingular Isogeny Diffie-Hellman. Note that what actually happens during an ECDH key exchange is: • We have a set of points on elliptic curve, set S • We have another group of integers used for point multiplication, G • We use an element from Z to act on an element from S to get another element from S: $$G \cdot S \rightarrow S$$ Now the question is: what is our G and S in an SIDH setting? For SIDH to work, we need a big set of elements and something secret that will act on the elements from that set. This “group action” must also be resistant to attacks performed by quantum computers. In the SIDH setting, those two sets are defined as: • Set S is a set (graph) of j-invariants, such that all the curves are supersingular: $$S = [j(E_1), j(E_2), j(E_3), …. , j(E_n)]$$ • Set G is a set of isogenies acting on elliptic curves and transforming, for example, the elliptic curve $$E_1$$ into $$E_n$$: ### Random walk on supersingular graph When we talk about Isogeny Based Cryptography, as a topic distinct from Elliptic Curve Cryptography, we usually mean algorithms and protocols that rely fundamentally on the structure of isogeny graphs. An example of such a (small) graph is pictured below. Each vertex of the graph represents a different j-invariant of a set of supersingular curves. The edges between vertices represent isogenies converting one elliptic curve to another. As you can notice, the graph is strongly connected, meaning every vertex can be reached from every other vertex. In the context of isogeny-based crypto, we call such a graph a supersingular isogeny graph. I’ll skip some technical details about the construction of this graph (look for those here or here), but instead describe ideas about how it can be used. As the graph is strongly connected, it is possible to walk a whole graph by starting from any vertex, randomly choosing an edge, following it to the next vertex and then start the process again on a new vertex. Such a way of visiting edges of this graph is called a random walk. The random walk is a key concept that makes isogeny based crypto feasible. When you look closely at the graph, you can notice that each vertex has a small number of edges incident to it, this is why we can compute the isogenies efficiently. But also for any vertex there is only a limited number of isogenies to choose from, which doesn’t look like good base for a cryptographic scheme. The key question is – where does the security of the scheme come from exactly? In order to get it, it is necessary to visit a couple hundred vertices. What it means in practice is that secret isogeny (of large degree) is constructed as a composition of multiple isogenies (of small, prime degree). Which means, the secret isogeny is: This property and properties of the isogeny graph are what makes some of us believe that scheme has a good chance to be secure. More specifically, there is no efficient way of finding a path that connects $$E_0$$ with $$E_n$$, even with quantum computer at hand. The security level of a system depends on value n – the number of steps taken during the walk. The random walk is a core process used when both generating public keys and computing shared secrets. It starts with party generating random value m (see more below), starting curve $$E_0$$ and points P and Q on this curve. Those values are used to compute the kernel of an isogeny $$R_1$$ in the following way: $$R_1 = P + m \cdot Q$$ Thanks to formulas given by Vélu we can now use the point $$R_1$$ to compute the isogeny, the party will choose to move from a vertex to another one. After the isogeny $$\phi_{R_1}$$ is calculated it is applied to $$E_0$$ which results in a new curve $$E_1$$: $$\phi_{R_1}: E_0 \rightarrow E_1$$ Isogeny is also applied to points P and Q. Once on $$E_1$$ the process is repeated. This process is applied n times, and at the end a party ends up on some curve $$E_n$$ which defines isomorphism class, so also j-invariant. ### Supersingular Isogeny Diffie-Hellman The core idea in SIDH is to compose two random walks on an isogeny graph of elliptic curves in such a way that the end node of both ways of composing is the same. In order to do it, scheme sets public parameters – starting curve $$E_0$$ and 2 pairs of base points on this curve $$(PA,QA)$$ , $$(PB,QB)$$. Alice generates her random secret keys m, and calculates a secret isogeny $$\phi_q$$ by performing a random walk as described above. The walk finishes with 3 values: elliptic curve $$E_a$$ she has ended up with and pair of points $$\phi_a(PB)$$ and $$\phi_a(QB)$$ after pushing through Alice’s secret isogeny. Bob proceeds analogously which results in the triple $${E_b, \phi_b(PA), \phi_b(QA)}$$. The triple forms a public key which is exchanged between parties. The picture below visualizes the operation. The black dots represent curves grouped in the same isomorphism classes represented by light blue circles. Alice takes the orange path ending up on a curve $$E_a$$ in a separate isomorphism class than Bob after taking his dark blue path ending on $$E_b$$. SIDH is parametrized in a way that Alice and Bob will always end up in different isomorphism classes. Upon receipt of triple $${ E_a, \phi_a(PB), \phi_a(QB) }$$ from Alice, Bob will use his secret value m to calculate a new kernel – but instead of using point $$PA$$ and $$QA$$ to calculate an isogeny kernel, he will now use images $$\phi_a(PB)$$ and $$\phi_a(QB)$$ received from Alice: $$R’_1 = \phi_a(PB) + m \cdot \phi_a(QB)$$ Afterwards, he uses $$R’_1$$ to start the walk again resulting in the isogeny $$\phi’_b: E_a \rightarrow E_{ab}$$. Allice proceeds analogously resulting in the isogeny $$\phi’_a: E_b \rightarrow E_{ba}$$. With isogenies calculated this way, both Alice and Bob will converge in the same isomorphism class. The math math may seem complicated, hopefully the picture below makes it easier to understand. Bob computes a new isogeny and starts his random walk from $$E_a$$ received from Alice. He ends up on some curve $$E_{ba}$$. Similarly, Alice calculates a new isogeny, applies it on $$E_b$$ received from Bob and her random walk ends on some curve $$E_{ab}$$. Curves $$E_{ab}$$ and $$E_{ba}$$ are not likely to be the same, but construction guarantees that they are isomorphic. As mentioned earlier, isomorphic curves have the same value of j-invariant, hence the shared secret is a value of j-invariant $$j(E_{ab})$$. Coming back to differences between SIDH and ECDH – we can split them into four categories: the elements of the group we are operating on, the cornerstone computation required to agree on a shared secret, the elements representing secret values, and the difficult problem on which the security relies. In ECDH there is a secret key which is an integer scalar, in case of SIDH it is a secret isogeny, which also is generated from an integer scalar. In the case of ECDH one multiplies a point on a curve by a scalar, in the case of SIDH it is a random walk in an isogeny graph. In the case of ECDH, the public key is a point on a curve, in the case of SIDH, the public part is a curve itself and the image of some points after applying isogeny. The shared secret in the case of ECDH is a point on a curve, in the case of SIDH it is a j-invariant. ### SIKE: Supersingular Isogeny Key Encapsulation SIDH could potentially be used as a drop-in replacement of the ECDH protocol. We have actually implemented a proof-of-concept and added it to our implementation of TLS 1.3 in the tls-tris library and described (together with Mozilla) implementation details in this draft. Nevertheless, there is a problem with SIDH – the keys can be used only once. In 2016, a few researchers came up with an active attack on SIDH which works only when public keys are reused. In the context of TLS, it is not a big problem, because for each session a fresh key pair is generated (ephemeral keys), but it may not be true for other applications. SIKE is an isogeny key encapsulation which solves this problem. Bob can generate SIKE keys, upload the public part somewhere in the Internet and then anybody can use it whenever he wants to communicate with Bob securely. SIKE reuses SIDH – internally both sides of the connection always perform SIDH key generation, SIDH key agreement and apply some other cryptographic primitives in order to convert SIDH to KEM. SIKE is implemented in a few variants – each variant corresponds to the security levels using 128-, 192- and 256-bit secret keys. Higher security level means longer running time. More details about SIKE can be found here. SIKE is also one of the candidates in NIST post-quantum “competition“. I’ve skipped many important details to give a brief description of how isogeny based crypto works. If you’re curious and hungry for details, look at either of these Cloudflare meetups, where Deirdre Connolly talked about isogeny-based cryptography or this talk by Chloe Martindale during PQ Crypto School 2017. And if you would like to know more about quantum attacks on this scheme, I highly recommend this work. ## Conclusion Quantum computers that can break meaningful cryptographic parameter settings do not exist, yet. They won’t be built for at least the next few years. Nevertheless, they have already changed the way we look at current cryptographic deployments. There are at least two reasons it’s worth investing in PQ cryptography: • It takes a lot of time to build secure cryptography and we don’t actually know when today’s classical cryptography will be broken. There is a need for a good mathematical base: an initial idea of what may be secure against something that doesn’t exist yet. If you have an idea, you also need good implementation, constant time, resistance to things like time and cache side-channels, DFA, DPA, EM, and a bunch of other abbreviations indicating side-channel resistance. There is also deployment of, for example, algorithms based on elliptic curves were introduced in ’85, but started to really be used in production only during the last decade, 20 or so years later. Obviously, the implementation must be blazingly fast! Last, but not least, integration: we need time to develop standards to allow integration of PQ cryptography with protocols like TLS. • Even though efficient quantum computers probably won’t exist for another few years, the threat is real. Data encrypted with current cryptographic algorithms can be recorded now with hopes of being broken in the future. Cloudflare is motivated to help build the Internet of tomorrow with the tools at hand today. Our interest is in cryptographic techniques that can be integrated into existing protocols and widely deployed on the Internet as seamlessly as possible. PQ cryptography, like the rest of cryptography, includes many cryptosystems that can be used for communications in today’s Internet; Alice and Bob need to perform some computation, but they do not need to buy new hardware to do that. Cloudflare sees great potential in those algorithms and believes that some of them can be used as a safe replacement for classical public-key cryptosystems. Time will tell if we’re justified in this belief! # Introducing CIRCL: An Advanced Cryptographic Library Post Syndicated from Kris Kwiatkowski original https://blog.cloudflare.com/introducing-circl/ As part of Crypto Week 2019, today we are proud to release the source code of a cryptographic library we’ve been working on: a collection of cryptographic primitives written in Go, called CIRCL. This library includes a set of packages that target cryptographic algorithms for post-quantum (PQ), elliptic curve cryptography, and hash functions for prime groups. Our hope is that it’s useful for a broad audience. Get ready to discover how we made CIRCL unique. ### Cryptography in Go We use Go a lot at Cloudflare. It offers a good balance between ease of use and performance; the learning curve is very light, and after a short time, any programmer can get good at writing fast, lightweight backend services. And thanks to the possibility of implementing performance critical parts in Go assembly, we can try to ‘squeeze the machine’ and get every bit of performance. Cloudflare’s cryptography team designs and maintains security-critical projects. It’s not a secret that security is hard. That’s why, we are introducing the Cloudflare Interoperable Reusable Cryptographic Library – CIRCL. There are multiple goals behind CIRCL. First, we want to concentrate our efforts to implement cryptographic primitives in a single place. This makes it easier to ensure that proper engineering processes are followed. Second, Cloudflare is an active member of the Internet community – we are trying to improve and propose standards to help make the Internet a better place. Cloudflare’s mission is to help build a better Internet. For this reason, we want CIRCL helps the cryptographic community to create proof of concepts, like the post-quantum TLS experiments we are doing. Over the years, lots of ideas have been put on the table by cryptographers (for example, homomorphic encryption, multi-party computation, and privacy preserving constructions). Recently, we’ve seen those concepts picked up and exercised in a variety of contexts. CIRCL’s implementations of cryptographic primitives creates a powerful toolbox for developers wishing to use them. The Go language provides native packages for several well-known cryptographic algorithms, such as key agreement algorithms, hash functions, and digital signatures. There are also packages maintained by the community under golang.org/x/crypto that provide a diverse set of algorithms for supporting authenticated encryption, stream ciphers, key derivation functions, and bilinear pairings. CIRCL doesn’t try to compete with golang.org/x/crypto in any sense. Our goal is to provide a complementary set of implementations that are more aggressively optimized, or may be less commonly used but have a good chance at being very useful in the future. ### Unboxing CIRCL Our cryptography team worked on a fresh proposal to augment the capabilities of Go users with a new set of packages. You can get them by typing:  go get github.com/cloudflare/circl

The contents of CIRCL is split across different categories, summarized in this table:

CategoryAlgorithmsDescriptionApplications
Post-Quantum CryptographySIDHIsogeny-based cryptography.SIDH provides key exchange mechanisms using ephemeral keys.
SIKESIKE is a key encapsulation mechanism (KEM).Key agreement protocols.
Key ExchangeX25519, X448RFC-7748 provides new key exchange mechanisms based on Montgomery elliptic curves.TLS 1.3. Secure Shell.
FourQOne of the fastest elliptic curves at 128-bit security level.Experimental for key agreement and digital signatures.
Digital SignaturesEd25519RFC-8032 provides new digital signature algorithms based on twisted Edwards curves.Digital certificates and authentication methods.
Hash to Elliptic Curve GroupsSeveral algorithms: Elligator2, Ristretto, SWU, Icart.Protocols based on elliptic curves require hash functions that map bit strings to points on an elliptic curve.Useful in protocols such as Privacy Pass. OPAQUE.
PAKE.
Verifiable random functions.
OptimizationCurve P-384Our optimizations reduce the burden when moving from P-256 to P-384.ECDSA and ECDH using Suite B at top secret level.

### SIKE, a Post-Quantum Key Encapsulation Method

To better understand the post-quantum world, we started experimenting with post-quantum key exchange schemes and using them for key agreement in TLS 1.3. CIRCL contains the sidh package, an implementation of Supersingular Isogeny-based Diffie-Hellman (SIDH), as well as CCA2-secure Supersingular Isogeny-based Key Encapsulation (SIKE), which is based on SIDH.

CIRCL makes playing with PQ key agreement very easy. Below is an example of the SIKE interface that can be used to establish a shared secret between two parties for use in symmetric encryption. The example uses a key encapsulation mechanism (KEM). For our example in this scheme, Alice generates a random secret key, and then uses Bob’s pre-generated public key to encrypt (encapsulate) it. The resulting ciphertext is sent to Bob. Then, Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the secret key. See more details about SIKE in this Cloudflare blog.

Let’s see how to do this with CIRCL:

// Bob's key pair
prvB := NewPrivateKey(Fp503, KeyVariantSike)
pubB := NewPublicKey(Fp503, KeyVariantSike)

// Generate private key
// Generate public key
prvB.GeneratePublicKey(pubB)

var publicKeyBytes = make([]array, pubB.Size())
var privateKeyBytes = make([]array, prvB.Size())

pubB.Export(publicKeyBytes)
prvB.Export(privateKeyBytes)

// Encode public key to JSON
// Save privateKeyBytes on disk


Bob uploads the public key to a location accessible by anybody. When Alice wants to establish a shared secret with Bob, she performs encapsulation that results in two parts: a shared secret and the result of the encapsulation, the ciphertext.

// Read JSON to bytes

// Alice's key pair
pubB := NewPublicKey(Fp503, KeyVariantSike)
pubB.Import(publicKeyBytes)

kem.Encapsulate(ciphertext, sharedSecret, pubB)

// send ciphertext to Bob


Bob now receives ciphertext from Alice and decapsulates the shared secret:

var kem := sike.NewSike503(rand.Reader)
kem.Decapsulate(sharedSecret, prvA, pubA, ciphertext)


At this point, both Alice and Bob can derive a symmetric encryption key from the secret generated.

SIKE implementation contains:

• Two different field sizes: Fp503 and Fp751. The choice of the field is a trade-off between performance and security.
• Code optimized for AMD64 and ARM64 architectures, as well as generic Go code. For AMD64, we detect the micro-architecture and if it’s recent enough (e.g., it supports ADOX/ADCX and BMI2 instruction sets), we use different multiplication techniques to make an execution even faster.
• Code implemented in constant time, that is, the execution time doesn’t depend on secret values.

We also took care of low heap-memory footprint, so that the implementation uses a minimal amount of dynamically allocated memory. In the future, we plan to provide multiple implementations of post-quantum schemes. Currently, our focus is on algorithms useful for key exchange in TLS.

SIDH/SIKE are interesting because the key sizes produced by those algorithms are relatively small (comparing with other PQ schemes). Nevertheless, performance is not all that great yet, so we’ll continue looking. We plan to add lattice-based algorithms, such as NTRU-HRSS and Kyber, to CIRCL. We will also add another more experimental algorithm called cSIDH, which we would like to try in other applications. CIRCL doesn’t currently contain any post-quantum signature algorithms, which is also on our to-do list. After our experiment with TLS key exchange completes, we’re going to look at post-quantum PKI. But that’s a topic for a future blog post, so stay tuned.

Last, we must admit that our code is largely based on the implementation from the NIST submission along with the work of former intern Henry De Valence, and we would like to thank both Henry and the SIKE team for their great work.

### Elliptic Curve Cryptography

Elliptic curve cryptography brings short keys sizes and faster evaluation of operations when compared to algorithms based on RSA. Elliptic curves were standardized during the early 2000s, and have recently gained popularity as they are a more efficient way for securing communications.

Elliptic curves are used in almost every project at Cloudflare, not only for establishing TLS connections, but also for certificate validation, certificate revocation (OCSP), Privacy Pass, certificate transparency, and AMP Real URL.

The Go language provides native support for NIST-standardized curves, the most popular of which is P-256. In a previous post, Vlad Krasnov described the relevance of optimizing several cryptographic algorithms, including P-256 curve. When working at Cloudflare scale, little issues around performance are significantly magnified. This is one reason why Cloudflare pushes the boundaries of efficiency.

A similar thing happened on the chained validation of certificates. For some certificates, we observed performance issues when validating a chain of certificates. Our team successfully diagnosed this issue: certificates which had signatures from the P-384 curve, which is the curve that corresponds to the 192-bit security level, were taking up 99% of CPU time! It is common for certificates closer to the root of the chain of trust to rely on stronger security assumptions, for example, using larger elliptic curves. Our first-aid reaction comes in the form of an optimized implementation written by Brendan McMillion that reduced the time of performing elliptic curve operations by a factor of 10. The code for P-384 is also available in CIRCL.

The latest developments in elliptic curve cryptography have caused a shift to use elliptic curve models with faster arithmetic operations. The best example is undoubtedly Curve25519; other examples are the Goldilocks and FourQ curves. CIRCL supports all of these curves, allowing instantiation of Diffie-Hellman exchanges and Edwards digital signatures. Although it slightly overlaps the Go native libraries, CIRCL has architecture-dependent optimizations.

### Hashing to Groups

Many cryptographic protocols rely on the hardness of solving the Discrete Logarithm Problem (DLP) in special groups, one of which is the integers reduced modulo a large integer. To guarantee that the DLP is hard to solve, the modulus must be a large prime number. Increasing its size boosts on security, but also makes operations more expensive. A better approach is using elliptic curve groups since they provide faster operations.

In some cryptographic protocols, it is common to use a function with the properties of a cryptographic hash function that maps bit strings into elements of the group. This is easy to accomplish when, for example, the group is the set of integers modulo a large prime. However, it is not so clear how to perform this function using elliptic curves. In cryptographic literature, several methods have been proposed using the terms hashing to curves or hashing to point indistinctly.

The main issue is that there is no general method for deterministically finding points on any elliptic curve, the closest available are methods that target special curves and parameters. This is a problem for implementers of cryptographic algorithms, who have a hard time figuring out on a suitable method for hashing to points of an elliptic curve. Compounding that, chances of doing this wrong are high. There are many different methods, elliptic curves, and security considerations to analyze. For example, a vulnerability on WPA3 handshake protocol exploited a non-constant time hashing method resulting in a recovery of keys. Currently, an IETF draft is tracking work in-progress that provides hashing methods unifying requirements with curves and their parameters.

Corresponding to this problem, CIRCL will include implementations of hashing methods for elliptic curves. Our development is accompanying the evolution of the IEFT draft. Therefore, users of CIRCL will have this added value as the methods implement a ready-to-go functionality, covering the needs of some cryptographic protocols.

### Update on Bilinear Pairings

Bilinear pairings are sometimes regarded as a tool for cryptanalysis, however pairings can also be used in a constructive way by allowing instantiation of advanced public-key algorithms, for example, identity-based encryption, attribute-based encryption, blind digital signatures, three-party key agreement, among others.

An efficient way to instantiate a bilinear pairing is to use elliptic curves. Note that only a special class of curves can be used, thus so-called pairing-friendly curves have specific properties that enable the efficient evaluation of a pairing.

Some families of pairing-friendly curves were introduced by Barreto-Naehrig (BN), Kachisa-Schaefer-Scott (KSS), and Barreto-Lynn-Scott (BLS). BN256 is a BN curve using a 256-bit prime and is one of the fastest options for implementing a bilinear pairing. The Go native library supports this curve in the package golang.org/x/crypto/bn256. In fact, the BN256 curve is used by Cloudflare’s Geo Key Manager, which allows distributing encrypted keys around the world. At Cloudflare, high-performance is a must and with this motivation, in 2017, we released an optimized implementation of the BN256 package that is 8x faster than the Go’s native package. The success of these optimizations reached several other projects such as the Ethereum protocol and the Randomness Beacon project.

Recent improvements in solving the DLP over extension fields, GF(pᵐ) for p prime and m>1, impacted the security of pairings, causing recalculation of the parameters used for pairing-friendly curves.

Before these discoveries, the BN256 curve provided a 128-bit security level, but now larger primes are needed to target the same security level. That does not mean that the BN256 curve has been broken, since BN256 gives a security of 100 bits, that is, approximately 2¹⁰⁰ operations are required to cause a real danger, which is still unfeasible with current computing power.

With our CIRCL announcement, we want to announce our plans for research and development to obtain efficient curve(s) to become a stronger successor of BN256. According to the estimation by Barbulescu-Duquesne, a BN curve must use primes of at least 456 bits to match a 128-bit security level. However, the impact on the recalculation of parameters brings back to the main scene BLS and KSS curves as efficient alternatives. To this end a standardization effort at IEFT is in progress with the aim of defining parameters and pairing-friendly curves that match different security levels.

Note that regardless of the curve(s) chosen, there is an unavoidable performance downgrade when moving from BN256 to a stronger curve. Actual timings were presented by Aranha, who described the evolution of the race for high-performance pairing implementations. The purpose of our continuous development of CIRCL is to minimize this impact through fast implementations.

### Optimizations

Go itself is a very easy to learn and use for system programming and yet makes it possible to use assembly so that you can stay close “to the metal”. We have blogged about improving performance in Go few times in the past (see these posts about encryption, ciphersuites, and image encoding).

When developing CIRCL, we crafted the code to get the best possible performance from the machine. We leverage the capabilities provided by the architecture and the architecture-specific instructions. This means that in some cases we need to get our hands dirty and rewrite parts of the software in Go assembly, which is not easy, but definitely worth the effort when it comes to performance. We focused on x86-64, as this is our main target, but we also think that it’s worth looking at ARM architecture, and in some cases (like SIDH or P-384), CIRCL has optimized code for this platform.

We also try to ensure that code uses memory efficiently – crafting it in a way that fast allocations on the stack are preferred over expensive heap allocations. In cases where heap allocation is needed, we tried to design the APIs in a way that, they allow pre-allocating memory ahead of time and reuse it for multiple operations.

### Security

The CIRCL library is offered as-is, and without a guarantee. Therefore, it is expected that changes in the code, repository, and API occur in the future. We recommend to take caution before using this library in a production application since part of its content is experimental.

As new attacks and vulnerabilities arise over the time, security of software should be treated as a continuous process. In particular, the assessment of cryptographic software is critical, it requires the expertise of several fields, not only computer science. Cryptography engineers must be aware of the latest vulnerabilities and methods of attack in order to defend against them.

The development of CIRCL follows best practices on the secure development. For example, if time execution of the code depends on secret data, the attacker could leverage those irregularities and recover secret keys. In our code, we take care of writing constant-time code and hence prevent timing based attacks.

Developers of cryptographic software must also be aware of optimizations performed by the compiler and/or the processor since these optimizations can lead to insecure binary codes in some cases. All of these issues could be exploited in real attacks aimed at compromising systems and keys. Therefore, software changes must be tracked down through thorough code reviews. Also static analyzers and automated testing tools play an important role on the security of the software.

### Summary

CIRCL is envisioned as an effective tool for experimenting with modern cryptographic algorithms yet providing high-performance implementations. Today is marked as the starting point of a continuous machinery of innovation and retribution to the community in the form of a cryptographic library. There are still several other applications such as homomorphic encryption, multi-party computation, and privacy-preserving protocols that we would like to explore.

We are team of cryptography, security, and software engineers working to improve and augment Cloudflare products. Our team keeps the communication channels open for receiving comments, including improvements, and merging contributions. We welcome opinions and contributions! If you would like to get in contact, you should check out our github repository for CIRCL github.com/cloudflare/circl. We want to share our work and hope it makes someone else’s job easier as well.

Finally, special thanks to all the contributors who has either directly or indirectly helped to implement the library – Ko Stoffelen, Brendan McMillion, Henry de Valence, Michael McLoughlin and all the people who invested their time in reviewing our code.

# Introducing CIRCL: An Advanced Cryptographic Library

Post Syndicated from Kris Kwiatkowski original https://blog.cloudflare.com/introducing-circl/

As part of Crypto Week 2019, today we are proud to release the source code of a cryptographic library we’ve been working on: a collection of cryptographic primitives written in Go, called CIRCL. This library includes a set of packages that target cryptographic algorithms for post-quantum (PQ), elliptic curve cryptography, and hash functions for prime groups. Our hope is that it’s useful for a broad audience. Get ready to discover how we made CIRCL unique.

### Cryptography in Go

We use Go a lot at Cloudflare. It offers a good balance between ease of use and performance; the learning curve is very light, and after a short time, any programmer can get good at writing fast, lightweight backend services. And thanks to the possibility of implementing performance critical parts in Go assembly, we can try to ‘squeeze the machine’ and get every bit of performance.

Cloudflare’s cryptography team designs and maintains security-critical projects. It’s not a secret that security is hard. That’s why, we are introducing the Cloudflare Interoperable Reusable Cryptographic Library – CIRCL. There are multiple goals behind CIRCL. First, we want to concentrate our efforts to implement cryptographic primitives in a single place. This makes it easier to ensure that proper engineering processes are followed. Second, Cloudflare is an active member of the Internet community – we are trying to improve and propose standards to help make the Internet a better place.

Cloudflare’s mission is to help build a better Internet. For this reason, we want CIRCL helps the cryptographic community to create proof of concepts, like the post-quantum TLS experiments we are doing. Over the years, lots of ideas have been put on the table by cryptographers (for example, homomorphic encryption, multi-party computation, and privacy preserving constructions). Recently, we’ve seen those concepts picked up and exercised in a variety of contexts. CIRCL’s implementations of cryptographic primitives creates a powerful toolbox for developers wishing to use them.

The Go language provides native packages for several well-known cryptographic algorithms, such as key agreement algorithms, hash functions, and digital signatures. There are also packages maintained by the community under golang.org/x/crypto that provide a diverse set of algorithms for supporting authenticated encryption, stream ciphers, key derivation functions, and bilinear pairings. CIRCL doesn’t try to compete with golang.org/x/crypto in any sense. Our goal is to provide a complementary set of implementations that are more aggressively optimized, or may be less commonly used but have a good chance at being very useful in the future.

### Unboxing CIRCL

Our cryptography team worked on a fresh proposal to augment the capabilities of Go users with a new set of packages.  You can get them by typing:

$go get github.com/cloudflare/circl The contents of CIRCL is split across different categories, summarized in this table: CategoryAlgorithmsDescriptionApplications Post-Quantum CryptographySIDHIsogeny-based cryptography.SIDH provides key exchange mechanisms using ephemeral keys. SIKESIKE is a key encapsulation mechanism (KEM).Key agreement protocols. Key ExchangeX25519, X448RFC-7748 provides new key exchange mechanisms based on Montgomery elliptic curves.TLS 1.3. Secure Shell. FourQOne of the fastest elliptic curves at 128-bit security level.Experimental for key agreement and digital signatures. Digital SignaturesEd25519RFC-8032 provides new digital signature algorithms based on twisted Edwards curves.Digital certificates and authentication methods. Hash to Elliptic Curve GroupsSeveral algorithms: Elligator2, Ristretto, SWU, Icart.Protocols based on elliptic curves require hash functions that map bit strings to points on an elliptic curve.Useful in protocols such as Privacy Pass. OPAQUE. PAKE. Verifiable random functions. OptimizationCurve P-384Our optimizations reduce the burden when moving from P-256 to P-384.ECDSA and ECDH using Suite B at top secret level. ### SIKE, a Post-Quantum Key Encapsulation Mechanism To better understand the post-quantum world, we started experimenting with post-quantum key exchange schemes and using them for key agreement in TLS 1.3. CIRCL contains the sidh package, an implementation of Supersingular Isogeny-based Diffie-Hellman (SIDH), as well as CCA2-secure Supersingular Isogeny-based Key Encapsulation (SIKE), which is based on SIDH. CIRCL makes playing with PQ key agreement very easy. Below is an example of the SIKE interface that can be used to establish a shared secret between two parties for use in symmetric encryption. The example uses a key encapsulation mechanism (KEM). For our example in this scheme, Alice generates a random secret key, and then uses Bob’s pre-generated public key to encrypt (encapsulate) it. The resulting ciphertext is sent to Bob. Then, Bob uses his private key to decrypt (decapsulate) the ciphertext and retrieve the secret key. See more details about SIKE in this Cloudflare blog. Let’s see how to do this with CIRCL: // Bob's key pair prvB := NewPrivateKey(Fp503, KeyVariantSike) pubB := NewPublicKey(Fp503, KeyVariantSike) // Generate private key prvB.Generate(rand.Reader) // Generate public key prvB.GeneratePublicKey(pubB) var publicKeyBytes = make([]array, pubB.Size()) var privateKeyBytes = make([]array, prvB.Size()) pubB.Export(publicKeyBytes) prvB.Export(privateKeyBytes) // Encode public key to JSON // Save privateKeyBytes on disk  Bob uploads the public key to a location accessible by anybody. When Alice wants to establish a shared secret with Bob, she performs encapsulation that results in two parts: a shared secret and the result of the encapsulation, the ciphertext. // Read JSON to bytes // Alice's key pair pubB := NewPublicKey(Fp503, KeyVariantSike) pubB.Import(publicKeyBytes) var kem := sike.NewSike503(rand.Reader) kem.Encapsulate(ciphertext, sharedSecret, pubB) // send ciphertext to Bob  Bob now receives ciphertext from Alice and decapsulates the shared secret: var kem := sike.NewSike503(rand.Reader) kem.Decapsulate(sharedSecret, prvA, pubA, ciphertext)  At this point, both Alice and Bob can derive a symmetric encryption key from the secret generated. SIKE implementation contains: • Two different field sizes: Fp503 and Fp751. The choice of the field is a trade-off between performance and security. • Code optimized for AMD64 and ARM64 architectures, as well as generic Go code. For AMD64, we detect the micro-architecture and if it’s recent enough (e.g., it supports ADOX/ADCX and BMI2 instruction sets), we use different multiplication techniques to make an execution even faster. • Code implemented in constant time, that is, the execution time doesn’t depend on secret values. We also took care of low heap-memory footprint, so that the implementation uses a minimal amount of dynamically allocated memory. In the future, we plan to provide multiple implementations of post-quantum schemes. Currently, our focus is on algorithms useful for key exchange in TLS. SIDH/SIKE are interesting because the key sizes produced by those algorithms are relatively small (comparing with other PQ schemes). Nevertheless, performance is not all that great yet, so we’ll continue looking. We plan to add lattice-based algorithms, such as NTRU-HRSS and Kyber, to CIRCL. We will also add another more experimental algorithm called cSIDH, which we would like to try in other applications. CIRCL doesn’t currently contain any post-quantum signature algorithms, which is also on our to-do list. After our experiment with TLS key exchange completes, we’re going to look at post-quantum PKI. But that’s a topic for a future blog post, so stay tuned. Last, we must admit that our code is largely based on the implementation from the NIST submission along with the work of former intern Henry De Valence, and we would like to thank both Henry and the SIKE team for their great work. ### Elliptic Curve Cryptography Elliptic curve cryptography brings short keys sizes and faster evaluation of operations when compared to algorithms based on RSA. Elliptic curves were standardized during the early 2000s, and have recently gained popularity as they are a more efficient way for securing communications. Elliptic curves are used in almost every project at Cloudflare, not only for establishing TLS connections, but also for certificate validation, certificate revocation (OCSP), Privacy Pass, certificate transparency, and AMP Real URL. The Go language provides native support for NIST-standardized curves, the most popular of which is P-256. In a previous post, Vlad Krasnov described the relevance of optimizing several cryptographic algorithms, including P-256 curve. When working at Cloudflare scale, little issues around performance are significantly magnified. This is one reason why Cloudflare pushes the boundaries of efficiency. A similar thing happened on the chained validation of certificates. For some certificates, we observed performance issues when validating a chain of certificates. Our team successfully diagnosed this issue: certificates which had signatures from the P-384 curve, which is the curve that corresponds to the 192-bit security level, were taking up 99% of CPU time! It is common for certificates closer to the root of the chain of trust to rely on stronger security assumptions, for example, using larger elliptic curves. Our first-aid reaction comes in the form of an optimized implementation written by Brendan McMillion that reduced the time of performing elliptic curve operations by a factor of 10. The code for P-384 is also available in CIRCL. The latest developments in elliptic curve cryptography have caused a shift to use elliptic curve models with faster arithmetic operations. The best example is undoubtedly Curve25519; other examples are the Goldilocks and FourQ curves. CIRCL supports all of these curves, allowing instantiation of Diffie-Hellman exchanges and Edwards digital signatures. Although it slightly overlaps the Go native libraries, CIRCL has architecture-dependent optimizations. ### Hashing to Groups Many cryptographic protocols rely on the hardness of solving the Discrete Logarithm Problem (DLP) in special groups, one of which is the integers reduced modulo a large integer. To guarantee that the DLP is hard to solve, the modulus must be a large prime number. Increasing its size boosts on security, but also makes operations more expensive. A better approach is using elliptic curve groups since they provide faster operations. In some cryptographic protocols, it is common to use a function with the properties of a cryptographic hash function that maps bit strings into elements of the group. This is easy to accomplish when, for example, the group is the set of integers modulo a large prime. However, it is not so clear how to perform this function using elliptic curves. In cryptographic literature, several methods have been proposed using the terms hashing to curves or hashing to point indistinctly. The main issue is that there is no general method for deterministically finding points on any elliptic curve, the closest available are methods that target special curves and parameters. This is a problem for implementers of cryptographic algorithms, who have a hard time figuring out on a suitable method for hashing to points of an elliptic curve. Compounding that, chances of doing this wrong are high. There are many different methods, elliptic curves, and security considerations to analyze. For example, a vulnerability on WPA3 handshake protocol exploited a non-constant time hashing method resulting in a recovery of keys. Currently, an IETF draft is tracking work in-progress that provides hashing methods unifying requirements with curves and their parameters. Corresponding to this problem, CIRCL will include implementations of hashing methods for elliptic curves. Our development is accompanying the evolution of the IEFT draft. Therefore, users of CIRCL will have this added value as the methods implement a ready-to-go functionality, covering the needs of some cryptographic protocols. ### Update on Bilinear Pairings Bilinear pairings are sometimes regarded as a tool for cryptanalysis, however pairings can also be used in a constructive way by allowing instantiation of advanced public-key algorithms, for example, identity-based encryption, attribute-based encryption, blind digital signatures, three-party key agreement, among others. An efficient way to instantiate a bilinear pairing is to use elliptic curves. Note that only a special class of curves can be used, thus so-called pairing-friendly curves have specific properties that enable the efficient evaluation of a pairing. Some families of pairing-friendly curves were introduced by Barreto-Naehrig (BN), Kachisa-Schaefer-Scott (KSS), and Barreto-Lynn-Scott (BLS). BN256 is a BN curve using a 256-bit prime and is one of the fastest options for implementing a bilinear pairing. The Go native library supports this curve in the package golang.org/x/crypto/bn256. In fact, the BN256 curve is used by Cloudflare’s Geo Key Manager, which allows distributing encrypted keys around the world. At Cloudflare, high-performance is a must and with this motivation, in 2017, we released an optimized implementation of the BN256 package that is 8x faster than the Go’s native package. The success of these optimizations reached several other projects such as the Ethereum protocol and the Randomness Beacon project. Recent improvements in solving the DLP over extension fields, GF(pᵐ) for p prime and m>1, impacted the security of pairings, causing recalculation of the parameters used for pairing-friendly curves. Before these discoveries, the BN256 curve provided a 128-bit security level, but now larger primes are needed to target the same security level. That does not mean that the BN256 curve has been broken, since BN256 gives a security of 100 bits, that is, approximately 2¹⁰⁰ operations are required to cause a real danger, which is still unfeasible with current computing power. With our CIRCL announcement, we want to announce our plans for research and development to obtain efficient curve(s) to become a stronger successor of BN256. According to the estimation by Barbulescu-Duquesne, a BN curve must use primes of at least 456 bits to match a 128-bit security level. However, the impact on the recalculation of parameters brings back to the main scene BLS and KSS curves as efficient alternatives. To this end a standardization effort at IEFT is in progress with the aim of defining parameters and pairing-friendly curves that match different security levels. Note that regardless of the curve(s) chosen, there is an unavoidable performance downgrade when moving from BN256 to a stronger curve. Actual timings were presented by Aranha, who described the evolution of the race for high-performance pairing implementations. The purpose of our continuous development of CIRCL is to minimize this impact through fast implementations. ### Optimizations Go itself is a very easy to learn and use for system programming and yet makes it possible to use assembly so that you can stay close “to the metal”. We have blogged about improving performance in Go few times in the past (see these posts about encryption, ciphersuites, and image encoding). When developing CIRCL, we crafted the code to get the best possible performance from the machine. We leverage the capabilities provided by the architecture and the architecture-specific instructions. This means that in some cases we need to get our hands dirty and rewrite parts of the software in Go assembly, which is not easy, but definitely worth the effort when it comes to performance. We focused on x86-64, as this is our main target, but we also think that it’s worth looking at ARM architecture, and in some cases (like SIDH or P-384), CIRCL has optimized code for this platform. We also try to ensure that code uses memory efficiently – crafting it in a way that fast allocations on the stack are preferred over expensive heap allocations. In cases where heap allocation is needed, we tried to design the APIs in a way that, they allow pre-allocating memory ahead of time and reuse it for multiple operations. ### Security The CIRCL library is offered as-is, and without a guarantee. Therefore, it is expected that changes in the code, repository, and API occur in the future. We recommend to take caution before using this library in a production application since part of its content is experimental. As new attacks and vulnerabilities arise over the time, security of software should be treated as a continuous process. In particular, the assessment of cryptographic software is critical, it requires the expertise of several fields, not only computer science. Cryptography engineers must be aware of the latest vulnerabilities and methods of attack in order to defend against them. The development of CIRCL follows best practices on the secure development. For example, if time execution of the code depends on secret data, the attacker could leverage those irregularities and recover secret keys. In our code, we take care of writing constant-time code and hence prevent timing based attacks. Developers of cryptographic software must also be aware of optimizations performed by the compiler and/or the processor since these optimizations can lead to insecure binary codes in some cases. All of these issues could be exploited in real attacks aimed at compromising systems and keys. Therefore, software changes must be tracked down through thorough code reviews. Also static analyzers and automated testing tools play an important role on the security of the software. ### Summary CIRCL is envisioned as an effective tool for experimenting with modern cryptographic algorithms yet providing high-performance implementations. Today is marked as the starting point of a continuous machinery of innovation and retribution to the community in the form of a cryptographic library. There are still several other applications such as homomorphic encryption, multi-party computation, and privacy-preserving protocols that we would like to explore. We are team of cryptography, security, and software engineers working to improve and augment Cloudflare products. Our team keeps the communication channels open for receiving comments, including improvements, and merging contributions. We welcome opinions and contributions! If you would like to get in contact, you should check out our github repository for CIRCL github.com/cloudflare/circl. We want to share our work and hope it makes someone else’s job easier as well. Finally, special thanks to all the contributors who has either directly or indirectly helped to implement the library – Ko Stoffelen, Brendan McMillion, Henry de Valence, Michael McLoughlin and all the people who invested their time in reviewing our code. # Securing Certificate Issuance using Multipath Domain Control Validation Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/secure-certificate-issuance/ Trust on the Internet is underpinned by the Public Key Infrastructure (PKI). PKI grants servers the ability to securely serve websites by issuing digital certificates, providing the foundation for encrypted and authentic communication. Certificates make HTTPS encryption possible by using the public key in the certificate to verify server identity. HTTPS is especially important for websites that transmit sensitive data, such as banking credentials or private messages. Thankfully, modern browsers, such as Google Chrome, flag websites not secured using HTTPS by marking them “Not secure,” allowing users to be more security conscious of the websites they visit. Now that we know what certificates are used for, let’s talk about where they come from. ### Certificate Authorities Certificate Authorities (CAs) are the institutions responsible for issuing certificates. When issuing a certificate for any given domain, they use Domain Control Validation (DCV) to verify that the entity requesting a certificate for the domain is the legitimate owner of the domain. With DCV the domain owner: 1. creates a DNS resource record for a domain; 2. uploads a document to the web server located at that domain; OR 3. proves ownership of the domain’s administrative email account. The DCV process prevents adversaries from obtaining private-key and certificate pairs for domains not owned by the requestor. Preventing adversaries from acquiring this pair is critical: if an incorrectly issued certificate and private-key pair wind up in an adversary’s hands, they could pose as the victim’s domain and serve sensitive HTTPS traffic. This violates our existing trust of the Internet, and compromises private data on a potentially massive scale. For example, an adversary that tricks a CA into mis-issuing a certificate for gmail.com could then perform TLS handshakes while pretending to be Google, and exfiltrate cookies and login information to gain access to the victim’s Gmail account. The risks of certificate mis-issuance are clearly severe. ### Domain Control Validation To prevent attacks like this, CAs only issue a certificate after performing DCV. One way of validating domain ownership is through HTTP validation, done by uploading a text file to a specific HTTP endpoint on the webserver they want to secure. Another DCV method is done using email verification, where an email with a validation code link is sent to the administrative contact for the domain. ### HTTP Validation Suppose Alice buys the domain name aliceswonderland.com and wants to get a dedicated certificate for this domain. Alice chooses to use Let’s Encrypt as their certificate authority. First, Alice must generate their own private key and create a certificate signing request (CSR). She sends the CSR to Let’s Encrypt, but the CA won’t issue a certificate for that CSR and private key until they know Alice owns aliceswonderland.com. Alice can then choose to prove that she owns this domain through HTTP validation. When Let’s Encrypt performs DCV over HTTP, they require Alice to place a randomly named file in the /.well-known/acme-challenge path for her website. The CA must retrieve the text file by sending an HTTP GET request to http://aliceswonderland.com/.well-known/acme-challenge/<random_filename>. An expected value must be present on this endpoint for DCV to succeed. For HTTP validation, Alice would upload a file to http://aliceswonderland.com/.well-known/acme-challenge/YnV0dHNz where the body contains: curl http://aliceswonderland.com/.well-known/acme-challenge/YnV0dHNz GET /.well-known/acme-challenge/LoqXcYV8...jxAjEuX0 Host: aliceswonderland.com HTTP/1.1 200 OK Content-Type: application/octet-stream YnV0dHNz.TEST_CLIENT_KEY  The CA instructs them to use the Base64 token YnV0dHNz. TEST_CLIENT_KEY in an account-linked key that only the certificate requestor and the CA know. The CA uses this field combination to verify that the certificate requestor actually owns the domain. Afterwards, Alice can get her certificate for her website! ### DNS Validation Another way users can validate domain ownership is to add a DNS TXT record containing a verification string or token from the CA to their domain’s resource records. For example, here’s a domain for an enterprise validating itself towards Google: $ dig TXT aliceswonderland.com


Here, Alice chooses to create a TXT DNS resource record with a specific token value. A Google CA can verify the presence of this token to validate that Alice actually owns her website.

### Types of BGP Hijacking Attacks

Certificate issuance is required for servers to securely communicate with clients. This is why it’s so important that the process responsible for issuing certificates is also secure. Unfortunately, this is not always the case.

Researchers at Princeton University recently discovered that common DCV methods are vulnerable to attacks executed by network-level adversaries. If Border Gateway Protocol (BGP) is the “postal service” of the Internet responsible for delivering data through the most efficient routes, then Autonomous Systems (AS) are individual post office branches that represent an Internet network run by a single organization. Sometimes network-level adversaries advertise false routes over BGP to steal traffic, especially if that traffic contains something important, like a domain’s certificate.

Bamboozling Certificate Authorities with BGP highlights five types of attacks that can be orchestrated during the DCV process to obtain a certificate for a domain the adversary does not own. After implementing these attacks, the authors were able to (ethically) obtain certificates for domains they did not own from the top five CAs: Let’s Encrypt, GoDaddy, Comodo, Symantec, and GlobalSign. But how did they do it?

### Attacking the Domain Control Validation Process

There are two main approaches to attacking the DCV process with BGP hijacking:

1. Sub-Prefix Attack
2. Equally-Specific-Prefix Attack

These attacks create a vulnerability when an adversary sends a certificate signing request for a victim’s domain to a CA. When the CA verifies the network resources using an HTTP GET  request (as discussed earlier), the adversary then uses BGP attacks to hijack traffic to the victim’s domain in a way that the CA’s request is rerouted to the adversary and not the domain owner. To understand how these attacks are conducted, we first need to do a little bit of math.

Every device on the Internet uses an IP (Internet Protocol) address as a numerical identifier. IPv4 addresses contain 32 bits and follow a slash notation to indicate the size of the prefix. So, in the network address 123.1.2.0/24, “/24” refers to how many bits the network contains. This means that there are 8 bits left that contain the host addresses, for a total of 256 host addresses. The smaller the prefix number, the more host addresses remain in the network. With this knowledge, let’s jump into the attacks!

#### Attack one: Sub-Prefix Attack

When BGP announces a route, the router always prefers to follow the more specific route. So if 123.0.0.0/8 and 123.1.2.0/24 are advertised, the router will use the latter as it is the more specific prefix. This becomes a problem when an adversary makes a BGP announcement to a specific IP address while using the victim’s domain IP address. Let’s say the IP address for our victim, leagueofentropy.com, is 123.0.0.0/8. If an adversary announces the prefix 123.1.2.0/24, then they will capture the victim’s traffic, launching a sub-prefix hijack attack.

For example, in an attack during April 2018, routes were announced with the more specific /24 vs. the existing /23. In the diagram below, /23 is Texas and /24 is the more specific Austin, Texas. The new (but nefarious) routes overrode the existing routes for portions of the Internet. The attacker then ran a nefarious DNS server on the normal IP addresses with DNS records pointing at some new nefarious web server instead of the existing server. This attracted the traffic destined for the victim’s domain within the area the nefarious routes were being propagated. The reason this attack was successful was because a more specific prefix is always preferred by the receiving routers.

#### Attack two: Equally-Specific-Prefix Attack

In the last attack, the adversary was able to hijack traffic by offering a more specific announcement, but what if the victim’s prefix is /24 and a sub-prefix attack is not viable? In this case, an attacker would launch an equally-specific-prefix hijack, where the attacker announces the same prefix as the victim. This means that the AS chooses the preferred route between the victim and the adversary’s announcements based on properties like path length. This attack only ever intercepts a portion of the traffic.

There are more advanced attacks that are covered in more depth in the paper. They are fundamentally similar attacks but are more stealthy.

Once an attacker has successfully obtained a bogus certificate for a domain that they do not own, they can perform a convincing attack where they pose as the victim’s domain and are able to decrypt and intercept the victim’s TLS traffic. The ability to decrypt the TLS traffic allows the adversary to completely Monster-in-the-Middle (MITM) encrypted TLS traffic and reroute Internet traffic destined for the victim’s domain to the adversary. To increase the stealthiness of the attack, the adversary will continue to forward traffic through the victim’s domain to perform the attack in an undetected manner.

### DNS Spoofing

Another way an adversary can gain control of a domain is by spoofing DNS traffic by using a source IP address that belongs to a DNS nameserver. Because anyone can modify their packets’ outbound IP addresses, an adversary can fake the IP address of any DNS nameserver involved in resolving the victim’s domain, and impersonate a nameserver when responding to a CA.

This attack is more sophisticated than simply spamming a CA with falsified DNS responses. Because each DNS query has its own randomized query identifiers and source port, a fake DNS response must match the DNS query’s identifiers to be convincing. Because these query identifiers are random, making a spoofed response with the correct identifiers is extremely difficult.

Adversaries can fragment User Datagram Protocol (UDP) DNS packets so that identifying DNS response information (like the random DNS query identifier) is delivered in one packet, while the actual answer section follows in another packet. This way, the adversary spoofs the DNS response to a legitimate DNS query.

In doing so, the answer section of a DNS response (the important part!) can be falsified, and an adversary can trick a CA into mis-issuing a certificate.

### Solution

At first glance, one could think a Certificate Transparency log could expose a mis-issued certificate and allow a CA to quickly revoke it. CT logs, however, can take up to 24 hours to include newly issued certificates, and certificate revocation can be inconsistently followed among different browsers. We need a solution that allows CAs to proactively prevent this attacks, not retroactively address them.

We’re excited to announce that Cloudflare provides CAs a free API to leverage our global network to perform DCV from multiple vantage points around the world. This API bolsters the DCV process against BGP hijacking and off-path DNS attacks.

Given that Cloudflare runs 175+ datacenters around the world, we are in a unique position to perform DCV from multiple vantage points. Each datacenter has a unique path to DNS nameservers or HTTP endpoints, which means that successful hijacking of a BGP route can only affect a subset of DCV requests, further hampering BGP hijacks. And since we use RPKI, we actually sign and verify BGP routes.

This DCV checker additionally protects CAs against off-path, DNS spoofing attacks. An additional feature that we built into the service that helps protect against off-path attackers is DNS query source IP randomization. By making the source IP unpredictable to the attacker, it becomes more challenging to spoof the second fragment of the forged DNS response to the DCV validation agent.

By comparing multiple DCV results collected over multiple paths, our DCV API makes it virtually impossible for an adversary to mislead a CA into thinking they own a domain when they actually don’t. CAs can use our tool to ensure that they only issue certificates to rightful domain owners.

Our multipath DCV checker consists of two services:

1. DCV agents responsible for performing DCV out of a specific datacenter, and
2. a DCV orchestrator that handles multipath DCV requests from CAs and dispatches them to a subset of DCV agents.

When a CA wants to ensure that DCV occurred without being intercepted, it can send a request to our API specifying the type of DCV to perform and its parameters.

The DCV orchestrator then forwards each request to a random subset of over 20 DCV agents in different datacenters. Each DCV agent performs the DCV request and forwards the result to the DCV orchestrator, which aggregates what each agent observed and returns it to the CA.

This approach can also be generalized to performing multipath queries over DNS records, like Certificate Authority Authorization (CAA) records. CAA records authorize CAs to issue certificates for a domain, so spoofing them to trick unauthorized CAs into issuing certificates is another attack vector that multipath observation prevents.

As we were developing our multipath checker, we were in contact with the Princeton research group that introduced the proof-of-concept (PoC) of certificate mis-issuance through BGP hijacking attacks. Prateek Mittal, coauthor of the Bamboozling Certificate Authorities with BGP paper, wrote:

“Our analysis shows that domain validation from multiple vantage points significantly mitigates the impact of localized BGP attacks. We recommend that all certificate authorities adopt this approach to enhance web security. A particularly attractive feature of Cloudflare’s implementation of this defense is that Cloudflare has access to a vast number of vantage points on the Internet, which significantly enhances the robustness of domain control validation.”

Our DCV checker follows our belief that trust on the Internet must be distributed, and vetted through third-party analysis (like that provided by Cloudflare) to ensure consistency and security. This tool joins our pre-existing Certificate Transparency monitor as a set of services CAs are welcome to use in improving the accountability of certificate issuance.

### An Opportunity to Dogfood

Building our multipath DCV checker also allowed us to dogfood multiple Cloudflare products.

The DCV orchestrator as a simple fetcher and aggregator was a fantastic candidate for Cloudflare Workers. We implemented the orchestrator in TypeScript using this post as a guide, and created a typed, reliable orchestrator service that was easy to deploy and iterate on. Hooray that we don’t have to maintain our own dcv-orchestrator  server!

We use Argo Tunnel to allow Cloudflare Workers to contact DCV agents. Argo Tunnel allows us to easily and securely expose our DCV agents to the Workers environment. Since Cloudflare has approximately 175 datacenters running DCV agents, we expose many services through Argo Tunnel, and have had the opportunity to load test Argo Tunnel as a power user with a wide variety of origins. Argo Tunnel readily handled this influx of new origins!

If you and/or your organization are interested in trying our DCV checker, email [email protected] and let us know! We’d love to hear more about how multipath querying and validation bolsters the security of your certificate issuance.

As a new class of BGP and IP spoofing attacks threaten to undermine PKI fundamentals, it’s important that website owners advocate for multipath validation when they are issued certificates. We encourage all CAs to use multipath validation, whether it is Cloudflare’s or their own. Jacob Hoffman-Andrews, Tech Lead, Let’s Encrypt, wrote:

“BGP hijacking is one of the big challenges the web PKI still needs to solve, and we think multipath validation can be part of the solution. We’re testing out our own implementation and we encourage other CAs to pursue multipath as well”

Hopefully in the future, website owners will look at multipath validation support when selecting a CA.

# Kernel 4.17 released

Post Syndicated from corbet original https://lwn.net/Articles/756373/rss

Linus has released the 4.17 kernel, which
will indeed be called “4.17”.
No, I didn’t call it 5.0, even though all the git object count
numerology was in place for that. It will happen in the not _too_
distant future, and I’m told all the release scripts on kernel.org are
ready for it, but I didn’t feel there was any real reason for it.

Headline features in this release include
improved load estimation in the CPU
scheduler,
raw
BPF tracepoints
,
lazytime support in the XFS filesystem,
full in-kernel TLS protocol support,
histogram triggers for tracing,
mitigations for the latest Spectre variants,
and, of course, the removal of support for eight unloved processor
architectures.

Post Syndicated from ris original https://lwn.net/Articles/755667/rss

Security updates have been issued by Arch Linux (bind, libofx, and thunderbird), Debian (thunderbird, xdg-utils, and xen), Fedora (procps-ng), Mageia (gnupg2, mbedtls, pdns, and pdns-recursor), openSUSE (bash, GraphicsMagick, icu, and kernel), Oracle (thunderbird), Red Hat (java-1.7.1-ibm, java-1.8.0-ibm, and thunderbird), Scientific Linux (thunderbird), and Ubuntu (curl).

Post Syndicated from ris original https://lwn.net/Articles/755076/rss

Security updates have been issued by Arch Linux (lib32-curl, lib32-libcurl-compat, lib32-libcurl-gnutls, libcurl-compat, and libcurl-gnutls), CentOS (firefox), Debian (imagemagick), Fedora (exiv2, LibRaw, and love), Gentoo (chromium), Mageia (kernel, librelp, and miniupnpc), openSUSE (curl, enigmail, ghostscript, libvorbis, lilypond, and thunderbird), Red Hat (Red Hat OpenStack Platform director), and Ubuntu (firefox).

# AWS IoT 1-Click – Use Simple Devices to Trigger Lambda Functions

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-iot-1-click-use-simple-devices-to-trigger-lambda-functions/

We announced a preview of AWS IoT 1-Click at AWS re:Invent 2017 and have been refining it ever since, focusing on simplicity and a clean out-of-box experience. Designed to make IoT available and accessible to a broad audience, AWS IoT 1-Click is now generally available, along with new IoT buttons from AWS and AT&T.

I sat down with the dev team a month or two ago to learn about the service so that I could start thinking about my blog post. During the meeting they gave me a pair of IoT buttons and I started to think about some creative ways to put them to use. Here are a few that I came up with:

Help Request – Earlier this month I spent a very pleasant weekend at the HackTillDawn hackathon in Los Angeles. As the participants were hacking away, they occasionally had questions about AWS, machine learning, Amazon SageMaker, and AWS DeepLens. While we had plenty of AWS Solution Architects on hand (decked out in fashionable & distinctive AWS shirts for easy identification), I imagined an IoT button for each team. Pressing the button would alert the SA crew via SMS and direct them to the proper table.

Camera ControlTim Bray and I were in the AWS video studio, prepping for the first episode of Tim’s series on AWS Messaging. Minutes before we opened the Twitch stream I realized that we did not have a clean, unobtrusive way to ask the camera operator to switch to a closeup view. Again, I imagined that a couple of IoT buttons would allow us to make the request.

Remote Dog Treat Dispenser – My dog barks every time a stranger opens the gate in front of our house. While it is great to have confirmation that my Ring doorbell is working, I would like to be able to press a button and dispense a treat so that Luna stops barking!

Homes, offices, factories, schools, vehicles, and health care facilities can all benefit from IoT buttons and other simple IoT devices, all managed using AWS IoT 1-Click.

As I said earlier, we have been focusing on simplicity and a clean out-of-box experience. Here’s what that means:

Architects can dream up applications for inexpensive, low-powered devices.

Developers don’t need to write any device-level code. They can make use of pre-built actions, which send email or SMS messages, or write their own custom actions using AWS Lambda functions.

Installers don’t have to install certificates or configure cloud endpoints on newly acquired devices, and don’t have to worry about firmware updates.

Administrators can monitor the overall status and health of each device, and can arrange to receive alerts when a device nears the end of its useful life and needs to be replaced, using a single interface that spans device types and manufacturers.

I’ll show you how easy this is in just a moment. But first, let’s talk about the current set of devices that are supported by AWS IoT 1-Click.

Who’s Got the Button?
We’re launching with support for two types of buttons (both pictured above). Both types of buttons are pre-configured with X.509 certificates, communicate to the cloud over secure connections, and are ready to use.

The AWS IoT Enterprise Button communicates via Wi-Fi. It has a 2000-click lifetime, encrypts outbound data using TLS, and can be configured using BLE and our mobile app. It retails for $19.99 (shipping and handling not included) and can be used in the United States, Europe, and Japan. The AT&T LTE-M Button communicates via the LTE-M cellular network. It has a 1500-click lifetime, and also encrypts outbound data using TLS. The device and the bundled data plan is available an an introductory price of$29.99 (shipping and handling not included), and can be used in the United States.

We are very interested in working with device manufacturers in order to make even more shapes, sizes, and types of devices (badge readers, asset trackers, motion detectors, and industrial sensors, to name a few) available to our customers. Our team will be happy to tell you about our provisioning tools and our facility for pushing OTA (over the air) updates to large fleets of devices; you can contact them at [email protected].

AWS IoT 1-Click Concepts
I’m eager to show you how to use AWS IoT 1-Click and the buttons, but need to introduce a few concepts first.

Device – A button or other item that can send messages. Each device is uniquely identified by a serial number.

Placement Template – Describes a like-minded collection of devices to be deployed. Specifies the action to be performed and lists the names of custom attributes for each device.

Placement – A device that has been deployed. Referring to placements instead of devices gives you the freedom to replace and upgrade devices with minimal disruption. Each placement can include values for custom attributes such as a location (“Building 8, 3rd Floor, Room 1337”) or a purpose (“Coffee Request Button”).

Action – The AWS Lambda function to invoke when the button is pressed. You can write a function from scratch, or you can make use of a pair of predefined functions that send an email or an SMS message. The actions have access to the attributes; you can, for example, send an SMS message with the text “Urgent need for coffee in Building 8, 3rd Floor, Room 1337.”

Getting Started with AWS IoT 1-Click
Let’s set up an IoT button using the AWS IoT 1-Click Console:

If I didn’t have any buttons I could click Buy devices to get some. But, I do have some, so I click Claim devices to move ahead. I enter the device ID or claim code for my AT&T button and click Claim (I can enter multiple claim codes or device IDs if I want):

The AWS buttons can be claimed using the console or the mobile app; the first step is to use the mobile app to configure the button to use my Wi-Fi:

Then I scan the barcode on the box and click the button to complete the process of claiming the device. Both of my buttons are now visible in the console:

I am now ready to put them to use. I click on Projects, and then Create a project:

I name and describe my project, and click Next to proceed:

Now I define a device template, along with names and default values for the placement attributes. Here’s how I set up a device template (projects can contain several, but I just need one):

The action has two mandatory parameters (phone number and SMS message) built in; I add three more (Building, Room, and Floor) and click Create project:

I’m almost ready to ask for some coffee! The next step is to associate my buttons with this project by creating a placement for each one. I click Create placements to proceed. I name each placement, select the device to associate with it, and then enter values for the attributes that I established for the project. I can also add additional attributes that are peculiar to this placement:

I can inspect my project and see that everything looks good:

I click on the buttons and the SMS messages appear:

I can monitor device activity in the AWS IoT 1-Click Console:

And also in the Lambda Console:

The Lambda function itself is also accessible, and can be used as-is or customized:

As you can see, this is the code that lets me use {{*}}include all of the placement attributes in the message and {{Building}} (for example) to include a specific placement attribute.

Now Available
I’ve barely scratched the surface of this cool new service and I encourage you to give it a try (or a click) yourself. Buy a button or two, build something cool, and let me know all about it!

Pricing is based on the number of enabled devices in your account, measured monthly and pro-rated for partial months. Devices can be enabled or disabled at any time. See the AWS IoT 1-Click Pricing page for more info.