Tag Archives: post-quantum cryptography

How to tune TLS for hybrid post-quantum cryptography with Kyber

Post Syndicated from Brian Jarvis original https://aws.amazon.com/blogs/security/how-to-tune-tls-for-hybrid-post-quantum-cryptography-with-kyber/

We are excited to offer hybrid post-quantum TLS with Kyber for AWS Key Management Service (AWS KMS) and AWS Certificate Manager (ACM). In this blog post, we share the performance characteristics of our hybrid post-quantum Kyber implementation, show you how to configure a Maven project to use it, and discuss how to prepare your connection settings for Kyber post-quantum cryptography (PQC).

After five years of intensive research and cryptanalysis among partners from academia, the cryptographic community, and the National Institute of Standards and Technology (NIST), NIST has selected Kyber for post-quantum key encapsulation mechanism (KEM) standardization. This marks the beginning of the next generation of public key encryption. In time, the classical key establishment algorithms we use today, like RSA and elliptic curve cryptography (ECC), will be replaced by quantum-secure alternatives. At AWS Cryptography, we’ve been researching and analyzing the candidate KEMs through each round of the NIST selection process. We began supporting Kyber in round 2 and continue that support today.

A cryptographically relevant quantum computer that is capable of breaking RSA and ECC does not yet exist. However, we are offering hybrid post-quantum TLS with Kyber today so that customers can see how the performance differences of PQC affect their workloads. We also believe that the use of PQC raises the already-high security bar for connecting to AWS KMS and ACM, making this feature attractive for customers with long-term confidentiality needs.

Performance of hybrid post-quantum TLS with Kyber

Hybrid post-quantum TLS incurs a latency and bandwidth overhead compared to classical crypto alone. To quantify this overhead, we measured how long S2N-TLS takes to negotiate hybrid post-quantum (ECDHE + Kyber) key establishment compared to ECDHE alone. We performed the tests with the Linux perf subsystem on an Amazon Elastic Compute Cloud (Amazon EC2) c6i.4xlarge instance in the US East (Northern Virginia) AWS Region, and we initiated 2,000 TLS connections to a test server running in the US West (Oregon) Region, to include typical internet latencies.

Figure 1 shows the latencies of a TLS handshake that uses classical ECDHE and hybrid post-quantum (ECDHE + Kyber) key establishment. The columns are separated to illustrate the CPU time spent by the client and server compared to the time spent sending data over the network.

Figure 1: Latency of classical compared to hybrid post-quantum TLS handshake

Figure 1: Latency of classical compared to hybrid post-quantum TLS handshake

Figure 2 shows the bytes sent and received during the TLS handshake, as measured by the client, for both classical ECDHE and hybrid post-quantum (ECDHE + Kyber) key establishment.

Figure 2: Bandwidth of classical compared to hybrid post-quantum TLS handshake

Figure 2: Bandwidth of classical compared to hybrid post-quantum TLS handshake

This data shows that the overhead for using hybrid post-quantum key establishment is 0.25 ms on the client, 0.23 ms on the server, and an additional 2,356 bytes on the wire. Intra-Region tests would result in lower network latency. Your latencies also might vary depending on network conditions, CPU performance, server load, and other variables.

The results show that the performance of Kyber is strong; the additional latency is one of the top contenders among the NIST PQC candidates that we analyzed in a previous blog post. In fact, the performance of these ciphers has improved during our latest test, because x86-64 assembly-optimized versions of these ciphers are now available for use.

Configure a Maven project for hybrid post-quantum TLS

In this section, we provide a Maven configuration and code example that will show you how to get started using our assembly-optimized, hybrid post-quantum TLS configuration with Kyber.

To configure a Maven project for hybrid post-quantum TLS

  1. Get the preview release of the AWS Common Runtime HTTP client for the AWS SDK for Java 2.x. Your Maven dependency configuration should specify version 2.17.69-PREVIEW or newer, as shown in the following code sample.
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        aws-crt-client
        <version>[2.17.69-PREVIEW,]</version>
    </dependency>

  2. Configure the desired cipher suite in your code’s initialization. The following code sample configures an AWS KMS client to use the latest hybrid post-quantum cipher suite.
    // Check platform support
    if(!TLS_CIPHER_PREF_PQ_TLSv1_0_2021_05.isSupported()){
        throw new RuntimeException(“Hybrid post-quantum cipher suites are not supported.”);
    }
    
    // Configure HTTP client   
    SdkAsyncHttpClient awsCrtHttpClient = AwsCrtAsyncHttpClient.builder()
              .tlsCipherPreference(TLS_CIPHER_PREF_PQ_TLSv1_0_2021_05)
              .build();
    
    // Create the AWS KMS async client
    KmsAsyncClient kmsAsync = KmsAsyncClient.builder()
             .httpClient(awsCrtHttpClient)
             .build();

With that, all calls made with your AWS KMS client will use hybrid post-quantum TLS. You can use the latest hybrid post-quantum cipher suite with ACM by following the preceding example but using an AcmAsyncClient instead.

Tune connection settings for hybrid post-quantum TLS

Although hybrid post-quantum TLS has some latency and bandwidth overhead on the initial handshake, that cost is amortized over the duration of the TLS session, and you can fine-tune your connection settings to help further reduce the cost. In this section, you learn three ways to reduce the impact of hybrid PQC on your TLS connections: connection pooling, connection timeouts, and TLS session resumption.

Connection pooling

Connection pools manage the number of active connections to a server. They allow a connection to be reused without closing and reopening it, which amortizes the cost of connection establishment over time. Part of a connection’s setup time is the TLS handshake, so you can use connection pools to help reduce the impact of an increase in handshake latency.

To illustrate this, we wrote a test application that generates approximately 200 transactions per second to a test server. We varied the maximum concurrency setting of the HTTP client and measured the latency of the test request. In the AWS CRT HTTP client, this is the maxConcurrency setting. If the connection pool doesn’t have an idle connection available, the request latency includes establishing a new connection. Using Wireshark, we captured the network traffic to observe the number of TLS handshakes that took place over the duration of the application. Figure 3 shows the request latency and number of TLS handshakes as the maxConcurrency setting is increased.

Figure 3: Median request latency and number of TLS handshakes as concurrency pool size increases

Figure 3: Median request latency and number of TLS handshakes as concurrency pool size increases

The biggest latency benefit occurred with a maxConcurrency value greater than 1. Beyond that, the latencies were past the point of diminishing returns. For all maxConcurrency values of 10 and below, additional TLS handshakes took place within the connections, but they didn’t have much impact on median latency. These inflection points will depend on your application’s request volume. The takeaway is that connection pooling allows connections to be reused, thereby spreading the cost of any increased TLS negotiation time over many requests.

More detail about using the maxConcurrency option can be found in the AWS SDK for Java API Reference.

Connection timeouts

Connection timeouts work in conjunction with connection pooling. Even if you use a connection pool, there is a limit to how long idle connections stay open before the pool closes them. You can adjust this time limit to save on connection establishment overhead.

A nice way to visualize this setting is to imagine bursty traffic patterns. Despite tuning the connection pool concurrency, your connections keep closing because the burst period is longer than the idle time limit. By increasing the maximum idle time, you can reuse these connections despite bursty behavior.

To simulate the impact of connection timeouts, we wrote a test application that starts 10 threads, each of which activate at the same time on a periodic schedule every 5 seconds for a minute. We set maxConcurrency to 10 to allow each thread to have its own connection. We set connectionMaxIdleTime of the AWS CRT HTTP client to 1 second for the first test; and to 10 seconds for the second test.

When the maximum idle time was 1 second, the connections for all 10 threads closed during the time between each burst. As a result, 100 total connections were formed over the life of the test, causing a median request latency of 20.3 ms. When we changed the maximum idle time to 10 seconds, the 10 initial connections were reused by each subsequent burst, reducing the median request latency to 5.9 ms.

By setting the connectionMaxIdleTime appropriately for your application, you can reduce connection establishment overhead, including TLS negotiation time, to help achieve time savings throughout the life of your application.

More detail about using the connectionMaxIdleTime option can be found in the AWS SDK for Java API Reference.

TLS session resumption

TLS session resumption allows a client and server to bypass the key agreement that is normally performed to arrive at a new shared secret. Instead, communication quickly resumes by using a shared secret that was previously negotiated, or one that was derived from a previous secret (the implementation details depend on the version of TLS in use). This feature requires that both the client and server support it, but if available, TLS session resumption allows the TLS handshake time and bandwidth increases associated with hybrid PQ to be amortized over the life of multiple connections.

Conclusion

As you learned in this post, hybrid post-quantum TLS with Kyber is available for AWS KMS and ACM. This new cipher suite raises the security bar and allows you to prepare your workloads for post-quantum cryptography. Hybrid key agreement has some additional overhead compared to classical ECDHE, but you can mitigate these increases by tuning your connection settings, including connection pooling, connection timeouts, and TLS session resumption. Begin using hybrid key agreement today with AWS KMS and ACM.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Brian Jarvis

Brian Jarvis

Brian is a Senior Software Engineer at AWS Cryptography. His interests are in post-quantum cryptography and cryptographic hardware. Previously, Brian worked in AWS Security, developing internal services used throughout the company. Brian holds a Bachelor’s degree from Vanderbilt University and a Master’s degree from George Mason University in Computer Engineering. He plans to finish his PhD “some day”.

Round 2 Hybrid Post-Quantum TLS Benchmarks

Post Syndicated from Alex Weibel original https://aws.amazon.com/blogs/security/round-2-hybrid-post-quantum-tls-benchmarks/

AWS Cryptography has completed benchmarks of Round 2 Versions of the Bit Flipping Key Encapsulation (BIKE) and Supersingular Isogeny Key Encapsulation (SIKE) hybrid post-quantum Transport Layer Security (TLS) Algorithms. Both of these algorithms have been submitted to the National Institute of Standards and Technology (NIST) as part of NIST’s Post-Quantum Cryptography standardization process.

In the first hybrid post-quantum TLS blog, we announced that AWS Key Management Service (KMS) had launched support for hybrid post-quantum TLS 1.2 using Round 1 versions of BIKE and SIKE. In this blog, we are announcing AWS Cryptography’s benchmark results of using Round 2 versions of BIKE and SIKE with hybrid post-quantum TLS 1.2 against an HTTP webservice. Round 2 versions of BIKE and SIKE include performance improvements, parameter tuning, and algorithm updates in response to NIST’s comments on Round 1 versions. I’ll give a refresher on hybrid post-quantum TLS 1.2, go over our Round 2 hybrid post-quantum TLS 1.2 benchmark results, and then describe our benchmarking methodology.

This blog post is intended to inform software developers, AWS customers, and cryptographic researchers about the potential upcoming performance differences between classical and hybrid post-quantum TLS.

Refresher on Hybrid Post-Quantum TLS 1.2

Some of this section is repeated from the previous hybrid post-quantum TLS 1.2 launch announcement for KMS. If you are already familiar with hybrid post-quantum TLS, feel free to skip to the Benchmark Results section.

What is Hybrid Post-Quantum TLS 1.2?

Hybrid post-quantum TLS 1.2 is a proposed extension to the TLS 1.2 Protocol implemented by Amazon’s open source TLS library s2n that provides the security protections of both the classical and post-quantum schemes. It does this by performing two independent key exchanges (one classical and one post-quantum), and then cryptographically combining both keys into a single TLS master secret.

Why is Post-Quantum TLS Important?

Hybrid post-quantum TLS allows connections to remain secure even if one of the key exchanges (either classical or post-quantum) performed during the TLS Handshake is compromised in the future. For example, if a sufficiently large-scale quantum computer were to be built, it could break the current classical public-key cryptography that is used for key exchange in every TLS connection today. Encrypted TLS traffic recorded today could be decrypted in the future with a large-scale quantum computer if post-quantum TLS is not used to protect it.

Round 2 Hybrid Post-Quantum TLS Benchmark Results

Figure 2: Latency in relation to HTTP request count for four key exchange algorithms

Figure 2: Latency in relation to HTTP request count for four key exchange algorithms

Key Exchange Algorithm Server PQ Implementation TLS Handshake
+ 1 HTTP Request
TLS Handshake
+ 2 HTTP Requests
TLS Handshake
+ 10 HTTP Requests
TLS Handshake
+ 25 HTTP Requests
ECDHE Only N/A 10.8 ms 15.1 ms 52.6 ms 124.2 ms
ECDHE + BIKE1‑CCA‑L1‑R2 C 19.9 ms 24.4 ms 61.4 ms 133.2 ms
ECDHE + SIKE‑P434‑R2 C 169.6 ms 180.3 ms 219.1 ms 288.1 ms
ECDHE + SIKE‑P434‑R2 x86-64
Assembly
20.1 ms 24.5 ms 62.0 ms 133.3 ms

Table 1 shows the time (in milliseconds) that a client and server in the same region take to complete a TCP Handshake, a TLS Handshake, and complete varying numbers of HTTP Requests sent to an HTTP web service running on an i3en.12xlarge host.

Key Exchange Algorithm Client Hello Server Key Exchange Client Key Exchange Other TLS Handshake Total
ECDHE Only 218 338 75 2430 3061
ECDHE + BIKE1‑CCA‑L1‑R2 220 3288 3023 2430 8961
ECDHE + SIKE‑P434‑R2 214 672 423 2430 3739

Table 2 shows the amount of data (in bytes) used by different messages in the TLS Handshake for each Key Exchange algorithm.

1 HTTP
Request
2 HTTP Requests 10 HTTP
Requests
25 HTTP Requests
HTTP Request Bytes 878 1,761 8,825 22,070
HTTP Response Bytes 698 1,377 6,809 16,994
Total HTTP Bytes 1576 3,138 15,634 39,064

Table 3 shows the amount of data (in bytes) sent and received through each TLS connection for varying numbers of HTTP requests.

Benchmark Results Analysis

In general, we find that the major trade off between BIKE and SIKE is data usage versus processing time, with BIKE needing to send more bytes but requiring less time processing them, and SIKE making the opposite trade off of needing to send fewer bytes but requiring more time processing them. At the time of integration for our benchmarks, an x86-64 assembly optimized implementation of BIKE1-CCA-L1-R2 was not available in s2n.

Our results show that when only a single HTTP request is sent, completing a BIKE1-CCA-L1-R2 hybrid TLS 1.2 handshake takes approximately 84% more time compared to a non-hybrid TLS connection, and completing an x86-64 assembly optimized SIKE-P434-R2 hybrid TLS 1.2 handshake takes approximately 86% more time than non-hybrid. However, at 25 HTTP Requests per TLS connection, when using the fastest available implementation for both BIKE and SIKE, the increased TLS Handshake latency is amortized, and only 7% more total time is needed for both BIKE and SIKE compared to a classical TLS connection.

Our results also show that BIKE1-CCA-L1-R2 hybrid TLS Handshakes used 5900 more bytes than a classical TLS Handshake, while SIKE-P434-R2 hybrid TLS Handshakes used 678 more bytes than classical TLS.

In the AWS EC2 network, using modern x86-64 CPU’s with the fastest available algorithm implementations, we found that BIKE and SIKE performed similarly, with their maximum latency difference being only 0.6 milliseconds apart, and BIKE being the faster of the two in every benchmark. However when compared to SIKE’s C implementation, which would be used on hosts without the ADX and MULX x86-64 instructions used by SIKE’s assembly implementation, BIKE performed significantly better, seeing a maximum improvement of 157 milliseconds over SIKE.

Hybrid Post-Quantum TLS Benchmark Details and Methodology

Hybrid Post-Quantum TLS Client

Figure 3: Architecture diagram of the AWS SDK Java Client using Java Native Interface (JNI) to communicate with the native AWS Common Runtime (CRT)

Figure 3: Architecture diagram of the AWS SDK Java Client using Java Native Interface (JNI) to communicate with the native AWS Common Runtime (CRT)

Our post-quantum TLS Client is using the aws-crt-dev-preview branch of the AWS SDK Java v2 Client, that has Java Native Interface Bindings to the AWS Common Runtime (AWS CRT) written in C. The AWS Common Runtime uses s2n for TLS negotiation on Linux platforms.

Our client was a single EC2 i3en.6xlarge host, using v0.5.1 of the AWS Common Runtime (AWS CRT) Java Bindings, with commit f3abfaba of s2n and used the x86-64 Assembly implementation for all SIKE-P434-R2 benchmarks.

Hybrid Post-Quantum TLS Server

Our server was a single EC2 i3en.12xlarge host running a REST-ful HTTP web service which used s2n to terminate TLS connections. In order to measure the latency of the SIKE-P434-R2 C implementation on these hosts, we used an s2n compile time flag to build a 2nd version of s2n with SIKE’s x86-64 assembly optimization disabled, and reran our benchmarks with that version.

We chose i3en.12xlarge as our host type because it is optimized for high IO usage, provides high levels of network bandwidth, has a high number of vCPU’s that is typical for many web service endpoints, and has a modern x86-64 CPU with the ADX and MULX instructions necessary to use the high performance Round 2 SIKE x86-64 assembly implementation. Additional TLS Handshake benchmarks performed on other modern types of EC2 hosts, such as the C5 family and M5 family of EC2 instances, also showed similar latency results to those generated on i3en family of EC2 instances.

Post-Quantum Algorithm Implementation Details

The implementations of the post-quantum algorithms used in these benchmarks can be found in the pq-crypto directory of the s2n GitHub Repository. Our Round 2 BIKE implementation uses portable optimized C code, and our Round 2 SIKE implementation uses an optimized implementation in x86-64 assembly when available, and falls back to a portable optimized C implementation otherwise.

Key Exchange Algorithm s2n Client Cipher Preference s2n Server Cipher Preference Negotiated Cipher
ECDHE Only ELBSecurityPolicy-TLS-1-1-2017-01 KMS-PQ-TLS-1-0-2020-02 ECDHE-RSA-AES256-GCM-SHA384
ECDHE + BIKE1‑CCA‑L1‑R2 KMS-PQ-TLS-1-0-2020-02 KMS-PQ-TLS-1-0-2020-02 ECDHE-BIKE-RSA-AES256-GCM-SHA384
ECDHE + SIKE‑P434‑R2 PQ-SIKE-TEST-TLS-1-0-2020-02 KMS-PQ-TLS-1-0-2020-02 ECDHE-SIKE-RSA-AES256-GCM-SHA384

Table 4 shows the Clients and Servers TLS Cipher Config name used in order to negotiate each Key Exchange Algorithm.

Hybrid Post-Quantum TLS Benchmark Methodology

Figure 4: Benchmarking Methodology Client/Server Architecture Diagram

Figure 4: Benchmarking Methodology Client/Server Architecture Diagram

Our Benchmarks were run with a single client host connecting to a single host running a HTTP web service in a different availability zone within the same AWS Region (us-east-1), through a TCP Load Balancer.

We chose to include varying numbers of HTTP requests in our latency benchmarks, rather than TLS Handshakes alone, because customers are unlikely to establish a secure TLS connection and let the connection sit idle performing no work. Customers use TLS connections in order to send and receive data securely, and HTTP web services are one of the most common types of data being secured by TLS. We also chose to place our EC2 server behind a TCP Load Balancer to more closely approximate how an HTTP web service would be deployed in a typical setup.

Latency was measured at the client in Java starting from before a TCP connection was established, until after the final HTTP Response was received, and includes all network transfer time. All connections used RSA Certificate Authentication with a 2048-bit key, and ECDHE Key Exchange used the secp256r1 curve. All latency values listed in Tables 1 above were calculated from the median value (50th percentile) from 60 minutes of continuous single-threaded measurements between the EC2 Client and Server.

More Info

If you’re interested to learn more about post-quantum cryptography check out the following links:

Conclusion

In this blog post, I gave a refresher on hybrid post-quantum TLS, I went over our hybrid post-quantum TLS 1.2 benchmark results, and went over our hybrid post-quantum benchmarking methodology. Our benchmark results found that BIKE and SIKE performed similarly when using s2n’s fastest available implementation on modern CPU’s, but that BIKE performed better than SIKE when both were using their generic C implementation.

If you have feedback about this blog post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Alex Weibel

Alex is a Senior Software Development Engineer on the AWS Crypto Algorithms team. He’s one of the maintainers for Amazon’s TLS Library s2n. Previously, Alex worked on TLS termination and request proxying for S3 and the Elastic Load Balancing Service developing new features for customers. Alex holds a Bachelor of Science degree in Computer Science from the University of Texas at Austin.