Tag Archives: TLS

Experiment with post-quantum cryptography today

Post Syndicated from Bas Westerbaan original https://blog.cloudflare.com/experiment-with-pq/

Experiment with post-quantum cryptography today

Experiment with post-quantum cryptography today

Practically all data sent over the Internet today is at risk in the future if a sufficiently large and stable quantum computer is created. Anyone who captures data now could decrypt it.

Luckily, there is a solution: we can switch to so-called post-quantum (PQ) cryptography, which is designed to be secure against attacks of quantum computers. After a six-year worldwide selection process, in July 2022, NIST announced they will standardize Kyber, a post-quantum key agreement scheme. The standard will be ready in 2024, but we want to help drive the adoption of post-quantum cryptography.

Today we have added support for the X25519Kyber512Draft00 and X25519Kyber768Draft00 hybrid post-quantum key agreements to a number of test domains, including pq.cloudflareresearch.com.

Do you want to experiment with post-quantum on your test website for free? Mail [email protected] to enroll your test website, but read the fine-print below.

What does it mean to enable post-quantum on your website?

If you enroll your website to the post-quantum beta, we will add support for these two extra key agreements alongside the existing classical encryption schemes such as X25519. If your browser doesn’t support these post-quantum key agreements (and none at the time of writing do), then your browser will continue working with a classically secure, but not quantum-resistant, connection.

Then how to test it?

We have open-sourced a fork of BoringSSL and Go that has support for these post-quantum key agreements. With those and an enrolled test domain, you can check how your application performs with post-quantum key exchanges. We are working on support for more libraries and languages.

What to look for?

Kyber and classical key agreements such as X25519 have different performance characteristics: Kyber requires less computation, but has bigger keys and requires a bit more RAM to compute. It could very well make the connection faster if used on its own.

We are not using Kyber on its own though, but are using hybrids. That means we are doing both an X25519 and Kyber key agreement such that the connection is still classically secure if either is broken. That also means that connections will be a bit slower. In our experiments, the difference is very small, but it’s best to check for yourself.

The fine-print

Cloudflare’s post-quantum cryptography support is a beta service for experimental use only. Enabling post-quantum on your website will subject the website to Cloudflare’s Beta Services terms and will impact other Cloudflare services on the website as described below.

No stability or support guarantees

Over the coming months, both Kyber and the way it’s integrated into TLS will change for several reasons, including:

  1. Kyber will see small, but backward-incompatible changes in the coming months.
  2. We want to be compatible with other early adopters and will change our integration accordingly.
  3. As, together with the cryptography community, we find issues, we will add workarounds in our integration.

We will update our forks accordingly, but cannot guarantee any long-term stability or continued support. PQ support may become unavailable at any moment. We will post updates on pq.cloudflareresearch.com.

Features in enrolled domains

For the moment, we are running enrolled zones on a slightly different infrastructure for which not all features, notably QUIC, are available.

With that out of the way, it’s…

Demo time!

BoringSSL

With the following commands build our fork of BoringSSL and create a TLS connection with pq.cloudflareresearch.com using the compiled bssl tool. Note that we do not enable the post-quantum key agreements by default, so you have to pass the -curves flag.

$ git clone https://github.com/cloudflare/boringssl-pq
[snip]
$ cd boringssl-pq && mkdir build && cd build && cmake .. -Gninja && ninja 
[snip]
$ ./tool/bssl client -connect pq.cloudflareresearch.com -server-name pq.cloudflareresearch.com -curves Xyber512D00
	Connecting to [2606:4700:7::a29f:8a55]:443
Connected.
  Version: TLSv1.3
  Resumed session: no
  Cipher: TLS_AES_128_GCM_SHA256
  ECDHE curve: X25519Kyber512Draft00
  Signature algorithm: ecdsa_secp256r1_sha256
  Secure renegotiation: yes
  Extended master secret: yes
  Next protocol negotiated: 
  ALPN protocol: 
  OCSP staple: no
  SCT list: no
  Early data: no
  Encrypted ClientHello: no
  Cert subject: CN = *.pq.cloudflareresearch.com
  Cert issuer: C = US, O = Let's Encrypt, CN = E1

Go

Our Go fork doesn’t enable the post-quantum key agreement by default. The following simple Go program enables PQ by default for the http package and GETs pq.cloudflareresearch.com.

​​package main

import (
  "crypto/tls"
  "fmt"
  "net/http"
)

func main() {
  http.DefaultTransport.(*http.Transport).TLSClientConfig = &tls.Config{
    CurvePreferences: []tls.CurveID{tls.X25519Kyber512Draft00, tls.X25519},
    CFEventHandler: func(ev tls.CFEvent) {
      switch e := ev.(type) {
      case tls.CFEventTLS13HRR:
        fmt.Printf("HelloRetryRequest\n")
      case tls.CFEventTLS13NegotiatedKEX:
        switch e.KEX {
        case tls.X25519Kyber512Draft00:
          fmt.Printf("Used X25519Kyber512Draft00\n")
        default:
          fmt.Printf("Used %d\n", e.KEX)
        }
      }
    },
  }

  if _, err := http.Get("https://pq.cloudflareresearch.com"); err != nil {
    fmt.Println(err)
  }
}

To run we need to compile our Go fork:

$ git clone https://github.com/cloudflare/go
[snip]
$ cd go/src && ./all.bash
[snip]
$ ../bin/go run path/to/example.go
Used X25519Kyber512Draft00

On the wire

So what does this look like on the wire? With Wireshark we can capture the packet flow. First a non-post quantum HTTP/2 connection with X25519:

Experiment with post-quantum cryptography today

This is a normal TLS 1.3 handshake: the client sends a ClientHello with an X25519 keyshare, which fits in a single packet. In return, the server sends its own 32 byte X25519 keyshare. It also sends various other messages, such as the certificate chain, which requires two packets in total.

Let’s check out Kyber:

Experiment with post-quantum cryptography today

As you can see the ClientHello is a bit bigger, but still fits within a single packet. The response takes three packets now, instead of two, because of the larger server keyshare.

Under the hood

Want to add client support yourself? We are using a hybrid of X25519 and Kyber version 3.02. We are writing out the details of the latter in version 00 of this CRFG IETF draft, hence the name. We are using TLS group identifiers 0xfe30 and 0xfe31 for X25519Kyber512Draft00 and X25519Kyber768Draft00 respectively.

There are some differences between our Go and BoringSSL forks that are interesting to compare.

  • Our Go fork uses our fast AVX2 optimized implementation of Kyber from CIRCL. In contrast, our BoringSSL fork uses the simpler portable reference implementation. Without the AVX2 optimisations it’s easier to evaluate. The downside is that it’s slower. Don’t be mistaken: it is still very fast, but you can check yourself.
  • Our Go fork only sends one keyshare. If the server doesn’t support it, it will respond with a HelloRetryRequest message and the client will fallback to one the server does support. This adds a roundtrip.
    Our BoringSSL fork, on the other hand, will send two keyshares: the post-quantum hybrid and a classical one (if a classical key agreement is still enabled). If the server doesn’t recognize the first, it will be able to use the second. In this way we avoid a roundtrip if the server does not support the post-quantum key agreement.

Looking ahead

The quantum future is here. In the coming years the Internet will move to post-quantum cryptography. Today we are offering our customers the tools to get a headstart and test post-quantum key agreements. We love to hear your feedback: e-mail it to [email protected].

This is just a small, but important first step. We will continue our efforts to move towards a secure and private quantum-secure Internet. Much more to come — watch this space.

How to tune TLS for hybrid post-quantum cryptography with Kyber

Post Syndicated from Brian Jarvis original https://aws.amazon.com/blogs/security/how-to-tune-tls-for-hybrid-post-quantum-cryptography-with-kyber/

We are excited to offer hybrid post-quantum TLS with Kyber for AWS Key Management Service (AWS KMS) and AWS Certificate Manager (ACM). In this blog post, we share the performance characteristics of our hybrid post-quantum Kyber implementation, show you how to configure a Maven project to use it, and discuss how to prepare your connection settings for Kyber post-quantum cryptography (PQC).

After five years of intensive research and cryptanalysis among partners from academia, the cryptographic community, and the National Institute of Standards and Technology (NIST), NIST has selected Kyber for post-quantum key encapsulation mechanism (KEM) standardization. This marks the beginning of the next generation of public key encryption. In time, the classical key establishment algorithms we use today, like RSA and elliptic curve cryptography (ECC), will be replaced by quantum-secure alternatives. At AWS Cryptography, we’ve been researching and analyzing the candidate KEMs through each round of the NIST selection process. We began supporting Kyber in round 2 and continue that support today.

A cryptographically relevant quantum computer that is capable of breaking RSA and ECC does not yet exist. However, we are offering hybrid post-quantum TLS with Kyber today so that customers can see how the performance differences of PQC affect their workloads. We also believe that the use of PQC raises the already-high security bar for connecting to AWS KMS and ACM, making this feature attractive for customers with long-term confidentiality needs.

Performance of hybrid post-quantum TLS with Kyber

Hybrid post-quantum TLS incurs a latency and bandwidth overhead compared to classical crypto alone. To quantify this overhead, we measured how long S2N-TLS takes to negotiate hybrid post-quantum (ECDHE + Kyber) key establishment compared to ECDHE alone. We performed the tests with the Linux perf subsystem on an Amazon Elastic Compute Cloud (Amazon EC2) c6i.4xlarge instance in the US East (Northern Virginia) AWS Region, and we initiated 2,000 TLS connections to a test server running in the US West (Oregon) Region, to include typical internet latencies.

Figure 1 shows the latencies of a TLS handshake that uses classical ECDHE and hybrid post-quantum (ECDHE + Kyber) key establishment. The columns are separated to illustrate the CPU time spent by the client and server compared to the time spent sending data over the network.

Figure 1: Latency of classical compared to hybrid post-quantum TLS handshake

Figure 1: Latency of classical compared to hybrid post-quantum TLS handshake

Figure 2 shows the bytes sent and received during the TLS handshake, as measured by the client, for both classical ECDHE and hybrid post-quantum (ECDHE + Kyber) key establishment.

Figure 2: Bandwidth of classical compared to hybrid post-quantum TLS handshake

Figure 2: Bandwidth of classical compared to hybrid post-quantum TLS handshake

This data shows that the overhead for using hybrid post-quantum key establishment is 0.25 ms on the client, 0.23 ms on the server, and an additional 2,356 bytes on the wire. Intra-Region tests would result in lower network latency. Your latencies also might vary depending on network conditions, CPU performance, server load, and other variables.

The results show that the performance of Kyber is strong; the additional latency is one of the top contenders among the NIST PQC candidates that we analyzed in a previous blog post. In fact, the performance of these ciphers has improved during our latest test, because x86-64 assembly-optimized versions of these ciphers are now available for use.

Configure a Maven project for hybrid post-quantum TLS

In this section, we provide a Maven configuration and code example that will show you how to get started using our assembly-optimized, hybrid post-quantum TLS configuration with Kyber.

To configure a Maven project for hybrid post-quantum TLS

  1. Get the preview release of the AWS Common Runtime HTTP client for the AWS SDK for Java 2.x. Your Maven dependency configuration should specify version 2.17.69-PREVIEW or newer, as shown in the following code sample.
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        aws-crt-client
        <version>[2.17.69-PREVIEW,]</version>
    </dependency>

  2. Configure the desired cipher suite in your code’s initialization. The following code sample configures an AWS KMS client to use the latest hybrid post-quantum cipher suite.
    // Check platform support
    if(!TLS_CIPHER_PREF_PQ_TLSv1_0_2021_05.isSupported()){
        throw new RuntimeException(“Hybrid post-quantum cipher suites are not supported.”);
    }
    
    // Configure HTTP client   
    SdkAsyncHttpClient awsCrtHttpClient = AwsCrtAsyncHttpClient.builder()
              .tlsCipherPreference(TLS_CIPHER_PREF_PQ_TLSv1_0_2021_05)
              .build();
    
    // Create the AWS KMS async client
    KmsAsyncClient kmsAsync = KmsAsyncClient.builder()
             .httpClient(awsCrtHttpClient)
             .build();

With that, all calls made with your AWS KMS client will use hybrid post-quantum TLS. You can use the latest hybrid post-quantum cipher suite with ACM by following the preceding example but using an AcmAsyncClient instead.

Tune connection settings for hybrid post-quantum TLS

Although hybrid post-quantum TLS has some latency and bandwidth overhead on the initial handshake, that cost is amortized over the duration of the TLS session, and you can fine-tune your connection settings to help further reduce the cost. In this section, you learn three ways to reduce the impact of hybrid PQC on your TLS connections: connection pooling, connection timeouts, and TLS session resumption.

Connection pooling

Connection pools manage the number of active connections to a server. They allow a connection to be reused without closing and reopening it, which amortizes the cost of connection establishment over time. Part of a connection’s setup time is the TLS handshake, so you can use connection pools to help reduce the impact of an increase in handshake latency.

To illustrate this, we wrote a test application that generates approximately 200 transactions per second to a test server. We varied the maximum concurrency setting of the HTTP client and measured the latency of the test request. In the AWS CRT HTTP client, this is the maxConcurrency setting. If the connection pool doesn’t have an idle connection available, the request latency includes establishing a new connection. Using Wireshark, we captured the network traffic to observe the number of TLS handshakes that took place over the duration of the application. Figure 3 shows the request latency and number of TLS handshakes as the maxConcurrency setting is increased.

Figure 3: Median request latency and number of TLS handshakes as concurrency pool size increases

Figure 3: Median request latency and number of TLS handshakes as concurrency pool size increases

The biggest latency benefit occurred with a maxConcurrency value greater than 1. Beyond that, the latencies were past the point of diminishing returns. For all maxConcurrency values of 10 and below, additional TLS handshakes took place within the connections, but they didn’t have much impact on median latency. These inflection points will depend on your application’s request volume. The takeaway is that connection pooling allows connections to be reused, thereby spreading the cost of any increased TLS negotiation time over many requests.

More detail about using the maxConcurrency option can be found in the AWS SDK for Java API Reference.

Connection timeouts

Connection timeouts work in conjunction with connection pooling. Even if you use a connection pool, there is a limit to how long idle connections stay open before the pool closes them. You can adjust this time limit to save on connection establishment overhead.

A nice way to visualize this setting is to imagine bursty traffic patterns. Despite tuning the connection pool concurrency, your connections keep closing because the burst period is longer than the idle time limit. By increasing the maximum idle time, you can reuse these connections despite bursty behavior.

To simulate the impact of connection timeouts, we wrote a test application that starts 10 threads, each of which activate at the same time on a periodic schedule every 5 seconds for a minute. We set maxConcurrency to 10 to allow each thread to have its own connection. We set connectionMaxIdleTime of the AWS CRT HTTP client to 1 second for the first test; and to 10 seconds for the second test.

When the maximum idle time was 1 second, the connections for all 10 threads closed during the time between each burst. As a result, 100 total connections were formed over the life of the test, causing a median request latency of 20.3 ms. When we changed the maximum idle time to 10 seconds, the 10 initial connections were reused by each subsequent burst, reducing the median request latency to 5.9 ms.

By setting the connectionMaxIdleTime appropriately for your application, you can reduce connection establishment overhead, including TLS negotiation time, to help achieve time savings throughout the life of your application.

More detail about using the connectionMaxIdleTime option can be found in the AWS SDK for Java API Reference.

TLS session resumption

TLS session resumption allows a client and server to bypass the key agreement that is normally performed to arrive at a new shared secret. Instead, communication quickly resumes by using a shared secret that was previously negotiated, or one that was derived from a previous secret (the implementation details depend on the version of TLS in use). This feature requires that both the client and server support it, but if available, TLS session resumption allows the TLS handshake time and bandwidth increases associated with hybrid PQ to be amortized over the life of multiple connections.

Conclusion

As you learned in this post, hybrid post-quantum TLS with Kyber is available for AWS KMS and ACM. This new cipher suite raises the security bar and allows you to prepare your workloads for post-quantum cryptography. Hybrid key agreement has some additional overhead compared to classical ECDHE, but you can mitigate these increases by tuning your connection settings, including connection pooling, connection timeouts, and TLS session resumption. Begin using hybrid key agreement today with AWS KMS and ACM.

 
If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Brian Jarvis

Brian Jarvis

Brian is a Senior Software Engineer at AWS Cryptography. His interests are in post-quantum cryptography and cryptographic hardware. Previously, Brian worked in AWS Security, developing internal services used throughout the company. Brian holds a Bachelor’s degree from Vanderbilt University and a Master’s degree from George Mason University in Computer Engineering. He plans to finish his PhD “some day”.

TLS 1.2 to become the minimum TLS protocol level for all AWS API endpoints

Post Syndicated from Janelle Hopper original https://aws.amazon.com/blogs/security/tls-1-2-required-for-aws-endpoints/

At Amazon Web Services (AWS), we continuously innovate to deliver you a cloud computing environment that works to help meet the requirements of the most security-sensitive organizations. To respond to evolving technology and regulatory standards for Transport Layer Security (TLS), we will be updating the TLS configuration for all AWS service API endpoints to a minimum of version TLS 1.2. This update means you will no longer be able to use TLS versions 1.0 and 1.1 with all AWS APIs in all AWS Regions by June 28, 2023. In this post, we will tell you how to check your TLS version, and what to do to prepare.

We have continued AWS support for TLS versions 1.0 and 1.1 to maintain backward compatibility for customers that have older or difficult to update clients, such as embedded devices. Furthermore, we have active mitigations in place that help protect your data for the issues identified in these older versions. Now is the right time to retire TLS 1.0 and 1.1, because increasing numbers of customers have requested this change to help simplify part of their regulatory compliance, and there are fewer and fewer customers using these older versions.

If you are one of the more than 95% of AWS customers who are already using TLS 1.2 or later, you will not be impacted by this change. You are almost certainly already using TLS 1.2 or later if your client software application was built after 2014 using an AWS Software Development Kit (AWS SDK), AWS Command Line Interface (AWS CLI), Java Development Kit (JDK) 8 or later, or another modern development environment. If you are using earlier application versions, or have not updated your development environment since before 2014, you will likely need to update.

If you are one of the customers still using TLS 1.0 or 1.1, then you must update your client software to use TLS 1.2 or later to maintain your ability to connect. It is important to understand that you already have control over the TLS version used when connecting. When connecting to AWS API endpoints, your client software negotiates its preferred TLS version, and AWS uses the highest mutually agreed upon version.

To minimize the availability impact of requiring TLS 1.2, AWS is rolling out the changes on an endpoint-by-endpoint basis over the next year, starting now and ending in June 2023. Before making these potentially breaking changes, we monitor for connections that are still using TLS 1.0 or TLS 1.1. If you are one of the AWS customers who may be impacted, we will notify you on your AWS Health Dashboard, and by email. After June 28, 2023, AWS will update our API endpoint configuration to remove TLS 1.0 and TLS 1.1, even if you still have connections using these versions.

What should you do to prepare for this update?

To minimize your risk, you can self-identify if you have any connections using TLS 1.0 or 1.1. If you find any connections using TLS 1.0 or 1.1, you should update your client software to use TLS 1.2 or later.

AWS CloudTrail records are especially useful to identify if you are using the outdated TLS versions. You can now search for the TLS version used for your connections by using the recently added tlsDetails field. The tlsDetails structure in each CloudTrail record contains the TLS version, cipher suite, and the fully qualified domain name (FQDN, also known as the URL) field used for the API call. You can then use the data in the records to help you pinpoint your client software that is responsible for the TLS 1.0 or 1.1 call, and update it accordingly. Nearly half of AWS services currently provide the TLS information in the CloudTrail tlsDetails field, and we are continuing to roll this out for the remaining services in the coming months.

We recommend you use one of the following options for running your CloudTrail TLS queries:

  1. AWS CloudTrail Lake: You can follow the steps, and use the sample TLS query, in the blog post Using AWS CloudTrail Lake to identify older TLS connections. There is also a built-in sample CloudTrail TLS query available in the AWS CloudTrail Lake console.
  2. Amazon CloudWatch Log Insights: There are two built-in CloudWatch Log Insights sample CloudTrail TLS queries that you can use, as shown in Figure 1.
     
    Figure 1: Available sample TLS queries for CloudWatch Log Insights

    Figure 1: Available sample TLS queries for CloudWatch Log Insights

  3. Amazon Athena: You can query AWS CloudTrail logs in Amazon Athena, and we will be adding support for querying the TLS values in your CloudTrail logs in the coming months. Look for updates and announcements about this in future AWS Security Blog posts.

In addition to using CloudTrail data, you can also identify the TLS version used by your connections by performing code, network, or log analysis as described in the blog post TLS 1.2 will be required for all AWS FIPS endpoints. Note that while this post refers to the FIPS API endpoints, the information about querying for TLS versions is applicable to all API endpoints.

Will I be notified if I am using TLS 1.0 or TLS 1.1?

If we detect that you are using TLS 1.0 or 1.1, you will be notified on your AWS Health Dashboard, and you will receive email notifications. However, you will not receive a notification for connections you make anonymously to AWS shared resources, such as a public Amazon Simple Storage Service (Amazon S3) bucket, because we cannot identify anonymous connections. Furthermore, while we will make every effort to identify and notify every customer, there is a possibility that we may not detect infrequent connections, such as those that occur less than monthly.

How do I update my client to use TLS 1.2 or TLS 1.3?

If you are using an AWS Software Developer Kit (AWS SDK) or the AWS Command Line Interface (AWS CLI), follow the detailed guidance about how to examine your client software code and properly configure the TLS version used in the blog post TLS 1.2 to become the minimum for FIPS endpoints.

We encourage you to be proactive in order to avoid an impact to availability. Also, we recommend that you test configuration changes in a staging environment before you introduce them into production workloads.

What is the most common use of TLS 1.0 or TLS 1.1?

The most common use of TLS 1.0 or 1.1 are .NET Framework versions earlier than 4.6.2. If you use the .NET Framework, please confirm you are using version 4.6.2 or later. For information about how to update and configure the .NET Framework to support TLS 1.2, see How to enable TLS 1.2 on clients in the .NET Configuration Manager documentation.

What is Transport Layer Security (TLS)?

Transport Layer Security (TLS) is a cryptographic protocol that secures internet communications. Your client software can be set to use TLS version 1.0, 1.1, 1.2, or 1.3, or a subset of these, when connecting to service endpoints. You should ensure that your client software supports TLS 1.2 or later.

Is there more assistance available to help verify or update my client software?

If you have any questions or issues, you can start a new thread on the AWS re:Post community, or you can contact AWS Support or your Technical Account Manager (TAM).

Additionally, you can use AWS IQ to find, securely collaborate with, and pay AWS certified third-party experts for on-demand assistance to update your TLS client components. To find out how to submit a request, get responses from experts, and choose the expert with the right skills and experience, see the AWS IQ page. Sign in to the AWS Management Console and select Get Started with AWS IQ to start a request.

What if I can’t update my client software?

If you are unable to update to use TLS 1.2 or TLS 1.3, contact AWS Support or your Technical Account Manager (TAM) so that we can work with you to identify the best solution.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Janelle Hopper

Janelle is a Senior Technical Program Manager in AWS Security with over 25 years of experience in the IT security field. She works with AWS services, infrastructure, and administrative teams to identify and drive innovative solutions that improve the AWS security posture.

Author

Daniel Salzedo

Daniel is a Senior Specialist Technical Account Manager – Security. He has over 25 years of professional experience in IT in industries as diverse as video game development, manufacturing, banking, and used car sales. He loves working with our wonderful AWS customers to help them solve their complex security challenges at scale.

Author

Ben Sherman

Ben is a Software Development Engineer in AWS Security, where he focuses on automation to support AWS compliance obligations. He enjoys experimenting with computing and web services both at work and in his free time.

Introducing: Backup Certificates

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/introducing-backup-certificates/

Introducing: Backup Certificates

At Cloudflare, we pride ourselves in giving every customer the ability to provision a TLS certificate for their Internet application — for free. Today, we are responsible for managing the certificate lifecycle for almost 45 million certificates from issuance to deployment to renewal. As we build out the most resilient, robust platform, we want it to be “future-proof” and resilient against events we can’t predict.

Events that cause us to re-issue certificates for our customers, like key compromises, vulnerabilities, and mass revocations require immediate action. Otherwise, customers can be left insecure or offline. When one of these events happens, we want to be ready to mitigate impact immediately. But how?

By having a backup certificate ready to deploy — wrapped with a different private key and issued from a different Certificate Authority than the primary certificate that we serve.

Introducing: Backup Certificates

Events that lead to certificate re-issuance

Cloudflare re-issues certificates every day — we call this a certificate renewal. Because certificates come with an expiration date, when Cloudflare sees that a certificate is expiring soon, we initiate a new certificate renewal order. This way, by the time the certificate expires, we already have an updated certificate deployed and ready to use for TLS termination.

Unfortunately, not all certificate renewals are initiated by the expiration date. Sometimes, unforeseeable events like key compromises can lead to certificate renewals. This is because a new key needs to be issued, and therefore a corresponding certificate does as well.

Key Compromises

A key compromise is when an unauthorized person or system obtains the private key that is used to encrypt and decrypt secret information — security personnel’s worst nightmare. Key compromises can be the result of a vulnerability, such as Heartbleed, where a bug in a system can cause the private key to be leaked. They can also be the result of malicious actions, such as a rogue employee accessing unauthorized information. In the event of a key compromise, it’s crucial that (1) new private keys are immediately issued, (2) new certificates are deployed, and (3) the old certificates are revoked.

The Heartbleed Vulnerability

In 2014, the Heartbleed vulnerability was exposed. It allowed attackers to extract the TLS certificate private key for any server that was running the affected version of OpenSSL, a popular encryption library. We patched the bug and then as a precaution, quickly reissued private keys and TLS certificates belonging to all of our customers, even though none of our keys were leaked. Cloudflare’s ability to act quickly protected our customers’ data from being exposed.

Heartbleed was a wake-up call. At the time, Cloudflare’s scale was a magnitude smaller. A similar vulnerability at today’s scale would take us weeks, not hours to re-issue all of our customers certificates.

Now, with backup certificates, we don’t need to worry about initiating a mass re-issuance in a small time frame. Instead, customers will already have a certificate that we’ll be able to instantly deploy. Not just that, but the backup certificate will also be wrapped with a different key than the primary certificate, preventing it from being impacted by a key compromise.

Key compromises are one of the main reasons certificates need to be re-issued at scale. But other events can prompt re-issuance as well, including mass revocations by Certificate Authorities.

Mass Revocations from CAs

Today, the Certificate Authority/Browser Forum (CA/B Forum) is the governing body that sets the rules and standards for certificates. One of the Baseline Requirements set by the CA/B Forum states that Certificate Authorities are required to revoke certificates whose keys are at risk of being compromised within 24 hours. For less immediate issues, such as certificate misuse or violation of a CA’s Certificate Policy, certificates need to be revoked within five days. In both scenarios, certificates will be revoked by the CA in a short timeframe and immediate re-issuance of certificates is required.

While mass revocations aren’t commonly initiated by CAs, there have been a few occurrences throughout the last few years. Recently, Let’s Encrypt had to revoke roughly 2.7 million certificates when they found a non-compliance in their implementation of a DCV challenge. In this case, Cloudflare customers were unaffected.

Another time, one of the Certificate Authorities that we use found that they were renewing certificates based on validation tokens that did not comply with the CA/B Forum standards. This caused them to invoke a mass revocation, impacting about five thousand Cloudflare-managed domains. We worked with our customers and the CA to issue new certificates before the revocation, resulting in minimal impact.

We understand that mistakes happen, and we have been lucky enough that as these issues have come up, our engineering teams were able to mitigate quickly so that no customers were impacted. But that’s not enough: our systems need to be future-proof so that a revocation of 45 million certificates will have no impact on our customers. With backup certificates, we’ll be ready for a mass re-issuance, no matter the scale.

To be resilient against mass revocations initiated by our CAs, we are going to issue every backup certificate from a different CA than the primary certificate. This will add a layer of protection if one of our CAs will have to invoke a mass revocation — something that when initiated, is a ticking time bomb.

Challenges when Renewing Certificates

Scale: With great power, comes great responsibility

When the Heartbleed vulnerability was exposed, we had to re-issue about 100,000 certificates. At the time, this wasn’t a challenge for Cloudflare. Now, we are responsible for tens of millions of certificates. Even if our systems are able to handle this scale, we rely on our Certificate Authority partners to be able to handle it as well. In the case of an emergency, we don’t want to rely on systems that we do not control. That’s why it’s important for us to issue the certificates ahead of time, so that during a disaster, all we need to worry about is getting the backup certificates deployed.

Manual intervention for completing DCV

Another challenge that comes with re-issuing certificates is Domain Control Validation (DCV). DCV is a check used to validate the ownership of a domain before a Certificate Authority can issue a certificate for it. When customers onboard to Cloudflare, they can either delegate Cloudflare to be their DNS provider, or they can choose to use Cloudflare as a proxy while maintaining their current DNS provider.

When Cloudflare acts as the DNS provider for a domain, we can add Domain Control Validation (DCV) records on our customer’s behalf. This makes the certificate issuance and renewal process much simpler.

Domains that don’t use Cloudflare as their DNS provider — we call them partial zones — have to rely on other methods for completing DCV. When those domains proxy their traffic through us, we can complete HTTP DCV on their behalf, serving the HTTP DCV token from our Edge. However, customers that want their certificate issued before proxying their traffic need to manually complete DCV. In an event where Cloudflare has to re-issue thousands or millions of certificates, but cannot complete DCV on behalf of the customer, manual intervention will be required. While completing DCV is not an arduous task, it’s not something that we should rely on our customers to do in an emergency, when they have a small time frame, with high risk involved.

This is where backup certificates come into play. From now on, every certificate issuance will fire two orders: one for a certificate from the primary CA and one for the backup certificate. When we can complete the DCV on behalf of the customer, we will do so for both CAs.

Today, we’re only issuing backup certificates for domains that use Cloudflare as an Authoritative DNS provider. In the future, we’ll order backup certificates for partial zones. That means that for backup certificates for which we are unable to complete DCV, we will give customers the corresponding DCV records to get the certificate issued.

Backup Certificates Deployment Plan

We are happy to announce that Cloudflare has started deploying backup certificates on Universal Certificate orders for Free customers that use Cloudflare as an Authoritative DNS provider. We have been slowly ramping up the number of backup certificate orders and in the next few weeks, we expect every new Universal certificate pack order initiated on a Free, Pro, or Biz account to include a backup certificate, wrapped with a different key and issued from a different CA than the primary certificate.

At the end of April we will start issuing backup certificates for our Enterprise customers. If you’re an Enterprise customer and have any questions about backup certificates, please reach out to your Account Team.

Next Up: Backup Certificates for All

Today, Universal certificates make up 72% of the certificates in our pipeline. But we want full coverage! That’s why our team will continue building out our backup certificates pipeline to support Advanced Certificates and SSL for SaaS certificates. In the future, we will also issue backup certificates for certificates that our customers upload themselves, so they can have a backup they can rely on.

In addition, we will continue to improve our pipeline to make the deployment of backup certificates instantaneous — leaving our customers secure and online in an emergency.

At Cloudflare, our mission is to help build a better Internet. With backup certificates, we’re helping build a secure, reliable Internet that’s ready for any disaster. Interested in helping us out? We’re hiring.

Sizing Up Post-Quantum Signatures

Post Syndicated from Bas Westerbaan original https://blog.cloudflare.com/sizing-up-post-quantum-signatures/

Sizing Up Post-Quantum Signatures

Sizing Up Post-Quantum Signatures

Quantum computers are a boon and a bane. Originally conceived by Manin and Feyman to simulate nature efficiently, large-scale quantum computers will speed-up innovation in material sciences by orders of magnitude. Consider the technical advances enabled by the discovery of new materials (with bronze, iron, steel and silicon each ascribed their own age!); quantum computers could help to unlock the next age of innovation. Unfortunately, they will also break the majority of the cryptography that’s currently used in TLS to protect our web browsing. They fall in two categories:

  1. Digital signatures, such as RSA, which ensure you’re talking to the right server.
  2. Key exchanges, such as Diffie–Hellman, which are used to agree on encryption keys.

A moderately-sized stable quantum computer will easily break the signatures and key exchanges currently used in TLS using Shor’s algorithm. Luckily this can be fixed: over the last two decades, there has been great progress in so-called post-quantum cryptography. “Post quantum”, abbreviated PQ, means secure against quantum computers. Five years ago, the standards institute NIST started a public process to standardise post-quantum signature schemes and key exchanges. The outcome is expected to be announced early 2022.

At Cloudflare, we’re not just following this process closely, but are also testing the real-world performance of PQ cryptography. In our 2019 experiment with Google, we saw that we can switch to a PQ key exchange with little performance impact. Among the NIST finalists, there are many with even better performance. This is good news, as we would like to switch to PQ key exchanges as soon as possible — indeed, an attacker could intercept sensitive data today, then keep and decrypt it years into the future using a quantum computer.

Why worry about PQ signatures today

One would think we can take it easy with signatures for TLS: we only need to have them replaced before a large quantum computer is built. The situation, however, is more complicated.

  • The lead time to change signatures is higher. Not only do we need to change the browsers and servers, we also need to change certificate authorities (CAs) and everyone’s certificate management.
  • TLS is addicted to small and fast signatures. For this page that you’re viewing we sent six signatures: two in the certificate chain; one handshake signature; one OCSP staple and finally two SCTs used for certificate transparency.
  • PQ signature schemes have wildly varying performance trade-offs and quirks (as we’ll see below) which stack up quickly with six signatures, which all have slightly different requirements.

One might ask: can’t we be clever and get rid of some of these signatures? We think so! For instance, we can replace the handshake signature with a smaller key exchange or suppress intermediate certificates. Such fundamental changes take years to be adopted. That is why we are also investigating the performance of plain TLS with drop-in PQ signatures.

So, what are our options?

The zoo of PQ signatures

The three finalists of the NIST competition are Dilithium, Falcon and Rainbow. In the table below we compare them against RSA and ECDSA, both of which are in common use today, and a selection of other PQ schemes that might see standardisation in the future.

Sizing Up Post-Quantum Signatures
(* There are many caveats to this table. We compare instances of PQC security level 1. Signing and verification times vary considerably by hardware platform and implementation constraints. They should be taken as a rough indication only. The signing time of Falcon512 is discussed later on. We do not list all relevant variants of the NIST alternates or promising schemes. This instance of XMSS can only sign a million messages, is stateful, requires quite a bit of storage for quick signing, is not standardised and thus far from a drop-in replacement. Rainbow has one other variant, which has smaller private keys.)

None of these PQ signatures are a clear-cut drop-in replacement. To start, all have (much) larger signatures, except for Rainbow, GeMMS and SQISign. Rainbow and GeMMS have huge public keys and SQISign is very slow.

TLS signatures

To confuse matters even more, the signatures within TLS are not all the same:

  • Online. Only the handshake signature is created with every incoming TLS connection, and so signing needs to be fast. Dilithium fits this role well.
  • Offline. All other signatures are made months/years in advance, and so signing time is not that important. This group splits in two:
    • With a public key. The certificate chain includes signatures and their public keys. Here Falcon seems most suited.
    • Without a public key. The remaining three (SCTs and OCSP staple) are just signatures. For these, Rainbow seems optimal, as its large public keys are not transmitted.

Using Dilithium, Falcon, and Rainbow, together, allows optimization for both speed and size simultaneously, which seems like a great idea. However, combining different signatures at the same time has disadvantages:

  • A security issue in the design or implementation of one of the signatures compromises the whole.
  • Clients need to implement multiple cryptographic algorithms, in this case three of them, which is troublesome for smaller devices — especially if separate hardware support is needed for each of them.

So do we really need to eke out every byte and every cycle of performance? Or can we stick to a single signature scheme for simplicity and security?

Can we pick just one?

If we stick to one signature scheme, looking just at the numbers, Falcon512 seems like a reasonable option. It needs 5KB of extra space (compared to a classical handshake), about the same as the Dilithium–Falcon–Rainbow chimera of before. Unfortunately Falcon comes with a caveat: creating signatures efficiently requires constant-time 64-bit floating point arithmetic. Without it, signing is 20x slower. But speed alone is not enough; it has to run in constant time. Without that, one can slowly learn the secret key by measuring the time it takes to create a signature.

Although PCs typically have a sufficiently constant-time floating-point unit, many smaller devices do not. Thus, Falcon seems ill-suited for general purpose online signatures.

What about Dilithium2? It needs 17KB extra — let’s find out if that makes a big difference.

Evidence by Experiment

All the different variables and constraints clearly complicate an already challenging puzzle. The best thing is to just try the options. Over the last few years several interesting papers have appeared studying the various options, such as SKD20, PST20, SKD21 and PKNLN22. These are great starts, but don’t provide a complete picture:

  • SCTs and OCSP staples have yet to be considered. Leaving half (three) of the signatures out changes the results significantly.
  • The networks tested or emulated offer insights, but are far from representative of real-world conditions. All tests were conducted between two datacenters (which does not include real-world last-mile conditions such as Wi-Fi or spotty mobile connections); or a network was simulated with unrealistic packet loss rates.

Here, Cloudflare can contribute. One of the things we like to do is to put new ideas in the community to the test on a global scale.

In this case we’re just taking a first step. Setting up a real-world experiment with a modified browser is quite involved, especially when we consider the many possible variations. Instead, as a first step, we decided first to investigate the most striking variable, the size, and try to answer the question:

How do larger signatures affect the TLS handshake?

There are two parts to this: how fast are they, and, more importantly, do they work at all?

Experimental setup

We need some way to emulate bigger signatures without having to modify the clients. We considered several options. The first idea we had was to pad a valid certificate with a dummy extension. That would require a custom certificate for each size to test, which is cumbersome. Then we considered responding with a dummy ServerHello extension. This is, however, not allowed by TLS 1.2 without a corresponding ClientHello extension. In the end, we went for adding dummy certificates.

Dummy certificates

These dummy certificates are 1kB self-signed invalid certificates that have nothing to do with the certificate chain. To vary the size to test, we simply add more copies. Adding unrelated certificates to the certificate chain is a common misconfiguration and clients have learnt to ignore them. In fact, TLS 1.3 stipulates that these (in rfc-speak) SHOULD be ignored by the client. Testing out hundreds of browsers, we saw no issues.

Standards and reality don’t always agree: when inserting dummy certificates on actual traffic, we saw issues with a small, but not insignificant number of clients. We don’t want to ruin anyone’s connection, and so we decided to use separate connections for this purpose.

Using challenge pages to launch probes

So what did we actually do? On a small percentage of the challenge pages (those with the CAPTCHA), we pick a number n and a random key and send this key in two separate background requests to:

  • 0.tls-size-experiment-c.cloudflareresearch.com
  • [n].tls-size-experiment-1.cloudflareresearch.com

The first, the control, is a normal webpage that stores the TLS handshake time under the key that’s been sent. The real action happens at the second, the live, which adds the n dummy certificates to its chain. The live also stores handshake time under the given key. We could call it “experimental” instead of “live”, but the benign control connection is also an important part of the experiment. Indeed, it allows us to see if live connections are missing. These endpoints were a breeze to write using Cloudflare Workers and KV.

How much dummy data to test?

Before launching the experiment, we tested several libraries and browsers on the live endpoint to see whether they would error due to the dummy certificates. None rejected a single certificate, but how far can we go? TLS 1.3 theoretically allows a certificate chain of 16MB, but in practice many clients reject a much shorter chain. OpenSSL, for instance, rejects one of 102kB. The most stingy we found is Go’s TLS client, which rejects a handshake larger than 64kB. Because of this, we tested with between 1 and 59 dummy certificates.

Intermezzo: TCP’s congestion window

So, what did we find? The graphs are in the next section, have a peek! Before diving right in, we would like to explain a crucial concept, the TCP congestion window, that helps us read the results.

Data sent over the Internet is broken down in packets of around 1.4kB that traverse many routers to reach their destination. Sometimes a router has more incoming packets than it can handle and it has to drop them — this is called congestion. To avoid causing congestion, TCP initially sends just a few packets (typically ten, so ~14kB). Then, with every acknowledgement received in return, the TCP sender will very quickly ramp up the number of packets that it keeps in flight. This number is called the congestion window (cwnd). When it gets too high, congestion occurs, packets are dropped and in response the sender backs off by dialing down the congestion window. Any dropped packet is seen as a sign of congestion by TCP. For this reason, Wi-Fi has its own retransmission mechanism transparent to TCP.

Considering all this, we would expect to see two effects with larger signatures:

  • Gentle slope. Every single packet needs some extra time to transmit, due to limited bandwidth and possible physical-layer retransmissions. This slope isn’t so gentle if your internet connection is slow or spotty.
  • cwnd wall. Once we fill the congestion window, we have to wait for a whole roundtrip before we can continue. This effect is stronger if the roundtrip time (RTT) is higher.

The strength of the two effects can differ. With a fast connection and high RTT we expect to see the graph below on the left. With a slow connection and low RTT, we expect the one on the right.

Sizing Up Post-Quantum Signatures

There might be other unknown effects. The best thing is to have a look.

In PQ research, the second effect has gained the most attention. The larger signatures simply do not fit in the initial congestion windows used today. A common suggestion in response has been to simply increase the initial congestion window to accommodate the larger signatures. This is far from a simple change to make globally, and we have to understand if this solves the problem to begin with.

Results

Over 24 days we’ve received 964,499 live connections from 454,218 different truncated IPs (to 24 bits, “/24”, for IPv4 and 48 bits for IPv6) and 11,239 different ASNs. First, let’s check how many clients had trouble with the bigger handshakes.

Can clients handle the larger handshakes?

The control connection was missing for 2.4% of the live connections. This is not alarming: we expect some connections to be missing for harmless reasons, such as the user browsing away from the challenge page. There are, however, significantly more live connections without control connection at 3.6%.

In the graph below on the left we break the number of received live connections down by the number of dummy certificates added. Because we pick the number of certificates randomly, the graph is noisy. To get a clearer picture, we started storing the number of certificates added in the corresponding control request, which gives us the graph on the right. The bumps at 10kB and 30kB suggest that there are clients or middleboxes that cannot handle these handshake sizes.

Sizing Up Post-Quantum Signatures

Handshake times with larger signatures

What is the effect on the handshake time? The graph on the left shows the weighted median and 75th percentile TLS handshake times for different amounts of dummy data added. We use the weight so that every truncated IP contributes equally. On the right we show the slowdowns for each size, relative to the handshake time of the control connection.

Sizing Up Post-Quantum Signatures

We can see the not-so-gentle slope until 40kB, where we hit a little wall that corresponds to Cloudflare’s default initial congestion window of 30 packets.

Adding 35kB fits within our initial congestion window. Nonetheless, the median handshake with 35kB extra is 40% slower. The slowest 10% are even worse off, taking 60% as much time. Thus even though we stay within the congestion window, the added data is not for free at all.

We can now translate these insights back to concrete PQ signatures. For example, using Dilithium2 as a drop-in replacement, we need around 17kB extra. That also fits within our initial congestion window with a median slowdown of 20%, which gets worse for the tail-end of users. For the normal initial congestion window of ten, we expect the slowdown to be much worse — around 60–80%.

There are several caveats to point out:

  • These experiments used an initial congestion window of 30 packets instead of ten. With a smaller initial congestion window of ten, which is the default for most systems, we would expect the wall to move from 40kB to around 10kB.
  • Because of our presence all across the world, our RTTs are fairly low. Thus the effect of the cwnd wall is smaller for us.
  • Challenge pages are served, by design, to those clients that we expect to be bots. This adds a significant bias because bots are generally hosted at well-connected providers, and so are closer than users.
  • HTTP/3 was not supported by the server we used for the endpoint. Support for IPv6 was only added ten days into the experiment and accounts for 10.9% of the measurements.
  • Actual TLS handshakes differ in size much more than tested in this setup due to differences in certificate sizes and extensions and other factors.

What have we learned?

The TLS handshake is just one step (~5–20%) in a long chain required to show you a webpage. Casually browsing, it would be hard to notice a TLS handshake that’s 60% slower. But such differences add up. To make a website really fast, you need many seemingly insignificant speedups. Browser developers take this seriously: only in exceptional cases does Chrome allow a change that slows down any microbenchmark by even a percent.

Because of the many parties and complexities involved, we should avoid waiting too long to adopt post-quantum signatures in TLS. That’s a hard sell if it comes at the price of a double-digit slowdown, not least to content servers but also to browser vendors and clients.

A timely adoption of PQ signatures on the web would be great. Our evidence so far suggests that this will be easiest, if six signatures and two public keys would fit in 9kB.

We will continue our efforts to help build a post-quantum secure Internet. To follow along, keep an eye on this blog or have a look at research.cloudflare.com.

Bas Westerbaan is co-submitter of the SPHINCS+ signature scheme.

References

SKD20: Sikeridis, Kampanakis, Devetsikiotis. Assessing the overhead of post-quantum cryptography in TLS 1.3 and SSH. CoNEXT’20.
PST20: Paquin, Stebila, Tamvada. Benchmarking Post-Quantum Cryptography in TLS. PQCrypto 2020.
SKD21: Sikeridis, Kampanakis, Devetsikiotis. Post-Quantum Authentication in TLS 1.3: A Performance Study. NDSS2020.
PKNLN22: Paul, Kuzovkova, Lahr, Niederhagen. Mixed Certificate Chains for the Transition to Post-Quantum Authentication in TLS 1.3. To appear in AsiaCCS 2022.

Cloudflare for SaaS for All, now Generally Available!

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/cloudflare-for-saas-for-all-now-generally-available/

Cloudflare for SaaS for All, now Generally Available!

Cloudflare for SaaS for All, now Generally Available!

During Developer Week a few months ago, we opened up the Beta for Cloudflare for SaaS: a one-stop shop for SaaS providers looking to provide fast load times, unparalleled redundancy, and the strongest security to their customers.

Since then, we’ve seen numerous developers integrate with our technology, allowing them to spend their time building out their solution instead of focusing on the burdens of running a fast, secure, and scalable infrastructure — after all, that’s what we’re here for.

Today, we are very excited to announce that Cloudflare for SaaS is generally available, so that every customer, big and small, can use Cloudflare for SaaS to continue scaling and building their SaaS business.

What is Cloudflare for SaaS?

If you’re running a SaaS company, you have customers that are fully reliant on you for your service. That means you’re responsible for keeping their domain fast, secure, and protected. But this isn’t simple. There’s a long checklist you need to get through to put a solution in your customers’ hands:

  • Set up an origin server
  • Encrypt your customers’ traffic
  • Keep your customers online
  • Boost the performance of global customers
  • Support vanity domains
  • Protect against attacks and bots
  • Scale for growth
  • Provide insights and analytics

And on top of that, you need to also focus on building out your solution and your business. As a developer or startup with limited resources, this can delay your product launch by weeks or months.

That’s what we’re here to help with! We have numerous engineering teams whose sole focus is to work on products that take care of each one of these tasks, so you don’t have to!

The Cloudflare solution:

  • Set up an origin server  → Workers
  • Encrypt your customers’ traffic →  SSL for SaaS
  • Keep your customers online → Cloudflare’s global Anycast network
  • Boost the performance of global customers → Argo Smart Routing/Cache
  • Support vanity domains → Custom Hostnames
  • Protect against attacks and bots → WAF and Bot Management
  • Scale for growth → Workers
  • Provide insights and analytics → Custom Hostname Analytics

Pricing, Made for Developers

Starting today, Cloudflare for SaaS is available to purchase on Free, Pro, and Business plans. We wanted to make sure that the pricing made sense for developers. At the time of building, you don’t know how many customers you’ll have, so we wanted to offer flexibility by keeping the pricing as simple as possible: only pay for the customers you use.

Each customer domain using the service is called a Custom Hostname. For each Custom Hostname, we automatically provision a TLS certificate. But not just that!  Beyond the TLS certificate, each of your Custom Hostnames inherits the full suite of Cloudflare products that you set up on your SaaS zone. From Bot Management to Argo Smart Routing, you can extend these add-ons that protect and accelerate your domain to your customers.

Custom Hostnames cost two dollars per month. We will only charge you after each Custom Hostname has been onboarded, adjusted according to when you created it. That means that if you created 10 Custom Hostnames at the start of the month and 10 Custom Hostnames halfway through, at the end of the month you will be billed $30.

This way, you’re only charged for the Custom Hostnames that you provision. It’s also a great incentive to make sure you clean up after your churned customers.

If you’re an Enterprise customer and want to learn more about the benefits that you can from Cloudflare for SaaS, make sure you check out our blog post about the latest developments.

Show us what you’re building!

During the beta alone, we’ve seen incredible projects built out on the platform. We wanted to showcase these developers to show you what’s possible. And even better, some of these have been built on our Workers platform! We’d love to see what you’re working on. Join our Discord channel and showcase your work! Have feature requests for us? Let us know!

mmm.page: Simple Personal Websites

Cloudflare for SaaS for All, now Generally Available!

mmm.page is a drag-and-drop website builder that makes it dead simple to create auto-responsive, collage-like websites: websites with overlapping text, images, GIFs, YouTube videos, Spotify embeds, and (a lot) more. To make it easier, all the standard website tedium — uptime, usability, performance, reliability, responsiveness, SEO, etc. — are handled under the hood so all you have to worry about is adding content and arranging it how you want.

Under their hood is Cloudflare. Cloudflare’s CDN allows both the flexibility of server-side pages as well as the instant loading times of static pages — not to mention an 80% reduction in server costs. Custom Hostnames alone saved months of development time by handling domain names and SSL management (which are otherwise tricky to get perfect and reliable).

They’ve used Workers for increasingly more tasks that would’ve otherwise taken an order of magnitude more time if implemented with their current backend monolith — the ease of deployment and comparatively low cost of Workers is something that keeps them coming back.

The longer-term hope is for pages to be used as a sort of beacon signal, an easy-to-make yet unbounded way to express to others the things you’re interested in, especially for things that aren’t so easily describable or captured in words. They look forward to a world of a ton more DIY micro-sites. Cloudflare has been crucial in taking care of much of the difficult technical plumbing and giving them more time to work on designs and features that get them closer to this hope.

Lightfunnels

Cloudflare for SaaS for All, now Generally Available!

Lightfunnels is a performance driven e-commerce and lead generation platform. It focuses on delivering fast, reliable, and highly converting sales funnels to its users and their customers.

With Cloudflare for SaaS, Lightfunnels allows users to preserve their brand by easily connecting their own domain names with SSL to use on their funnels.

The platform handles large e-commerce traffic volume through Cloudflare Workers. This helps Lightfunnels serve pages from the closest edge to the customer, wherever they are in the world, allowing for blazing fast page load speeds.

Workers also come with a powerful caching API that eliminates a great percentage of back-end trips and reduces the stress on their servers.

“Our aim is to build the best performing e-commerce and lead generation platform on the market. Page load speeds play a significant role in performance. Using Cloudflare for SaaS along with Cloudflare Workers made building a reliable, secure, and fast infrastructure a breeze.”
Yassir Ennazk, Co-founder & CEO at Lightfunnels

Ventrata

Ventrata is a SaaS multi-channel booking platform for large attractions and tour operators. They power booking sites and B2B booking portals for clients that run on other domains. Cloudflare for SaaS has allowed them to leverage all of Cloudflare’s tools, including Firewall, image caching, Workers, and free TLS certificates on Custom Hostnames, while allowing their clients to keep full control of their brand. Their implementation involved just 4 lines of code without any infrastructure/DevOps help required, which would have been impossible before.

Currently a part of the Beta?

If you were accepted as a part of the Cloudflare for SaaS Beta, you will get a notice next week about migrating to the paid version.  

Help build a better Internet

Want to be a part of the Cloudflare team and work on the products that power Cloudflare for SaaS? We’re hiring!

Revisiting BetterTLS: Certificate Path Building

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/revisiting-bettertls-certificate-path-building-4c978b79843f

By Ian Haken

Last year the AddTrust root certificate expired and lots of clients had a bad time. Some Roku devices weren’t working right, Heroku had problems, and some folks couldn’t even curl. In the aftermath Ryan Sleevi wrote a really great blog post not just about the issue of this one certificate’s expiry, but the problem that so many TLS implementations have in general with certificate path building. If you haven’t read that blog post, you should. This post is probably going to make a lot more sense if you’ve read that one first, so go ahead and read it now.

To recap that previous AddTrust root certificate expiry, there was a certificate graph that looked like this:

The AddTrust certificate graph

This is a real example, and you can see the five certificates in the above graph here:

  1. www.agwa.name (leaf certificate)
  2. Sectigo RSA Domain Validation Secure Server CA (intermediate CA)
  3. USERTrust RSA Certification Authority (intermediate CA)
  4. USERTrust RSA Certification Authority (self-signed)
  5. AddTrust External CA Root (self-signed)

The important thing to understand about a certificate graph is that the boxes represent entities (meaning an X.500 Distinguished Name and public key). Entities are things you trust (or don’t, as the case may be). The arrows between entities represent certificates: a way to extend trust from one entity to another. This means that if you trust either the “USERTrust RSA Certification Authority” entity or the “AddTrust External CA Root” entity, you should be able to discover a chain of trust from that trusted entity (the “trust anchor”) down to “www.agwa.name”, the “end-entity”.

(Note that the self-signed certificates (4 and 5) are often useful for defining trusted entities, but aren’t going to be important in the context of path building.)

The problem that occurred last summer started because certificate 3 expired. The “USERTrust RSA Certificate Authority” was relatively new and not directly trusted by many clients and so most servers would send certificates 1, 2, and 3 to clients. If a client only trusted “AddTrust External CA Root” then this would be fine; that client can build a certificate chain 1 ← 2 ← 3 and see that they should trust www.agwa.name’s public key. On the other hand, if the client trusts “USERTrust RSA Certification Authority” then that’s also fine; it only needs to build a chain 1 ← 2.

The problem that arose was that some clients weren’t good at certificate path building (even though this is a fairly simple case of path building compared to the next example below). Those clients didn’t realize that they could stop building a chain at 2 if they trusted “USERTrust RSA Certification Authority”. So when certificate 3 expired on May 30, 2020, these clients treated the entire collection of certificates sent by the server as invalid and would no longer establish trust in the end-entity.

Even though that story is a year old and was well covered then, I’m retelling it here because a couple of weeks ago something kind of similar happened: a certificate for the Let’s Encrypt R3 CA expired (certificate 2 below) on September 30, 2021. This should have been fine; the Let’s Encrypt R3 entity also has a certificate signed by the ISRG Root X1 CA (3) which nowadays is trusted by most clients.

The Let’s Encrypt R3 Certificate Graph
  1. src.agwa.name (leaf certificate)
  2. Let’s Encrypt R3 (signed by DST Root CA X3)
  3. Let’s Encrypt R3 (signed by ISRG Root X1)
  4. DST Root CA X3 (self-signed)
  5. ISRG Root X1 (self-signed)

But predictably, even though it’s been a year since Ryan’s post, lots of services and clients had issues. You should read Scott Helme’s full post-mortem on the event to understand some of the contributing factors, but one big problem is that most TLS implementations still aren’t very good at path building. As a result, servers generally can’t send a complete collection of certificates down to clients (containing different possible paths to different trust anchors) which makes it hard to host a service that both old and new devices can talk to.

Maybe it’s because I saw history repeating or maybe it’s because I had just listened to Ryan Sleevi talk about the history of web PKI, but the whole episode really made me want to get back to something I had been wanting to do for a while. So over the last couple of weeks I set some time aside, started reading some RFCs, had to get more coffee, finished reading some RFCs, and finally started making certificates. The end result is the first major update to BetterTLS since its first release: a new suite of tests to exercise TLS implementations’ certificate path building. As a bonus, it also checks whether TLS implementations apply certain validity checks. Some of the checks are part of RFCs, like Basic Constraints, while others are not fully standardized, like distrusting deprecated signature algorithms and enforcing EKU constraints on CAs.

I found the results of applying these tests to various TLS implementations pretty interesting, but before I get into those results let me give you a few more details about why TLS implementations should be doing good path building and why we care about it.

What is Certificate Path Building?

If you want the really detailed answer to “What is Certificate Path Building” you can take a look at RFC 4158. But in short, certificate path building is the process of building a chain of trust from some end entity (like the blue boxes in the examples above) back to a trust anchor (like the ISRG Root X1 CA) given a collection of certificates. In the context of TLS, that collection of certificates is sent from the server to the client as part of the TLS handshake. A lot of the time, that collection of certificates is actually already an ordered sequence of certificates from end-entity to trust anchor, such as in the first example where servers would send certificates 1, 2, 3. This happens to already be a chain from “www.agwa.name” to “AddTrust External CA Root”.

But what happens if we can’t be sure what trust anchor the client has, such as the second example above where the server doesn’t know if the client will trust DST Root CA X3 or ISRG Root X1? In this case the server could send all the certificates (1, 2, and 3) and let the client figure out which path makes sense (either 1 ← 2, or 1 ← 3). But if the client expects the server’s certificates to simply be a chain already, the sequence 1 ← 2 ← 3 is going to fail to validate.

Why Does This Matter?

The most important reason for clients to support robust path building is that it allows for agility in the web PKI ecosystem. For example, we can add additional certificates that conform to new requirements such as SHA-1 deprecation, validity length restrictions, or trust anchor removal, all while leaving existing certificates in place to preserve legacy client functionality. This allows static, infrequently updated, or intentionally end-of-lifed clients to continue working while browsers (which frequently enforce new constraints like the ones mentioned above) can take advantage of the additional certificates in the handshake that conform to the new requirements.

In particular, Netflix runs on a lot of devices. Millions of them. The reality though is that the above description applies to many of them. Some devices only run older versions of Android of iOS. Some are embedded devices that no longer receive updates. Regardless of the specifics, the update cadence (if one exists) for those devices is outside of our control. But ideally we’d love it if every device that used to work just kept working. To that end, it’s helpful to know what trade-offs we can make in terms of agility versus retaining support for every device. Are those devices stuck using certain signature algorithms or cipher suites? Will those devices accept a certificate set that includes extra certificates with other alternate signature algorithms?

As service owners, having test suites that can answer these questions can guide decision making. On the other hand, TLS library implementers using these test suites can ensure that applications built with their libraries operate reliably throughout churn in the web PKI ecosystem.

An Aside About Agility

More than 4 years passed between publication of the first draft of the TLS 1.3 specification and the final version. An impressive amount of consideration went into the design of all of the versions of the TLS and SSL protocols and it speaks to the designers’ foresight and diligence that a single server can support clients speaking SSL 3.0 (final specification released 1996) all the way up to TLS 1.3 (final specification released 2018).

(Although I should say that in practice, supporting such a broad set of protocol versions on a single server is probably not a good idea.)

The reason that TLS protocol can support this is because agility has been designed into the system. The client advertises the TLS versions, cipher suites, and extensions it supports and the server can make decisions about the best supported version of those options and negotiate the details in its response. Over time this has allowed the ecosystem to evolve gracefully, supporting new cryptographic primitives (like elliptic curve cryptography) and deprecating old ones (like the MD5 hash algorithm).

Unfortunately the TLS specification has not enabled the same agility with the certificates that it relies on in practice. While there are great specifications like RFC 4158 for how to think about certificate path building, TLS specifications up to 1.2 only allowed for server to present “the chain”:

This is a sequence (chain) of certificates. The sender’s certificate MUST come first in the list. Each following certificate MUST directly certify the one preceding it.

Only in TLS 1.3 did the specification allow for greater flexibility:

The sender’s certificate MUST come in the first CertificateEntry in the list. Each following certificate SHOULD directly certify the one immediately preceding it.

Note: Prior to TLS 1.3, “certificate_list” ordering required each certificate to certify the one immediately preceding it; however, some implementations allowed some flexibility. Servers sometimes send both a current and deprecated intermediate for transitional purposes, and others are simply configured incorrectly, but these cases can nonetheless be validated properly. For maximum compatibility, all implementations SHOULD be prepared to handle potentially extraneous certificates and arbitrary orderings from any TLS version, with the exception of the end-entity certificate which MUST be first.

This change to the specification is hugely significant because it’s the first formalization that TLS implementations should be doing robust path building. Implementations which conform to this are far more likely to continue operating in a PKI ecosystem undergoing frequent changes. If more TLS implementations can tolerate changes, then web PKI ecosystem will be in a place where it is able to undergo those changes. And ultimately this means we will be able to update best practices and retain trust agility as time goes on, making the web a more secure place.

It’s hard to imagine a world where SSL and TLS were so inflexible that we wouldn’t have been able to gracefully transition off of MD5 or transition to PFS cipher suites. I’m hopeful that this update to the TLS specification will help bring the same agility that has existed in the TLS protocol itself to the web PKI ecosystem.

Test Results

So what does the new test suite in BetterTLS tell us about the state of certificate path building in TLS implementations? The good news is that there has been some improvement in the state of the world since Ryan’s roundup last year. The bad news is that that improvement isn’t everywhere.

The test suite both identifies what relevant features a TLS implementation supports (like default distrust of deprecated signing algorithms) and evaluates correctness. Here’s a quick enumeration of what features this test suite looks for:

  • Branching Path Building: Implementations that support “branching” path building can handle cases like the Let’s Encrypt R3 example above where an entity has multiple issuing certificates and the client needs to check multiple possible paths to find a route to a trust anchor. Importantly, as invalid certificates are found during path building (for all of the reasons listed below) the implementation should be able to pick an alternate issuer certificate to continue building a path. This is the primary feature of interest in this test suite.
  • Certificate expiration: Implementations should treat expired certificates as invalid during path building. This is a pretty straightforward expectation and fortunately all the tested implementations were properly verifying this.
  • Name constraints: Implementations should treat certificates with a name constraint extension in conflict with the end entity’s identity as invalid. Check out BetterTLS’s name constraints test suite for more thorough evaluations of this evaluation. All of the implementations tested below correctly evaluated the simple name constraints check in this test suite.
  • Bad Extended Key Usage (EKU) on CAs: This check tests whether an implementation rejects CA certificates with an Extended Key Usage extension that is incompatible with the end-entity’s use of the certificate for TLS server authentication. The Mozilla Certificate Policy FAQ states:

Inclusion of EKU in CA certificates is generally allowed. NSS and CryptoAPI both treat the EKU extension in intermediate certificates as a constraint on the permitted EKU OIDs in end-entity certificates. Browsers and certificate client software have been using EKU in intermediate certificates, and it has been common for enterprise subordinate CAs in Windows environments to use EKU in their intermediate certificates to constrain certificate issuance.

While many implementations support the semantics of an incompatible EKU in CAs as a reason to treat a certificate as invalid, RFCs do not require this behavior so we do see several implementations below not applying this verification.

  • Missing Basic Constraints Extension: This check tests whether the implementation rejects paths where a CA is missing the Basic Constraints extension. RFC 5280 requires that all CAs have this extension, so all implementations should support this.
  • Not a CA: This check tests whether the implementation rejects paths where a CA has a Basic Constraints extension, but that extension does not identify the certificate as being a CA. Similarly to the above, all implementations should support this and fortunately all of the implementations tested applied this check correctly.
  • Deprecated Signing Algorithm: This check tests whether the implementation rejects certificates that have been signed with an algorithm that is considered deprecated (in particular, with an algorithm using SHA-1). Enforcement of SHA-1 deprecation is not universally present in all TLS implementations at this point, so we see a mix of implementations below that do and do not apply it.

For more information about these checks, check out the repository’s README. Now on to the results!

webpki

webpki is a rust library for validating web PKI certificates. It’s the underlying validation mechanism for the rustls library that I actually tested. webpki shows up as the hero of the non-browser TLS implementations, supporting all of the features and having a 100% test pass rate. webpki is primarily maintained by Brian Smith who also worked on the mozilla::pkix codebase that’s used by Firefox.

Go

Go didn’t distrust deprecated signature algorithms by default (although looking at the issues tracker, an update was merged to change this long before I started working on this test suite; it should land in Go 1.18), but otherwise supported all the features in the test suite. However, while it supported EKU constraints on CAs the test suite discovered a bug that causes path building to fail under certain conditions when only a subset of paths have an EKU constraint.

Upon inspection, the Go x509 library validates most certificate constraints (like expiration and name constraints) as it builds paths, but EKU constraints are only applied after candidate paths are found. This looks to be a violation of Sleevi’s Rule, which probably explains why the EKU corner case causes Go to have a bad time:

Even if a library supports path building, doing some form of depth-first search in the PKI graph, the next most common mistake is still treating path building and path verification as separable, independent steps. That is, the path builder finds “a chain” that is rooted in a trusted CA, and then completes. The completed chain is then handed to a path verifier, which asks “Does this chain meet all the caller’s/application’s requirements”, and returns a “Yes/No” answer. If the answer is “No”, you want the path builder to consider those other paths in the graph, to see if there are any “Yes” paths. Yet if the path building and verification steps are different, you’re bound to have a bad time.

Java

I didn’t evaluate JDKs other than OpenJDK, but the latest version of OpenJDK 11 actually performed quite well. This JDK didn’t enforce EKU constraints on CAs or distrust certificates signed with SHA-1 algorithms. Otherwise, the JDK did a good job of building certificate paths.

PKI.js

The PKI.js library is a javascript library that can perform a number of PKI-related operations including certificate verification. It’s unclear if the “certificate chain validator” is meant to support complex certificate sets or if it was only meant to handle pre-validated paths, but the implementation fared poorly against the test suite. It didn’t support EKU constraints, distrust deprecated signature algorithms, didn’t perform any branching path building, and failed to validate even a simple “chain” when a parent certificate has expired but the intermediate was already trusted (this is the same issue OpenSSL ran into with the expired AddTrust certificate last year).

Even worse, when the certificate pool had a cycle (like in RFC 4158 figure 7), the validator got stuck in an infinite loop.

OpenSSL

In short, OpenSSL doesn’t appear to have changed significantly since Ryan’s roundup last year. OpenSSL does support the less ubiquitous validation checks (such as EKU constraints on CAs and distrusting deprecated signing algorithms), but it still doesn’t support branching path building (only non-branching chains).

LibreSSL

LibreSSL showed significant improvement over last year’s evaluation, which appears to be largely attributable to Bob Beck’s work on a new x509 verifier in LibreSSL 3.2.2 based on Go’s verifier. It supported path building and passed all of the non-skipped tests. As with other implementations it didn’t distrust deprecated algorithms by default. The one big surprise though is that it also didn’t distrust certificates missing the Basic Constraints extension, which as we described above is strictly required by the RFC 5280 spec:

If the basic constraints extension is not present in a version 3 certificate, or the extension is present but the cA boolean is not asserted, then the certified public key MUST NOT be used to verify certificate signatures.

BoringSSL

BoringSSL performed similarly to OpenSSL. Not only did it not support any path building (only non-branching chains), but it also didn’t distrust deprecated signature algorithms.

GnuTLS

GnuTLS looked just like OpenSSL in its results. It also supported all the validation checks in the test suite (EKU constraints, deprecated signature algorithms) but didn’t support branching path building.

Browsers

By and large, browsers (or the operating system libraries they utilize) do a good job of path building.

Firefox (all platforms)

Firefox didn’t distrust deprecated signature algorithms, but otherwise supported path building and passed all tests.

Chrome (all platforms)
Chrome supported all validation cases and passed all tests.

Microsoft Edge (Windows)
Edge supported all validation cases and passed all tests.

Safari (MacOS)
Safari didn’t support EKU constraints on CAs but did pass simple branching path building test cases. However, it failed most of the more complicated path building test cases (such as cases with cycles).

Summary

+-----------+----------+-------------------+---+-------------------+
| | Supports | Distrusts SHA-1 |EKU| Has other errors? |
| | branching| signing algs? | | |
+-----------+----------+-------------------+---+-------------------+
| webpki | ✓ | ✓ | ✓ | |
+-----------+----------+-------------------+---+-------------------+| Go | ✓ | ✖ (Fixed in 1.18) | ✓ | EKU bug |
+-----------+----------+-------------------+---+-------------------+
| Java | ✓ | ✖ | ✖ | |
+-----------+----------+-------------------+---+-------------------+
| PKI.js | ✖ | ✖ | ✖ | Fails even non- |
| | | | | branching path |
| | | | | building cases, |
| | | | | has infinite loop |
+-----------+----------+-------------------+---+-------------------+
| OpenSSL | ✖ | ✓ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| LibreSSL | ✓ | ✖ | ✓ | Doesn't require |
| | | | | Basic Constraints |
+-----------+----------+-------------------+---+-------------------+
| BoringSSL | ✖ | ✖ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| GnuTLS | ✖ | ✓ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| Firefox | ✓ | ✖ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| Chrome | ✓ | ✓ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| Edge | ✓ | ✓ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| Safari | Kind of? | ✓ | ✖ | Failed complex |
| | | | | path finding cases|
+-----------+----------+-------------------+---+-------------------+

Closing Thoughts

For most of the history of TLS, implementations have been pretty poor at certificate path building (if they supported it at all). In fairness, until recently the TLS specifications asserted that servers MUST behave in such a way that didn’t require clients to implement certificate path building.

However the evolution of the web PKI ecosystem has necessitated flexibility and this has been more directly codified in the TLS 1.3 specification. If you work on a TLS implementation, you really really ought to take heed of these new expectations in the TLS 1.3 specification. We’re going to have a lot more stability on the web if implementations can do good path building.

To be clear, it doesn’t need to be every implementation’s goal to pass every test in this suite. I’ll be the first to admit that the test suite contains some more pathological test cases than you’re likely to see in web PKI in practice. But at a minimum you should look at the changes that have occurred in the web PKI ecosystem in the past decade and be confident that your library supports enough path building to easily handle transitions (such as servers sending multiple possible paths to a trust anchor). And passing all of the tests in the BetterTLS test suite is a useful metric for establishing that confidence.

It’s important to make sure clients are forward-compatible with changes to the web PKI, because it’s not a matter of “if” but “when.” In Scott’s own words:

One thing that’s certain is that this event is coming again. Over the next few years we’re going to see a wide selection of Root Certificates expiring for all of the major CAs and we’re likely to keep experiencing the exact same issues unless something changes in the wider ecosystem.

If you are in a position to choose between different client-side TLS libraries, you can use these test results as a point of consideration for which libraries are most likely to weather those changes.

And if you are a service owner, it is important to know your customers. Will they be able to handle a transition from RSA to ECDSA? Will they be able to handle a transition from ECDSA to a post-quantum signature algorithm? Will they be able to handle having multiple certificates in a handshake when an old trust is expiring or no longer trusted by new clients? Knowing your clients can help you be resilient and set up appropriate configurations and endpoints to support them.

Standards, security base lines, and best practices in web PKI have been rapidly changing over the last few years and are only going to keep changing. Whether you implement TLS or just consume it, whether it’s a distrusted CA, a broken signature algorithm, or just the expiry of a certificate in good standing, it’s important to make sure that your application will be able to handle the next big change. We hope that BetterTLS can play a part in making that easier!


Revisiting BetterTLS: Certificate Path Building was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Exported Authenticators: The long road to RFC

Post Syndicated from Jonathan Hoyland original https://blog.cloudflare.com/exported-authenticators-the-long-road-to-rfc/

Exported Authenticators: The long road to RFC

Exported Authenticators: The long road to RFC

Our earlier blog post talked in general terms about how we work with the IETF. In this post we’re going to talk about a particular IETF project we’ve been working on, Exported Authenticators (EAs). Exported Authenticators is a new extension to TLS that we think will prove really exciting. It unlocks all sorts of fancy new authentication possibilities, from TLS connections with multiple certificates attached, to logging in to a website without ever revealing your password.

Now, you might have thought that given the innumerable hours that went into the design of TLS 1.3 that it couldn’t possibly be improved, but it turns out that there are a number of places where the design falls a little short. TLS allows us to establish a secure connection between a client and a server. The TLS connection presents a certificate to the browser, which proves the server is authorised to use the name written on the certificate, for example blog.cloudflare.com. One of the most common things we use that ability for is delivering webpages. In fact, if you’re reading this, your browser has already done this for you. The Cloudflare Blog is delivered over TLS, and by presenting a certificate for blog.cloudflare.com the server proves that it’s allowed to deliver Cloudflare’s blog.

When your browser requests blog.cloudflare.com you receive a big blob of HTML that your browser then starts to render. In the dim and distant past, this might have been the end of the story. Your browser would render the HTML, and display it. Nowadays, the web has become more complex, and the HTML your browser receives often tells it to go and load lots of other resources. For example, when I loaded the Cloudflare blog just now, my browser made 73 subrequests.

As we mentioned in our connection coalescing blog post, sometimes those resources are also served by Cloudflare, but on a different domain. In our connection coalescing experiment, we acquired certificates with a special extension, called a Subject Alternative Name (SAN), that tells the browser that the owner of the certificate can act as two different websites. Along with some further shenanigans that you can read about in our blog post, this lets us serve the resources for both the domains over a single TLS connection.

Cloudflare, however, services millions of domains, and we have millions of certificates. It’s possible to generate certificates that cover lots of domains, and in fact this is what Cloudflare used to do. We used to use so-called “cruise-liner” certificates, with dozens of names on them. But for connection coalescing this quickly becomes impractical, as we would need to know what sub-resources each webpage might request, and acquire certificates to match. We switched away from this model because issues with individual domains could affect other customers.

What we’d like to be able to do is serve as much content as possible down a single connection. When a user requests a resource from a different domain they need to perform a new TLS handshake, costing valuable time and resources. Our connection coalescing experiment showed the benefits when we know in advance what resources are likely to be requested, but most of the time we don’t know what subresources are going to be requested until the requests actually arrive. What we’d rather do is attach extra identities to a connection after it’s been established, and we know what extra domains the client actually wants. Because the TLS connection is just a transport mechanism and doesn’t understand the information being sent across it, it doesn’t actually know what domains might subsequently be requested. This is only available to higher-layer protocols such as HTTP. However, we don’t want any website to be able to impersonate another, so we still need to have strong authentication.

Exported Authenticators

Enter Exported Authenticators. They give us even more than we asked for. They allow us to do application layer authentication that’s just as strong as the authentication you get from TLS, and then tie it to the TLS channel. Now that’s a pretty complicated idea, so let’s break it down.

To understand application layer authentication we first need to explain what the application layer is. The application layer is a reference to the OSI model. The OSI model describes the various layers of abstraction we use, to make things work across the Internet. When you’re developing your latest web application you don’t want to have to worry about how light is flickered down a fibre optic cable, or even how the TLS handshake is encoded (although that’s a fascinating topic in its own right, let’s leave that for another time.)

All you want to care about is having your content delivered to your end-user, and using TLS gives you a guaranteed in-order, reliable, authenticated channel over which you can communicate. You just shove bits in one end of the pipe, and after lots of blinky lights, fancy routing, maybe a touch of congestion control, and a little decoding, *poof*, your data arrives at the end-user.

The application layer is the top of the OSI stack, and contains things like HTTP. Because the TLS handshake is lower in the stack, the application is oblivious to this process. So, what Exported Authenticators give us is the ability for the very top of the stack to reliably authenticate their partner.

Exported Authenticators: The long road to RFC
The seven-layered OSI model

Now let’s jump back a bit, and discuss what we mean when we say that EAs give us authentication that’s as strong as TLS authentication. TLS, as we know, is used to create a secure connection between two endpoints, but lots of us are hazy when we try and pin down exactly what we mean by “secure”. The TLS standard makes eight specific promises, but rather than get buried in that particular ocean of weeds, let’s just pick out the one guarantee that we care about most: Peer Authentication.

Peer authentication: The client's view of the peer identity should reflect the server's identity. [...]

In other words, if the client thinks that it’s talking to example.com then it should, in fact, be talking to example.com.

What we want from EAs is that if I receive an EA then I have cryptographic proof that the person I’m talking to is the person I think I’m talking to. Now at this point you might be wondering what an EA actually looks like, and what it has to do with certificates. Well, an EA is actually a trio of messages, the first of which is a Certificate. The second is a CertificateVerify, a cryptographic proof that the sender knows the private key for the certificate. Finally there is a Finished message, which acts as a MAC, and proves the first two parts of the message haven’t been tampered with. If this structure sounds familiar to you, it’s because it’s the same structure as used by the server in the TLS handshake to prove it is the owner of the certificate.

The final piece of unpacking we need to do is explaining what we mean by tying the authentication to the TLS channel. Because EAs are an application layer construct they don’t provide any transport mechanism. So, whilst I know that the EA was created by the server I want to talk to, without binding the EA to a TLS connection I can’t be sure that I’m talking directly to the server I want.

Exported Authenticators: The long road to RFC
Without protection, a malicious server can move Exported Authenticators from one connection to another.

For all I know, the TLS server I’m talking to is creating a new TLS connection to the EA Server, and relaying my request, and then returning the response. This would be very bad, because it would allow a malicious server to impersonate any server that supports EAs.

Exported Authenticators: The long road to RFC
Because EAs are bound to a single TLS connection, if a malicious server copies an EA from one connection to another it will fail to verify.

EAs therefore have an extra security feature. They use the fact that every TLS connection is guaranteed to produce a unique set of keys. EAs take one of these keys and use it to construct the EA. This means that if some malicious third-party copies an EA from one TLS session to another, the recipient wouldn’t be able to validate it. This technique is called channel binding, and is another fascinating topic, but this post is already getting a bit long, so we’ll have to revisit channel binding in a future blog post.

How the sausage is made

OK, now we know what EAs do, let’s talk about how they were designed and built. EAs are going through the IETF standardisation process. Draft standards move through the IETF process starting as Internet Drafts (I-Ds), and ending up as published Requests For Comment (RFCs). RFCs are voluntary standards that underpin much of the global Internet plumbing, and not just for security protocols like TLS. RFCs define DNS, UDP, TCP, and many, many more.

The first step in producing a new IETF standard is coming up with a proposal. Designing security protocols is a very conservative business, firstly because it’s very easy to introduce really subtle bugs, and secondly, because if you do introduce a security issue, things can go very wrong, very quickly. A flaw in the design of a protocol can be especially problematic as it can be replicated across multiple independent implementations — for example the TLS renegotiation vulnerabilities reported in 2009 and the custom EC(DH) parameters vulnerability from 2012. To minimise the risks of design issues, EAs hew closely to the design of the TLS 1.3 handshake.

Security and Assurance

Before making a big change to how authentication works on the Internet, we want as much assurance as possible that we’re not going to break anything. To give us more confidence that EAs are secure, they reuse parts of the design of TLS 1.3. The TLS 1.3 design was carefully examined by dozens of experts, and underwent multiple rounds of formal analysis — more on that in a moment. Using well understood design patterns is a super important part of security protocols. Making something secure is incredibly difficult, because security issues can be introduced in thousands of ways, and an attacker only needs to find one. By starting from a well understood design we can leverage the years of expertise that went into it.

Another vital step in catching design errors early is baked into the IETF process: achieving rough consensus. Although the ins and outs of the IETF process are worthy of their own blog post, suffice it to say the IETF works to ensure that all technical objections get addressed, and even if they aren’t solved they are given due care and attention. Exported Authenticators were proposed way back in 2016, and after many rounds of comments, feedback, and analysis the TLS Working Group (WG) at the IETF has finally reached consensus on the protocol. All that’s left before the EA I-D becomes an RFC is for a final revision of the text to be submitted and sent to the RFC Editors, leading hopefully to a published standard very soon.

As we just mentioned, the WG has to come to a consensus on the design of the protocol. One thing that can hold up achieving consensus are worries about security. After the Snowden revelations there was a barrage of attacks on TLS 1.2, not to mention some even earlier attacks from academia. Changing how trust works on the Internet can be pretty scary, and the TLS WG didn’t want to be caught flat-footed. Luckily this coincided with the maturation of some tools and techniques we can use to get mathematical guarantees that a protocol is secure. This class of techniques is known as formal methods. To help ensure that people are confident in the security of EAs I performed a formal analysis.

Formal Analysis

Formal analysis is a special technique that can be used to examine security protocols. It creates a mathematical description of the protocol, the security properties we want it to have, and a model attacker. Then, aided by some sophisticated software, we create a proof that the protocol has the properties we want even in the presence of our model attacker. This approach is able to catch incredibly subtle edge cases, which, if not addressed, could lead to attacks, as has happened before. Trotting out a formal analysis gives us strong assurances that we haven’t missed any horrible issues. By sticking as closely as possible to the design of TLS 1.3 we were able to repurpose much of the original analysis for EAs, giving us a big leg up in our ability to prove their security. Our EA model is available in Bitbucket, along with the proofs. You can check it out using Tamarin, a theorem prover for security protocols.

Formal analysis, and formal methods in general, give very strong guarantees that rule out entire classes of attack. However, they are not a panacea. TLS 1.3 was subject to a number of rounds of formal analysis, and yet an attack was still found. However, this attack in many ways confirms our faith in formal methods. The attack was found in a blind spot of the proof, showing that attackers have been pushed to the very edges of the protocol. As our formal analyses get more and more rigorous, attackers will have fewer and fewer places to search for attacks. As formal analysis has become more and more practical, more and more groups at the IETF have been asking to see proofs of security before standardising new protocols. This hopefully will mean that future attacks on protocol design will become rarer and rarer.

Once the EA I-D becomes an RFC, then all sorts of cool stuff gets unlocked — for example OPAQUE-EAs, which will allow us to do password-based login on the web without the server ever seeing the password! Watch this space.

Exported Authenticators: The long road to RFC

Introducing SSL/TLS Recommender

Post Syndicated from Suleman Ahmad original https://blog.cloudflare.com/ssl-tls-recommender/

Introducing SSL/TLS Recommender

Introducing SSL/TLS Recommender

Seven years ago, Cloudflare made HTTPS availability for any Internet property easy and free with Universal SSL. At the time, few websites — other than those that processed sensitive data like passwords and credit card information — were using HTTPS because of how difficult it was to set up.

However, as we all started using the Internet for more and more private purposes (communication with loved ones, financial transactions, shopping, healthcare, etc.) the need for encryption became apparent. Tools like Firesheep demonstrated how easily attackers could snoop on people using public Wi-Fi networks at coffee shops and airports. The Snowden revelations showed the ease with which governments could listen in on unencrypted communications at scale. We have seen attempts by browser vendors to increase HTTPS adoption such as the recent announcement by Chromium for loading websites on HTTPS by default. Encryption has become a vital part of the modern Internet, not just to keep your information safe, but to keep you safe.

When it was launched, Universal SSL doubled the number of sites on the Internet using HTTPS. We are building on that with SSL/TLS Recommender, a tool that guides you to stronger configurations for the backend connection from Cloudflare to origin servers. Recommender has been available in the SSL/TLS tab of the Cloudflare dashboard since August 2020 for self-serve customers. Over 500,000 zones are currently signed up. As of today, it is available for all customers!

How Cloudflare connects to origin servers

Cloudflare operates as a reverse proxy between clients (“visitors”) and customers’ web servers (“origins”), so that Cloudflare can protect origin sites from attacks and improve site performance. This happens, in part, because visitor requests to websites proxied by Cloudflare are processed by an “edge” server located in a data center close to the client. The edge server either responds directly back to the visitor, if the requested content is cached, or creates a new request to the origin server to retrieve the content.

Introducing SSL/TLS Recommender

The backend connection to the origin can be made with an unencrypted HTTP connection or with an HTTPS connection where requests and responses are encrypted using the TLS protocol (historically known as SSL). HTTPS is the secured form of HTTP and should be used whenever possible to avoid leaking information or allowing content tampering by third-party entities. The origin server can further authenticate itself by presenting a valid TLS certificate to prevent active monster-in-the-middle attacks. Such a certificate can be obtained from a certificate authority such as Let’s Encrypt or Cloudflare’s Origin CA. Origins can also set up authenticated origin pull, which ensures that any HTTPS requests outside of Cloudflare will not receive a response from your origin.

Cloudflare Tunnel provides an even more secure option for the connection between Cloudflare and origins. With Tunnel, users run a lightweight daemon on their origin servers that proactively establishes secure and private tunnels to the nearest Cloudflare data centers. With this configuration, users can completely lock down their origin servers to only receive requests routed through Cloudflare. While we encourage customers to set up tunnels if feasible, it’s important to encourage origins with more traditional configurations to adopt the strongest possible security posture.

Detecting HTTPS support

You might wonder, why doesn’t Cloudflare always connect to origin servers with a secure TLS connection? To start, some origin servers have no TLS support at all (for example, certain shared hosting providers and even government sites have been slow adopters) and rely on Cloudflare to ensure that the client request is at least encrypted over the Internet from the browser to Cloudflare’s edge.

Then why don’t we simply probe the origin to determine if TLS is supported? It turns out that many sites only partially support HTTPS, making the problem non-trivial. A single customer site can be served from multiple separate origin servers with differing levels of TLS support. For instance, some sites support HTTPS on their landing page but serve certain resources only over unencrypted HTTP. Further, site content can differ when accessed over HTTP versus HTTPS (for example, http://example.com and https://example.com can return different results).

Such content differences can arise due to misconfiguration on the origin server, accidental mistakes by developers when migrating their servers to HTTPS, or can even be intentional depending on the use case.

A study by researchers at Northeastern University, the Max Planck Institute for Informatics, and the University of Maryland highlights reasons for some of these inconsistencies. They found that 1.5% of surveyed sites had at least one page that was unavailable over HTTPS — despite the protocol being supported on other pages — and 3.7% of sites served different content over HTTP versus HTTPS for at least one page. Thus, always using the most secure TLS setting detected on a particular resource could result in unforeseen side effects and usability issues for the entire site.

We wanted to tackle all such issues and maximize the number of TLS connections to origin servers, but without compromising a website’s functionality and performance.

Introducing SSL/TLS Recommender
Content differences on sites when loaded over HTTPS vs HTTP; images taken from https://www.cs.umd.edu/~dml/papers/https_tma20.pdf with author permission

Configuring the SSL/TLS encryption mode

Cloudflare relies on customers to indicate the level of TLS support at their origins via the zone’s SSL/TLS encryption mode. The following SSL/TLS encryption modes can be configured from the Cloudflare dashboard:

  • Off indicates that client requests reaching Cloudflare as well as Cloudflare’s requests to the origin server should only use unencrypted HTTP. This option is never recommended, but is still in use by a handful of customers for legacy reasons or testing.
  • Flexible allows clients to connect to Cloudflare’s edge via HTTPS, but requests to the origin are over HTTP only. This is the most common option for origins that do not support TLS. However, we encourage customers to upgrade their origins to support TLS whenever possible and only use Flexible as a last resort.
  • Full enables encryption for requests to the origin when clients connect via HTTPS, but Cloudflare does not attempt to validate the certificate. This is useful for origins that have a self-signed or otherwise invalid certificate at the origin, but leaves open the possibility for an active attacker to impersonate the origin server with a fake certificate. Client HTTP requests result in HTTP requests to the origin.
  • Full (strict) indicates that Cloudflare should validate the origin certificate to fully secure the connection. The origin certificate can either be issued by a public CA or by Cloudflare Origin CA. HTTP requests from clients result in HTTP requests to the origin, exactly the same as in Full mode. We strongly recommend Full (strict) over weaker options if supported by the origin.
  • Strict (SSL-Only Origin Pull) causes all traffic to the origin to go over HTTPS, even if the client request was HTTP. This differs from Full (strict) in that HTTP client requests will result in an HTTPS request to the origin, not HTTP. Most customers do not need to use this option, and it is available only to Enterprise customers. The preferred way to ensure that no HTTP requests reach your origin is to enable Always Use HTTPS in conjunction with Full or Full (strict) to redirect visitor HTTP requests to the HTTPS version of the content.
Introducing SSL/TLS Recommender
SSL/TLS encryption modes determine how Cloudflare connects to origins

The SSL/TLS encryption mode is a zone-wide setting, meaning that Cloudflare applies the same policy to all subdomains and resources. If required, you can configure this setting more granularly via Page Rules. Misconfiguring this setting can make site resources unavailable. For instance, suppose your website loads certain assets from an HTTP-only subdomain. If you set your zone to Full or Full (strict), you might make these assets unavailable for visitors that request the content over HTTPS, since the HTTP-only subdomain lacks HTTPS support.

Importance of secure origin connections

When an end-user visits a site proxied by Cloudflare, there are two connections to consider: the front-end connection between the visitor and Cloudflare and the back-end connection between Cloudflare and the customer origin server. The front-end connection typically presents the largest attack surface (for example, think of the classic example of an attacker snooping on a coffee shop’s Wi-Fi network), but securing the back-end connection is equally important. While all SSL/TLS encryption modes (except Off) secure the front-end connection, less secure modes leave open the possibility of malicious activity on the backend.

Consider a zone set to Flexible where the origin is connected to the Internet via an untrustworthy ISP. In this case, spyware deployed by the customer’s ISP in an on-path middlebox could inspect the plaintext traffic from Cloudflare to the origin server, potentially resulting in privacy violations or leaks of confidential information. Upgrading the zone to Full or a stronger mode to encrypt traffic to the ISP would help prevent this basic form of snooping.

Similarly, consider a zone set to Full where the origin server is hosted in a shared hosting provider facility. An attacker colocated in the same facility could generate a fake certificate for the origin (since the certificate isn’t validated for Full) and deploy an attack technique such as ARP spoofing to direct traffic intended for the origin server to an attacker-owned machine instead. The attacker could then leverage this setup to inspect and filter traffic intended for the origin, resulting in site breakage or content unavailability. The attacker could even inject malicious JavaScript into the response served to the visitor to carry out other nefarious goals. Deploying a valid Cloudflare-trusted certificate on the origin and configuring the zone to use Full (strict) would prevent Cloudflare from trusting the attacker’s fake certificate in this scenario, preventing the hijack.

Since a secure backend only improves your website security, we strongly encourage setting your zone to the highest possible SSL/TLS encryption mode whenever possible.

Balancing functionality and security

When Universal SSL was launched, Cloudflare’s goal was to get as many sites away from the status quo of HTTP as possible. To accomplish this, Cloudflare provisioned TLS certificates for all customer domains to secure the connection between the browser and the edge. Customer sites that did not already have TLS support were defaulted to Flexible, to preserve existing site functionality. Although Flexible is not recommended for most zones, we continue to support this option as some Cloudflare customers still rely on it for origins that do not yet support TLS. Disabling this option would make these sites unavailable. Currently, the default option for newly onboarded zones is Full if we detect a TLS certificate on the origin zone, and Flexible otherwise.

Further, the SSL/TLS encryption mode configured at the time of zone sign-up can become suboptimal as a site evolves. For example, a zone might switch to a hosting provider that supports origin certificate installation. An origin server that is able to serve all content over TLS should at least be on Full. An origin server that has a valid TLS certificate installed should use Full (strict) to ensure that communication between Cloudflare and the origin server is not susceptible to monster-in-the-middle attacks.

The Research team combined lessons from academia and our engineering efforts to make encryption easy, while ensuring the highest level of security possible for our customers. Because of that goal, we’re proud to introduce SSL/TLS Recommender.

SSL/TLS Recommender

Cloudflare’s mission is to help build a better Internet, and that includes ensuring that requests from visitors to our customers’ sites are as secure as possible. To that end, we began by asking ourselves the following question: how can we detect when a customer is able to use a more secure SSL/TLS encryption mode without impacting site functionality?

To answer this question, we built the SSL/TLS Recommender. Customers can enable Recommender for a zone via the SSL/TLS tab of the Cloudflare dashboard. Using a zone’s currently configured SSL/TLS option as the baseline for expected site functionality, the Recommender performs a series of checks to determine if an upgrade is possible. If so, we email the zone owner with the recommendation. If a zone is currently misconfigured — for example, an HTTP-only origin configured on Full — Recommender will not recommend a downgrade.

Introducing SSL/TLS Recommender

The checks that Recommender runs are determined by the site’s currently configured SSL/TLS option.

The simplest check is to determine if a customer can upgrade from Full to Full (strict). In this case, all site resources are already served over HTTPS, so the check comprises a few simple tests of the validity of the TLS certificate for the domain and all subdomains (which can be on separate origin servers).

The check to determine if a customer can upgrade from Off or Flexible to Full is more complex. A site can be upgraded if all resources on the site are available over HTTPS and the content matches when served over HTTP versus HTTPS. Recommender carries out this check as follows:

  • Crawl customer sites to collect links. For large sites where it is impractical to scan every link, Recommender tests only a subset of links (up to some threshold), leading to a trade-off between performance and potential false positives. Similarly, for sites where the crawl turns up an insufficient number of links, we augment our results with a sample of links from recent visitors requests to the zone to provide a high-confidence recommendation. The crawler uses the user agent Cloudflare-SSLDetector and has been added to Cloudflare’s list of known good bots. Similar to other Cloudflare crawlers, Recommender ignores robots.txt (except for rules explicitly targeting the crawler’s user agent) to avoid negatively impacting the accuracy of the recommendation.
  • Download the content of each link over both HTTP and HTTPS. Recommender makes only idempotent GET requests when scanning origin servers to avoid modifying server resource state.
  • Run a content similarity algorithm to determine if the content matches. The algorithm is adapted from a research paper called “A Deeper Look at Web Content Availability and Consistency over HTTP/S” (TMA Conference 2020) and is designed to provide an accurate similarity score even for sites with dynamic content.

Recommender is conservative with recommendations, erring on the side of maintaining current site functionality rather than risking breakage and usability issues. If a zone is non-functional, the zone owner blocks all types of bots, or if misconfigured SSL-specific Page Rules are applied to the zone, then Recommender will not be able to complete its scans and provide a recommendation. Therefore, it is not intended to resolve issues with website or domain functionality, but rather maximize your zone’s security when possible.

Please send questions and feedback to [email protected]. We’re excited to continue this line of work to improve the security of customer origins!

Mentions

While this work is led by the Research team, we have been extremely privileged to get support from all across the company!

Special thanks to the incredible team of interns that contributed to SSL/TLS Recommender. Suleman Ahmad (now full-time), Talha Paracha, and Ananya Ghose built the current iteration of the project and Matthew Bernhard helped to lay the groundwork in a previous iteration of the project.

Handshake Encryption: Endgame (an ECH update)

Post Syndicated from Christopher Wood original https://blog.cloudflare.com/handshake-encryption-endgame-an-ech-update/

Handshake Encryption: Endgame (an ECH update)

Handshake Encryption: Endgame (an ECH update)

Privacy and security are fundamental to Cloudflare, and we believe in and champion the use of cryptography to help provide these fundamentals for customers, end-users, and the Internet at large. In the past, we helped specify, implement, and ship TLS 1.3, the latest version of the transport security protocol underlying the web, to all of our users. TLS 1.3 vastly improved upon prior versions of the protocol with respect to security, privacy, and performance: simpler cryptographic algorithms, more handshake encryption, and fewer round trips are just a few of the many great features of this protocol.

TLS 1.3 was a tremendous improvement over TLS 1.2, but there is still room for improvement. Sensitive metadata relating to application or user intent is still visible in plaintext on the wire. In particular, all client parameters, including the name of the target server the client is connecting to, are visible in plaintext. For obvious reasons, this is problematic from a privacy perspective: Even if your application traffic to crypto.cloudflare.com is encrypted, the fact you’re visiting crypto.cloudflare.com can be quite revealing.

And so, in collaboration with other participants in the standardization community and members of industry, we embarked towards a solution for encrypting all sensitive TLS metadata in transit. The result: TLS Encrypted ClientHello (ECH), an extension to protect this sensitive metadata during connection establishment.

Last year, we described the current status of this standard and its relation to the TLS 1.3 standardization effort, as well as ECH’s predecessor, Encrypted SNI (ESNI). The protocol has come a long way since then, but when will we know when it’s ready? There are many ways by which one can measure a protocol. Is it implementable? Is it easy to enable? Does it seamlessly integrate with existing protocols or applications? In order to assess these questions and see if the Internet is ready for ECH, the community needs deployment experience. Hence, for the past year, we’ve been focused on making the protocol stable, interoperable, and, ultimately, deployable. And today, we’re pleased to announce that we’ve begun our initial deployment of TLS ECH.

What does ECH mean for connection security and privacy on the network? How does it relate to similar technologies and concepts such as domain fronting? In this post, we’ll dig into ECH details and describe what this protocol does to move the needle to help build a better Internet.

Connection privacy

For most Internet users, connections are made to perform some type of task, such as loading a web page, sending a message to a friend, purchasing some items online, or accessing bank account information. Each of these connections reveals some limited information about user behavior. For example, a connection to a messaging platform reveals that one might be trying to send or receive a message. Similarly, a connection to a bank or financial institution reveals when the user typically makes financial transactions. Individually, this metadata might seem harmless. But consider what happens when it accumulates: does the set of websites you visit on a regular basis uniquely identify you as a user? The safe answer is: yes.

This type of metadata is privacy-sensitive, and ultimately something that should only be known by two entities: the user who initiates the connection, and the service which accepts the connection. However, the reality today is that this metadata is known to more than those two entities.

Making this information private is no easy feat. The nature or intent of a connection, i.e., the name of the service such as crypto.cloudflare.com, is revealed in multiple places during the course of connection establishment: during DNS resolution, wherein clients map service names to IP addresses; and during connection establishment, wherein clients indicate the service name to the target server. (Note: there are other small leaks, though DNS and TLS are the primary problems on the Internet today.)

As is common in recent years, the solution to this problem is encryption. DNS-over-HTTPS (DoH) is a protocol for encrypting DNS queries and responses to hide this information from onpath observers. Encrypted Client Hello (ECH) is the complementary protocol for TLS.

The TLS handshake begins when the client sends a ClientHello message to the server over a TCP connection (or, in the context of QUIC, over UDP) with relevant parameters, including those that are sensitive. The server responds with a ServerHello, encrypted parameters, and all that’s needed to finish the handshake.

Handshake Encryption: Endgame (an ECH update)

The goal of ECH is as simple as its name suggests: to encrypt the ClientHello so that privacy-sensitive parameters, such as the service name, are unintelligible to anyone listening on the network. The client encrypts this message using a public key it learns by making a DNS query for a special record known as the HTTPS resource record. This record advertises the server’s various TLS and HTTPS capabilities, including ECH support. The server decrypts the encrypted ClientHello using the corresponding secret key.

Conceptually, DoH and ECH are somewhat similar. With DoH, clients establish an encrypted connection (HTTPS) to a DNS recursive resolver such as 1.1.1.1 and, within that connection, perform DNS transactions.

Handshake Encryption: Endgame (an ECH update)

With ECH, clients establish an encrypted connection to a TLS-terminating server such as crypto.cloudflare.com, and within that connection, request resources for an authorized domain such as cloudflareresearch.com.

Handshake Encryption: Endgame (an ECH update)

There is one very important difference between DoH and ECH that is worth highlighting. Whereas a DoH recursive resolver is specifically designed to allow queries for any domain, a TLS server is configured to allow connections for a select set of authorized domains. Typically, the set of authorized domains for a TLS server are those which appear on its certificate, as these constitute the set of names for which the server is authorized to terminate a connection.

Basically, this means the DNS resolver is open, whereas the ECH client-facing server is closed. And this closed set of authorized domains is informally referred to as the anonymity set. (This will become important later on in this post.) Moreover, the anonymity set is assumed to be public information. Anyone can query DNS to discover what domains map to the same client-facing server.

Why is this distinction important? It means that one cannot use ECH for the purposes of connecting to an authorized domain and then interacting with a different domain, a practice commonly referred to as domain fronting. When a client connects to a server using an authorized domain but then tries to interact with a different domain within that connection, e.g., by sending HTTP requests for an origin that does not match the domain of the connection, the request will fail.

From a high level, encrypting names in DNS and TLS may seem like a simple feat. However, as we’ll show, ECH demands a different look at security and an updated threat model.

A changing threat model and design confidence

The typical threat model for TLS is known as the Dolev-Yao model, in which an active network attacker can read, write, and delete packets from the network. This attacker’s goal is to derive the shared session key. There has been a tremendous amount of research analyzing the security of TLS to gain confidence that the protocol achieves this goal.

The threat model for ECH is somewhat stronger than considered in previous work. Not only should it be hard to derive the session key, it should also be hard for the attacker to determine the identity of the server from a known anonymity set. That is, ideally, it should have no more advantage in identifying the server than if it simply guessed from the set of servers in the anonymity set. And recall that the attacker is free to read, write, and modify any packet as part of the TLS connection. This means, for example, that an attacker can replay a ClientHello and observe the server’s response. It can also extract pieces of the ClientHello — including the ECH extension — and use them in its own modified ClientHello.

Handshake Encryption: Endgame (an ECH update)

The design of ECH ensures that this sort of attack is virtually impossible by ensuring the server certificate can only be decrypted by either the client or client-facing server.

Something else an attacker might try is masquerade as the server and actively interfere with the client to observe its behavior. If the client reacted differently based on whether the server-provided certificate was correct, this would allow the attacker to test whether a given connection using ECH was for a particular name.

Handshake Encryption: Endgame (an ECH update)

ECH also defends against this attack by ensuring that an attacker without access to the private ECH key material cannot actively inject anything into the connection.

The attacker can also be entirely passive and try to infer encrypted information from other visible metadata, such as packet sizes and timing. (Indeed, traffic analysis is an open problem for ECH and in general for TLS and related protocols.) Passive attackers simply sit and listen to TLS connections, and use what they see and, importantly, what they know to make determinations about the connection contents. For example, if a passive attacker knows that the name of the client-facing server is crypto.cloudflare.com, and it sees a ClientHello with ECH to crypto.cloudflare.com, it can conclude, with reasonable certainty, that the connection is to some domain in the anonymity set of crypto.cloudflare.com.

The number of potential attack vectors is astonishing, and something that the TLS working group has tripped over in prior iterations of the ECH design. Before any sort of real world deployment and experiment, we needed confidence in the design of this protocol. To that end, we are working closely with external researchers on a formal analysis of the ECH design which captures the following security goals:

  1. Use of ECH does not weaken the security properties of TLS without ECH.
  2. TLS connection establishment to a host in the client-facing server’s anonymity set is indistinguishable from a connection to any other host in that anonymity set.

We’ll write more about the model and analysis when they’re ready. Stay tuned!

There are plenty of other subtle security properties we desire for ECH, and some of these drill right into the most important question for a privacy-enhancing technology: Is this deployable?

Focusing on deployability

With confidence in the security and privacy properties of the protocol, we then turned our attention towards deployability. In the past, significant protocol changes to fundamental Internet protocols such as TCP or TLS have been complicated by some form of benign interference. Network software, like any software, is prone to bugs, and sometimes these bugs manifest in ways that we only detect when there’s a change elsewhere in the protocol. For example, TLS 1.3 unveiled middlebox ossification bugs that ultimately led to the middlebox compatibility mode for TLS 1.3.

While itself just an extension, the risk of ECH exposing (or introducing!) similar bugs is real. To combat this problem, ECH supports a variant of GREASE whose goal is to ensure that all ECH-capable clients produce syntactically equivalent ClientHello messages. In particular, if a client supports ECH but does not have the corresponding ECH configuration, it uses GREASE. Otherwise, it produces a ClientHello with real ECH support. In both cases, the syntax of the ClientHello messages is equivalent.

This hopefully avoids network bugs that would otherwise trigger upon real or fake ECH. Or, in other words, it helps ensure that all ECH-capable client connections are treated similarly in the presence of benign network bugs or otherwise passive attackers. Interestingly, active attackers can easily distinguish — with some probability — between real or fake ECH. Using GREASE, the ClientHello carries an ECH extension, though its contents are effectively randomized, whereas a real ClientHello using ECH has information that will match what is contained in DNS. This means an active attacker can simply compare the ClientHello against what’s in the DNS. Indeed, anyone can query DNS and use it to determine if a ClientHello is real or fake:

$ dig +short crypto.cloudflare.com TYPE65
\# 134 0001000001000302683200040008A29F874FA29F884F000500480046 FE0D0042D500200020E3541EC94A36DCBF823454BA591D815C240815 77FD00CAC9DC16C884DF80565F0004000100010013636C6F7564666C 6172652D65736E692E636F6D00000006002026064700000700000000 0000A29F874F260647000007000000000000A29F884F

Despite this obvious distinguisher, the end result isn’t that interesting. If a server is capable of ECH and a client is capable of ECH, then the connection most likely used ECH, and whether clients and servers are capable of ECH is assumed public information. Thus, GREASE is primarily intended to ease deployment against benign network bugs and otherwise passive attackers.

Note, importantly, that GREASE (or fake) ECH ClientHello messages are semantically different from real ECH ClientHello messages. This presents a real problem for networks such as enterprise settings or school environments that otherwise use plaintext TLS information for the purposes of implementing various features like filtering or parental controls. (Encrypted DNS protocols like DoH also encountered similar obstacles in their deployment.) Fundamentally, this problem reduces to the following: How can networks securely disable features like DoH and ECH? Fortunately, there are a number of approaches that might work, with the more promising one centered around DNS discovery. In particular, if clients could securely discover encrypted recursive resolvers that can perform filtering in lieu of it being done at the TLS layer, then TLS-layer filtering might be wholly unnecessary. (Other approaches, such as the use of canary domains to give networks an opportunity to signal that certain features are not permitted, may work, though it’s not clear if these could or would be abused to disable ECH.)

We are eager to collaborate with browser vendors, network operators, and other stakeholders to find a feasible deployment model that works well for users without ultimately stifling connection privacy for everyone else.

Next steps

ECH is rolling out for some FREE zones on our network in select geographic regions. We will continue to expand the set of zones and regions that support ECH slowly, monitoring for failures in the process. Ultimately, the goal is to work with the rest of the TLS working group and IETF towards updating the specification based on this experiment in hopes of making it safe, secure, usable, and, ultimately, deployable for the Internet.

ECH is one part of the connection privacy story. Like a leaky boat, it’s important to look for and plug all the gaps before taking on lots of passengers! Cloudflare Research is committed to these narrow technical problems and their long-term solutions. Stay tuned for more updates on this and related protocols.

Staging TLS Certificate: Make every deployment a safe deployment

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/staging-tls-certificate-every-deployment-safe-deployment/

Staging TLS Certificate: Make every deployment a safe deployment

Staging TLS Certificate: Make every deployment a safe deployment

We are excited to announce that Enterprise customers now have the ability to test custom uploaded certificates in a staging environment before pushing them to production.

With great power comes great responsibility

If you’re running a website or the API that’s behind a popular app, you know your users have high expectations: it can’t just be up and running; it also has to be fast and secure. One of the easiest and most standardized ways to secure connections is with the TLS protocol. To do that, you need to acquire a TLS certificate for your domain.

One way to get a certificate is by using a CDN provider, like Cloudflare. We make the process really easy by issuing certificates on your behalf. Not just that, but when your certificate is getting closer to its expiration date, we are responsible for re-issuing it. But, if you don’t want Cloudflare to issue the certificate on your behalf and want to obtain the certificate yourself, you can do so. You can either keep control of your private key, or generate a Certificate Signing Request (CSR) through Cloudflare, so we maintain the private key, but you can still use the certificate authority (CA) of your choice for the certificate. Once you get your certificate, you upload it to Cloudflare and voilà — your site is secure.

One of the downsides of obtaining your own certificate is that you’re in charge of its renewal, which comes with a great deal of responsibility. You have to ensure that any renewed certificate  will not cause TLS errors, causing your property to be unreachable by your users.

Now, you’re probably wondering, why is this so risky? What could go wrong?

It’s possible that when you renew a certificate, you forget to include one of your subdomains in the certificate. Another reason why your clients might see failures when you upload the new certificate is that some of those clients are pinning the old certificate.

The problem with pinning certificates

Certificate pinning allows Internet application owners to say “trust this certificate and no other” by pinning the certificate or public key to their service, or even to client devices. While this was originally intended to tighten security, certificate pinning has created many problems. We have seen customers shoot themselves in the foot with certificate pinning numerous times. There are a lot of things that could go wrong: you could pin the wrong certificate, or configure the wrong settings, which will bring your service down. It’s possible that you rely on a CDN for your certificate renewal and forgot to update your pin — again, blocking access to your service in the process. Alternatively, you can pin certificates to clients so that they refuse connections from servers that don’t have the same certificate. But when it comes to renewing that certificate, you have to make sure that your clients have the updated certificate, or they won’t be able to access your server. Instead, you can use tools like Cloudflare’s Certificate Transparency Monitoring which notifies you when certificates are being issued for your domain.

But, some of our customers do rely on this behavior. So for them, switching to a new certificate is a big change — one that can bring down their whole application. For this reason, they need to be able to test the new certificate to identify any problems that may occur before their customers see it.

A staging environment for certificate deployments

When it comes to your traffic, there are things you can’t predict. You can’t control if your customer’s browser is having a bad day, or if some ISP is having a network outage. When it comes to certificate deployment, the last thing you want is to be surprised. Imagine pushing out a new certificate, and realizing you’ve uploaded the wrong one only after customers complain they’ve lost access; or that you never updated your pin to your new certificate to begin with. Mistakes happen, but this type of outage can cause a lot of pain and lose you a lot of money and trust. The good news is that this is preventable.

We are giving customers the ability to find these problems before they’re encountered by real users. We’re doing this by giving customers the ability to test certificate deployment changes against a pair of staging IPs.

Imagine uploading a new certificate, finding any related problems, and fixing them in a way that doesn’t impact production traffic. That’s exactly what Staging Certificates allows you to do!

You can go to the Staging Certificates section under the SSL/TLS tab in the Cloudflare dashboard and upload a new certificate to our staging network. The staging network will replicate your production environment but will only be accessible through the Staging IPs.

Staging TLS Certificate: Make every deployment a safe deployment

Once you upload your certificate, you can make `curl` requests against the staging IPs to verify that it’s being served and that it covers all the hostnames that you need it to. If you’ve decided to pin your previous certificate, the staging environment helps you verify that your pins have been updated correctly and that TLS termination is successful.

Once you’re confident in your tests, you can push the certificate out to production.

Oh no, something went wrong

Mistakes happen. It’s possible that you found one client device whose pin you forgot to update. In that case, you can disable the certificate to quickly mitigate the problem by rolling back to the certificate that you were last serving.

Staging TLS Certificate: Make every deployment a safe deployment

What if you realize that this is a bigger issue, and you need more time to test? You can push the disabled certificate back to the staging environment, and when you’ve verified everything works correctly, push it back to production.

Staging changes: We’re just getting started

We want to de-risk every change you make. Staging your custom uploaded certificates is a start, but it doesn’t end there. In the future, we’ll allow you to stage certificate renewals for certificates issued through Cloudflare. We’re also planning to give customers the ability to test TLS configuration changes like minimum TLS or cipher suite settings.

If you’re interested in staying aware of these future developments, or you have some settings that you want to test in staging, reach out to your Account team and let them know!

Ready to test your next certificate deployment?

This feature is currently in Beta and available to Enterprise customers. If you want to use it, reach out to your Account team to get it enabled.

How to use ACM Private CA for enabling mTLS in AWS App Mesh

Post Syndicated from Raj Jain original https://aws.amazon.com/blogs/security/how-to-use-acm-private-ca-for-enabling-mtls-in-aws-app-mesh/

Securing east-west traffic in service meshes, such as AWS App Mesh, by using mutual Transport Layer Security (mTLS) adds an additional layer of defense beyond perimeter control. mTLS adds bidirectional peer-to-peer authentication on top of the one-way authentication in normal TLS. This is done by adding a client-side certificate during the TLS handshake, through which a client proves possession of the corresponding private key to the server, and as a result the server trusts the client. This prevents an arbitrary client from connecting to an App Mesh service, because the client wouldn’t possess a valid certificate.

In this blog post, you’ll learn how to enable mTLS in App Mesh by using certificates derived from AWS Certificate Manager Private Certificate Authority (ACM Private CA). You’ll also learn how to reuse AWS CloudFormation templates, which we make available through a companion open-source project, for configuring App Mesh and ACM Private CA.

You’ll first see how to derive server-side certificates from ACM Private CA into App Mesh internally by using the native integration between the two services. You’ll then see a method and code for installing client-side certificates issued from ACM Private CA into App Mesh; this method is needed because client-side certificates aren’t integrated natively.

You’ll learn how to use AWS Lambda to export a client-side certificate from ACM Private CA and store it in AWS Secrets Manager. You’ll then see Envoy proxies in App Mesh retrieve the certificate from Secrets Manager and use it in an mTLS handshake. The solution is designed to ensure confidentiality of the private key of a client-side certificate, in transit and at rest, as it moves from ACM to Envoy.

The solution described in this blog post simplifies and allows you to automate the configuration and operations of mTLS-enabled App Mesh deployments, because all of the certificates are derived from a single managed private public key infrastructure (PKI) service—ACM Private CA—eliminating the need to run your own private PKI. The solution uses Amazon Elastic Container Services (Amazon ECS) with AWS Fargate as the App Mesh hosting environment, although the design presented here can be applied to any compute environment that is supported by App Mesh.

Solution overview

ACM Private CA provides a highly available managed private PKI service that enables creation of private CA hierarchies—including root and subordinate CAs—without the investment and maintenance costs of operating your own private PKI service. The service allows you to choose among several CA key algorithms and key sizes and makes it easier for you to export and deploy private certificates anywhere by using API-based automation.

App Mesh is a service mesh that provides application-level networking across multiple types of compute infrastructure. It standardizes how your microservices communicate, giving you end-to-end visibility and helping to ensure transport security and high availability for your applications. In order to communicate securely between mesh endpoints, App Mesh directs the Envoy proxy instances that are running within the mesh to use one-way or mutual TLS.

TLS provides authentication, privacy, and data integrity between two communicating endpoints. The authentication in TLS communications is governed by the PKI system. The PKI system allows certificate authorities to issue certificates that are used by clients and servers to prove their identity. The authentication process in TLS happens by exchanging certificates via the TLS handshake protocol. By default, the TLS handshake protocol proves the identity of the server to the client by using X.509 certificates, while the authentication of the client to the server is left to the application layer. This is called one-way TLS. TLS also supports two-way authentication through mTLS. In mTLS, in addition to the one-way TLS server authentication with a certificate, a client presents its certificate and proves possession of the corresponding private key to a server during the TLS handshake.

Example application

The following sections describe one-way and mutual TLS integrations between App Mesh and ACM Private CA in the context of an example application. This example application exposes an API to external clients that returns a text string name of a color—for example, “yellow”. It’s an extension of the Color App that’s used to demonstrate several existing App Mesh examples.

The example application is comprised of two services running in App Mesh—ColorGateway and ColorTeller. An external client request enters the mesh through the ColorGateway service and is proxied to the ColorTeller service. The ColorTeller service responds back to the ColorGateway service with the name of a color. The ColorGateway service proxies the response to the external client. Figure 1 shows the basic design of the application.
 

Figure 1: App Mesh services in the Color App example application

Figure 1: App Mesh services in the Color App example application

The two services are mapped onto the following constructs in App Mesh:

  • ColorGateway is mapped as a Virtual gateway. A virtual gateway in App Mesh allows resources that are outside of a mesh to communicate to resources that are inside the mesh. A virtual gateway represents Envoy deployed by itself. In this example, the virtual gateway represents an Envoy proxy that is running as an Amazon ECS service. This Envoy proxy instance acts as a TLS client, since it initiates TLS connections to the Envoy proxy that is running in the ColorTeller service.
  • ColorTeller is mapped as a Virtual node. A virtual node in App Mesh acts as a logical pointer to a particular task group. In this example, the virtual node—ColorTeller—runs as another Amazon ECS service. The service runs two tasks—an Envoy proxy instance and a ColorTeller application instance. The Envoy proxy instance acts as a TLS server, receiving inbound TLS connections from ColorGateway.

Let’s review running the example application in one-way TLS mode. Although optional, starting with one-way TLS allows you to compare the two methods and establish how to look at certain Envoy proxy statistics to distinguish and verify one-way TLS versus mTLS connections.

For practice, you can deploy the example application project in your own AWS account and perform the steps described in your own test environment.

Note: In both the one-way TLS and mTLS descriptions in the following sections, we’re using a flat certificate hierarchy for demonstration purposes. The root CAs are issuing end-entity certificates. The AWS ACM Private CA best practices recommend that root CAs should only be used to issue certificates for intermediate CAs. When intermediate CAs are involved, your certificate chain has multiple certificates concatenated in it, but the mechanisms are the same as those described here.

One-way TLS in App Mesh using ACM Private CA

Because this is a one-way TLS authentication scenario, you need only one Private CA—ColorTeller—and issue one end-entity certificate from it that’s used as the server-side certificate for the ColorTeller virtual node.

Figure 2, following, shows the architecture for this setup, including notations and color codes for certificates and a step-by-step process that shows how the system is configured and functions. Because this architecture uses a server-side certificate only, you use the native integration between App Mesh and ACM Private CA and don’t need an external mechanism for certificate integration.
 

Figure 2: One-way TLS in App Mesh integrated with ACM Private CA

Figure 2: One-way TLS in App Mesh integrated with ACM Private CA

The steps in Figure 2 are:

Step 1: A Private CA instance—ColorTeller—is created in ACM Private CA. Next, an end-entity certificate is created and signed by the CA. This certificate is used as the server-side certificate in ColorTeller.

Step 2: The CloudFormation templates configure the ColorGateway to validate server certificates against the ColorTeller private CA certificate chain. As the App Mesh endpoints are starting up, the ColorTeller CA certificate trust chain is ingested into the ColorGateway Envoy instance. The TLS configuration for ColorGateway in App Mesh is shown in Figure 3.
 

Figure 3: One-way TLS configuration in the client policy of ColorGateway

Figure 3: One-way TLS configuration in the client policy of ColorGateway

Figure 3 shows that the client policy attributes for outbound transport connections for ColorGateway have been configured as follows:

  • Enforce TLS is set to Enforced. This enforces use of TLS while communicating with backends.
  • TLS validation method is set to AWS Certificate Manager Private Certificate Authority (ACM-PCA hosting). This instructs App Mesh to derive the certificate trust chain from ACM PCA.
  • Certificate is set to the Amazon Resource Name (ARN) of the ColorTeller Private CA, which is the identifier of the certificate trust chain in ACM PCA.

This configuration ensures that ColorGateway makes outbound TLS-only connections towards ColorTeller, extracts the CA trust chain from ACM-PCA, and validates the server certificate presented by the ColorTeller virtual node against the configured CA ARN.

Step 3: The CloudFormation templates configure the ColorTeller virtual node with the ColorTeller end-entity certificate ARN in ACM Private CA. While the App Mesh endpoints are started, the ColorTeller end-entity certificate is ingested into the ColorTeller Envoy instance.

The TLS configuration for the ColorTeller virtual node in App Mesh is shown in Figure 4.
 

Figure 4: One-way TLS configuration in the listener configuration of ColorTeller

Figure 4: One-way TLS configuration in the listener configuration of ColorTeller

Figure 4 shows that various TLS-related attributes are configured as follows:

  • Enable TLS termination is on.
  • Mode is set to Strict to limit connections to TLS only.
  • TLS Certificate method is set to ACM Certificate Manager (ACM) hosting as the source of the end-entity certificate.
  • Certificate is set to ARN of the ColorTeller end-entity certificate.

Note: Figure 4 shows an annotation where the certificate ARN has been superimposed by the cert icon in green color. This icon follows the color convention from Figure 2 and can help you relate how the individual resources are configured to construct the architecture shown in Figure 2. The cert shown (and the associated private key that is not shown) in the diagram is necessary for ColorTeller to run the TLS stack and serve the certificate. The exchange of this material happens over the internal communications between App Mesh and ACM Private CA.

Step 4: The ColorGateway service receives a request from an external client.

Step 5: This step includes multiple sub-steps:

  • The ColorGateway Envoy initiates a one-way TLS handshake towards the ColorTeller Envoy.
  • The ColorTeller Envoy presents its server-side certificate to the ColorGateway Envoy.
  • The ColorGateway Envoy validates the certificate against its configured CA trust chain—the ColorTeller CA trust chain—and the TLS handshake succeeds.

Verifying one-way TLS

To verify that a TLS connection was established and that it is one-way TLS authenticated, run the following command on your bastion host:

$ curl -s http://colorteller.mtls-ec2.svc.cluster.local:9901/stats |grep -E 'ssl.handshake|ssl.no_certificate'

listener.0.0.0.0_15000.ssl.handshake: 1
listener.0.0.0.0_15000.ssl.no_certificate: 1

This command queries the runtime statistics that are maintained in ColorTeller Envoy and filters the output for certain SSL-related counts. The count for ssl.handshake should be one. If the ssl.handshake count is more than one, that means there’s been more than one TLS handshake. The count for ssl.no_certificate should also be one, or equal to the count for ssl.handshake. The ssl.no_certificate count tracks the total successful TLS connections with no client certificate. Since this is a one-way TLS handshake that doesn’t involve client certificates, this count is the same as the count of ssl.handshake.

The preceding statistics verify that a TLS handshake was completed and the authentication was one-way, where the ColorGateway authenticated the ColorTeller but not vice-versa. You’ll see in the next section how the ssl.no_certificate count differs when mTLS is enabled.

Mutual TLS in App Mesh using ACM Private CA

In the one-way TLS discussion in the previous section, you saw that App Mesh and ACM Private CA integration works without needing external enhancements. You also saw that App Mesh retrieved the server-side end-entity certificate in ColorTeller and the root CA trust chain in ColorGateway from ACM Private CA internally, by using the native integration between the two services.

However, a native integration between App Mesh and ACM Private CA isn’t currently available for client-side certificates. Client-side certificates are necessary for mTLS. In this section, you’ll see how you can issue and export client-side certificates from ACM Private CA and ingest them into App Mesh.

The solution uses Lambda to export the client-side certificate from ACM Private CA and store it in Secrets Manager. The solution includes an enhanced startup script embedded in the Envoy image to retrieve the certificate from Secrets Manager and place it on the Envoy file system before the Envoy process is started. The Envoy process reads the certificate, loads it in memory, and uses it in the TLS stack for the client-side certificate exchange of the mTLS handshake.

The choice of Lambda is based on this being an ephemeral workflow that needs to run only during system configuration. You need a short-lived, runtime compute context that lets you run the logic for exporting certificates from ACM Private CA and store them in Secrets Manager. Because this compute doesn’t need to run beyond this step, Lambda is an ideal choice for hosting this logic, for cost and operational effectiveness.

The choice of Secrets Manager for storing certificates is based on the confidentiality requirements of the passphrase that is used for encrypting the private key (PKCS #8) of the certificate. You also need a higher throughput data store that can support secrets retrieval from large meshes. Secrets Manager supports a higher API rate limit than the API for exporting certificates from ACM Private CA, and thus serves as a high-throughput front end for ACM Private CA for serving certificates without compromising data confidentiality.

The resulting architecture is shown in Figure 5. The figure includes notations and color codes for certificates—such as root certificates, endpoint certificates, and private keys—and a step-by-step process showing how the system is configured, started, and functions at runtime. The example uses two CA hierarchies for ColorGateway and ColorTeller to demonstrate an mTLS setup where the client and server belong to separate CA hierarchies but trust each other’s CAs.
 

Figure 5: mTLS in App Mesh integrated with ACM Private CA

Figure 5: mTLS in App Mesh integrated with ACM Private CA

The numbered steps in Figure 5 are:

Step 1: A Private CA instance representing the ColorGateway trust hierarchy is created in ACM Private CA. Next, an end-entity certificate is created and signed by the CA, which is used as the client-side certificate in ColorGateway.

Step 2: Another Private CA instance representing the ColorTeller trust hierarchy is created in ACM Private CA. Next, an end-entity certificate is created and signed by the CA, which is used as the server-side certificate in ColorTeller.

Step 3: As part of running CloudFormation, the Lambda function is invoked. This Lambda function is responsible for exporting the client-side certificate from ACM Private CA and storing it in Secrets Manager. This function begins by requesting a random password from Secrets Manager. This random password is used as the passphrase for encrypting the private key inside ACM Private CA before it’s returned to the function. Generating a random password from Secrets Manager allows you to generate a random password with a specified complexity.

Step 4: The Lambda function issues an export certificate request to ACM, requesting the ColorGateway end-entity certificate. The request conveys the private key passphrase retrieved from Secrets Manager in the previous step so that ACM Private CA can use it to encrypt the private key that’s sent in the response.

Step 5: The ACM Private CA responds to the Lambda function. The response carries the following elements of the ColorGateway end-entity certificate.

{
  'Certificate': '..',
  'CertificateChain': '..',
  'PrivateKey': '..'
}   

Step 6: The Lambda function processes the response that is returned from ACM. It extracts individual fields in the JSON-formatted response and stores them in Secrets Manager. The Lambda function stores the following four values in Secrets Manager:

  • The ColorGateway endpoint certificate
  • The ColorGateway certificate trust chain, which contains the ColorGateway Private CA root certificate
  • The encrypted private key for the ColorGateway end-entity certificate
  • The passphrase that was used to encrypt the private key

Step 7: The App Mesh services—ColorGateway and ColorTeller—are started, which then start their Envoy proxy containers. A custom startup script embedded in the Envoy docker image fetches a certificate from Secrets Manager and places it on the Envoy file system.

Note: App Mesh publishes its own custom Envoy proxy Docker container image that ensures it is fully tested and patched with the latest vulnerability and performance patches. You’ll notice in the example source code that a custom Envoy image is built on top of the base image published by App Mesh. In this solution, we add an Envoy startup script and certain utilities such as AWS Command Line Interface (AWS CLI) and jq to help retrieve the certificate from Secrets Manager and place it on the Envoy file system during Envoy startup.

Step 8: The CloudFormation scripts configure the client policy for mTLS in ColorGateway in App Mesh, as shown in Figure 6. The following attributes are configured:

  • Provide client certificate is enabled. This ensures that the client certificate is exchanged as part of the mTLS handshake.
  • Certificate method is set to Local file hosting so that the certificate is read from the local file system.
  • Certificate chain is set to the path for the file that contains the ColorGateway certificate chain.
  • Private key is set to the path for the file that contains the private key for the ColorGateway certificate.
Figure 6: Client-side mTLS configuration in ColorGateway

Figure 6: Client-side mTLS configuration in ColorGateway

At the end of the custom Envoy startup script described in step 7, the core Envoy process in ColorGateway service is started. It retrieves the ColorTeller CA root certificate from ACM Private CA and configures it internally as a trusted CA. This retrieval happens due to native integration between App Mesh and ACM Private CA. This allows ColorGateway Envoy to validate the certificate presented by ColorTeller Envoy during the TLS handshake.

Step 9: The CloudFormation scripts configure the listener configuration for mTLS in ColorTeller in App Mesh, as shown in Figure 7. The following attributes are configured:

  • Require client certificate is enabled, which enforces mTLS.
  • Validation Method is set to Local file hosting, which causes Envoy to read the certificate from the local file system.
  • Certificate chain is set to the path for the file that contains the ColorGateway certificate chain.
Figure 7: Server-side mTLS configuration in ColorTeller

Figure 7: Server-side mTLS configuration in ColorTeller

At the end of the Envoy startup script described in step 7, the core Envoy process in ColorTeller service is started. It retrieves its own server-side end-entity certificate and corresponding private key from ACM Private CA. This retrieval happens internally, driven by the native integration between App Mesh and ACM Private CA. This allows ColorTeller Envoy to present its server-side certificate to ColorGateway Envoy during the TLS handshake.

The system startup concludes with this step, and the application is ready to process external client requests.

Step 10: The ColorGateway service receives a request from an external client.

Step 11: The ColorGateway Envoy initiates a TLS handshake with the ColorTeller Envoy. During the first half of the TLS handshake protocol, the ColorTeller Envoy presents its server-side certificate to the ColorGateway Envoy. The ColorGateway Envoy validates the certificate. Because the ColorGateway Envoy has been configured with the ColorTeller CA trust chain in step 8, the validation succeeds.

Step 12: During the second half of the TLS handshake, the ColorTeller Envoy requests the ColorGateway Envoy to provide its client-side certificate. This step is what distinguishes an mTLS exchange from a one-way TLS exchange.

The ColorGateway Envoy responds with its end-entity certificate that had been placed on its file system in step 7. The ColorTeller Envoy validates the received certificate with its CA trust chain, which contains the ColorGateway root CA that was placed on its file system (in step 7). The validation succeeds, and so an mTLS session is established.

Verifying mTLS

You can now verify that an mTLS exchange happened by running the following command on your bastion host.

$ curl -s http://colorteller.mtls-ec2.svc.cluster.local:9901/stats |grep -E 'ssl.handshake|ssl.no_certificate'

listener.0.0.0.0_15000.ssl.handshake: 1
listener.0.0.0.0_15000.ssl.no_certificate: 0

The count for ssl.handshake should be one. If the ssl.handshake count is more than one, that means that you’ve gone through more than one TLS handshake. It’s important to note that the count for ssl.no_certificate—the total successful TLS connections with no client certificate—is zero. This shows that mTLS configuration is working as expected. Recall that this count was one or higher—equal to the ssl.handshake count—in the previous section that described one-way TLS. The ssl.no_certificate count being zero indicates that this was an mTLS authenticated connection, where the ColorGateway authenticated the ColorTeller and vice-versa.

Certificate renewal

The ACM Private CA certificates that are imported into App Mesh are not eligible for managed renewal, so an external certificate renewal method is needed. This example solution uses an external renewal method as recommended in Renewing certificates in a private PKI that you can use in your own implementations.

The certificate renewal mechanism can be broken down into six steps, which are outlined in Figure 8.
 

Figure 8: Certificate renewal process in ACM Private CA and App Mesh on ECS integration

Figure 8: Certificate renewal process in ACM Private CA and App Mesh on ECS integration

Here are the steps illustrated in Figure 8:

Step 1: ACM generates an Amazon CloudWatch Events event when a certificate is close to expiring.

Step 2: CloudWatch triggers a Lambda function that is responsible for certificate renewal.

Step 3: The Lambda function renews the certificate in ACM and exports the new certificate by calling ACM APIs.

Step 4: The Lambda function writes the certificate to Secrets Manager.

Step 5: The Lambda function triggers a new service deployment in an Amazon ECS cluster. This will cause the ECS services to go through a graceful update process to acquire a renewed certificate.

Step 6: The Envoy processes in App Mesh fetch the client-side certificate from Secrets Manager using external integration, and the server-side certificate from ACM using native integration.

Conclusion

In this post, you learned a method for enabling mTLS authentication between App Mesh endpoints based on certificates issued by ACM Private CA. mTLS enhances security of App Mesh deployments due to its bidirectional authentication capability. While server-side certificates are integrated natively, you saw how to use Lambda and Secrets Manager to integrate client-side certificates externally. Because ACM Private CA certificates aren’t eligible for managed renewal, you also learned how to implement an external certificate renewal process.

This solution enhances your App Mesh security posture by simplifying configuration of mTLS-enabled App Mesh deployments. It achieves this because all mTLS certificate requirements are met by a single, managed private PKI service—ACM Private CA—which allows you to centrally manage certificates and eliminates the need to run your own private PKI.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Certificate Manager forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Raj Jain

Raj is an engineering leader at Amazon in the FinTech space. He is passionate about building SaaS applications for Amazon internal and external customers using AWS. He is currently working on an AI/ML application in the governance, risk and compliance domain. Raj is a published author in the Bell Labs Technical Journal, has authored 3 IETF standards, and holds 12 patents in internet telephony and applied cryptography. In his spare time, he enjoys the outdoors, cooking, reading, and travel.

Author

Nagmesh Kumar

Nagmesh is a Cloud Architect with the Worldwide Public Sector Professional Services team. He enjoys working with customers to design and implement well-architected solutions in the cloud. He was a researcher who stumbled into IT operations as a database administrator. After spending all day in the cloud, you can spot him in the wild with his family, reading, or gaming.

TLS 1.2 will be required for all AWS FIPS endpoints beginning March 31, 2021

Post Syndicated from Janelle Hopper original https://aws.amazon.com/blogs/security/tls-1-2-required-for-aws-fips-endpoints/

To help you meet your compliance needs, we’re updating all AWS Federal Information Processing Standard (FIPS) endpoints to a minimum of Transport Layer Security (TLS) 1.2. We have already updated over 40 services to require TLS 1.2, removing support for TLS 1.0 and TLS 1.1. Beginning March 31, 2021, if your client application cannot support TLS 1.2, it will result in connection failures. In order to avoid an interruption in service, we encourage you to act now to ensure that you connect to AWS FIPS endpoints at TLS version 1.2. This change does not affect non-FIPS AWS endpoints.

Amazon Web Services (AWS) continues to notify impacted customers directly via their Personal Health Dashboard and email. However, if you’re connecting anonymously to AWS shared resources, such as through a public Amazon Simple Storage Service (Amazon S3) bucket, then you would not have received a notification, as we cannot identify anonymous connections.

Why are you removing TLS 1.0 and TLS 1.1 support from FIPS endpoints?

At AWS, we’re continually expanding the scope of our compliance programs to meet the needs of customers who want to use our services for sensitive and regulated workloads. Compliance programs, including FedRAMP, require a minimum level of TLS 1.2. To help you meet compliance requirements, we’re updating all AWS FIPS endpoints to a minimum of TLS version 1.2 across all AWS Regions. Following this update, you will not be able to use TLS 1.0 and TLS 1.1 for connections to FIPS endpoints.

How can I detect if I am using TLS 1.0 or TLS 1.1?

To detect the use of TLS 1.0 or 1.1, we recommend that you perform code, network, or log analysis. If you are using an AWS Software Developer Kit (AWS SDK) or Command Line Interface (CLI), we have provided hyperlinks to detailed guidance in our previous TLS blog post about how to examine your client application code and properly configure the TLS version used.

When the application source code is unavailable, you can use a network tool, such as TCPDump (Linux) or Wireshark (Linux or Windows), to analyze your network traffic to find the TLS versions you’re using when connecting to AWS endpoints. For a detailed example of using these tools, see the example, below.

If you’re using Amazon S3, you can also use your access logs to view the TLS connection information for these services and identify client connections that are not at TLS 1.2.

What is the most common use of TLS 1.0 or TLS 1.1?

The most common client applications that use TLS 1.0 or 1.1 are Microsoft .NET Framework versions earlier than 4.6.2. If you use the .NET Framework, please confirm you are using version 4.6.2 or later. For information on how to update and configure .NET Framework to support TLS 1.2, see How to enable TLS 1.2 on clients.

How do I know if I am using an AWS FIPS endpoint?

All AWS services offer TLS 1.2 encrypted endpoints that you can use for all API calls. Some AWS services also offer FIPS 140-2 endpoints for customers who need to use FIPS-validated cryptographic libraries to connect to AWS services. You can check our list of all AWS FIPS endpoints and compare the list to your application code, configuration repositories, DNS logs, or other network logs.

EXAMPLE: TLS version detection using a packet capture

To capture the packets, multiple online sources, such as this article, provide guidance for setting up TCPDump on a Linux operating system. On a Windows operating system, the Wireshark tool provides packet analysis capabilities and can be used to analyze packets captured with TCPDump or it can also directly capture packets.

In this example, we assume there is a client application with the local IP address 10.25.35.243 that is making API calls to the CloudWatch FIPS API endpoint in the AWS GovCloud (US-West) Region. To analyze the traffic, first we look up the endpoint URL in the AWS FIPS endpoint list. In our example, the endpoint URL is monitoring.us-gov-west-1.amazonaws.com. Then we use NSLookup to find the IP addresses used by this FIPS endpoint.

Figure 1: Use NSLookup to find the IP addresses used by this FIPS endpoint

Figure 1: Use NSLookup to find the IP addresses used by this FIPS endpoint

Wireshark is then used to open the captured packets, and filter to just the packets with the relevant IP address. This can be done automatically by selecting one of the packets in the upper section, and then right-clicking to use the Conversation filter/IPv4 option.

After the results are filtered to only the relevant IP addresses, the next step is to find the packet whose description in the Info column is Client Hello. In the lower packet details area, expand the Transport Layer Security section to find the version, which in this example is set to TLS 1.0 (0x0301). This indicates that the client only supports TLS 1.0 and must be modified to support a TLS 1.2 connection.

Figure 2: After the conversation filter has been applied, select the Client Hello packet in the top pane. Expand the Transport Layer Security section in the lower pane to view the packet details and the TLS version.

Figure 2: After the conversation filter has been applied, select the Client Hello packet in the top pane. Expand the Transport Layer Security section in the lower pane to view the packet details and the TLS version.

Figure 3 shows what it looks like after the client has been updated to support TLS 1.2. This second packet capture confirms we are sending TLS 1.2 (0x0303) in the Client Hello packet.

Figure 3: The client TLS has been updated to support TLS 1.2

Figure 3: The client TLS has been updated to support TLS 1.2

Is there more assistance available?

If you have any questions or issues, you can start a new thread on one of the AWS forums, or contact AWS Support or your technical account manager (TAM). The AWS support tiers cover development and production issues for AWS products and services, along with other key stack components. AWS Support doesn’t include code development for client applications.

Additionally, you can use AWS IQ to find, securely collaborate with, and pay AWS-certified third-party experts for on-demand assistance to update your TLS client components. Visit the AWS IQ page for information about how to submit a request, get responses from experts, and choose the expert with the right skills and experience. Log in to your console and select Get Started with AWS IQ to start a request.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Janelle Hopper

Janelle is a Senior Technical Program Manager in AWS Security with over 15 years of experience in the IT security field. She works with AWS services, infrastructure, and administrative teams to identify and drive innovative solutions that improve AWS’ security posture.

Author

Daniel Salzedo

Daniel is a Senior Specialist Technical Account Manager – Security. He has over 25 years of professional experience in IT in industries as diverse as video game development, manufacturing, banking and used car sales. He loves working with our wonderful AWS customers to help them solve their complex security challenges at scale.

Over 40 services require TLS 1.2 minimum for AWS FIPS endpoints

Post Syndicated from Janelle Hopper original https://aws.amazon.com/blogs/security/over-40-services-require-tls-1-2-minimum-for-aws-fips-endpoints/

In a March 2020 blog post, we told you about work Amazon Web Services (AWS) was undertaking to update all of our AWS Federal Information Processing Standard (FIPS) endpoints to a minimum of Transport Layer Security (TLS) 1.2 across all AWS Regions. Today, we’re happy to announce that over 40 services have been updated and now require TLS 1.2:

These services no longer support using TLS 1.0 or TLS 1.1 on their FIPS endpoints. To help you meet your compliance needs, we are updating all AWS FIPS endpoints to a minimum of TLS 1.2 across all Regions. We will continue to update our services to support only TLS 1.2 or later on AWS FIPS endpoints, which you can check on the AWS FIPS webpage. This change doesn’t affect non-FIPS AWS endpoints.

When you make a connection from your client application to an AWS service endpoint, the client provides its TLS minimum and TLS maximum versions. The AWS service endpoint will always select the maximum version offered.

What is TLS?

TLS is a cryptographic protocol designed to provide secure communication across a computer network. API calls to AWS services are secured using TLS.

What is FIPS 140-2?

The FIPS 140-2 is a US and Canadian government standard that specifies the security requirements for cryptographic modules that protect sensitive information.

What are AWS FIPS endpoints?

All AWS services offer TLS 1.2 encrypted endpoints that can be used for all API calls. Some AWS services also offer FIPS 140-2 endpoints for customers who need to use FIPS validated cryptographic libraries to connect to AWS services.

Why are we upgrading to TLS 1.2?

Our upgrade to TLS 1.2 across all Regions reflects our ongoing commitment to help customers meet their compliance needs.

Is there more assistance available to help verify or update client applications?

If you’re using an AWS software development kit (AWS SDK), you can find information about how to properly configure the minimum and maximum TLS versions for your clients in the following AWS SDK topics:

You can also visit Tools to Build on AWS and browse by programming language to find the relevant SDK. AWS Support tiers cover development and production issues for AWS products and services, along with other key stack components. AWS Support doesn’t include code development for client applications.

If you have any questions or issues, you can start a new thread on one of the AWS forums, or contact AWS Support or your technical account manager (TAM).

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Janelle Hopper

Janelle Hopper is a Senior Technical Program Manager in AWS Security with over 15 years of experience in the IT security field. She works with AWS services, infrastructure, and administrative teams to identify and drive innovative solutions that improve AWS’ security posture.

Author

Marta Taggart

Marta is a Seattle-native and Senior Program Manager in AWS Security, where she focuses on privacy, content development, and educational programs. Her interest in education stems from two years she spent in the education sector while serving in the Peace Corps in Romania. In her free time, she’s on a global hunt for the perfect cup of coffee.

KEMTLS: Post-quantum TLS without signatures

Post Syndicated from Sofía Celi original https://blog.cloudflare.com/kemtls-post-quantum-tls-without-signatures/

KEMTLS: Post-quantum TLS without signatures

KEMTLS: Post-quantum TLS without signatures

The Transport Layer Security protocol (TLS), which secures most Internet connections, has mainly been a protocol consisting of a key exchange authenticated by digital signatures used to encrypt data at transport[1]. Even though it has undergone major changes since 1994, when SSL 1.0 was introduced by Netscape, its main mechanism has remained the same. The key exchange was first based on RSA, and later on traditional Diffie-Hellman (DH) and Elliptic-curve Diffie-Hellman (ECDH). The signatures used for authentication have almost always been RSA-based, though in recent years other kinds of signatures have been adopted, mainly ECDSA and Ed25519. This recent change to elliptic curve cryptography in both at the key exchange and at the signature level has resulted in considerable speed and bandwidth benefits in comparison to traditional Diffie-Hellman and RSA.

TLS is the main protocol that protects the connections we use everyday. It’s everywhere: we use it when we buy products online, when we register for a newsletter — when we access any kind of website, IoT device, API for mobile apps and more, really. But with the imminent threat of the arrival of quantum computers (a threat that seems to be getting closer and closer), we need to reconsider the future of TLS once again. A wide-scale post-quantum experiment was carried out by Cloudflare and Google: two post-quantum key exchanges were integrated into our TLS stack and deployed at our edge servers as well as in Chrome Canary clients. The goal of that experiment was to evaluate the performance and feasibility of deployment of two post-quantum key exchanges in TLS.

Similar experiments have been proposed for introducing post-quantum algorithms into the TLS handshake itself. Unfortunately, it seems infeasible to replace both the key exchange and signature with post-quantum primitives, because post-quantum cryptographic primitives are bigger, or slower (or both), than their predecessors. The proposed algorithms under consideration in the NIST post-quantum standardization process use mathematical objects that are larger than the ones used for elliptic curves, traditional Diffie-Hellman, or RSA. As a result, the overall size of public keys, signatures and key exchange material is much bigger than those from elliptic curves, Diffie-Hellman, or RSA.

How can we solve this problem? How can we use post-quantum algorithms as part of the TLS handshake without making the material too big to be transmitted? In this blogpost, we will introduce a new mechanism for making this happen. We’ll explain how it can be integrated into the handshake and we’ll cover implementation details. The key observation in this mechanism is that, while post-quantum algorithms have bigger communication size than their predecessors, post-quantum key exchanges have somewhat smaller sizes than post-quantum signatures, so we can try to replace signatures with key exchanges in some places to save space.  We will only focus on the TLS 1.3 handshake as it is the TLS version that should be currently used.

Past experiments: making the TLS 1.3 handshake post-quantum

KEMTLS: Post-quantum TLS without signatures

TLS 1.3 was introduced in August 2018, and it brought many security and performance improvements (notably, having only one round-trip to complete the handshake). But TLS 1.3 is designed for a world with classical computers, and some of its functionality will be broken by quantum computers when they do arrive.

The primary goal of TLS 1.3 is to provide authentication (the server side of the channel is always authenticated, the client side is optionally authenticated), confidentiality, and integrity by using a handshake protocol and a record protocol. The handshake protocol, the one of interest for us today, establishes the cryptographic parameters for securing and authenticating a connection. It can be thought of as having three main phases, as defined in RFC8446:

–  The Parameter Negotiation phase (referred to as ‘Server Parameters’ in RFC8446), which establishes other handshake parameters (whether the client is authenticated, application-layer protocol support, etc).

–  The Key Exchange phase, which establishes shared keying material and selects the cryptographic parameters to be used. Everything after this phase will be encrypted.

–  The Authentication phase, which authenticates the server (and, optionally, the client) and provides key confirmation and handshake integrity.

The main idea of past experiments that introduced post-quantum algorithms into the handshake of TLS 1.3 was to use them in place of classical algorithms by advertising them as part of the supported groups[2] and key share[3] extensions, and, therefore, establishing with them the negotiated connection parameters. Key encapsulation mechanisms (KEMs) are an abstraction of the basic key exchange primitive, and were used to generate the shared secrets. When using a pre-shared key, its symmetric algorithms can be easily replaced by post-quantum KEMs as well; and, in the case of password-authenticated TLS, some ideas have been proposed on how to use post-quantum algorithms with them.

Most of the above ideas only provide what is often defined as ‘transitional security’, because its main focus is to provide quantum-resistant confidentiality, and not to take quantum-resistant authentication into account. It is possible to use post-quantum signatures for TLS authentication, but the post-quantum signatures are larger than traditional ones. Furthermore, it is worth noting that using post-quantum signatures is much more expensive than using post-quantum KEMs.

We can estimate the impact of such a replacement on network traffic by simply looking at the sum of the cryptographic objects that are transmitted during the handshake. A typical TLS 1.3 handshake using elliptic curve X25519 and RSA-2048 would transmit 1,376 bytes, which would correspond to the public keys for key exchange, the certificate, the signature of the handshake, and the certificate chain. If we were to replace X25519 by the post-quantum KEM Kyber512 and RSA by the post-quantum signature Dilithium II, two of the more efficient proposals, the size transmitted data would increase to 10,036 bytes[4]. The increase is mostly due to the size of the post-quantum signature algorithm.

The question then is: how can we achieve full post-quantum security and give a handshake that is efficient to be used?

A more efficient proposal: KEMTLS

There is a long history of other mechanisms, besides signatures, being used for authentication. Modern protocols, such as the Signal protocol, the Noise framework, or WireGuard, rely on key exchange mechanisms for authentication; but they are unsuitable for the TLS 1.3 case as they expect the long-term key material to be known in advance by the interested parties.

The OPTLS proposal by Krawczyk and Wee authenticates the TLS handshake without signatures by using a non-interactive key exchange (NIKE). However, the only somewhat efficient construction for a post-quantum NIKE is CSIDH, the security of which is the subject of an ongoing debate. But we can build on this idea, and use KEMs for authentication. KEMTLS, the current proposed experiment, replaces the handshake signature by a post-quantum KEM key exchange. It was designed and introduced by Peter Schwabe, Douglas Stebila and Thom Wiggers in the publication ‘Post-Quantum TLS Without Handshake Signatures’.

KEMTLS, therefore, achieves the same goals as TLS 1.3 (authentication, confidentiality and integrity) in the face of quantum computers. But there’s one small difference compared to the TLS 1.3 handshake. KEMTLS allows the client to send encrypted application data in the second client-to-server TLS message flow when client authentication is not required, and in the third client-to-server TLS message flow when mutual authentication is required. Note that with TLS 1.3, the server is able to send encrypted and authenticated application data in its first response message (although, in most uses of TLS 1.3, this feature is not actually used). With KEMTLS, when client authentication is not required, the client is able to send its first encrypted application data after the same number of handshake round trips as in TLS 1.3.

Intuitively, the handshake signature in TLS 1.3 proves possession of the private key corresponding to the public key certified in the TLS 1.3 server certificate. For these signature schemes, this is the straightforward way to prove possession; another way to prove possession is through key exchanges. By carefully considering the key derivation sequence, a server can decrypt any messages sent by the client only if it holds the private key corresponding to the certified public key. Therefore, implicit authentication is fulfilled. It is worth noting that KEMTLS still relies on signatures by certificate authorities to authenticate the long-term KEM keys.

With KEMTLS, application data transmitted during the handshake is implicitly authenticated rather than explicitly (as in TLS 1.3), and has slightly weaker downgrade resilience and forward secrecy; but full downgrade resilience and forward secrecy are achieved once the KEMTLS handshake completes.

KEMTLS: Post-quantum TLS without signatures

By replacing the handshake signature by a KEM key exchange, we reduce the size of the data transmitted in the example handshake to 8,344 bytes, using Kyber512 and Dilithium II — a significant reduction. We can reduce the handshake size even for algorithms such as the NTRU-assumption based KEM NTRU and signature algorithm Falcon, which have a less-pronounced size gap. Typically, KEM operations are computationally much lighter than signing operations, which makes the reduction even more significant.

KEMTLS was presented at ACM CCS 2020. You can read more about its details in the paper. It was initially implemented in the RustTLS library by Thom Wiggers using optimized C/assembly implementations of the post-quantum algorithms provided by the PQClean and Open Quantum Safe projects.

Cloudflare and KEMTLS: the implementation

As part of our effort to show that TLS can be completely post-quantum safe, we implemented the full KEMTLS handshake in Golang’s TLS 1.3 suite. The implementation was done in several steps:

  1. We first needed to clone our own version of Golang, so we could add different post-quantum algorithms to it. You can find our own version here. This code gets constantly updated with every release of Golang, following these steps.
  2. We needed to implement post-quantum algorithms in Golang, which we did on our own cryptographic library, CIRCL.
  3. As we cannot force certificate authorities to use certificates with long-term post-quantum KEM keys, we decided to use Delegated Credentials. A delegated credential is a short-lasting key that the certificate’s owner has delegated for use in TLS. Therefore, they can be used for post-quantum KEM keys. See its implementation in our Golang code here.
  4. We implemented mutual auth (client and server authentication) KEMTLS by using Delegated Credentials for the authentication process. See its implementation in our Golang code here. You can also check its test for an overview of how it works.

Implementing KEMTLS was a straightforward process, although it did require changes to the way Golang handles a TLS 1.3 handshake and how the key schedule works.

A “regular” TLS 1.3 handshake in Golang (from the server perspective) looks like this:

func (hs *serverHandshakeStateTLS13) handshake() error {
    c := hs.c

    // For an overview of the TLS 1.3 handshake, see RFC 8446, Section 2.
    if err := hs.processClientHello(); err != nil {
   	 return err
    }
    if err := hs.checkForResumption(); err != nil {
   	 return err
    }
    if err := hs.pickCertificate(); err != nil {
   	 return err
    }
    c.buffering = true
    if err := hs.sendServerParameters(); err != nil {
   	 return err
    }
    if err := hs.sendServerCertificate(); err != nil {
   	 return err
    }
    if err := hs.sendServerFinished(); err != nil {
   	 return err
    }
    // Note that at this point we could start sending application data without
    // waiting for the client's second flight, but the application might not
    // expect the lack of replay protection of the ClientHello parameters.
    if _, err := c.flush(); err != nil {
   	 return err
    }
    if err := hs.readClientCertificate(); err != nil {
   	 return err
    }
    if err := hs.readClientFinished(); err != nil {
   	 return err
    }

    atomic.StoreUint32(&c.handshakeStatus, 1)

    return nil
}

We had to interrupt the process when the server sends the Certificate (sendServerCertificate()) in order to send the KEMTLS specific messages. In the same way, we had to add the appropriate KEM TLS messages to the client’s handshake. And, as we didn’t want to change so much the way Golang handles TLS 1.3, we only added one new constant to the configuration that can be used by a server in order to ask for the Client’s Certificate (the constant is serverConfig.ClientAuth = RequestClientKEMCert).

The implementation is easy to work with: if a delegated credential or a certificate has a public key of a supported post-quantum KEM algorithm, the handshake will proceed with KEMTLS. If the server requests a Client KEMTLS Certificate, the handshake will use client KEMTLS authentication.

Running the Experiment

So, what’s next? We’ll take the code we have produced and run it on actual Cloudflare infrastructure to measure how efficiently it works.

Thanks

Many thanks to everyone involved in the project: Chris Wood, Armando Faz-Hernández, Thom Wiggers, Bas Westerbaan, Peter Wu, Peter Schwabe, Goutam Tamvada, Douglas Stebila, Thibault Meunier, and the whole Cloudflare Research team.

1It is worth noting that the RSA key transport in TLS ≤1.2 has the server only authenticated by RSA public key encryption, although the server’s RSA public key is certified using RSA signatures by Certificate Authorities.
2An extension used by the client to indicate which named groups -Elliptic Curve Groups, Finite Field Groups- it supports for key exchange.
3An extension which contains the endpoint’s cryptographic parameters.
4These numbers, as it is noted in the paper, are based on the round-2 submissions.

Helping build the next generation of privacy-preserving protocols

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/next-generation-privacy-protocols/

Helping build the next generation of privacy-preserving protocols

Helping build the next generation of privacy-preserving protocols

Over the last ten years, Cloudflare has become an important part of Internet infrastructure, powering websites, APIs, and web services to help make them more secure and efficient. The Internet is growing in terms of its capacity and the number of people using it and evolving in terms of its design and functionality. As a player in the Internet ecosystem, Cloudflare has a responsibility to help the Internet grow in a way that respects and provides value for its users. Today, we’re making several announcements around improving Internet protocols with respect to something important to our customers and Internet users worldwide: privacy.

These initiatives are:

Each of these projects impacts an aspect of the Internet that influences our online lives and digital footprints. Whether we know it or not, there is a lot of private information about us and our lives floating around online. This is something we can help fix.

For over a year, we have been working through standards bodies like the IETF and partnering with the biggest names in Internet technology (including Mozilla, Google, Equinix, and more) to design, deploy, and test these new privacy-preserving protocols at Internet scale. Each of these three protocols touches on a critical aspect of our online lives, and we expect them to help make real improvements to privacy online as they gain adoption.

A continuing tradition at Cloudflare

One of Cloudflare’s core missions is to support and develop technology that helps build a better Internet. As an industry, we’ve made exceptional progress in making the Internet more secure and robust. Cloudflare is proud to have played a part in this progress through multiple initiatives over the years.

Here are a few highlights:

  • Universal SSL™. We’ve been one of the driving forces for encrypting the web. We launched Universal SSL in 2014 to give website encryption to our customers for free and have actively been working along with certificate authorities like Let’s Encrypt, web browsers, and website operators to help remove mixed content. Before Universal SSL launched to give all Cloudflare customers HTTPS for free, only 30% of connections to websites were encrypted. Through the industry’s efforts, that number is now 80% — and a much more significant proportion of overall Internet traffic. Along with doing our part to encrypt the web, we have supported the Certificate Transparency project via Nimbus and Merkle Town, which has improved accountability for the certificate ecosystem HTTPS relies on for trust.
  • TLS 1.3 and QUIC. We’ve also been a proponent of upgrading existing security protocols. Take Transport Layer Security (TLS), the underlying protocol that secures HTTPS. Cloudflare engineers helped contribute to the design of TLS 1.3, the latest version of the standard, and in 2016 we launched support for an early version of the protocol. This early deployment helped lead to improvements to the final version of the protocol. TLS 1.3 is now the most widely used encryption protocol on the web and a vital component of the emerging QUIC standard, of which we were also early adopters.
  • Securing Routing, Naming, and Time. We’ve made major efforts to help secure other critical components of the Internet. Our efforts to help secure Internet routing through our RPKI toolkit, measurement studies, and “Is BGP Safe Yet” tool have significantly improved the Internet’s resilience against disruptive route leaks. Our time service (time.cloudflare.com) has helped keep people’s clocks in sync with more secure protocols like NTS and Roughtime. We’ve also made DNS more secure by supporting DNS-over-HTTPS and DNS-over-TLS in 1.1.1.1 at launch, along with one-click DNSSEC in our authoritative DNS service and registrar.

Continuing to improve the security of the systems of trust online is critical to the Internet’s growth. However, there is a more fundamental principle at play: respect. The infrastructure underlying the Internet should be designed to respect its users.

Building an Internet that respects users

When you sign in to a specific website or service with a privacy policy, you know what that site is expected to do with your data. It’s explicit. There is no such visibility to the users when it comes to the operators of the Internet itself. You may have an agreement with your Internet Service Provider (ISP) and the site you’re visiting, but it’s doubtful that you even know which networks your data is traversing. Most people don’t have a concept of the Internet beyond what they see on their screen, so it’s hard to imagine that people would accept or even understand what a privacy policy from a transit wholesaler or an inspection middlebox would even mean.

Without encryption, Internet browsing information is implicitly shared with countless third parties online as information passes between networks. Without secure routing, users’ traffic can be hijacked and disrupted. Without privacy-preserving protocols, users’ online life is not as private as they would think or expect. The infrastructure of the Internet wasn’t built in a way that reflects their expectations.

Helping build the next generation of privacy-preserving protocols
Normal network flow
Helping build the next generation of privacy-preserving protocols
Network flow with malicious route leak

The good news is that the Internet is continuously evolving. One of the groups that help guide that evolution is the Internet Architecture Board (IAB). The IAB provides architectural oversight to the Internet Engineering Task Force (IETF), the Internet’s main standard-setting body. The IAB recently published RFC 8890, which states that individual end-users should be prioritized when designing Internet protocols. It says that if there’s a conflict between the interests of end-users and the interest of service providers, corporations, or governments, IETF decisions should favor end users. One of the prime interests of end-users is the right to privacy, and the IAB published RFC 6973 to indicate how Internet protocols should take privacy into account.

Today’s technical blog posts are about improvements to the Internet designed to respect user privacy. Privacy is a complex topic that spans multiple disciplines, so it’s essential to clarify what we mean by “improving privacy.” We are specifically talking about changing the protocols that handle privacy-sensitive information exposed “on-the-wire” and modifying them so that this data is exposed to fewer parties. This data continues to exist. It’s just no longer available or visible to third parties without building a mechanism to collect it at a higher layer of the Internet stack, the application layer. These changes go beyond website encryption; they go deep into the design of the systems that are foundational to making the Internet what it is.

The toolbox: cryptography and secure proxies

Two tools for making sure data can be used without being seen are cryptography and secure proxies.

Helping build the next generation of privacy-preserving protocols

Cryptography allows information to be transformed into a format that a very limited number of people (those with the key) can understand. Some describe cryptography as a tool that transforms data security problems into key management problems. This is a humorous but fair description. Cryptography makes it easier to reason about privacy because only key holders can view data.

Another tool for protecting access to data is isolation/segmentation. By physically limiting which parties have access to information, you effectively build privacy walls. A popular architecture is to rely on policy-aware proxies to pass data from one place to another. Such proxies can be configured to strip sensitive data or block data transfers between parties according to what the privacy policy says.

Both these tools are useful individually, but they can be even more effective if combined. Onion routing (the cryptographic technique underlying Tor) is one example of how proxies and encryption can be used in tandem to enforce strong privacy. Broadly, if party A wants to send data to party B, they can encrypt the data with party B’s key and encrypt the metadata with a proxy’s key and send it to the proxy.

Platforms and services built on top of the Internet can build in consent systems, like privacy policies presented through user interfaces. The infrastructure of the Internet relies on layers of underlying protocols. Because these layers of the Internet are so far below where the user interacts with them, it’s almost impossible to build a concept of user consent. In order to respect users and protect them from privacy issues, the protocols that glue the Internet together should be designed with privacy enabled by default.

Data vs. metadata

The transition from a mostly unencrypted web to an encrypted web has done a lot for end-user privacy. For example, the “coffeeshop stalker” is no longer an issue for most sites. When accessing the majority of sites online, users are no longer broadcasting every aspect of their web browsing experience (search queries, browser versions, authentication cookies, etc.) over the Internet for any participant on the path to see. Suppose a site is configured correctly to use HTTPS. In that case, users can be confident their data is secure from onlookers and reaches only the intended party because their connections are both encrypted and authenticated.

However, HTTPS only protects the content of web requests. Even if you only browse sites over HTTPS, that doesn’t mean that your browsing patterns are private. This is because HTTPS fails to encrypt a critical aspect of the exchange: the metadata. When you make a phone call, the metadata is the phone number, not the call’s contents. Metadata is the data about the data.

To illustrate the difference and why it matters, here’s a diagram of what happens when you visit a website like an imageboard. Say you’re going to a specific page on that board (https://<imageboard>.com/room101/) that has specific embedded images hosted on <embarassing>.com.

Helping build the next generation of privacy-preserving protocols
Page load for an imageboard, returning an HTML page with an image from an embarassing site
Helping build the next generation of privacy-preserving protocols
Subresource fetch for the image from an embarassing site

The space inside the dotted line here represents the part of the Internet that your data needs to transit. They include your local area network or coffee shop, your ISP, an Internet transit provider, and it could be the network portion of the cloud provider that hosts the server. Users often don’t have a relationship with these entities or a contract to prevent these parties from doing anything with the user’s data. And even if those entities don’t look at the data, a well-placed observer intercepting Internet traffic could see anything sent unencrypted. It would be best if they just didn’t see it at all. In this example, the fact that the user visited <imageboard>.com can be seen by an observer, which is expected. However, though page content is encrypted, it’s possible to learn which specific page you’ve visited can be seen since <embarassing>.com is also visible.

It’s a general rule that if data is available to on-path parties on the Internet, some of these on-path parties will use this data. It’s also true that these on-path parties need some metadata in order to facilitate the transport of this data. This balance is explored in RFC 8558, which explains how protocols should be designed thoughtfully with respect to the balance between too much metadata (bad for privacy) and too little metadata (bad for operations).

In an ideal world, Internet protocols would be designed with the principle of least privilege. They would provide the minimum amount of information needed for the on-path parties (the pipes) to do the job of transporting the data to the right place and keep everything else confidential by default. Current protocols, including TLS 1.3 and QUIC, are important steps towards this ideal but fall short with respect to metadata privacy.

Knowing both who you are and what you do online can lead to profiling

Today’s announcements reflect two metadata protection levels: the first involves limiting the amount of metadata available to third-party observers (like ISPs). The second involves restricting the amount of metadata that users share with service providers themselves.

Hostnames are an example of metadata that needs to be protected from third-party observers, which DoH and ECH intend to do. However, it doesn’t make sense to hide the hostname from the site you’re visiting. It also doesn’t make sense to hide it from a directory service like DNS. A DNS server needs to know which hostname you’re resolving to resolve it for you!

A privacy issue arises when a service provider knows about both what sites you’re visiting and who you are. Individual websites do not have this dangerous combination of information (except in the case of third party cookies, which are going away soon in browsers), but DNS providers do. Thankfully, it’s not actually necessary for a DNS resolver to know *both* the hostname of the service you’re going to and which IP you’re coming from. Disentangling the two, which is the goal of ODoH, is good for privacy.

The Internet is part of ‘our’ Infrastructure

Roads should be well-paved, well lit, have accurate signage, and be optimally connected. They aren’t designed to stop a car based on who’s inside it. Nor should they be! Like transportation infrastructure, Internet infrastructure is responsible for getting data where it needs to go, not looking inside packets, and making judgments. But the Internet is made of computers and software, and software tends to be written to make decisions based on the data it has available to it.

Privacy-preserving protocols attempt to eliminate the temptation for infrastructure providers and others to peek inside and make decisions based on personal data. A non-privacy preserving protocol like HTTP keeps data and metadata, like passwords, IP addresses, and hostnames, as explicit parts of the data sent over the wire. The fact that they are explicit means that they are available to any observer to collect and act on. A protocol like HTTPS improves upon this by making some of the data (such as passwords and site content) invisible on the wire using encryption.

The three protocols we are exploring today extend this concept.

  • ECH takes most of the unencrypted metadata in TLS (including the hostname) and encrypts it with a key that was fetched ahead of time.
  • ODoH (a new variant of DoH co-designed by Apple, Cloudflare, and Fastly engineers) uses proxies and onion-like encryption to make the source of a DNS query invisible to the DNS resolver. This protects the user’s IP address when resolving hostnames.
  • OPAQUE uses a new cryptographic technique to keep passwords hidden even from the server. Utilizing a construction called an Oblivious Pseudo-Random Function (as seen in Privacy Pass), the server does not learn the password; it only learns whether or not the user knows the password.

By making sure Internet infrastructure acts more like physical infrastructure, user privacy is more easily protected. The Internet is more private if private data can only be collected where the user has a chance to consent to its collection.

Doing it together

As much as we’re excited about working on new ways to make the Internet more private, innovation at a global scale doesn’t happen in a vacuum. Each of these projects is the output of a collaborative group of individuals working out in the open in organizations like the IETF and the IRTF. Protocols must come about through a consensus process that involves all the parties that make up the interconnected set of systems that power the Internet. From browser builders to cryptographers, from DNS operators to website administrators, this is truly a global team effort.

We also recognize that sweeping technical changes to the Internet will inevitably also impact the technical community. Adopting these new protocols may have legal and policy implications. We are actively working with governments and civil society groups to help educate them about the impact of these potential changes.

We’re looking forward to sharing our work today and hope that more interested parties join in developing these protocols. The projects we are announcing today were designed by experts from academia, industry, and hobbyists together and were built by engineers from Cloudflare Research (including the work of interns, which we will highlight) with everyone’s support Cloudflare.

If you’re interested in this type of work, we’re hiring!

Good-bye ESNI, hello ECH!

Post Syndicated from Christopher Patton original https://blog.cloudflare.com/encrypted-client-hello/

Good-bye ESNI, hello ECH!

Good-bye ESNI, hello ECH!

Most communication on the modern Internet is encrypted to ensure that its content is intelligible only to the endpoints, i.e., client and server. Encryption, however, requires a key and so the endpoints must agree on an encryption key without revealing the key to would-be attackers. The most widely used cryptographic protocol for this task, called key exchange, is the Transport Layer Security (TLS) handshake.

In this post we’ll dive into Encrypted Client Hello (ECH), a new extension for TLS that promises to significantly enhance the privacy of this critical Internet protocol. Today, a number of privacy-sensitive parameters of the TLS connection are negotiated in the clear. This leaves a trove of metadata available to network observers, including the endpoints’ identities, how they use the connection, and so on.

ECH encrypts the full handshake so that this metadata is kept secret. Crucially, this closes a long-standing privacy leak by protecting the Server Name Indication (SNI) from eavesdroppers on the network. Encrypting the SNI secret is important because it is the clearest signal of which server a given client is communicating with. However, and perhaps more significantly, ECH also lays the groundwork for adding future security features and performance enhancements to TLS while minimizing their impact on the privacy of end users.

ECH is the product of close collaboration, facilitated by the IETF, between academics and the tech industry leaders, including Cloudflare, our friends at Fastly and Mozilla (both of whom are the affiliations of co-authors of the standard), and many others. This feature represents a significant upgrade to the TLS protocol, one that builds on bleeding edge technologies, like DNS-over-HTTPS, that are only now coming into their own. As such, the protocol is not yet ready for Internet-scale deployment. This article is intended as a sign post on the road to full handshake encryption.

Background

The story of TLS is the story of the Internet. As our reliance on the Internet has grown, so the protocol has evolved to address ever-changing operational requirements, use cases, and threat models. The client and server don’t just exchange a key: they negotiate a wide variety of features and parameters: the exact method of key exchange; the encryption algorithm; who is authenticated and how; which application layer protocol to use after the handshake; and much, much more. All of these parameters impact the security properties of the communication channel in one way or another.

SNI is a prime example of a parameter that impacts the channel’s security. The SNI extension is used by the client to indicate to the server the website it wants to reach. This is essential for the modern Internet, as it’s common nowadays for many origin servers to sit behind a single TLS operator. In this setting, the operator uses the SNI to determine who will authenticate the connection: without it, there would be no way of knowing which TLS certificate to present to the client. The problem is that SNI leaks to the network the identity of the origin server the client wants to connect to, potentially allowing eavesdroppers to infer a lot of information about their communication. (Of course, there are other ways for a network observer to identify the origin — the origin’s IP address, for example. But co-locating with other origins on the same IP address makes it much harder to use this metric to determine the origin than it is to simply inspect the SNI.)

Although protecting SNI is the impetus for ECH, it is by no means the only privacy-sensitive handshake parameter that the client and server negotiate. Another is the ALPN extension, which is used to decide which application-layer protocol to use once the TLS connection is established. The client sends the list of applications it supports — whether it’s HTTPS, email, instant messaging, or the myriad other applications that use TLS for transport security — and the server selects one from this list, and sends its selection to the client. By doing so, the client and server leak to the network a clear signal of their capabilities and what the connection might be used for.

Some features are so privacy-sensitive that their inclusion in the handshake is a non-starter. One idea that has been floated is to replace the key exchange at the heart of TLS with password-authenticated key-exchange (PAKE). This would allow password-based authentication to be used alongside (or in lieu of) certificate-based authentication, making TLS more robust and suitable for a wider range of applications. The privacy issue here is analogous to SNI: servers typically associate a unique identifier to each client (e.g., a username or email address) that is used to retrieve the client’s credentials; and the client must, somehow, convey this identity to the server during the course of the handshake. If sent in the clear, then this personally identifiable information would be easily accessible to any network observer.

A necessary ingredient for addressing all of these privacy leaks is handshake encryption, i.e., the encryption of handshake messages in addition to application data. Sounds simple enough, but this solution presents another problem: how do the client and server pick an encryption key if, after all, the handshake is itself a means of exchanging a key? Some parameters must be sent in the clear, of course, so the goal of ECH is to encrypt all handshake parameters except those that are essential to completing the key exchange.

In order to understand ECH and the design decisions underpinning it, it helps to understand a little bit about the history of handshake encryption in TLS.

Handshake encryption in TLS

TLS had no handshake encryption at all prior to the latest version, TLS 1.3. In the wake of the Snowden revelations in 2013, the IETF community began to consider ways of countering the threat that mass surveillance posed to the open Internet. When the process of standardizing TLS 1.3 began in 2014, one of its design goals was to encrypt as much of the handshake as possible. Unfortunately, the final standard falls short of full handshake encryption, and several parameters, including SNI, are still sent in the clear. Let’s take a closer look to see why.

The TLS 1.3 protocol flow is illustrated in Figure 1. Handshake encryption begins as soon as the client and server compute a fresh shared secret. To do this, the client sends a key share in its ClientHello message, and the server responds in its ServerHello with its own key share. Having exchanged these shares, the client and server can derive a shared secret. Each subsequent handshake message is encrypted using the handshake traffic key derived from the shared secret. Application data is encrypted using a different key, called the application traffic key, which is also derived from the shared secret. These derived keys have different security properties: to emphasize this, they are illustrated with different colors.

The first handshake message that is encrypted is the server’s EncryptedExtensions. The purpose of this message is to protect the server’s sensitive handshake parameters, including the server’s ALPN extension, which contains the application selected from the client’s ALPN list. Key-exchange parameters are sent unencrypted in the ClientHello and ServerHello.

Good-bye ESNI, hello ECH!
Figure 1: The TLS 1.3 handshake.

All of the client’s handshake parameters, sensitive or not, are sent in the ClientHello. Looking at Figure 1, you might be able to think of ways of reworking the handshake so that some of them can be encrypted, perhaps at the cost of additional latency (i.e., more round trips over the network). However, extensions like SNI create a kind of “chicken-and-egg” problem.

The client doesn’t encrypt anything until it has verified the server’s identity (this is the job of the Certificate and CertificateVerify messages) and the server has confirmed that it knows the shared secret (the job of the Finished message). These measures ensure the key exchange is authenticated, thereby preventing monster-in-the-middle (MITM) attacks in which the adversary impersonates the server to the client in a way that allows it to decrypt messages sent by the client.  Because SNI is needed by the server to select the certificate, it needs to be transmitted before the key exchange is authenticated.

In general, ensuring confidentiality of handshake parameters used for authentication is only possible if the client and server already share an encryption key. But where might this key come from?

Full handshake encryption in the early days of TLS 1.3. Interestingly, full handshake encryption was once proposed as a core feature of TLS 1.3. In early versions of the protocol (draft-10, circa 2015), the server would offer the client a long-lived public key during the handshake, which the client would use for encryption in subsequent handshakes. (This design came from a protocol called OPTLS, which in turn was borrowed from the original QUIC proposal.) Called “0-RTT”, the primary purpose of this mode was to allow the client to begin sending application data prior to completing a handshake. In addition, it would have allowed the client to encrypt its first flight of handshake messages following the ClientHello, including its own EncryptedExtensions, which might have been used to protect the client’s sensitive handshake parameters.

Ultimately this feature was not included in the final standard (RFC 8446, published in 2018), mainly because its usefulness was outweighed by its added complexity. In particular, it does nothing to protect the initial handshake in which the client learns the server’s public key. Parameters that are required for server authentication of the initial handshake, like SNI, would still be transmitted in the clear.

Nevertheless, this scheme is notable as the forerunner of other handshake encryption mechanisms, like ECH, that use public key encryption to protect sensitive ClientHello parameters. The main problem these mechanisms must solve is key distribution.

Before ECH there was (and is!) ESNI

The immediate predecessor of ECH was the Encrypted SNI (ESNI) extension. As its name implies, the goal of ESNI was to provide confidentiality of the SNI. To do so, the client would encrypt its SNI extension under the server’s public key and send the ciphertext to the server. The server would attempt to decrypt the ciphertext using the secret key corresponding to its public key. If decryption were to succeed, then the server would proceed with the connection using the decrypted SNI. Otherwise, it would simply abort the handshake. The high-level flow of this simple protocol is illustrated in Figure 2.

Good-bye ESNI, hello ECH!
Figure 2: The TLS 1.3 handshake with the ESNI extension. It is identical to the TLS 1.3 handshake, except the SNI extension has been replaced with ESNI.

For key distribution, ESNI relied on another critical protocol: Domain Name Service (DNS). In order to use ESNI to connect to a website, the client would piggy-back on its standard A/AAAA queries a request for a TXT record with the ESNI public key. For example, to get the key for crypto.dance, the client would request the TXT record of _esni.crypto.dance:

$ dig _esni.crypto.dance TXT +short
"/wGuNThxACQAHQAgXzyda0XSJRQWzDG7lk/r01r1ZQy+MdNxKg/mAqSnt0EAAhMBAQQAAAAAX67XsAAAAABftsCwAAA="

The base64-encoded blob contains an ESNI public key and related parameters such as the encryption algorithm.

But what’s the point of encrypting SNI if we’re just going to leak the server name to network observers via a plaintext DNS query? Deploying ESNI this way became feasible with the introduction of DNS-over-HTTPS (DoH), which enables encryption of DNS queries to resolvers that provide the DoH service (1.1.1.1 is an example of such a service.). Another crucial feature of DoH is that it provides an authenticated channel for transmitting the ESNI public key from the DoH server to the client. This prevents cache-poisoning attacks that originate from the client’s local network: in the absence of DoH, a local attacker could prevent the client from offering the ESNI extension by returning an empty TXT record, or coerce the client into using ESNI with a key it controls.

While ESNI took a significant step forward, it falls short of our goal of achieving full handshake encryption. Apart from being incomplete — it only protects SNI — it is vulnerable to a handful of sophisticated attacks, which, while hard to pull off, point to theoretical weaknesses in the protocol’s design that need to be addressed.

ESNI was deployed by Cloudflare and enabled by Firefox, on an opt-in basis, in 2018, an  experience that laid bare some of the challenges with relying on DNS for key distribution. Cloudflare rotates its ESNI key every hour in order to minimize the collateral damage in case a key ever gets compromised. DNS artifacts are sometimes cached for much longer, the result of which is that there is a decent chance of a client having a stale public key. While Cloudflare’s ESNI service tolerates this to a degree, every key must eventually expire. The question that the ESNI protocol left open is how the client should proceed if decryption fails and it can’t access the current public key, via DNS or otherwise.

Another problem with relying on DNS for key distribution is that several endpoints might be authoritative for the same origin server, but have different capabilities. For example, a request for the A record of “example.com” might return one of two different IP addresses, each operated by a different CDN. The TXT record for “_esni.example.com” would contain the public key for one of these CDNs, but certainly not both. The DNS protocol does not provide a way of atomically tying together resource records that correspond to the same endpoint. In particular, it’s possible for a client to inadvertently offer the ESNI extension to an endpoint that doesn’t support it, causing the handshake to fail. Fixing this problem requires changes to the DNS protocol. (More on this below.)

The future of ESNI. In the next section, we’ll describe the ECH specification and how it addresses the shortcomings of ESNI. Despite its limitations, however, the practical privacy benefit that ESNI provides is significant. Cloudflare intends to continue its support for ESNI until ECH is production-ready.

The ins and outs of ECH

The goal of ECH is to encrypt the entire ClientHello, thereby closing the gap left in TLS 1.3 and ESNI by protecting all privacy-sensitive handshake-parameters. Similar to ESNI, the protocol uses a public key, distributed via DNS and obtained using DoH, for encryption during the client’s first flight. But ECH has improvements to key distribution that make the protocol more robust to DNS cache inconsistencies. Whereas the ESNI server aborts the connection if decryption fails, the ECH server attempts to complete the handshake and supply the client with a public key it can use to retry the connection.

But how can the server complete the handshake if it’s unable to decrypt the ClientHello? As illustrated in Figure 3, the ECH protocol actually involves two ClientHello messages: the ClientHelloOuter, which is sent in the clear, as usual; and the ClientHelloInner, which is encrypted and sent as an extension of the ClientHelloOuter. The server completes the handshake with just one of these ClientHellos: if decryption succeeds, then it proceeds with the ClientHelloInner; otherwise, it proceeds with the ClientHelloOuter.

Good-bye ESNI, hello ECH!
Figure 3: The TLS 1.3 handshake with the ECH extension.

The ClientHelloInner is composed of the handshake parameters the client wants to use for the connection. This includes sensitive values, like the SNI of the origin server it wants to reach (called the backend server in ECH parlance), the ALPN list, and so on. The ClientHelloOuter, while also a fully-fledged ClientHello message, is not used for the intended connection. Instead, the handshake is completed by the ECH service provider itself (called the client-facing server), signaling to the client that its intended destination couldn’t be reached due to decryption failure. In this case, the service provider also sends along the correct ECH public key with which the client can retry handshake, thereby “correcting” the client’s configuration. (This mechanism is similar to how the server distributed its public key for 0-RTT mode in the early days of TLS 1.3.)

At a minimum, both ClientHellos must contain the handshake parameters that are required for a server-authenticated key-exchange. In particular, while the ClientHelloInner contains the real SNI, the ClientHelloOuter also contains an SNI value, which the client expects to verify in case of ECH decryption failure (i.e., the client-facing server). If the connection is established using the ClientHelloOuter, then the client is expected to immediately abort the connection and retry the handshake with the public key provided by the server. It’s not necessary that the client specify an ALPN list in the ClientHelloOuter, nor any other extension used to guide post-handshake behavior. All of these parameters are encapsulated by the encrypted ClientHelloInner.

This design resolves — quite elegantly, I think — most of the challenges for securely deploying handshake encryption encountered by earlier mechanisms. Importantly, the design of ECH was not conceived in a vacuum. The protocol reflects the diverse perspectives of the IETF community, and its development dovetails with other IETF standards that are crucial to the success of ECH.

The first is an important new DNS feature known as the HTTPS resource record type. At a high level, this record type is intended to allow multiple HTTPS endpoints that are authoritative for the same domain name to advertise different capabilities for TLS. This makes it possible to rely on DNS for key distribution, resolving one of the deployment challenges uncovered by the initial ESNI deployment. For a deep dive into this new record type and what it means for the Internet more broadly, check out Alessandro Ghedini’s recent blog post on the subject.

The second is the CFRG’s Hybrid Public Key Encryption (HPKE) standard, which specifies an extensible framework for building public key encryption schemes suitable for a wide variety of applications. In particular, ECH delegates all of the details of its handshake encryption mechanism to HPKE, resulting in a much simpler and easier-to-analyze specification. (Incidentally, HPKE is also one of the main ingredients of Oblivious DNS-over-HTTPS.

The road ahead

The current ECH specification is the culmination of a multi-year collaboration. At this point, the overall design of the protocol is fairly stable. In fact, the next draft of the specification will be the first to be targeted for interop testing among implementations. Still, there remain a number of details that need to be sorted out. Let’s end this post with a brief overview of the road ahead.

Resistance to traffic analysis

Ultimately, the goal of ECH is to ensure that TLS connections made to different origin servers behind the same ECH service provider are indistinguishable from one another. In other words, when you connect to an origin behind, say, Cloudflare, no one on the network between you and Cloudflare should be able to discern which origin you reached, or which privacy-sensitive handshake-parameters you and the origin negotiated. Apart from an immediate privacy boost, this property, if achieved, paves the way for the deployment of new features for TLS without compromising privacy.

Encrypting the ClientHello is an important step towards achieving this goal, but we need to do a bit more. An important attack vector we haven’t discussed yet is traffic analysis. This refers to the collection and analysis of properties of the communication channel that betray part of the ciphertext’s contents, but without cracking the underlying encryption scheme. For example, the length of the encrypted ClientHello might leak enough information about the SNI for the adversary to make an educated guess as to its value (this risk is especially high for domain names that are either particularly short or particularly long). It is therefore crucial that the length of each ciphertext is independent of the values of privacy-sensitive parameters. The current ECH specification provides some mitigations, but their coverage is incomplete. Thus, improving ECH’s resistance to traffic analysis is an important direction for future work.

The spectre of ossification

An important open question for ECH is the impact it will have on network operations.

One of the lessons learned from the deployment of TLS 1.3 is that upgrading a core Internet protocol can trigger unexpected network behavior. Cloudflare was one of the first major TLS operators to deploy TLS 1.3 at scale; when browsers like Firefox and Chrome began to enable it on an experimental basis, they observed a significantly higher rate of connection failures compared to TLS 1.2. The root cause of these failures was network ossification, i.e., the tendency of middleboxes — network appliances between clients and servers that monitor and sometimes intercept traffic — to write software that expects traffic to look and behave a certain way. Changing the protocol before middleboxes had the chance to update their software led to middleboxes trying to parse packets they didn’t recognize, triggering software bugs that, in some instances, caused connections to be dropped completely.

This problem was so widespread that, instead of waiting for network operators to update their software, the design of TLS 1.3 was altered in order to mitigate the impact of network ossification. The ingenious solution was to make TLS 1.3 “look like” another protocol that middleboxes are known to tolerate. Specifically, the wire format and even the contents of handshake messages were made to resemble TLS 1.2. These two protocols aren’t identical, of course — a curious network observer can still distinguish between them — but they look and behave similar enough to ensure that the majority of existing middleboxes don’t treat them differently. Empirically, it was found that this strategy significantly reduced the connection failure rate enough to make deployment of TLS 1.3 viable.

Once again, ECH represents a significant upgrade for TLS for which the spectre of network ossification looms large. The ClientHello contains parameters, like SNI, that have existed in the handshake for a long time, and we don’t yet know what the impact will be of encrypting them. In anticipation of the deployment issues ossification might cause, the ECH protocol has been designed to look as much like a standard TLS 1.3 handshake as possible. The most notable difference is the ECH extension itself: if middleboxes ignore it — as they should, if they are compliant with the TLS 1.3 standard — then the rest of the handshake will look and behave very much as usual.

It remains to be seen whether this strategy will be enough to ensure the wide-scale deployment of ECH. If so, it is notable that this new feature will help to mitigate the impact of future TLS upgrades on network operations. Encrypting the full handshake reduces the risk of ossification since it means that there are less visible protocol features for software to ossify on. We believe this will be good for the health of the Internet overall.

Conclusion

The old TLS handshake is (unintentionally) leaky. Operational requirements of both the client and server have led to privacy-sensitive parameters, like SNI, being negotiated completely in the clear and available to network observers. The ECH extension aims to close this gap by enabling encryption of the full handshake. This represents a significant upgrade to TLS, one that will help preserve end-user privacy as the protocol continues to evolve.

The ECH standard is a work-in-progress. As this work continues, Cloudflare is committed to doing its part to ensure this important upgrade for TLS reaches Internet-scale deployment.

Round 2 post-quantum TLS is now supported in AWS KMS

Post Syndicated from Alex Weibel original https://aws.amazon.com/blogs/security/round-2-post-quantum-tls-is-now-supported-in-aws-kms/

AWS Key Management Service (AWS KMS) now supports three new hybrid post-quantum key exchange algorithms for the Transport Layer Security (TLS) 1.2 encryption protocol that’s used when connecting to AWS KMS API endpoints. These new hybrid post-quantum algorithms combine the proven security of a classical key exchange with the potential quantum-safe properties of new post-quantum key exchanges undergoing evaluation for standardization. The fastest of these algorithms adds approximately 0.3 milliseconds of overheard compared to a classical TLS handshake. The new post-quantum key exchange algorithms added are Round 2 versions of Kyber, Bit Flipping Key Encapsulation (BIKE), and Supersingular Isogeny Key Encapsulation (SIKE). Each organization has submitted their algorithms to the National Institute of Standards and Technology (NIST) as part of NIST’s post-quantum cryptography standardization process. This process spans several rounds of evaluation over multiple years, and is likely to continue beyond 2021.

In our previous hybrid post-quantum TLS blog post, we announced that AWS KMS had launched hybrid post-quantum TLS 1.2 with Round 1 versions of BIKE and SIKE. The Round 1 post-quantum algorithms are still supported by AWS KMS, but at a lower priority than the Round 2 algorithms. You can choose to upgrade your client to enable negotiation of Round 2 algorithms.

Why post-quantum TLS is important

A large-scale quantum computer would be able to break the current public-key cryptography that’s used for key exchange in classical TLS connections. While a large-scale quantum computer isn’t available today, it’s still important to think about and plan for your long-term security needs. TLS traffic using classical algorithms recorded today could be decrypted by a large-scale quantum computer in the future. If you’re developing applications that rely on the long-term confidentiality of data passed over a TLS connection, you should consider a plan to migrate to post-quantum cryptography before the lifespan of the sensitivity of your data would be susceptible to an unauthorized user with a large-scale quantum computer. As an example, this means that if you believe that a large-scale quantum computer is 25 years away, and your data must be secure for 20 years, you should migrate to post-quantum schemes within the next 5 years. AWS is working to prepare for this future, and we want you to be prepared too.

We’re offering this feature now instead of waiting for standardization efforts to be complete so you have a way to measure the potential performance impact to your applications. Offering this feature now also gives you the protection afforded by the proposed post-quantum schemes today. While we believe that the use of this feature raises the already high security bar for connecting to AWS KMS endpoints, these new cipher suites will impact bandwidth utilization and latency. However, using these new algorithms could also create connection failures for intermediate systems that proxy TLS connections. We’d like to get feedback from you on the effectiveness of our implementation or any issues found so we can improve it over time.

Hybrid post-quantum TLS 1.2

Hybrid post-quantum TLS is a feature that provides the security protections of both the classical and post-quantum key exchange algorithms in a single TLS handshake. Figure 1 shows the differences in the connection secret derivation process between classical and hybrid post-quantum TLS 1.2. Hybrid post-quantum TLS 1.2 has three major differences from classical TLS 1.2:

  • The negotiated post-quantum key is appended to the ECDHE key before being used as the hash-based message authentication code (HMAC) key.
  • The text hybrid in its ASCII representation is prepended to the beginning of the HMAC message.
  • The entire client key exchange message from the TLS handshake is appended to the end of the HMAC message.
Figure 1: Differences in the connection secret derivation process between classical and hybrid post-quantum TLS 1.2

Figure 1: Differences in the connection secret derivation process between classical and hybrid post-quantum TLS 1.2

Some background on post-quantum TLS

Today, all requests to AWS KMS use TLS with key exchange algorithms that provide perfect forward secrecy and use one of the following classical schemes:

While existing FFDHE and ECDHE schemes use perfect forward secrecy to protect against the compromise of the server’s long-term secret key, these schemes don’t protect against large-scale quantum computers. In the future, a sufficiently capable large-scale quantum computer could run Shor’s Algorithm to recover the TLS session key of a recorded classical session, and thereby gain access to the data inside. Using a post-quantum key exchange algorithm during the TLS handshake protects against attacks from a large-scale quantum computer.

The possibility of large-scale quantum computing has spurred the development of new quantum-resistant cryptographic algorithms. NIST has started the process of standardizing post-quantum key encapsulation mechanisms (KEMs). A KEM is a type of key exchange that’s used to establish a shared symmetric key. AWS has chosen three NIST KEM submissions to adopt in our post-quantum efforts:

Hybrid mode ensures that the negotiated key is as strong as the weakest key agreement scheme. If one of the schemes is broken, the communications remain confidential. The Internet Engineering Task Force (IETF) Hybrid Post-Quantum Key Encapsulation Methods for Transport Layer Security 1.2 draft describes how to combine post-quantum KEMs with ECDHE to create new cipher suites for TLS 1.2.

These cipher suites use a hybrid key exchange that performs two independent key exchanges during the TLS handshake. The key exchange then cryptographically combines the keys from each into a single TLS session key. This strategy combines the proven security of a classical key exchange with the potential quantum-safe properties of new post-quantum key exchanges being analyzed by NIST.

The effect of hybrid post-quantum TLS on performance

Post-quantum cipher suites have a different performance profile and bandwidth usage from traditional cipher suites. AWS has measured bandwidth and latency across 2,000 TLS handshakes between an Amazon Elastic Compute Cloud (Amazon EC2) C5n.4xlarge client and the public AWS KMS endpoint, which were both in the us-west-2 Region. Your own performance characteristics might differ, and will depend on your environment, including your:

  • Hardware–CPU speed and number of cores.
  • Existing workloads–how often you call AWS KMS and what other work your application performs.
  • Network–location and capacity.

The following graphs and table show latency measurements performed by AWS for all newly supported Round 2 post-quantum algorithms, in addition to the classical ECDHE key exchange algorithm currently used by most customers.

Figure 2 shows the latency differences of all hybrid post-quantum algorithms compared with classical ECDHE alone, and shows that compared to ECDHE alone, SIKE adds approximately 101 milliseconds of overhead, BIKE adds approximately 9.5 milliseconds of overhead, and Kyber adds approximately 0.3 milliseconds of overhead.
 

Figure 2: TLS handshake latency at varying percentiles for four key exchange algorithms

Figure 2: TLS handshake latency at varying percentiles for four key exchange algorithms

Figure 3 shows the latency differences between ECDHE with Kyber, and ECDHE alone. The addition of Kyber adds approximately 0.3 milliseconds of overhead.
 

Figure 3: TLS handshake latency at varying percentiles, with only top two performing key exchange algorithms

Figure 3: TLS handshake latency at varying percentiles, with only top two performing key exchange algorithms

The following table shows the total amount of data (in bytes) needed to complete the TLS handshake for each cipher suite, the average latency, and latency at varying percentiles. All measurements were gathered from 2,000 TLS handshakes. The time was measured on the client from the start of the handshake until the handshake was completed, and includes all network transfer time. All connections used RSA authentication with a 2048-bit key, and ECDHE used the secp256r1 curve. All hybrid post-quantum tests used the NIST Round 2 versions. The Kyber test used the Kyber-512 parameter, the BIKE test used the BIKE-1 Level 1 parameter, and the SIKE test used the SIKEp434 parameter.

Item Bandwidth
(bytes)
Total
handshakes
Average
(ms)
p0
(ms)
p50
(ms)
p90
(ms)
p99
(ms)
ECDHE (classic) 3,574 2,000 3.08 2.07 3.02 3.95 4.71
ECDHE + Kyber R2 5,898 2,000 3.36 2.38 3.17 4.28 5.35
ECDHE + BIKE R2 12,456 2,000 14.91 11.59 14.16 18.27 23.58
ECDHE + SIKE R2 4,628 2,000 112.40 103.22 108.87 126.80 146.56

By default, the AWS SDK client performs a TLS handshake once to set up a new TLS connection, and then reuses that TLS connection for multiple requests. This means that the increased cost of a hybrid post-quantum TLS handshake is amortized over multiple requests sent over the TLS connection. You should take the amortization into account when evaluating the overall additional cost of using post-quantum algorithms; otherwise performance data could be skewed.

AWS KMS has chosen Kyber Round 2 to be KMS’s highest prioritized post-quantum algorithm, with BIKE Round 2, and SIKE Round 2 next in priority order for post-quantum algorithms. This is because Kyber’s performance is closest to the classical ECDHE performance that most AWS KMS customers are using today and are accustomed to.

How to use hybrid post-quantum cipher suites

To use the post-quantum cipher suites with AWS KMS, you need the preview release of the AWS Common Runtime (CRT) HTTP client for the AWS SDK for Java 2.x. Also, you will need to configure the AWS CRT HTTP client to use the s2n post-quantum hybrid cipher suites. Post-quantum TLS for AWS KMS is available in all AWS Regions except for AWS GovCloud (US-East), AWS GovCloud (US-West), AWS China (Beijing) Region operated by Beijing Sinnet Technology Co. Ltd (“Sinnet”), and AWS China (Ningxia) Region operated by Ningxia Western Cloud Data Technology Co. Ltd. (“NWCD”). Since NIST has not yet standardized post-quantum cryptography, connections that require Federal Information Processing Standards (FIPS) compliance cannot use the hybrid key exchange. For example, kms.<region>.amazonaws.com supports the use of post-quantum cipher suites, while kms-fips.<region>.amazonaws.com does not.

  1. If you’re using the AWS SDK for Java 2.x, you must add the preview release of the AWS Common Runtime client to your Maven dependencies.
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>aws-crt-client</artifactId>
        <version>2.14.13-PREVIEW</version>
    </dependency>
    

  2. You then must configure the new SDK and cipher suite in the existing initialization code of your application:
    if(!TLS_CIPHER_PREF_KMS_PQ_TLSv1_0_2020_07.isSupported()){
        throw new RuntimeException("Post Quantum Ciphers not supported on this Platform");
    }
    
    SdkAsyncHttpClient awsCrtHttpClient = AwsCrtAsyncHttpClient.builder()
              .tlsCipherPreference(TLS_CIPHER_PREF_KMS_PQ_TLSv1_0_2020_07)
              .build();
              
    KmsAsyncClient kms = KmsAsyncClient.builder()
             .httpClient(awsCrtHttpClient)
             .build();
             
    ListKeysResponse response = kms.listKeys().get();
    

Now, all connections made to AWS KMS in supported Regions will use the new hybrid post-quantum cipher suites! To see a complete example of everything set up, check out the example application here.

Things to try

Here are some ideas about how to use this post-quantum-enabled client:

  • Run load tests and benchmarks. These new cipher suites perform differently than traditional key exchange algorithms. You might need to adjust your connection timeouts to allow for the longer handshake times or, if you’re running inside an AWS Lambda function, extend the execution timeout setting.
  • Try connecting from different locations. Depending on the network path your request takes, you might discover that intermediate hosts, proxies, or firewalls with deep packet inspection (DPI) block the request. This could be due to the new cipher suites in the ClientHello or the larger key exchange messages. If this is the case, you might need to work with your security team or IT administrators to update the relevant configuration to unblock the new TLS cipher suites. We’d like to hear from you about how your infrastructure interacts with this new variant of TLS traffic. If you have questions or feedback, please start a new thread on the AWS KMS discussion forum.

Conclusion

In this blog post, I announced support for Round 2 hybrid post-quantum algorithms in AWS KMS, and showed you how to begin experimenting with hybrid post-quantum key exchange algorithms for TLS when connecting to AWS KMS endpoints.

More info

If you’d like to learn more about post-quantum cryptography check out:

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Alex Weibel

Alex is a Senior Software Engineer on the AWS Crypto Algorithms team. He’s one of the maintainers for Amazon’s TLS Library s2n. Previously, Alex worked on TLS termination and request proxying for S3 and the Elastic Load Balancing Service developing new features for customers. Alex holds a Bachelor of Science degree in Computer Science from the University of Texas at Austin.

Automated Origin CA for Kubernetes

Post Syndicated from Terin Stock original https://blog.cloudflare.com/automated-origin-ca-for-kubernetes/

Automated Origin CA for Kubernetes

Automated Origin CA for Kubernetes

In 2016, we launched the Cloudflare Origin CA, a certificate authority optimized for making it easy to secure the connection between Cloudflare and an origin server. Running our own CA has allowed us to support fast issuance and renewal, simple and effective revocation, and wildcard certificates for our users.

Out of the box, managing TLS certificates and keys within Kubernetes can be challenging and error prone. The secret resources have to be constructed correctly, as components expect secrets with specific fields. Some forms of domain verification require manually rotating secrets to pass. Once you’re successful, don’t forget to renew before the certificate expires!

cert-manager is a project to fill this operational gap, providing Kubernetes resources that manage the lifecycle of a certificate. Today we’re releasing origin-ca-issuer, an extension to cert-manager integrating with Cloudflare Origin CA to easily create and renew certificates for your account’s domains.

Origin CA Integration

Creating an Issuer

After installing cert-manager and origin-ca-issuer, you can create an OriginIssuer resource. This resource creates a binding between cert-manager and the Cloudflare API for an account. Different issuers may be connected to different Cloudflare accounts in the same Kubernetes cluster.

apiVersion: cert-manager.k8s.cloudflare.com/v1
kind: OriginIssuer
metadata:
  name: prod-issuer
  namespace: default
spec:
  signatureType: OriginECC
  auth:
    serviceKeyRef:
      name: service-key
      key: key
      ```

This creates a new OriginIssuer named “prod-issuer” that issues certificates using ECDSA signatures, and the secret “service-key” in the same namespace is used to authenticate to the Cloudflare API.

Signing an Origin CA Certificate

After creating an OriginIssuer, we can now create a Certificate with cert-manager. This defines the domains, including wildcards, that the certificate should be issued for, how long the certificate should be valid, and when cert-manager should renew the certificate.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com
  namespace: default
spec:
  # The secret name where cert-manager
  # should store the signed certificate.
  secretName: example-com-tls
  dnsNames:
    - example.com
  # Duration of the certificate.
  duration: 168h
  # Renew a day before the certificate expiration.
  renewBefore: 24h
  # Reference the Origin CA Issuer you created above,
  # which must be in the same namespace.
  issuerRef:
    group: cert-manager.k8s.cloudflare.com
    kind: OriginIssuer
    name: prod-issuer

Once created, cert-manager begins managing the lifecycle of this certificate, including creating the key material, crafting a certificate signature request (CSR), and constructing a certificate request that will be processed by the origin-ca-issuer.

When signed by the Cloudflare API, the certificate will be made available, along with the private key, in the Kubernetes secret specified within the secretName field. You’ll be able to use this certificate on servers proxied behind Cloudflare.

Extra: Ingress Support

If you’re using an Ingress controller, you can use cert-manager’s Ingress support to automatically manage Certificate resources based on your Ingress resource.

apiVersion: networking/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/issuer: prod-issuer
    cert-manager.io/issuer-kind: OriginIssuer
    cert-manager.io/issuer-group: cert-manager.k8s.cloudflare.com
  name: example
  namespace: default
spec:
  rules:
    - host: example.com
      http:
        paths:
          - backend:
              serviceName: examplesvc
              servicePort: 80
            path: /
  tls:
    # specifying a host in the TLS section will tell cert-manager 
    # what DNS SANs should be on the created certificate.
    - hosts:
        - example.com
      # cert-manager will create this secret
      secretName: example-tls

Building an External cert-manager Issuer

An external cert-manager issuer is a specialized Kubernetes controller. There’s no direct communication between cert-manager and external issuers at all; this means that you can use any existing tools and best practices for developing controllers to develop an external issuer.

We’ve decided to use the excellent controller-runtime project to build origin-ca-issuer, running two reconciliation controllers.

Automated Origin CA for Kubernetes

OriginIssuer Controller

The OriginIssuer controller watches for creation and modification of OriginIssuer custom resources. The controllers create a Cloudflare API client using the details and credentials referenced. This client API instance will later be used to sign certificates through the API. The controller will periodically retry to create an API client; once it is successful, it updates the OriginIssuer’s status to be ready.

CertificateRequest Controller

The CertificateRequest controller watches for the creation and modification of cert-manager’s CertificateRequest resources. These resources are created automatically by cert-manager as needed during a certificate’s lifecycle.

The controller looks for Certificate Requests that reference a known OriginIssuer, this reference is copied by cert-manager from the origin Certificate resource, and ignores all resources that do not match. The controller then verifies the OriginIssuer is in the ready state, before transforming the certificate request into an API request using the previously created clients.

On a successful response, the signed certificate is added to the certificate request, and which cert-manager will use to create or update the secret resource. On an unsuccessful request, the controller will periodically retry.

Learn More

Up-to-date documentation and complete installation instructions can be found in our GitHub repository. Feedback and contributions are greatly appreciated. If you’re interested in Kubernetes at Cloudflare, including building controllers like these, we’re hiring.