All posts by Christopher Wood

Stronger than a promise: proving Oblivious HTTP privacy properties

Post Syndicated from Christopher Wood original https://blog.cloudflare.com/stronger-than-a-promise-proving-oblivious-http-privacy-properties/

Stronger than a promise: proving Oblivious HTTP privacy properties

Stronger than a promise: proving Oblivious HTTP privacy properties

We recently announced Privacy Gateway, a fully managed, scalable, and performant Oblivious HTTP (OHTTP) relay. Conceptually, OHTTP is a simple protocol: end-to-end encrypted requests and responses are forwarded between client and server through a relay, decoupling who from what was sent. This is a common pattern, as evidenced by deployed technologies like Oblivious DoH and Apple Private Relay. Nevertheless, OHTTP is still new, and as a new protocol it’s imperative that we analyze the protocol carefully.

To that end, we conducted a formal, computer-aided security analysis to complement the ongoing standardization process and deployment of this protocol. In this post, we describe this analysis in more depth, digging deeper into the cryptographic details of the protocol and the model we developed to analyze it. If you’re already familiar with the OHTTP protocol, feel free to skip ahead to the analysis to dive right in. Otherwise, let’s first review what OHTTP sets out to achieve and how the protocol is designed to meet those goals.

Decoupling who from what was sent

OHTTP is a protocol that combines public key encryption with a proxy to separate the contents of an HTTP request (and response) from the sender of an HTTP request. In OHTTP, clients generate encrypted requests and send them to a relay, the relay forwards them to a gateway server, and then finally the gateway decrypts the message to handle the request. The relay only ever sees ciphertext and the client and gateway identities, and the gateway only ever sees the relay identity and plaintext.

In this way, OHTTP is a lightweight application-layer proxy protocol. This means that it proxies application messages rather than network-layer connections. This distinction is important, so let’s make sure we understand the differences. Proxying connections involves a whole other suite of protocols typically built on HTTP CONNECT. (Technologies like VPNs and WireGuard, including Cloudflare WARP, can also be used, but let’s focus on HTTP CONNECT for comparison.)

Stronger than a promise: proving Oblivious HTTP privacy properties
Connection-oriented proxy depiction

Since the entire TCP connection itself is proxied, connection-oriented proxies are compatible with any application that uses TCP. In effect, they are general purpose proxy protocols that support any type of application traffic. In contrast, proxying application messages is compatible with application use cases that require transferring entire objects (messages) between a client and server.

Stronger than a promise: proving Oblivious HTTP privacy properties
Message-oriented proxy depiction

Examples include DNS requests and responses, or, in the case of OHTTP, HTTP requests and responses. In other words, OHTTP is not a general purpose proxy protocol: it’s fit for purpose, aimed at transactional interactions between clients and servers (such as app-level APIs). As a result, it is much simpler in comparison.

Applications use OHTTP to ensure that requests are not linked to either of the following:

  1. Client identifying information, including the IP address, TLS fingerprint, and so on. As a proxy protocol, this is a fundamental requirement.
  2. Future requests from the same client. This is necessary for applications that do not carry state across requests.

These two properties make OHTTP a perfect fit for applications that wish to provide privacy to their users without compromising basic functionality. It’s served as the foundation for a widespread deployment of Oblivious DoH for over a year now, and as of recently, serves as the foundation for Flo Health’s Anonymous Mode feature.

It’s worth noting that both of these properties could be achieved with a connection-oriented protocol, but at the cost of a new end-to-end TLS connection for each message that clients wish to transmit. This can be prohibitively expensive for all entities that participate in the protocol.

So how exactly does OHTTP achieve these goals? Let’s dig deeper into OHTTP to find out.

Oblivious HTTP protocol design

A single transaction in OHTTP involves the following steps:

  1. A client encapsulates an HTTP request using the public key of the gateway server, and sends it to the relay over a client<>relay HTTPS connection.
  2. The relay forwards the request to the server over its own relay<>gateway HTTPS connection.
  3. The gateway decapsulates the request, forwarding it to the target server which can produce the resource.
  4. The gateway returns an encapsulated response to the relay, which then forwards the result to the client.

Observe that in this transaction the relay only ever sees the client and gateway identities (the client IP address and the gateway URL, respectively), but does not see any application data. Conversely, the gateway sees the application data and the relay IP address, but does not see the client IP address. Neither party has the full picture, and unless the relay and gateway collude, it stays that way.

The HTTP details for forwarding requests and responses in the transaction above are not technically interesting – a message is sent from sender to receiver over HTTPS using a POST – so we’ll skip over them. The fascinating bits are in the request and response encapsulation, which build upon HPKE, a recently ratified standard for hybrid public key encryption.

Let’s begin with request encapsulation, which is hybrid public key encryption. Clients first transform their HTTP request into a binary format, called Binary HTTP, as specified by RFC9292. Binary HTTP is, as the name suggests, a binary format for encoding HTTP messages. This representation lets clients encode HTTP requests to binary-encoded values and for the gateway to reverse this process, recovering an HTTP request from a binary-encoded value. Binary encoding is necessary because the public key encryption layer expects binary-encoded inputs.

Once the HTTP request is encoded in binary format, it is then fed into HPKE to produce an encrypted message, which clients then send to the relay to be forwarded to the gateway. The gateway decrypts this message, transforms the binary-encoded request back to its equivalent HTTP request, and then forwards it to the target server for processing.

Stronger than a promise: proving Oblivious HTTP privacy properties

Responses from the gateway are encapsulated back to the client in a very similar fashion. The gateway first encodes the response in an equivalent binary HTTP message, encrypts it using a symmetric key known only to the client and gateway, and then returns it to the relay to be forwarded to the client. The client decrypts and transforms this message to recover the result.

Stronger than a promise: proving Oblivious HTTP privacy properties

Simplified model and security goals

In our formal analysis, we set out to make sure that OHTTP’s use of encryption and proxying achieves the desired privacy goals described above.

To motivate the analysis, consider the following simplified model where there exists two clients C1 and C2, one relay R, and one gateway, G. OHTTP assumes an attacker that can observe all network activity and can adaptively compromise either R or G, but not C1 or C2. OHTTP assumes that R and G do not collude, and so we assume only one of R and G is compromised. Once compromised, the attacker has access to all session information and private key material for the compromised party. The attacker is prohibited from sending client-identifying information, such as IP addresses, to the gateway. (This would allow the attacker to trivially link a query to the corresponding client.)

In this model, both C1 and C2 send OHTTP requests Q1 and Q2, respectively, through R to G, and G provides answers A1 and A2. The attacker aims to link C1 to (Q1, A1) and C2 to (Q2, A2), respectively. The attacker succeeds if this linkability is possible without any additional interaction. OHTTP prevents such linkability. Informally, this means:

  1. Requests and responses are known only to clients and gateways in possession of the corresponding response key and HPKE keying material.
  2. The gateway cannot distinguish between two identical requests generated from the same client, and two identical requests generated from different clients, in the absence of unique per-client keys.

And informally it might seem clear that OHTTP achieves these properties. But we want to prove this formally, which means that the design, if implemented perfectly, would have these properties. This type of formal analysis is distinct from formal verification, where you take a protocol design and prove that some code implements it correctly. Whilst both are useful they are different processes, and in this blog post we’ll be talking about the former. But first, let’s give some background on formal analysis.

Formal analysis programming model

In our setting, a formal analysis involves producing an algebraic description of the protocol and then using math to prove that the algebraic description has the properties we want. The end result is proof that shows that our idealized algebraic version of the protocol is “secure”, i.e. has the desired properties, with respect to an attacker we want to defend against. In our case, we chose to model our idealized algebraic version of OHTTP using a tool called Tamarin, a security-focused theorem prover and model checker. Tamarin is an intimidating tool to use, but makes intuitive sense once you get familiar with it. We’ll break down the various parts of a Tamarin model in the context of our OHTTP model below.

Modeling the Protocol Behavior

Tamarin uses a technique known as multiset rewriting to describe protocols. A protocol description is formed of a series of “rules” that can “fire” when certain requirements are met. Each rule represents a discrete step in the protocol, and when a rule fires that means the step was taken. For example, we have a rule representing the gateway generating its long-term public encapsulation key, and for different parties in the protocol establishing secure TLS connections. These rules can be triggered pretty much any time as they have no requirements.

Stronger than a promise: proving Oblivious HTTP privacy properties
Basic rule for OHTTP gateway key generation

Tamarin represents these requirements as “facts”. A rule can be triggered when the right facts are available. Tamarin stores all the available facts in a “bag” or multiset. A multiset is similar to an ordinary set, in that it stores a collection of objects in an unordered fashion, but unlike an ordinary set, duplicate objects are allowed. This is the “multiset” part of “multiset rewriting”.

The rewriting part refers to the output of our rules. When a rule triggers it takes some available facts out of the bag and, when finished, inserts some new facts into the bag. These new facts might fulfill the requirements of some other rule, which can then be triggered, producing even more new facts, and so on1. In this way we can represent progress through the protocol. Using input and output facts, we can describe our rule for generating long-term public encapsulation keys, which has no requirements and produces a long-term key as output, as follows.

Stronger than a promise: proving Oblivious HTTP privacy properties

A rule requirement is satisfied if there exist output facts that match the rule’s input facts. As an example, in OHTTP, one requirement for the client rule for generating a request is that the long-term public encapsulation key exists. This matching is shown below.

Stronger than a promise: proving Oblivious HTTP privacy properties

Let’s put some of these pieces together to show a very small but concrete part of OHTTP as an example: the client generating its encapsulated request and sending it to the relay. This step should produce a message for the relay, as well as any corresponding state needed to process the eventual response from the relay. As a precondition, the client requires (1) the gateway public key and (2) a TLS connection to the relay. And as mentioned earlier, generating the public key and TLS connection do not require any inputs, so they can be done at any time.

Stronger than a promise: proving Oblivious HTTP privacy properties

Modeling events in time

Beyond consuming and producing new facts, each Tamarin rule can also create side effects, called “action facts.” Tamarin records the action facts each time a rule is triggered. An action fact might be something like “a client message containing the contents m was sent at time t.” Sometimes rules can only be triggered in a strict sequence, and we can therefore put their action facts in a fixed time order. At other times multiple rules might have their prerequisites met at the same time, and therefore we can’t put their action facts into a strict time sequence. We can represent this pattern of partially ordered implications as a directed acyclic graph, or DAG for short.

Altogether, multiset rewriting rules describe the steps of a protocol, and the resulting DAG records the actions associated with the protocol description. We refer to the DAG of actions as the action graph. If we’ve done our job well it’s possible to follow these rules and produce every possible combination of messages or actions allowed by the protocol, and their corresponding action graph.

As an example of the action graph, let’s consider what happens when the client successfully finishes the protocol. When the requirements for this rule are satisfied, the rule triggers, marking that the client is done and that the response was valid. Since the protocol is done at this point, there are no output facts produced.

Stronger than a promise: proving Oblivious HTTP privacy properties
Action graph for terminal client response handler rule

Modeling the attacker

The action graph is core to reasoning about the protocol’s security properties. We can check a graph for various properties, e.g. “does the first action taken by the relay happen after the first action taken by the client?”. Our rules allow for multiple runs of the protocol to happen at the same time. This is very powerful. We can look at a graph and ask “did something bad happen here that might break the protocol’s security properties?”

In particular, we can prove (security and correctness) properties by querying this graph, or by asserting various properties about it. For example, we might say “for all runs of the protocol, if the client finishes the protocol and can decrypt the response from the gateway, then the response must have been generated and encrypted by an entity which has the corresponding shared secret.”

This is a useful statement, but it doesn’t say much about security. What happens if the gateway private key is compromised, for example? In order to prove security properties, we need to define our threat model, which includes the adversary and their capabilities. In Tamarin, we encode the threat model as part of the protocol model. For example, when we define messages being passed from the client to the relay, we can add a special rule that allows the attacker to read it as it goes past. This gives us the ability to describe properties such as “for all runs of the protocol in our language the attacker never learns the secret key.”

For security protocols, we typically give the attacker the ability to read, modify, drop, and replay any message. This is sometimes described as “the attacker controls the network”, or a Dolev-Yao attacker. However, the attacker can also sometimes compromise different entities in a protocol, learning state associated with that entity. This is sometimes called an extended Dolev-Yao attacker, and it is precisely the attacker we consider in our model.

Going back to our model, we give the attacker the ability to compromise long-term key pairs and TLS sessions as needed through different rules. These set various action facts that mark the fact that compromise took place.

Stronger than a promise: proving Oblivious HTTP privacy properties
Action graph for key compromise rule

Putting everything together, we have a way to model the protocol behavior, attacker capabilities, and security properties. Let’s now dive into how we applied these to prove OHTTP secure.

OHTTP Tamarin model

In our model, we give the attacker the ability to compromise the server’s long-term keys and the key between the client and the relay. Against this attacker, we aim to prove these two informal statements stated above:

  1. Requests and responses are known only to clients and gateways in possession of the corresponding response key and HPKE keying material.
  2. The gateway cannot distinguish between two requests generated from the same client, and two requests generated from different clients, in the absence of unique per-client keys.

To prove these formally, we express them somewhat differently. First, we assert that the protocol actually completes. This is an important step, because if your model has a bug in it where the protocol can’t even run as intended, then Tamarin is likely to say it’s “secure” because nothing bad ever happens.

For the core security properties, we translate the desired goals into questions we ask about the model. In this way, formal analysis only provides us proof (or disproof!) of the questions we ask, not the questions we should have asked, and so this translation relies on experience and expertise. We break down this translation for each of the questions we want to ask below, starting with gateway authentication.

Gateway authentication Unless the attacker has compromised the gateway’s long term keys, if the client completes the protocol and is able to decrypt the gateway’s response, then it knows that: the responder was the gateway it intended to use, the gateway derived the same keys, the gateway saw the request the client sent, and the response the client received is the one the gateway sent.

This tells us that the protocol actually worked, and that the messages sent and received were as they were supposed to be. One aspect of authentication can be that the participants agree on some data, so although this protocol seems to be a bit of a grab bag of properties they’re all part of one authentication property.

Next, we need to prove that the request and response remain secret. There are several ways in which secrecy may be violated, e.g., if encryption or decryption keys are compromised. We do so by proving the following properties.

Request and response secrecy
The request and response are both secret, i.e., the attacker never learns them, unless the attacker has compromised the gateway’s long term keys.

In a sense, request and response secrecy covers the case where the gateway is malicious, because if the gateway is malicious then the “attacker” knows the gateway’s long term keys.

Relay connection security
The contents of the connection between the client and relay are secret unless the attacker has compromised the relay.

We don’t have to worry about the secrecy of the connection if the client is compromised because in that scenario the attacker knows the query before it’s even been sent, and can learn the response by making an honest query itself. If your client is compromised then it’s game over.

AEAD nonce reuse resistance
If the gateway sends a message to the client, and the attacker finds a different message encrypted with the same key and nonce, then either the attacker has already compromised the gateway, or they already knew the query.

In translation, this property means that the response encryption is correct and not vulnerable to attack, such as through AEAD nonce reuse. This would obviously be a disaster for OHTTP, so we were careful to check that this situation never arises, especially as we’d already detected this issue in ODoH.

Finally, and perhaps most importantly, we want to prove that an attacker can’t link a particular query to a client. We prove a slightly different property which effectively argues that, unless the relay and gateway collude, then the attacker cannot link the encrypted query to its decrypted query together. In particular, we prove the following:

Client unlinkability
If an attacker knows the query and the contents of the connection sent to the relay (i.e. the encrypted query), then it must have compromised both the gateway and the relay.

This doesn’t in general prove indistinguishability. There are two techniques an attacker can use to link two queries. Direct inference and statistical analysis. Because of the anonymity trilemma we know that we cannot defend against statistical analysis, so we have to declare it out of scope and move on. To prevent direct inference we need to make sure that the attacker doesn’t compromise either the client, or both the relay and the gateway together, which would let it directly link the queries. So is there anything we can protect against? Thankfully there is one thing. We can make sure that a malicious gateway can’t identify that a single client sent two messages. We prove that by not keeping any state between connections. If a returning client acts in exactly the same way as a new client, and doesn’t carry any state between requests, there’s nothing for the malicious gateway to analyze.

And that’s it! If you want to have a go at proving some of these properties yourself our models and proofs are available on our GitHub, as are our ODoH models and proofs. The Tamarin prover is freely available too, so you can double-check all our work. Hopefully this post has given you a flavor of what we mean when we say that we’ve proven a protocol secure, and inspired you to have a go yourself. If you want to work on great projects like this check out our careers page.


1Depending on the model, this process can lead to an exponential blow-up in search space, making it impossible to prove anything automatically. Moreover, if the new output facts do not fulfill the requirements of any remaining rule(s) then the process hangs.

Announcing experimental DDR in 1.1.1.1

Post Syndicated from Christopher Wood original https://blog.cloudflare.com/announcing-ddr-support/

Announcing experimental DDR in 1.1.1.1

Announcing experimental DDR in 1.1.1.1

1.1.1.1 sees approximately 600 billion queries per day. However, proportionally, most queries sent to this resolver are over cleartext: 89% over UDP and TCP combined, and the remaining 11% are encrypted. We care about end-user privacy and would prefer to see all of these queries sent to us over an encrypted transport using DNS-over-TLS or DNS-over-HTTPS. Having a mechanism by which clients could discover support for encrypted protocols such as DoH or DoT will help drive this number up and lead to more name encryption on the Internet. That’s where DDR – or Discovery of Designated Resolvers – comes into play. As of today, 1.1.1.1 supports the latest version of DDR so clients can automatically upgrade non-secure UDP and TCP connections to secure connections. In this post, we’ll describe the motivations for DDR, how the mechanism works, and, importantly, how you can test it out as a client.

DNS transports and public resolvers

We initially launched our public recursive resolver service 1.1.1.1 over three years ago, and have since seen its usage steadily grow. Today, it is one of the fastest public recursive resolvers available to end-users, supporting the latest security and privacy DNS transports such as HTTP/3 for DNS-over-HTTPS (DoH), as well as Oblivious DoH.

As a public resolver, all clients, regardless of type, are typically manually configured based on a user’s desired performance, security, and privacy requirements. This choice reflects answers to two separate but related types of questions:

  1. What recursive resolver should be used to answer my DNS queries? Does the resolver perform well? Does the recursive resolver respect my privacy?
  2. What protocol should be used to speak to this particular recursive resolver? How can I keep my DNS data safe from eavesdroppers that should otherwise not have access to it?

The second question primarily concerns technical matters. In particular, whether or not a recursive resolver supports DoH is simple enough to answer. Either the recursive resolver does or does not support it!

In contrast, the first question is primarily a matter of policy. For example, consider the question of choosing between a local network-provided DNS recursive resolver and a public recursive resolver. How do resolver features (including DoH support, for example) influence this decision? How does the resolver’s privacy policy regarding data use and retention influence this decision? More generally, what information about recursive resolver capabilities is available to clients in making this decision and how is this information delivered to clients?

These policy questions have been the topic of substantial debate in the Internet Engineering Task Force (IETF), the standards body where DoH was standardized, and is the one facet of the Adaptive DNS Discovery (ADD) Working Group, which is chartered to work on the following items (among others):

– Define a mechanism that allows clients to discover DNS resolvers that support encryption and that are available to the client either on the public Internet or on private or local networks.

– Define a mechanism that allows communication of DNS resolver information to clients for use in selection decisions. This could be part of the mechanism used for discovery, above.

In other words, the ADD Working Group aims to specify mechanisms by which clients can obtain the information they need to answer question (1). Critically, one of those pieces of information is what encrypted transport protocols the recursive resolver supports, which would answer question (2).

As the answer to question (2) is purely technical and not a matter of policy, the ADD Working Group was able to specify a workable solution that we’ve implemented and tested with existing clients. Before getting into the details of how it works, let’s dig into the problem statement here and see what’s required to address it.

Threat model and problem statement

The DDR problem is relatively straightforward: given the IP address of a DNS recursive resolver, how can one discover parameters necessary for speaking to the same resolver using an encrypted transport? (As above, discovering parameters for a different resolver is a distinctly different problem that pertains to policy and is therefore out of scope.)

This question is only meaningful insofar as using encryption helps protect against some attacker. Otherwise, if the network was trusted, encryption would add no value! A direct consequence is that this question assumes the network – for some definition of “the network” – is untrusted and encryption helps protect against this network.

But what exactly is the network here? In practice, the topology typically looks like the following:

Announcing experimental DDR in 1.1.1.1
Typical DNS configuration from DHCP

Again, for DNS discovery to have any meaning, we assume that either the ISP or home network – or both – is untrusted and malicious. The setting here depends on the client and the network they are attached to, but it’s generally simplest to assume the ISP network is untrusted.

This question also makes one important assumption: clients know the desired recursive resolver address. Why is this important? Typically, the IP address of a DNS recursive resolver is provided via Dynamic Host Configuration Protocol (DHCP). When a client joins a network, it uses DHCP to learn information about the network, including the default DNS recursive resolver. However, DHCP is a famously unauthenticated protocol, which means that any active attacker on the network can spoof the information, as shown below.

Announcing experimental DDR in 1.1.1.1
Unauthenticated DHCP discovery

One obvious attacker vector would be for the attacker to redirect DNS traffic from the network’s desired recursive resolver to an attacker-controlled recursive resolver. This has important implications on the threat model for discovery.

First, there is currently no known mechanism for encrypted DNS discovery in the presence of an active attacker that can influence the client’s view of the recursive resolver’s address. In other words, to make any meaningful improvement, DNS discovery assumes the client’s view of the DNS recursive resolver address is correct (and obtained through some secure mechanism). A second implication is that the attacker can simply block any attempt of client discovery, preventing upgrade to encrypted transports. This seems true of any interactive discovery mechanism. As a result, DNS discovery must relax this attacker’s capabilities somewhat: rather than add, drop, or modify packets, the attacker can only add or modify packets.

Altogether, this threat model lets us sharpen the DNS discovery problem statement: given the IP address of a DNS recursive resolver, how can one securely discover parameters necessary for speaking to the same resolver using an encrypted transport in the presence of an active attacker that can add or modify packets? It should be infeasible, for example, for the attacker to redirect the client from the resolver that it knows at the outset to one the attacker controls.

So how does this work, exactly?

DDR mechanics

DDR depends on two mechanisms:

  1. Certificate-based authentication of encrypted DNS resolvers.
  2. SVCB records for encoding and communicating DNS parameters.

Certificates allow resolvers to prove authority for IP addresses. For example, if you view the certificate for one.one.one.one, you’ll see several IP addresses listed under the SubjectAlternativeName extension, including 1.1.1.1.

Announcing experimental DDR in 1.1.1.1
SubjectAltName list of the one.one.one.one certificate

SVCB records are extensible key-value stores that can be used for conveying information about services to clients. Example information includes the supported application protocols, including HTTP/3, as well as keying material like that used for TLS Encrypted Client Hello.

How does DDR combine these two to solve the discovery problem above? In three simple steps:

  1. Clients query the expected DNS resolver for its designations and their parameters with a special-purpose SVCB record.
  2. Clients open a secure connection to the designated resolver, for example, one.one.one.one, authenticating the resolver against the one.one.one.one name.
  3. Clients check that the designated resolver is additionally authenticated for the IP address of the origin resolver. That is, the certificate for one.one.one.one, the designated resolver, must include the IP address 1.1.1.1, the original designator resolver.

If this validation completes, clients can then use the secure connection to the designated resolver. In pictures, this is as follows:

Announcing experimental DDR in 1.1.1.1
DDR discovery process

This demonstrates that the encrypted DNS resolver is authoritative for the client’s original DNS resolver. Or, in other words, that the original resolver and the encrypted resolver are effectively  “the same.” An encrypted resolver that does not include the originally requested resolver IP address on its certificate would fail the validation, and clients are not expected to follow the designated upgrade path. This entire process is referred to as “Verified Discovery” in the DDR specification.

Experimental deployment and next steps

To enable more encrypted DNS on the Internet and help the standardization process, 1.1.1.1 now has experimental support for DDR. You can query it directly to find out:

$ dig +short @1.1.1.1 _dns.resolver.arpa type64

QUESTION SECTION
_dns.resolver.arpa.               IN SVCB 

ANSWER SECTION
_dns.resolver.arpa.                           300    IN SVCB  1 one.one.one.one. alpn="h2,h3" port="443" ipv4hint="1.1.1.1,1.0.0.1" ipv6hint="2606:4700:4700::1111,2606:4700:4700::1001" key7="/dns-query{?name}"
_dns.resolver.arpa.                           300    IN SVCB  2 one.one.one.one. alpn="dot" port="853" ipv4hint="1.1.1.1,1.0.0.1" ipv6hint="2606:4700:4700::1111,2606:4700:4700::1001"

ADDITIONAL SECTION
one.one.one.one.                              300    IN AAAA  2606:4700:4700::1111
one.one.one.one.                              300    IN AAAA  2606:4700:4700::1001
one.one.one.one.                              300    IN A     1.1.1.1
one.one.one.one.                              300    IN A     1.0.0.1

This command sends a SVCB query (type64) for the reserved name _dns.resolver.arpa to 1.1.1.1. The output lists the contents of this record, including the DoH and DoT designation parameters. Let’s walk through the contents of this record:

_dns.resolver.arpa.                           300    IN SVCB  1 one.one.one.one. alpn="h2,h3" port="443" ipv4hint="1.1.1.1,1.0.0.1" ipv6hint="2606:4700:4700::1111,2606:4700:4700::1001" key7="/dns-query{?name}"

This says that the DoH target one.one.one.one is accessible over port 443 (port=”443”) using either HTTP/2 or HTTP/3 (alpn=”h2,h3”), and the DoH path (key7) for queries is “/dns-query{?name}”.

Moving forward

DDR is a simple mechanism that lets clients automatically upgrade to encrypted transport protocols for DNS queries without any manual configuration. At the end of the day, users running compatible clients will enjoy a more private Internet experience. Happily, both Microsoft and Apple recently announced experimental support for this emerging standard, and we’re pleased to help them and other clients test support.
Going forward, we hope to help add support for DDR to open source DNS resolver software such as dnscrypt-proxy and Bind. If you’re interested in helping us continue to drive adoption of encrypted DNS and related protocols to help build a better Internet, we’re hiring!

HPKE: Standardizing public-key encryption (finally!)

Post Syndicated from Christopher Wood original https://blog.cloudflare.com/hybrid-public-key-encryption/

HPKE: Standardizing public-key encryption (finally!)

For the last three years, the Crypto Forum Research Group of the Internet Research Task Force (IRTF) has been working on specifying the next generation of (hybrid) public-key encryption (PKE) for Internet protocols and applications. The result is Hybrid Public Key Encryption (HPKE), published today as RFC 9180.

HPKE was made to be simple, reusable, and future-proof by building upon knowledge from prior PKE schemes and software implementations. It is already in use in a large assortment of emerging Internet standards, including TLS Encrypted Client Hello and Oblivious DNS-over-HTTPS, and has a large assortment of interoperable implementations, including one in CIRCL. This article provides an overview of this new standard, going back to discuss its motivation, design goals, and development process.

A primer on public-key encryption

Public-key cryptography is decades old, with its roots going back to the seminal work of Diffie and Hellman in 1976, entitled “New Directions in Cryptography.” Their proposal – today called Diffie-Hellman key exchange – was a breakthrough. It allowed one to transform small secrets into big secrets for cryptographic applications and protocols. For example, one can bootstrap a secure channel for exchanging messages with confidentiality and integrity using a key exchange protocol.

HPKE: Standardizing public-key encryption (finally!)
Unauthenticated Diffie-Hellman key exchange

In this example, Sender and Receiver exchange freshly generated public keys with each other, and then combine their own secret key with their peer’s public key. Algebraically, this yields the same value \(g^{xy} = (g^x)^y = (g^y)^x\). Both parties can then use this as a shared secret for performing other tasks, such as encrypting messages to and from one another.

The Transport Layer Security (TLS) protocol is one such application of this concept. Shortly after Diffie-Hellman was unveiled to the world, RSA came into the fold. The RSA cryptosystem is another public key algorithm that has been used to build digital signature schemes, PKE algorithms, and key transport protocols. A key transport protocol is similar to a key exchange algorithm in that the sender, Alice, generates a random symmetric key and then encrypts it under the receiver’s public key. Upon successful decryption, both parties then share this secret key. (This fundamental technique, known as static RSA, was used pervasively in the context of TLS. See this post for details about this old technique in TLS 1.2 and prior versions.)

At a high level, PKE between a sender and receiver is a protocol for encrypting messages under the receiver’s public key. One way to do this is via a so-called non-interactive key exchange protocol.

To illustrate how this might work, let \(g^y\) be the receiver’s public key, and let \(m\) be a message that one wants to send to this receiver. The flow looks like this:

  1. The sender generates a fresh private and public key pair, \((x, g^x)\).
  2. The sender computes \(g^{xy} = (g^y)^x\), which can be done without involvement from the receiver, that is, non-interactively.
  3. The sender then uses this shared secret to derive an encryption key, and uses this key to encrypt m.
  4. The sender packages up \(g^x\) and the encryption of \(m\), and sends both to the receiver.

The general paradigm here is called “hybrid public-key encryption” because it combines a non-interactive key exchange based on public-key cryptography for establishing a shared secret, and a symmetric encryption scheme for the actual encryption. To decrypt \(m\), the receiver computes the same shared secret \(g^{xy} = (g^x)^y\), derives the same encryption key, and then decrypts the ciphertext.

Conceptually, PKE of this form is quite simple. General designs of this form date back for many years and include the Diffie-Hellman Integrated Encryption System (DHIES) and ElGamal encryption. However, despite this apparent simplicity, there are numerous subtle design decisions one has to make in designing this type of protocol, including:

  • What type of key exchange protocol should be used for computing the shared secret? Should this protocol be based on modern elliptic curve groups like Curve25519? Should it support future post-quantum algorithms?
  • How should encryption keys be derived? Are there other keys that should be derived? How should additional application information be included in the encryption key derivation, if at all?
  • What type of encryption algorithm should be used? What types of messages should be encrypted?
  • How should sender and receiver encode and exchange public keys?

These and other questions are important for a protocol, since they are required for interoperability. That is, senders and receivers should be able to communicate without having to use the same source code.

There have been a number of efforts in the past to standardize PKE, most of which focus on elliptic curve cryptography. Some examples of past standards include: ANSI X9.63 (ECIES), IEEE 1363a, ISO/IEC 18033-2, and SECG SEC 1.

HPKE: Standardizing public-key encryption (finally!)
Timeline of related standards and software

A paper by Martinez et al. provides a thorough and technical comparison of these different standards. The key points are that all these existing schemes have shortcomings. They either rely on outdated or not-commonly-used primitives such as RIPEMD and CMAC-AES, lack accommodations for moving to modern primitives (e.g., AEAD algorithms), lack proofs of IND-CCA2 security, or, importantly, fail to provide test vectors and interoperable implementations.

The lack of a single standard for public-key encryption has led to inconsistent and often non-interoperable support across libraries. In particular, hybrid PKE implementation support is fractured across the community, ranging from the hugely popular and simple-to-use NaCl box and libsodium box seal implementations based on modern algorithm variants like X-SalsaPoly1305 for authenticated encryption, to BouncyCastle implementations based on “classical” algorithms like AES and elliptic curves.

Despite the lack of a single standard, this hasn’t stopped the adoption of ECIES instantiations for widespread and critical applications. For example, the Apple and Google Exposure Notification Privacy-preserving Analytics (ENPA) platform uses ECIES for public-key encryption.

When designing protocols and applications that need a simple, reusable, and agile abstraction for public-key encryption, existing standards are not fit for purpose. That’s where HPKE comes into play.

Construction and design goals

HPKE is a public-key encryption construction that is designed from the outset to be simple, reusable, and future-proof. It lets a sender encrypt arbitrary-length messages under a receiver’s public key, as shown below. You can try this out in the browser at Franziskus Kiefer’s blog post on HPKE!

HPKE: Standardizing public-key encryption (finally!)
HPKE overview

HPKE is built in stages. It starts with a Key Encapsulation Mechanism (KEM), which is similar to the key transport protocol described earlier and, in fact, can be constructed from the Diffie-Hellman key agreement protocol. A KEM has two algorithms: Encapsulation and Decapsulation, or Encap and Decap for short. The Encap algorithm creates a symmetric secret and wraps it for a public key such that only the holder of the corresponding private key can unwrap it. An attacker knowing this encapsulated key cannot recover even a single bit of the shared secret. Decap takes the encapsulated key and the private key associated with the public key, and computes the original shared secret. From this shared secret, HPKE computes a series of derived keys that are then used to encrypt and authenticate plaintext messages between sender and receiver.

This simple construction was driven by several high-level design goals and principles. We will discuss these goals and how they were met below.

Algorithm agility

Different applications, protocols, and deployments have different constraints, and locking any single use case into a specific (set of) algorithm(s) would be overly restrictive. For example, some applications may wish to use post-quantum algorithms when available, whereas others may wish to use different authenticated encryption algorithms for symmetric-key encryption. To accomplish this goal, HPKE is designed as a composition of a Key Encapsulation Mechanism (KEM), Key Derivation Function (KDF), and Authenticated Encryption Algorithm (AEAD). Any combination of the three algorithms yields a valid instantiation of HPKE, subject to certain security constraints about the choice of algorithm.

One important point worth noting here is that HPKE is not a protocol, and therefore does nothing to ensure that sender and receiver agree on the HPKE ciphersuite or shared context information. Applications and protocols that use HPKE are responsible for choosing or negotiating a specific HPKE ciphersuite that fits their purpose. This allows applications to be opinionated about their choice of algorithms to simplify implementation and analysis, as is common with protocols like WireGuard, or be flexible enough to support choice and agility, as is the approach taken with TLS.

Authentication modes

At a high level, public-key encryption ensures that only the holder of the private key can decrypt messages encrypted for the corresponding public key (being able to decrypt the message is an implicit authentication of the receiver.) However, there are other ways in which applications may wish to authenticate messages from sender to receiver. For example, if both parties have a pre-shared key, they may wish to ensure that both can demonstrate possession of this pre-shared key as well. It may also be desirable for senders to demonstrate knowledge of their own private key in order for recipients to decrypt the message (this functionally is similar to signing an encryption, but has some subtle and important differences).

To support these various use cases, HPKE admits different modes of authentication, allowing various combinations of pre-shared key and sender private key authentication. The additional private key contributes to the shared secret between the sender and receiver, and the pre-shared key contributes to the derivation of the application data encryption secrets. This process is referred to as the “key schedule”, and a simplified version of it is shown below.

HPKE: Standardizing public-key encryption (finally!)
Simplified HPKE key schedule

These modes come at a price, however: not all KEM algorithms will work with all authentication modes. For example, for most post-quantum KEM algorithms there isn’t a private key authentication variant known.

Reusability

The core of HPKE’s construction is its key schedule. It allows secrets produced and shared with KEMs and pre-shared keys to be mixed together to produce additional shared secrets between sender and receiver for performing authenticated encryption and decryption. HPKE allows applications to build on this key schedule without using the corresponding AEAD functionality, for example, by exporting a shared application-specific secret. Using HPKE in an “export-only” fashion allows applications to use other, non-standard AEAD algorithms for encryption, should that be desired. It also allows applications to use a KEM different from those specified in the standard, as is done in the proposed TLS AuthKEM draft.

Interface simplicity

HPKE hides the complexity of message encryption from callers. Encrypting a message with additional authenticated data from sender to receiver for their public key is as simple as the following two calls:

// Create an HPKE context to send messages to the receiver
encapsulatedKey, senderContext = SetupBaseS(receiverPublicKey, ”shared application info”)

// AEAD encrypt the message using the context
ciphertext = senderContext.Seal(aad, message)

In fact, many implementations are likely to offer a simplified “single-shot” interface that does context creation and message encryption with one function call.

Notice that this interface does not expose anything like nonce (“number used once”) or sequence numbers to the callers. The HPKE context manages nonce and sequence numbers internally, which means the application is responsible for message ordering and delivery. This was an important design decision done to hedge against key and nonce reuse, which can be catastrophic for security.

Consider what would be necessary if HPKE delegated nonce management to the application. The sending application using HPKE would need to communicate the nonce along with each ciphertext value for the receiver to successfully decrypt the message. If this nonce was ever reused, then security of the AEAD may fall apart. Thus, a sending application would necessarily need some way to ensure that nonces were never reused. Moreover, by sending the nonce to the receiver, the application is effectively implementing a message sequencer. The application could just as easily implement and use this sequencer to ensure in-order message delivery and processing. Thus, at the end of the day, exposing the nonce seemed both harmful and, ultimately, redundant.

Wire format

Another hallmark of HPKE is that all messages that do not contain application data are fixed length. This means that serializing and deserializing HPKE messages is trivial and there is no room for application choice. In contrast, some implementations of hybrid PKE deferred choice of wire format details, such as whether to use elliptic curve point compression, to applications. HPKE handles this under the KEM abstraction.

Development process

HPKE is the result of a three-year development cycle between industry practitioners, protocol designers, and academic cryptographers. In particular, HPKE built upon prior art relating to public-key encryption, iterated on a design and specification in a tight specification, implementation, experimentation, and analysis loop, with an ultimate goal towards real world use.

HPKE: Standardizing public-key encryption (finally!)
HPKE development process

This process isn’t new. TLS 1.3 and QUIC famously demonstrated this as an effective way of producing high quality technical specifications that are maximally useful for their consumers.

One particular point worth highlighting in this process is the value of interoperability and analysis. From the very first draft, interop between multiple, independent implementations was a goal. And since then, every revision was carefully checked by multiple library maintainers for soundness and correctness. This helped catch a number of mistakes and improved overall clarity of the technical specification.

From a formal analysis perspective, HPKE brought novel work to the community. Unlike protocol design efforts like those around TLS and QUIC, HPKE was simpler, but still came with plenty of sharp edges. As a new cryptographic construction, analysis was needed to ensure that it was sound and, importantly, to understand its limits. This analysis led to a number of important contributions to the community, including a formal analysis of HPKE, new understanding of the limits of ChaChaPoly1305 in a multi-user security setting, as well as a new CFRG specification documenting limits for AEAD algorithms. For more information about the analysis effort that went into HPKE, check out this companion blog by Benjamin Lipp, an HPKE co-author.

HPKE’s future

While HPKE may be a new standard, it has already seen a tremendous amount of adoption in the industry. As mentioned earlier, it’s an essential part of the TLS Encrypted Client Hello and Oblivious DoH standards, both of which are deployed protocols on the Internet today. Looking ahead, it’s also been integrated as part of the emerging Oblivious HTTP, Message Layer Security, and Privacy Preserving Measurement standards. HPKE’s hallmark is its generic construction that lets it adapt to a wide variety of application requirements. If an application needs public-key encryption with a key-committing AEAD, one can simply instantiate HPKE using a key-committing AEAD.

Moreover, there exists a huge assortment of interoperable implementations built on popular cryptographic libraries, including OpenSSL, BoringSSL, NSS, and CIRCL. There are also formally verified implementations in hacspec and F*; check out this blog post for more details. The complete set of known implementations is tracked here. More implementations will undoubtedly follow in their footsteps.

HPKE is ready for prime time. I look forward to seeing how it simplifies protocol design and development in the future. Welcome, RFC 9180.

Handshake Encryption: Endgame (an ECH update)

Post Syndicated from Christopher Wood original https://blog.cloudflare.com/handshake-encryption-endgame-an-ech-update/

Handshake Encryption: Endgame (an ECH update)

Handshake Encryption: Endgame (an ECH update)

Privacy and security are fundamental to Cloudflare, and we believe in and champion the use of cryptography to help provide these fundamentals for customers, end-users, and the Internet at large. In the past, we helped specify, implement, and ship TLS 1.3, the latest version of the transport security protocol underlying the web, to all of our users. TLS 1.3 vastly improved upon prior versions of the protocol with respect to security, privacy, and performance: simpler cryptographic algorithms, more handshake encryption, and fewer round trips are just a few of the many great features of this protocol.

TLS 1.3 was a tremendous improvement over TLS 1.2, but there is still room for improvement. Sensitive metadata relating to application or user intent is still visible in plaintext on the wire. In particular, all client parameters, including the name of the target server the client is connecting to, are visible in plaintext. For obvious reasons, this is problematic from a privacy perspective: Even if your application traffic to crypto.cloudflare.com is encrypted, the fact you’re visiting crypto.cloudflare.com can be quite revealing.

And so, in collaboration with other participants in the standardization community and members of industry, we embarked towards a solution for encrypting all sensitive TLS metadata in transit. The result: TLS Encrypted ClientHello (ECH), an extension to protect this sensitive metadata during connection establishment.

Last year, we described the current status of this standard and its relation to the TLS 1.3 standardization effort, as well as ECH’s predecessor, Encrypted SNI (ESNI). The protocol has come a long way since then, but when will we know when it’s ready? There are many ways by which one can measure a protocol. Is it implementable? Is it easy to enable? Does it seamlessly integrate with existing protocols or applications? In order to assess these questions and see if the Internet is ready for ECH, the community needs deployment experience. Hence, for the past year, we’ve been focused on making the protocol stable, interoperable, and, ultimately, deployable. And today, we’re pleased to announce that we’ve begun our initial deployment of TLS ECH.

What does ECH mean for connection security and privacy on the network? How does it relate to similar technologies and concepts such as domain fronting? In this post, we’ll dig into ECH details and describe what this protocol does to move the needle to help build a better Internet.

Connection privacy

For most Internet users, connections are made to perform some type of task, such as loading a web page, sending a message to a friend, purchasing some items online, or accessing bank account information. Each of these connections reveals some limited information about user behavior. For example, a connection to a messaging platform reveals that one might be trying to send or receive a message. Similarly, a connection to a bank or financial institution reveals when the user typically makes financial transactions. Individually, this metadata might seem harmless. But consider what happens when it accumulates: does the set of websites you visit on a regular basis uniquely identify you as a user? The safe answer is: yes.

This type of metadata is privacy-sensitive, and ultimately something that should only be known by two entities: the user who initiates the connection, and the service which accepts the connection. However, the reality today is that this metadata is known to more than those two entities.

Making this information private is no easy feat. The nature or intent of a connection, i.e., the name of the service such as crypto.cloudflare.com, is revealed in multiple places during the course of connection establishment: during DNS resolution, wherein clients map service names to IP addresses; and during connection establishment, wherein clients indicate the service name to the target server. (Note: there are other small leaks, though DNS and TLS are the primary problems on the Internet today.)

As is common in recent years, the solution to this problem is encryption. DNS-over-HTTPS (DoH) is a protocol for encrypting DNS queries and responses to hide this information from onpath observers. Encrypted Client Hello (ECH) is the complementary protocol for TLS.

The TLS handshake begins when the client sends a ClientHello message to the server over a TCP connection (or, in the context of QUIC, over UDP) with relevant parameters, including those that are sensitive. The server responds with a ServerHello, encrypted parameters, and all that’s needed to finish the handshake.

Handshake Encryption: Endgame (an ECH update)

The goal of ECH is as simple as its name suggests: to encrypt the ClientHello so that privacy-sensitive parameters, such as the service name, are unintelligible to anyone listening on the network. The client encrypts this message using a public key it learns by making a DNS query for a special record known as the HTTPS resource record. This record advertises the server’s various TLS and HTTPS capabilities, including ECH support. The server decrypts the encrypted ClientHello using the corresponding secret key.

Conceptually, DoH and ECH are somewhat similar. With DoH, clients establish an encrypted connection (HTTPS) to a DNS recursive resolver such as 1.1.1.1 and, within that connection, perform DNS transactions.

Handshake Encryption: Endgame (an ECH update)

With ECH, clients establish an encrypted connection to a TLS-terminating server such as crypto.cloudflare.com, and within that connection, request resources for an authorized domain such as cloudflareresearch.com.

Handshake Encryption: Endgame (an ECH update)

There is one very important difference between DoH and ECH that is worth highlighting. Whereas a DoH recursive resolver is specifically designed to allow queries for any domain, a TLS server is configured to allow connections for a select set of authorized domains. Typically, the set of authorized domains for a TLS server are those which appear on its certificate, as these constitute the set of names for which the server is authorized to terminate a connection.

Basically, this means the DNS resolver is open, whereas the ECH client-facing server is closed. And this closed set of authorized domains is informally referred to as the anonymity set. (This will become important later on in this post.) Moreover, the anonymity set is assumed to be public information. Anyone can query DNS to discover what domains map to the same client-facing server.

Why is this distinction important? It means that one cannot use ECH for the purposes of connecting to an authorized domain and then interacting with a different domain, a practice commonly referred to as domain fronting. When a client connects to a server using an authorized domain but then tries to interact with a different domain within that connection, e.g., by sending HTTP requests for an origin that does not match the domain of the connection, the request will fail.

From a high level, encrypting names in DNS and TLS may seem like a simple feat. However, as we’ll show, ECH demands a different look at security and an updated threat model.

A changing threat model and design confidence

The typical threat model for TLS is known as the Dolev-Yao model, in which an active network attacker can read, write, and delete packets from the network. This attacker’s goal is to derive the shared session key. There has been a tremendous amount of research analyzing the security of TLS to gain confidence that the protocol achieves this goal.

The threat model for ECH is somewhat stronger than considered in previous work. Not only should it be hard to derive the session key, it should also be hard for the attacker to determine the identity of the server from a known anonymity set. That is, ideally, it should have no more advantage in identifying the server than if it simply guessed from the set of servers in the anonymity set. And recall that the attacker is free to read, write, and modify any packet as part of the TLS connection. This means, for example, that an attacker can replay a ClientHello and observe the server’s response. It can also extract pieces of the ClientHello — including the ECH extension — and use them in its own modified ClientHello.

Handshake Encryption: Endgame (an ECH update)

The design of ECH ensures that this sort of attack is virtually impossible by ensuring the server certificate can only be decrypted by either the client or client-facing server.

Something else an attacker might try is masquerade as the server and actively interfere with the client to observe its behavior. If the client reacted differently based on whether the server-provided certificate was correct, this would allow the attacker to test whether a given connection using ECH was for a particular name.

Handshake Encryption: Endgame (an ECH update)

ECH also defends against this attack by ensuring that an attacker without access to the private ECH key material cannot actively inject anything into the connection.

The attacker can also be entirely passive and try to infer encrypted information from other visible metadata, such as packet sizes and timing. (Indeed, traffic analysis is an open problem for ECH and in general for TLS and related protocols.) Passive attackers simply sit and listen to TLS connections, and use what they see and, importantly, what they know to make determinations about the connection contents. For example, if a passive attacker knows that the name of the client-facing server is crypto.cloudflare.com, and it sees a ClientHello with ECH to crypto.cloudflare.com, it can conclude, with reasonable certainty, that the connection is to some domain in the anonymity set of crypto.cloudflare.com.

The number of potential attack vectors is astonishing, and something that the TLS working group has tripped over in prior iterations of the ECH design. Before any sort of real world deployment and experiment, we needed confidence in the design of this protocol. To that end, we are working closely with external researchers on a formal analysis of the ECH design which captures the following security goals:

  1. Use of ECH does not weaken the security properties of TLS without ECH.
  2. TLS connection establishment to a host in the client-facing server’s anonymity set is indistinguishable from a connection to any other host in that anonymity set.

We’ll write more about the model and analysis when they’re ready. Stay tuned!

There are plenty of other subtle security properties we desire for ECH, and some of these drill right into the most important question for a privacy-enhancing technology: Is this deployable?

Focusing on deployability

With confidence in the security and privacy properties of the protocol, we then turned our attention towards deployability. In the past, significant protocol changes to fundamental Internet protocols such as TCP or TLS have been complicated by some form of benign interference. Network software, like any software, is prone to bugs, and sometimes these bugs manifest in ways that we only detect when there’s a change elsewhere in the protocol. For example, TLS 1.3 unveiled middlebox ossification bugs that ultimately led to the middlebox compatibility mode for TLS 1.3.

While itself just an extension, the risk of ECH exposing (or introducing!) similar bugs is real. To combat this problem, ECH supports a variant of GREASE whose goal is to ensure that all ECH-capable clients produce syntactically equivalent ClientHello messages. In particular, if a client supports ECH but does not have the corresponding ECH configuration, it uses GREASE. Otherwise, it produces a ClientHello with real ECH support. In both cases, the syntax of the ClientHello messages is equivalent.

This hopefully avoids network bugs that would otherwise trigger upon real or fake ECH. Or, in other words, it helps ensure that all ECH-capable client connections are treated similarly in the presence of benign network bugs or otherwise passive attackers. Interestingly, active attackers can easily distinguish — with some probability — between real or fake ECH. Using GREASE, the ClientHello carries an ECH extension, though its contents are effectively randomized, whereas a real ClientHello using ECH has information that will match what is contained in DNS. This means an active attacker can simply compare the ClientHello against what’s in the DNS. Indeed, anyone can query DNS and use it to determine if a ClientHello is real or fake:

$ dig +short crypto.cloudflare.com TYPE65
\# 134 0001000001000302683200040008A29F874FA29F884F000500480046 FE0D0042D500200020E3541EC94A36DCBF823454BA591D815C240815 77FD00CAC9DC16C884DF80565F0004000100010013636C6F7564666C 6172652D65736E692E636F6D00000006002026064700000700000000 0000A29F874F260647000007000000000000A29F884F

Despite this obvious distinguisher, the end result isn’t that interesting. If a server is capable of ECH and a client is capable of ECH, then the connection most likely used ECH, and whether clients and servers are capable of ECH is assumed public information. Thus, GREASE is primarily intended to ease deployment against benign network bugs and otherwise passive attackers.

Note, importantly, that GREASE (or fake) ECH ClientHello messages are semantically different from real ECH ClientHello messages. This presents a real problem for networks such as enterprise settings or school environments that otherwise use plaintext TLS information for the purposes of implementing various features like filtering or parental controls. (Encrypted DNS protocols like DoH also encountered similar obstacles in their deployment.) Fundamentally, this problem reduces to the following: How can networks securely disable features like DoH and ECH? Fortunately, there are a number of approaches that might work, with the more promising one centered around DNS discovery. In particular, if clients could securely discover encrypted recursive resolvers that can perform filtering in lieu of it being done at the TLS layer, then TLS-layer filtering might be wholly unnecessary. (Other approaches, such as the use of canary domains to give networks an opportunity to signal that certain features are not permitted, may work, though it’s not clear if these could or would be abused to disable ECH.)

We are eager to collaborate with browser vendors, network operators, and other stakeholders to find a feasible deployment model that works well for users without ultimately stifling connection privacy for everyone else.

Next steps

ECH is rolling out for some FREE zones on our network in select geographic regions. We will continue to expand the set of zones and regions that support ECH slowly, monitoring for failures in the process. Ultimately, the goal is to work with the rest of the TLS working group and IETF towards updating the specification based on this experiment in hopes of making it safe, secure, usable, and, ultimately, deployable for the Internet.

ECH is one part of the connection privacy story. Like a leaky boat, it’s important to look for and plug all the gaps before taking on lots of passengers! Cloudflare Research is committed to these narrow technical problems and their long-term solutions. Stay tuned for more updates on this and related protocols.