All posts by Nick Sullivan

Going Keyless Everywhere

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/going-keyless-everywhere/

Going Keyless Everywhere

Going Keyless Everywhere

Time flies. The Heartbleed vulnerability was discovered just over five and a half years ago. Heartbleed became a household name not only because it was one of the first bugs with its own web page and logo, but because of what it revealed about the fragility of the Internet as a whole. With Heartbleed, one tiny bug in a cryptography library exposed the personal data of the users of almost every website online.

Heartbleed is an example of an underappreciated class of bugs: remote memory disclosure vulnerabilities. High profile examples other than Heartbleed include Cloudbleed and most recently NetSpectre. These vulnerabilities allow attackers to extract secrets from servers by simply sending them specially-crafted packets. Cloudflare recently completed a multi-year project to make our platform more resilient against this category of bug.

For the last five years, the industry has been dealing with the consequences of the design that led to Heartbleed being so impactful. In this blog post we’ll dig into memory safety, and how we re-designed Cloudflare’s main product to protect private keys from the next Heartbleed.

Memory Disclosure

Perfect security is not possible for businesses with an online component. History has shown us that no matter how robust their security program, an unexpected exploit can leave a company exposed. One of the more famous recent incidents of this sort is Heartbleed, a vulnerability in a commonly used cryptography library called OpenSSL that exposed the inner details of millions of web servers to anyone with a connection to the Internet. Heartbleed made international news, caused millions of dollars of damage, and still hasn’t been fully resolved.

Typical web services only return data via well-defined public-facing interfaces called APIs. Clients don’t typically get to see what’s going on under the hood inside the server, that would be a huge privacy and security risk. Heartbleed broke that paradigm: it enabled anyone on the Internet to get access to take a peek at the operating memory used by web servers, revealing privileged data usually not exposed via the API. Heartbleed could be used to extract the result of previous data sent to the server, including passwords and credit cards. It could also reveal the inner workings and cryptographic secrets used inside the server, including TLS certificate private keys.

Heartbleed let attackers peek behind the curtain, but not too far. Sensitive data could be extracted, but not everything on the server was at risk. For example, Heartbleed did not enable attackers to steal the content of databases held on the server. You may ask: why was some data at risk but not others? The reason has to do with how modern operating systems are built.

A simplified view of process isolation

Most modern operating systems are split into multiple layers. These layers are analogous to security clearance levels. So-called user-space applications (like your browser) typically live in a low-security layer called user space. They only have access to computing resources (memory, CPU, networking) if the lower, more credentialed layers let them.

User-space applications need resources to function. For example, they need memory to store their code and working memory to do computations. However, it would be risky to give an application direct access to the physical RAM of the computer they’re running on. Instead, the raw computing elements are restricted to a lower layer called the operating system kernel. The kernel only runs specially-designed applications designed to safely manage these resources and mediate access to them for user-space applications.

When a new user space application process is launched, the kernel gives it a virtual memory space. This virtual memory space acts like real memory to the application but is actually a safely guarded translation layer the kernel uses to protect the real memory. Each application’s virtual memory space is like a parallel universe dedicated to that application. This makes it impossible for one process to view or modify another’s, the other applications are simply not addressable.

Going Keyless Everywhere

Heartbleed, Cloudbleed and the process boundary

Heartbleed was a vulnerability in the OpenSSL library, which was part of many web server applications. These web servers run in user space, like any common applications. This vulnerability caused the web server to return up to 2 kilobytes of its memory in response to a specially-crafted inbound request.

Cloudbleed was also a memory disclosure bug, albeit one specific to Cloudflare, that got its name because it was so similar to Heartbleed. With Cloudbleed, the vulnerability was not in OpenSSL, but instead in a secondary web server application used for HTML parsing. When this code parsed a certain sequence of HTML, it ended up inserting some process memory into the web page it was serving.

Going Keyless Everywhere

It’s important to note that both of these bugs occurred in applications running in user space, not kernel space. This means that the memory exposed by the bug was necessarily part of the virtual memory of the application. Even if the bug were to expose megabytes of data, it would only expose data specific to that application, not other applications on the system.

In order for a web server to serve traffic over the encrypted HTTPS protocol, it needs access to the certificate’s private key, which is typically kept in the application’s memory. These keys were exposed to the Internet by Heartbleed. The Cloudbleed vulnerability affected a different process, the HTML parser, which doesn’t do HTTPS and therefore doesn’t keep the private key in memory. This meant that HTTPS keys were safe, even if other data in the HTML parser’s memory space wasn’t.

Going Keyless Everywhere

The fact that the HTML parser and the web server were different applications saved us from having to revoke and re-issue our customers’ TLS certificates. However, if another memory disclosure vulnerability is discovered in the web server, these keys are again at risk.

Moving keys out of Internet-facing processes

Not all web servers keep private keys in memory. In some deployments, private keys are held in a separate machine called a Hardware Security Module (HSM). HSMs are built to withstand physical intrusion and tampering and are often built to comply with stringent compliance requirements. They can often be bulky and expensive. Web servers designed to take advantage of keys in an HSM connect to them over a physical cable and communicate with a specialized protocol called PKCS#11. This allows the web server to serve encrypted content while being physically separated from the private key.

Going Keyless Everywhere

At Cloudflare, we built our own way to separate a web server from a private key: Keyless SSL. Rather than keeping the keys in a separate physical machine connected to the server with a cable, the keys are kept in a key server operated by the customer in their own infrastructure (this can also be backed by an HSM).

Going Keyless Everywhere

More recently, we launched Geo Key Manager, a service that allows users to store private keys in only select Cloudflare locations. Connections to locations that do not have access to the private key use Keyless SSL with a key server hosted in a datacenter that does have access.

In both Keyless SSL and Geo Key Manager, private keys are not only not part of the web server’s memory space, they’re often not even in the same country! This extreme degree of separation is not necessary to protect against the next Heartbleed. All that is needed is for the web server and the key server to not be part of the same application. So that’s what we did. We call this Keyless Everywhere.

Going Keyless Everywhere

Keyless SSL is coming from inside the house

Repurposing Keyless SSL for Cloudflare-held private keys was easy to conceptualize, but the path from ideation to live in production wasn’t so straightforward. The core functionality of Keyless SSL comes from the open source gokeyless which customers run on their infrastructure, but internally we use it as a library and have replaced the main package with an implementation suited to our requirements (we’ve creatively dubbed it gokeyless-internal).

As with all major architecture changes, it’s prudent to start with testing out the model with something new and low risk. In our case, the test bed was our experimental TLS 1.3 implementation. In order to quickly iterate through draft versions of the TLS specification and push releases without affecting the majority of Cloudflare customers, we re-wrote our custom nginx web server in Go and deployed it in parallel to our existing infrastructure. This server was designed to never hold private keys from the start and only leverage gokeyless-internal. At this time there was only a small amount of TLS 1.3 traffic and it was all coming from the beta versions of browsers, which allowed us to work through the initial kinks of gokeyless-internal without exposing the majority of visitors to security risks or outages due to gokeyless-internal.

The first step towards making TLS 1.3 fully keyless was identifying and implementing the new functionality we needed to add to gokeyless-internal. Keyless SSL was designed to run on customer infrastructure, with the expectation of supporting only a handful of private keys. But our edge must simultaneously support millions of private keys, so we implemented the same lazy loading logic we use in our web server, nginx. Furthermore, a typical customer deployment would put key servers behind a network load balancer, so they could be taken out of service for upgrades or other maintenance. Contrast this with our edge, where it’s important to maximize our resources by serving traffic during software upgrades. This problem is solved by the excellent tableflip package we use elsewhere at Cloudflare.

The next project to go Keyless was Spectrum, which launched with default support for gokeyless-internal. With these small victories in hand, we had the confidence necessary to attempt the big challenge, which was porting our existing nginx infrastructure to a fully keyless model. After implementing the new functionality, and being satisfied with our integration tests, all that’s left is to turn this on in production and call it a day, right? Anyone with experience with large distributed systems knows how far “working in dev” is from “done,” and this story is no different. Thankfully we were anticipating problems, and built a fallback into nginx to complete the handshake itself if any problems were encountered with the gokeyless-internal path. This allowed us to expose gokeyless-internal to production traffic without risking downtime in the event that our reimplementation of the nginx logic was not 100% bug-free.

When rolling back the code doesn’t roll back the problem

Our deployment plan was to enable Keyless Everywhere, find the most common causes of fallbacks, and then fix them. We could then repeat this process until all sources of fallbacks had been eliminated, after which we could remove access to private keys (and therefore the fallback) from nginx. One of the early causes of fallbacks was gokeyless-internal returning ErrKeyNotFound, indicating that it couldn’t find the requested private key in storage. This should not have been possible, since nginx only makes a request to gokeyless-internal after first finding the certificate and key pair in storage, and we always write the private key and certificate together. It turned out that in addition to returning the error for the intended case of the key truly not found, we were also returning it when transient errors like timeouts were encountered. To resolve this, we updated those transient error conditions to return ErrInternal, and deployed to our canary datacenters. Strangely, we found that a handful of instances in a single datacenter started encountering high rates of fallbacks, and the logs from nginx indicated it was due to a timeout between nginx and gokeyless-internal. The timeouts didn’t occur right away, but once a system started logging some timeouts it never stopped. Even after we rolled back the release, the fallbacks continued with the old version of the software! Furthermore, while nginx was complaining about timeouts, gokeyless-internal seemed perfectly healthy and was reporting reasonable performance metrics (sub-millisecond median request latency).

Going Keyless Everywhere

To debug the issue, we added detailed logging to both nginx and gokeyless, and followed the chain of events backwards once timeouts were encountered.

➜ ~ grep 'timed out' nginx.log | grep Keyless | head -5
2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015157 Keyless SSL request/response timed out while reading Keyless SSL response, keyserver: 127.0.0.1
2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015231 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1
2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015271 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1
2018-07-25T05:30:49.000 29m41 2018/07/25 05:30:49 [error] 4525#0: *1015280 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1
2018-07-25T05:30:50.000 29m41 2018/07/25 05:30:50 [error] 4525#0: *1015289 Keyless SSL request/response timed out while waiting for Keyless SSL response, keyserver: 127.0.0.1

You can see the first request to log a timeout had id 1015157. Also interesting that the first log line was “timed out while reading,” but all the others are “timed out while waiting,” and this latter message is the one that continues forever. Here is the matching request in the gokeyless log:

➜ ~ grep 'id=1015157 ' gokeyless.log | head -1
2018-07-25T05:30:39.000 29m41 2018/07/25 05:30:39 [DEBUG] connection 127.0.0.1:30520: worker=ecdsa-29 opcode=OpECDSASignSHA256 id=1015157 sni=announce.php?info_hash=%a8%9e%9dc%cc%3b1%c8%23%e4%93%21r%0f%92mc%0c%15%89&peer_id=-ut353s-%ce%ad%5e%b1%99%06%24e%d5d%9a%08&port=42596&uploaded=65536&downloaded=0&left=0&corrupt=0&key=04a184b7&event=started&numwant=200&compact=1&no_peer_id=1 ip=104.20.33.147

Aha! That SNI value is clearly invalid (SNIs are like Host headers, i.e. they are domains, not URL paths), and it’s also quite long. Our storage system indexes certificates based on two indices: which SNI they correspond to, and which IP addresses they correspond to (for older clients that don’t support SNI). Our storage interface uses the memcached protocol, and the client library that gokeyless-internal uses rejects requests for keys longer than 250 characters (memcached’s maximum key length), whereas the nginx logic is to simply ignore the invalid SNI and treat the request as if only had an IP. The change in our new release had shifted this condition from ErrKeyNotFound to ErrInternal, which triggered cascading problems in nginx. The “timeouts” it encountered were actually a result of throwing away all in-flight requests multiplexed on a connection which happened to return ErrInternalfor a single request. These requests were retried, but once this condition triggered, nginx became overloaded by the number of retried requests plus the continuous stream of new requests coming in with bad SNI, and was unable to recover. This explains why rolling back gokeyless-internal didn’t fix the problem.

This discovery finally brought our attention to nginx, which thus far had escaped blame since it had been working reliably with customer key servers for years. However, communicating over localhost to a multitenant key server is fundamentally different than reaching out over the public Internet to communicate with a customer’s key server, and we had to make the following changes:

  • Instead of a long connection timeout and a relatively short response timeout for customer key servers, extremely short connection timeouts and longer request timeouts are appropriate for a localhost key server.
  • Similarly, it’s reasonable to retry (with backoff) if we timeout waiting on a customer key server response, since we can’t trust the network. But over localhost, a timeout would only occur if gokeyless-internal were overloaded and the request were still queued for processing. In this case a retry would only lead to more total work being requested of gokeyless-internal, making the situation worse.
  • Most significantly, nginx must not throw away all requests multiplexed on a connection if any single one of them encounters an error, since a single connection no longer represents a single customer.

Implementations matter

CPU at the edge is one of our most precious assets, and it’s closely guarded by our performance team (aka CPU police). Soon after turning on Keyless Everywhere in one of our canary datacenters, they noticed gokeyless using ~50% of a core per instance. We were shifting the sign operations from nginx to gokeyless, so of course it would be using more CPU now. But nginx should have seen a commensurate reduction in CPU usage, right?

Going Keyless Everywhere

Wrong. Elliptic curve operations are very fast in Go, but it’s known that RSA operations are much slower than their BoringSSL counterparts.

Although Go 1.11 includes optimizations for RSA math operations, we needed more speed. Well-tuned assembly code is required to match the performance of BoringSSL, so Armando Faz from our Crypto team helped claw back some of the lost CPU by reimplementing parts of the math/big package with platform-dependent assembly in an internal fork of Go. The recent assembly policy of Go prefers the use of Go portable code instead of assembly, so these optimizations were not upstreamed. There is still room for more optimizations, and for that reason we’re still evaluating moving to cgo + BoringSSL for sign operations, despite cgo’s many downsides.

Changing our tooling

Process isolation is a powerful tool for protecting secrets in memory. Our move to Keyless Everywhere demonstrates that this is not a simple tool to leverage. Re-architecting an existing system such as nginx to use process isolation to protect secrets was time-consuming and difficult. Another approach to memory safety is to use a memory-safe language such as Rust.

Rust was originally developed by Mozilla but is starting to be used much more widely. The main advantage that Rust has over C/C++ is that it has memory safety features without a garbage collector.

Re-writing an existing application in a new language such as Rust is a daunting task. That said, many new Cloudflare features, from the powerful Firewall Rules feature to our 1.1.1.1 with WARP app, have been written in Rust to take advantage of its powerful memory-safety properties. We’re really happy with Rust so far and plan on using it even more in the future.

Conclusion

The harrowing aftermath of Heartbleed taught the industry a lesson that should have been obvious in retrospect: keeping important secrets in applications that can be accessed remotely via the Internet is a risky security practice. In the following years, with a lot of work, we leveraged process separation and Keyless SSL to ensure that the next Heartbleed wouldn’t put customer keys at risk.

However, this is not the end of the road. Recently memory disclosure vulnerabilities such as NetSpectre have been discovered which are able to bypass application process boundaries, so we continue to actively explore new ways to keep keys secure.

Going Keyless Everywhere

Delegated Credentials for TLS

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/keyless-delegation/

Delegated Credentials for TLS

Delegated Credentials for TLS

Today we’re happy to announce support for a new cryptographic protocol that helps make it possible to deploy encrypted services in a global network while still maintaining fast performance and tight control of private keys: Delegated Credentials for TLS. We have been working with partners from Facebook, Mozilla, and the broader IETF community to define this emerging standard. We’re excited to share the gory details today in this blog post.

Deploying TLS globally

Many of the technical problems we face at Cloudflare are widely shared problems across the Internet industry. As gratifying as it can be to solve a problem for ourselves and our customers, it can be even more gratifying to solve a problem for the entire Internet. For the past three years, we have been working with peers in the industry to solve a specific shared problem in the TLS infrastructure space: How do you terminate TLS connections while storing keys remotely and maintaining performance and availability? Today we’re announcing that Cloudflare now supports Delegated Credentials, the result of this work.

Cloudflare’s TLS/SSL features are among the top reasons customers use our service. Configuring TLS is hard to do without internal expertise. By automating TLS, web site and web service operators gain the latest TLS features and the most secure configurations by default. It also reduces the risk of outages or bad press due to misconfigured or insecure encryption settings. Customers also gain early access to unique features like TLS 1.3, post-quantum cryptography, and OCSP stapling as they become available.

Unfortunately, for web services to authorize a service to terminate TLS for them, they have to trust the service with their private keys, which demands a high level of trust. For services with a global footprint, there is an additional level of nuance. They may operate multiple data centers located in places with varying levels of physical security, and each of these needs to be trusted to terminate TLS.

To tackle these problems of trust, Cloudflare has invested in two technologies: Keyless SSL, which allows customers to use Cloudflare without sharing their private key with Cloudflare; and Geo Key Manager, which allows customers to choose the datacenters in which Cloudflare should keep their keys. Both of these technologies are able to be deployed without any changes to browsers or other clients. They also come with some downsides in the form of availability and performance degradation.

Keyless SSL introduces extra latency at the start of a connection. In order for a server without access to a private key to establish a connection with a client, that servers needs to reach out to a key server, or a remote point of presence, and ask them to do a private key operation. This not only adds additional latency to the connection, causing the content to load slower, but it also introduces some troublesome operational constraints on the customer. Specifically, the server with access to the key needs to be highly available or the connection can fail. Sites often use Cloudflare to improve their site’s availability, so having to run a high-availability key server is an unwelcome requirement.

Turning a pull into a push

The reason services like Keyless SSL that rely on remote keys are so brittle is their architecture: they are pull-based rather than push-based. Every time a client attempts a handshake with a server that doesn’t have the key, it needs to pull the authorization from the key server. An alternative way to build this sort of system is to periodically push a short-lived authorization key to the server and use that for handshakes. Switching from a pull-based model to a push-based model eliminates the additional latency, but it comes with additional requirements, including the need to change the client.

Enter the new TLS feature of Delegated Credentials (DCs). A delegated credential is a short-lasting key that the certificate’s owner has delegated for use in TLS. They work like a power of attorney: your server authorizes our server to terminate TLS for a limited time. When a browser that supports this protocol connects to our edge servers we can show it this “power of attorney”, instead of needing to reach back to a customer’s server to get it to authorize the TLS connection. This reduces latency and improves performance and reliability.

Delegated Credentials for TLS
The pull model

Delegated Credentials for TLS
The push model

A fresh delegated credential can be created and pushed out to TLS servers long before the previous credential expires. Momentary blips in availability will not lead to broken handshakes for clients that support delegated credentials. Furthermore, a Delegated Credentials-enabled TLS connection is just as fast as a standard TLS connection: there’s no need to connect to the key server for every handshake. This removes the main drawback of Keyless SSL for DC-enabled clients.

Delegated credentials are intended to be an Internet Standard RFC that anyone can implement and use, not a replacement for Keyless SSL. Since browsers will need to be updated to support the standard, proprietary mechanisms like Keyless SSL and Geo Key Manager will continue to be useful. Delegated credentials aren’t just useful in our context, which is why we’ve developed it openly and with contributions from across industry and academia. Facebook has integrated them into their own TLS implementation, and you can read more about how they view the security benefits here.  When it comes to improving the security of the Internet, we’re all on the same team.

"We believe delegated credentials provide an effective way to boost security by reducing certificate lifetimes without sacrificing reliability. This will soon become an Internet standard and we hope others in the industry adopt delegated credentials to help make the Internet ecosystem more secure."

Subodh Iyengar, software engineer at Facebook

Extensibility beyond the PKI

At Cloudflare, we’re interested in pushing the state of the art forward by experimenting with new algorithms. In TLS, there are three main areas of experimentation: ciphers, key exchange algorithms, and authentication algorithms. Ciphers and key exchange algorithms are only dependent on two parties: the client and the server. This freedom allows us to deploy exciting new choices like ChaCha20-Poly1305 or post-quantum key agreement in lockstep with browsers. On the other hand, the authentication algorithms used in TLS are dependent on certificates, which introduces certificate authorities and the entire public key infrastructure into the mix.

Unfortunately, the public key infrastructure is very conservative in its choice of algorithms, making it harder to adopt newer cryptography for authentication algorithms in TLS. For instance, EdDSA, a highly-regarded signature scheme, is not supported by certificate authorities, and root programs limit the certificates that will be signed. With the emergence of quantum computing, experimenting with new algorithms is essential to determine which solutions are deployable and functional on the Internet.

Since delegated credentials introduce the ability to use new authentication key types without requiring changes to certificates themselves, this opens up a new area of experimentation. Delegated credentials can be used to provide a level of flexibility in the transition to post-quantum cryptography, by enabling new algorithms and modes of operation to coexist with the existing PKI infrastructure. It also enables tiny victories, like the ability to use smaller, faster Ed25519 signatures in TLS.

Inside DCs

A delegated credential contains a public key and an expiry time. This bundle is then signed by a certificate along with the certificate itself, binding the delegated credential to the certificate for which it is acting as “power of attorney”. A supporting client indicates its support for delegated credentials by including an extension in its Client Hello.

A server that supports delegated credentials composes the TLS Certificate Verify and Certificate messages as usual, but instead of signing with the certificate’s private key, it includes the certificate along with the DC, and signs with the DC’s private key. Therefore, the private key of the certificate only needs to be used for the signing of the DC.

Certificates used for signing delegated credentials require a special X.509 certificate extension. This requirement exists to avoid breaking assumptions people may have about the impact of temporary access to their keys on security, particularly in cases involving HSMs and the still unfixed Bleichbacher oracles in older TLS versions.  Temporary access to a key can enable signing lots of delegated credentials which start far in the future, and as a result support was made opt-in. Early versions of QUIC had similar issues, and ended up adopting TLS to fix them. Protocol evolution on the Internet requires working well with already existing protocols and their flaws.

Delegated Credentials at Cloudflare and Beyond

Currently we use delegated credentials as a performance optimization for Geo Key Manager and Keyless SSL. Customers can update their certificates to include the special extension for delegated credentials, and we will automatically create delegated credentials and distribute them to the edge through the Keyless SSL or Geo Key Manager. For more information, see the documentation. It also enables us to be more conservative about where we keep keys for customers, improving our security posture.

Delegated Credentials would be useless if it wasn’t also supported by browsers and other HTTP clients. Christopher Patton, a former intern at Cloudflare, implemented support in Firefox and its underlying NSS security library. This feature is now in the Nightly versions of Firefox. You can turn it on by activating the configuration option security.tls.enable_delegated_credentials at about:config. Studies are ongoing on how effective this will be in a wider deployment. There also is support for Delegated Credentials in BoringSSL.

"At Mozilla we welcome ideas that help to make the Web PKI more robust. The Delegated Credentials feature can help to provide secure and performant TLS connections for our users, and we’re happy to work with Cloudflare to help validate this feature."

Thyla van der Merwe, Cryptography Engineering Manager at Mozilla

One open issue is the question of client clock accuracy. Until we have a wide-scale study we won’t know how many connections using delegated credentials will break because of the 24 hour time limit that is imposed.  Some clients, in particular mobile clients, may have inaccurately set clocks, the root cause of one third of all certificate errors in Chrome. Part of the way that we’re aiming to solve this problem is through standardizing and improving Roughtime, so web browsers and other services that need to validate certificates can do so independent of the client clock.

Cloudflare’s global scale means that we see connections from every corner of the world, and from many different kinds of connection and device. That reach enables us to find rare problems with the deployability of protocols. For example, our early deployment helped inform the development of the TLS 1.3 standard. As we enable developing protocols like delegated credentials, we learn about obstacles that inform and affect their future development.

Conclusion

As new protocols emerge, we’ll continue to play a role in their development and bring their benefits to our customers. Today’s announcement of a technology that overcomes some limitations of Keyless SSL is just one example of how Cloudflare takes part in improving the Internet not just for our customers, but for everyone. During the standardization process of turning the draft into an RFC, we’ll continue to maintain our implementation and come up with new ways to apply delegated credentials.

Tales from the Crypt(o team)

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/tales-from-the-crypt-o-team/

Tales from the Crypt(o team)

Tales from the Crypt(o team)

Halloween season is upon us. This week we’re sharing a series of blog posts about work being done at Cloudflare involving cryptography, one of the spookiest technologies around. So bookmark this page and come back every day for tricks, treats, and deep technical content.

A long-term mission

Cryptography is one of the most powerful technological tools we have, and Cloudflare has been at the forefront of using cryptography to help build a better Internet. Of course, we haven’t been alone on this journey. Making meaningful changes to the way the Internet works requires time, effort, experimentation, momentum, and willing partners. Cloudflare has been involved with several multi-year efforts to leverage cryptography to help make the Internet better.

Here are some highlights to expect this week:

  • We’re renewing Cloudflare’s commitment to privacy-enhancing technologies by sharing some of the recent work being done on Privacy Pass
  • We’re helping forge a path to a quantum-safe Internet by sharing some of the results of the Post-quantum Cryptography experiment
  • We’re sharing the rust-based software we use to power time.cloudflare.com
  • We’re doing a deep dive into the technical details of Encrypted DNS
  • We’re announcing support for a new technique we developed with industry partners to help keep TLS private keys more secure

The milestones we’re sharing this week would not be possible without partnerships with companies, universities, and individuals working in good faith to help build a better Internet together. Hopefully, this week provides a fun peek into the future of the Internet.

Cloudflare’s Approach to Research

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/cloudflares-approach-to-research/

Cloudflare’s Approach to Research

Cloudflare’s Approach to Research

Cloudflare’s mission is to help build a better Internet. One of the tools used in pursuit of this goal is computer science research. We’ve learned that some of the difficult problems to solve are best approached through research and experimentation to understand the solution before engineering it at scale. This research-focused approach to solving the big problems of the Internet is exemplified by the work of the Cryptography Research team, which leverages research to help build a safer, more secure and more performant Internet. Over the years, the team has worked on more than just cryptography, so we’re taking the model we’ve developed and expanding the scope of the team to include more areas of computer science research. Cryptography Research at Cloudflare is now Cloudflare Research. I am excited to share some of the insights we’ve learned over the years in this blog post.

Cloudflare’s research model

PrincipleDescription
Team structureHybrid approach. We have a program that allows research engineers to be embedded into product and operations teams for temporary assignments. This gives people direct exposure to practical problems.
Problem philosophyImpact-focused. We use our expertise and the expertise of partners in industry and academia to select projects that have the potential to make a big impact, and for which existing solutions are insufficient or not yet popularized.
Promoting solutionsOpen collaboration. Popularizing winning ideas through public outreach, working with industry partners to promote standardization, and implementing ideas at scale to show they’re effective.

The hybrid approach to research

“Super-ambitious goals tend to be unifying and energizing to people; but only if they believe there’s a chance of success.” – Peter Diamandis

Given the scale and reach of Cloudflare, research problems (and opportunities) present themselves all the time. Our approach to research is a practical one. We choose to tackle projects that have the potential to make a big impact, and for which existing solutions are insufficient. This stems from a belief that the interconnected systems that make up the Internet can be changed and improved in a fundamental way. While some research problems are solvable in a few months, some may take years. We don’t shy away from long-term projects, but the Internet moves fast, so it’s important to break down long-term projects into smaller, independently-valuable pieces in order to continually provide value while pursuing a bigger vision.

Successful technological innovation is not purely about technical accomplishments. New creations need the social and political scaffolding to support it while being built, and the momentum and support to gain popularity. We are better able to innovate if grounded in a deep understanding of the current day-to-day. To stay grounded, our research team members spend part of their time solving practical problems that affect Cloudflare and our customers right now.

Cloudflare employs a hybrid research model similar to the model pioneered by Google. Innovation can come from everywhere in a company, so teams are encouraged to find the right balance between research and engineering activities. The research team works with the same tools, systems, and constraints as the rest of the engineering organization.

Research engineers are expected to write production-quality code and contribute to engineering activities. This enables researchers to leverage the rich data provided by Cloudflare’s production environment for experiments. To further break down silos, we have a program that allows research engineers to be embedded into product and operations teams for temporary assignments. This gives people direct exposure to practical problems.

Continuing a successful tradition (our tradition)

“Skate to where the puck is going, not where it has been.” – Wayne Gretzky

The output of the research team is both new knowledge and technology that can lead to innovative products. Research works hand-in-hand with both product and engineering to help drive long-term positive outcomes for both Cloudflare and the Internet at large.

An example of a long-term project that requires both research and engineering is helping the Internet migrate from insecure to secure network protocols. To tackle the problem, we pursued several smaller projects with discrete and measurable outcomes. This included:

and many other smaller projects. Each step along the way contributed something concrete to help make the Internet more secure.

This year’s Crypto Week is a great example of the type of impact an effective hybrid research organization can make. Every day that week, a new announcement was made that helped take research results and realize their practical impact. From the League of Entropy, which is based on fundamental work by researchers at EPFL, to Cloudflare Time Services, which helps address time security issues raised in papers by former Cloudflare intern Aanchal Malhotra, to our own (currently running) post-quantum experiment with Google Chrome, engineers at Cloudflare combined research with building large-scale production systems to help solve some unsolved problems on the Internet.

Open collaboration, open standards, and open source

“We reject kings, presidents and voting. We believe in rough consensus and running code.” – Dave Clark

Effective research requires:

  • Choosing interesting problems to solve
  • Popularizing the ideas discovered while studying the solution space
  • Implementing the ideas at scale to show they’re effective

Cloudflare’s massive popularity puts us in a very privileged position. We can research, implement and deploy experiments at a scale that simply can’t be done by most organizations. This makes Cloudflare an attractive research partner for universities and other research institutions who have domain knowledge but not data. We rely on our own expertise along with that of peers in both academia and industry to decide which problems to tackle in order to achieve common goals and make new scientific progress. Our middlebox detection project, proposed by researchers at the University of Michigan, is an example of such a problem.

We’re not purists who are only interested in pursuing our own ideas. Some interesting problems have already been solved, but the solution isn’t widely known or implemented. In this situation, we contribute our efforts to help elevate the best ideas and make them available to the public in an accessible way. Our early work popularizing elliptic curves on the Internet is such an example.

Popularizing an idea and implementing the idea at scale are two different things. Along with popularizing winning ideas, we want to ensure these ideas stick and provide benefits to Internet users. To promote the widespread deployment of useful ideas, we work on standards and deploy newly emerging standards early on. Doing so helps the industry easily adopt innovations and supports interoperability. For example, the work done for Crypto Week 2019 has helped the development of international technical standards. Aspects of the League of Entropy are now being standardized at the CFRG, Roughtime is now being considered for adoption as an IETF standard, and we are presenting our post-quantum results as part of NIST’s post-quantum cryptography standardization effort.

Open source software is another key aspect of scaling the implementation of an idea. We open source associated code whenever possible. The research team collaborates with the wider research world as well as internally with other teams at Cloudflare.

Focus areas going forward

Doing research, sharing it in an accessible way, working with top experts to validate it, and working on standardization has several benefits. It provides an opportunity to educate the public, further scientific understanding, and improve the state of the art; but it’s also a great way to attract candidates. Great engineers want to work on interesting projects and great researchers want to see their work have an impact. This hybrid research approach is attractive to both types of candidates.

Computer science is a vast arena, so the areas we’re currently focusing on are:

  • Security and privacy
  • Cryptography
  • Internet measurement
  • Low-level networking and operating systems
  • Emerging networking paradigms

Here are some highlights of publications we’ve co-authored over the last few years in these areas. We’ll be building on this tradition going forward.

And by the way, we’re hiring!

Product Management
Help the research team explore the future of peer-to-peer systems by building and managing projects like the Distributed Web Gateway.

Engineering
Engineering Manager (San Francisco, London)
Systems Engineer – Cryptography Research (San Francisco)
Cryptography Research Engineer Internship (San Francisco, London)

If none of these fit you perfectly, but you still want to reach out, send us an email at: [email protected].

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/how-cloudflare-and-wall-street-are-helping-encrypt-the-internet-today/

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

Today has been a big day for Cloudflare, as we became a public company on the New York Stock Exchange (NYSE: NET). To mark the occasion, we decided to bring our favorite entropy machines to the floor of the NYSE. Footage of these lava lamps is being used as an additional seed to our entropy-generation system LavaRand — bolstering Internet encryption for over 20 million Internet properties worldwide.

(This is mostly for fun. But when’s the last time you saw a lava lamp on the trading floor of the New York Stock Exchange?)

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

A little context: generating truly random numbers using computers is impossible, because code is inherently deterministic (i.e. predictable). To compensate for this, engineers draw from pools of randomness created by entropy generators, which is a fancy term for “things that are truly unpredictable”.

It turns out that lava lamps are fantastic sources of entropy, as was first shown by Silicon Graphics in the 1990s. It’s a torch we’ve been proud to carry forward: today, Cloudflare uses lava lamps to generate entropy that helps make millions of Internet properties more secure.

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

Housed in our San Francisco headquarters is a wall filled with dozens of lava lamps, undulating with mesmerizing randomness. We capture these lava lamps on video via a camera mounted across the room, and feed the resulting footage into an algorithm — called LavaRand — that amplifies the pure randomness of these lava lamps to dizzying extremes (computers can’t create seeds of pure randomness, but they can massively amplify them).

Shortly before we rang the opening bell this morning, we recorded footage of our lava lamps in operation on the trading room floor of the New York Stock Exchange, and we’re ingesting the footage into our LavaRand system. The resulting entropy is mixed with the myriad additional sources of entropy that we leverage every day, creating a cryptographically-secure source of randomness — fortified by Wall Street.

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

We recently took our enthusiasm for randomness a step further by facilitating the League of Entropy, a consortium of global organizations and individual contributors, generating verifiable randomness via a globally distributed network. As one of the founding members of the League, LavaRand (pictured above) plays a key role in empowering developers worldwide with a pool of randomness with extreme entropy and high reliability.

And today, she’s enjoying the view from the podium!


One caveat: the lava lamps we run in our San Francisco headquarters are recorded in real-time, 24/7, giving us an ongoing stream of entropy. For reasons that are understandable, the NYSE doesn’t allow for live video feeds from the exchange floor while it is in operation. But this morning they did let us record footage of the lava lamps operating shortly before the opening bell. The video was recorded and we’re ingesting it into our LavaRand system (alongside many other entropy generators, including the lava lamps back in San Francisco).

Welcome to Crypto Week 2019

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/welcome-to-crypto-week-2019/

Welcome to Crypto Week 2019

Welcome to Crypto Week 2019

The Internet is an extraordinarily complex and evolving ecosystem. Its constituent protocols range from the ancient and archaic (hello FTP) to the modern and sleek (meet WireGuard), with a fair bit of everything in between. This evolution is ongoing, and as one of the most connected networks on the Internet, Cloudflare has a duty to be a good steward of this ecosystem. We take this responsibility to heart: Cloudflare’s mission is to help build a better Internet. In this spirit, we are very proud to announce Crypto Week 2019.

Every day this week we’ll announce a new project or service that uses modern cryptography to build a more secure, trustworthy Internet. Everything we release this week will be free and immediately useful. This blog is a fun exploration of the themes of the week.

  • Monday: Coming Soon
  • Tuesday: Coming Soon
  • Wednesday: Coming Soon
  • Thursday: Coming Soon
  • Friday: Coming Soon

The Internet of the Future

Many pieces of the Internet in use today were designed in a different era with different assumptions. The Internet’s success is based on strong foundations that support constant reassessment and improvement. Sometimes these improvements require deploying new protocols.

Performing an upgrade on a system as large and decentralized as the Internet can’t be done by decree;

  • There are too many economic, cultural, political, and technological factors at play.
  • Changes must be compatible with existing systems and protocols to even be considered for adoption.
  • To gain traction, new protocols must provide tangible improvements for users. Nobody wants to install an update that doesn’t improve their experience!

The last time the Internet had a complete reboot and upgrade was during TCP/IP flag day in 1983. Back then, the Internet (called ARPANET) had fewer than ten thousand hosts! To have an Internet-wide flag day today to switch over to a core new protocol is inconceivable; the scale and diversity of the components involved is way too massive. Too much would break. It’s challenging enough to deprecate outmoded functionality. In some ways, the open Internet is a victim of its own success. The bigger a system grows and the longer it stays the same, the harder it is to change. The Internet is like a massive barge: it takes forever to steer in a different direction and it’s carrying a lot of garbage.

Welcome to Crypto Week 2019
ARPANET, 1983 (Computer History Museum)

As you would expect, many of the warts of the early Internet still remain. Both academic security researchers and real-life adversaries are still finding and exploiting vulnerabilities in the system. Many vulnerabilities are due to the fact that most of the protocols in use on the Internet have a weak notion of trust inherited from the early days. With 50 hosts online, it’s relatively easy to trust everyone, but in a world-scale system, that trust breaks down in fascinating ways. The primary tool to scale trust is cryptography, which helps provide some measure of accountability, though it has its own complexities.

In an ideal world, the Internet would provide a trustworthy substrate for human communication and commerce. Some people naïvely assume that this is the natural direction the evolution of the Internet will follow. However, constant improvement is not a given. It’s possible that the Internet of the future will actually be worse than the Internet today: less open, less secure, less private, less trustworthy. There are strong incentives to weaken the Internet on a fundamental level by Governments, by businesses such as ISPs, and even by the financial institutions entrusted with our personal data.

In a system with as many stakeholders as the Internet, real change requires principled commitment from all invested parties. At Cloudflare, we believe everyone is entitled to an Internet built on a solid foundation of trust. Crypto Week is our way of helping nudge the Internet’s evolution in a more trust-oriented direction. Each announcement this week helps bring the Internet of the future to the present in a tangible way.

Ongoing Internet Upgrades

Before we explore the Internet of the future, let’s explore some of the previous and ongoing attempts to upgrade the Internet’s fundamental protocols.

Routing Security

As we highlighted in last year’s Crypto Week one of the weak links on the Internet is routing. Not all networks are directly connected.

To send data from one place to another, you might have to rely on intermediary networks to pass your data along. A packet sent from one host to another may have to be passed through up to a dozen of these intermediary networks. No single network knows the full path the data will have to take to get to its destination, it only knows which network to pass it to next.  The protocol that determines how packets are routed is called the Border Gateway Protocol (BGP.) Generally speaking, networks use BGP to announce to each other which addresses they know how to route packets for and (dependent on a set of complex rules) these networks share what they learn with their neighbors.

Unfortunately, BGP is completely insecure:

  • Any network can announce any set of addresses to any other network, even addresses they don’t control. This leads to a phenomenon called BGP hijacking, where networks are tricked into sending data to the wrong network.
  • A BGP hijack is most often caused by accidental misconfiguration, but can also be the result of malice on the network operator’s part.
  • During a BGP hijack, a network inappropriately announces a set of addresses to other networks, which results in packets destined for the announced addresses to be routed through the illegitimate network.

Understanding the risk

If the packets represent unencrypted data, this can be a big problem as it allows the hijacker to read or even change the data:

Mitigating the risk

The Resource Public Key Infrastructure (RPKI) system helps bring some trust to BGP by enabling networks to utilize cryptography to digitally sign network routes with certificates, making BGP hijacking much more difficult.

  • This enables participants of the network to gain assurances about the authenticity of route advertisements. Certificate Transparency (CT) is a tool that enables additional trust for certificate-based systems. Cloudflare operates the Cirrus CT log to support RPKI.

Since we announced our support of RPKI last year, routing security has made big strides. More routes are signed, more networks validate RPKI, and the software ecosystem has matured, but this work is not complete. Most networks are still vulnerable to BGP hijacking. For example, Pakistan knocked YouTube offline with a BGP hijack back in 2008, and could likely do the same today. Adoption here is driven less by providing a benefit to users, but rather by reducing systemic risk, which is not the strongest motivating factor for adopting a complex new technology. Full routing security on the Internet could take decades.

DNS Security

The Domain Name System (DNS) is the phone book of the Internet. Or, for anyone under 25 who doesn’t remember phone books, it’s the system that takes hostnames (like cloudflare.com or facebook.com) and returns the Internet address where that host can be found. For example, as of this publication, www.cloudflare.com is 104.17.209.9 and 104.17.210.9 (IPv4) and 2606:4700::c629:d7a2, 2606:4700::c629:d6a2 (IPv6). Like BGP, DNS is completely insecure. Queries and responses sent unencrypted over the Internet are modifiable by anyone on the path.

There are many ongoing attempts to add security to DNS, such as:

  • DNSSEC that adds a chain of digital signatures to DNS responses
  • DoT/DoH that wraps DNS queries in the TLS encryption protocol (more on that later)

Both technologies are slowly gaining adoption, but have a long way to go.

Welcome to Crypto Week 2019
DNSSEC-signed responses served by Cloudflare

Welcome to Crypto Week 2019
Cloudflare’s 1.1.1.1 resolver queries are already over 5% DoT/DoH

Just like RPKI, securing DNS comes with a performance cost, making it less attractive to users. However,

The Web

Transport Layer Security (TLS) is a cryptographic protocol that gives two parties the ability to communicate over an encrypted and authenticated channel. TLS protects communications from eavesdroppers even in the event of a BGP hijack. TLS is what puts the “S” in HTTPS. TLS protects web browsing against multiple types of network adversaries.

Welcome to Crypto Week 2019
Requests hop from network to network over the Internet

Welcome to Crypto Week 2019
For unauthenticated protocols, an attacker on the path can impersonate the server

Welcome to Crypto Week 2019
Attackers can use BGP hijacking to change the path so that communication can be intercepted

Welcome to Crypto Week 2019
Authenticated protocols are protected from interception attacks

The adoption of TLS on the web is partially driven by the fact that:

  • It’s easy and free for websites to get an authentication certificate (via Let’s Encrypt, Universal SSL, etc.)
  • Browsers make TLS adoption appealing to website operators by only supporting new web features such as HTTP/2 over HTTPS.

This has led to the rapid adoption of HTTPS over the last five years.

Welcome to Crypto Week 2019
HTTPS adoption curve (from Google Chrome)‌‌

To further that adoption, TLS recently got an upgrade in TLS 1.3, making it faster and more secure (a combination we love). It’s taking over the Internet!

Welcome to Crypto Week 2019
TLS 1.3 adoption over the last 12 months (from Cloudflare’s perspective)

Despite this fantastic progress in the adoption of security for routing, DNS, and the web, there are still gaps in the trust model of the Internet. There are other things needed to help build the Internet of the future. To find and identify these gaps, we lean on research experts.

Research Farm to Table

Cryptographic security on the Internet is a hot topic and there have been many flaws and issues recently pointed out in academic journals. Researchers often study the vulnerabilities of the past and ask:

  • What other critical components of the Internet have the same flaws?
  • What underlying assumptions can subvert trust in these existing systems?

The answers to these questions help us decide what to tackle next. Some recent research  topics we’ve learned about include:

  • Quantum Computing
  • Attacks on Time Synchronization
  • DNS attacks affecting Certificate issuance
  • Scaling distributed trust

Cloudflare keeps abreast of these developments and we do what we can to bring these new ideas to the Internet at large. In this respect, we’re truly standing on the shoulders of giants.

Future-proofing Internet Cryptography

The new protocols we are currently deploying (RPKI, DNSSEC, DoT/DoH, TLS 1.3) use relatively modern cryptographic algorithms published in the 1970s and 1980s.

  • The security of these algorithms is based on hard mathematical problems in the field of number theory, such as factoring and the elliptic curve discrete logarithm problem.
  • If you can solve the hard problem, you can crack the code. Using a bigger key makes the problem harder, making it more difficult to break, but also slows performance.

Modern Internet protocols typically pick keys large enough to make it infeasible to break with classical computers, but no larger. The sweet spot is around 128-bits of security; meaning a computer has to do approximately 2¹²⁸ operations to break it.

Arjen Lenstra and others created a useful measure of security levels by comparing the amount of energy it takes to break a key to the amount of water you can boil using that much energy. You can think of this as the electric bill you’d get if you run a computer long enough to crack the key.

  • 35-bit security is “Teaspoon security” — It takes about the same amount of energy to break a 35-bit key as it does to boil a teaspoon of water (pretty easy).

Welcome to Crypto Week 2019

  • 65 bits gets you up to “Pool security” – The energy needed to boil the average amount of water in a swimming pool.

Welcome to Crypto Week 2019

  • 105 bits is “Sea Security” – The energy needed to boil the Mediterranean Sea.

Welcome to Crypto Week 2019

  • 114-bits is “Global Security” –  The energy needed to boil all water on Earth.

Welcome to Crypto Week 2019

  • 128-bit security is safely beyond that of Global Security – Anything larger is overkill.
  • 256-bit security corresponds to “Universal Security” – The estimated mass-energy of the observable universe. So, if you ever hear someone suggest 256-bit AES, you know they mean business.

Welcome to Crypto Week 2019

Post-Quantum of Solace

As far as we know, the algorithms we use for cryptography are functionally uncrackable with all known algorithms that classical computers can run. Quantum computers change this calculus. Instead of transistors and bits, a quantum computer uses the effects of quantum mechanics to perform calculations that just aren’t possible with classical computers. As you can imagine, quantum computers are very difficult to build. However, despite large-scale quantum computers not existing quite yet, computer scientists have already developed algorithms that can only run efficiently on quantum computers. Surprisingly, it turns out that with a sufficiently powerful quantum computer, most of the hard mathematical problems we rely on for Internet security become easy!

Although there are still quantum-skeptics out there, some experts estimate that within 15-30 years these large quantum computers will exist, which poses a risk to every security protocol online. Progress is moving quickly; every few months a more powerful quantum computer is announced.

Welcome to Crypto Week 2019

Luckily, there are cryptography algorithms that rely on different hard math problems that seem to be resistant to attack from quantum computers. These math problems form the basis of so-called quantum-resistant (or post-quantum) cryptography algorithms that can run on classical computers. These algorithms can be used as substitutes for most of our current quantum-vulnerable algorithms.

  • Some quantum-resistant algorithms (such as McEliece and Lamport Signatures) were invented decades ago, but there’s a reason they aren’t in common use: they lack some of the nice properties of the algorithms we’re currently using, such as key size and efficiency.
  • Some quantum-resistant algorithms require much larger keys to provide 128-bit security
  • Some are very CPU intensive,
  • And some just haven’t been studied enough to know if they’re secure.

It is possible to swap our current set of quantum-vulnerable algorithms with new quantum-resistant algorithms, but it’s a daunting engineering task. With widely deployed protocols, it is hard to make the transition from something fast and small to something slower, bigger or more complicated without providing concrete user benefits. When exploring new quantum-resistant algorithms, minimizing user impact is of utmost importance to encourage adoption. This is a big deal, because almost all the protocols we use to protect the Internet are vulnerable to quantum computers.

Cryptography-breaking quantum computing is still in the distant future, but we must start the transition to ensure that today’s secure communications are safe from tomorrow’s quantum-powered onlookers; however, that’s not the most timely problem with the Internet. We haven’t addressed that…yet.

Attacking time

Just like DNS, BGP, and HTTP, the Network Time Protocol (NTP) is fundamental to how the Internet works. And like these other protocols, it is completely insecure.

  • Last year, Cloudflare introduced Roughtime as a mechanism for computers to access the current time from a trusted server in an authenticated way.
  • Roughtime is powerful because it provides a way to distribute trust among multiple time servers so that if one server attempts to lie about the time, it will be caught.

However, Roughtime is not exactly a secure drop-in replacement for NTP.

  • Roughtime lacks the complex mechanisms of NTP that allow it to compensate for network latency and yet maintain precise time, especially if the time servers are remote. This leads to imprecise time.
  • Roughtime also involves expensive cryptography that can further reduce precision. This lack of precision makes Roughtime useful for browsers and other systems that need coarse time to validate certificates (most certificates are valid for 3 months or more), but some systems (such as those used for financial trading) require precision to the millisecond or below.

With Roughtime we supported the time protocol of the future, but there are things we can do to help improve the health of security online today.

Welcome to Crypto Week 2019

Some academic researchers, including Aanchal Malhotra of Boston University, have demonstrated a variety of attacks against NTP, including BGP hijacking and off-path User Datagram Protocol (UDP) attacks.

  • Some of these attacks can be avoided by connecting to an NTP server that is close to you on the Internet.
  • However, to bring cryptographic trust to time while maintaining precision, we need something in between NTP and Roughtime.
  • To solve this, it’s natural to turn to the same system of trust that enabled us to patch HTTP and DNS: Web PKI.

Attacking the Web PKI

The Web PKI is similar to the RPKI, but is more widely visible since it relates to websites rather than routing tables.

  • If you’ve ever clicked the lock icon on your browser’s address bar, you’ve interacted with it.
  • The PKI relies on a set of trusted organizations called Certificate Authorities (CAs) to issue certificates to websites and web services.
  • Websites use these certificates to authenticate themselves to clients as part of the TLS protocol in HTTPS.

Welcome to Crypto Week 2019
TLS provides encryption and integrity from the client to the server with the help of a digital certificate 

Welcome to Crypto Week 2019
TLS connections are safe against MITM, because the client doesn’t trust the attacker’s certificate

While we were all patting ourselves on the back for moving the web to HTTPS, some researchers managed to find and exploit a weakness in the system: the process for getting HTTPS certificates.

Certificate Authorities (CAs) use a process called domain control validation (DCV) to ensure that they only issue certificates to websites owners who legitimately request them.

  • Some CAs do this validation manually, which is secure, but can’t scale to the total number of websites deployed today.
  • More progressive CAs have automated this validation process, but rely on insecure methods (HTTP and DNS) to validate domain ownership.

Without ubiquitous cryptography in place (DNSSEC may never reach 100% deployment), there is no completely secure way to bootstrap this system. So, let’s look at how to distribute trust using other methods.

One tool at our disposal is the distributed nature of the Cloudflare network.

Cloudflare is global. We have locations all over the world connected to dozens of networks. That means we have different vantage points, resulting in different ways to traverse networks. This diversity can prove an advantage when dealing with BGP hijacking, since an attacker would have to hijack multiple routes from multiple locations to affect all the traffic between Cloudflare and other distributed parts of the Internet. The natural diversity of the network raises the cost of the attacks.

A distributed set of connections to the Internet and using them as a quorum is a mighty paradigm to distribute trust, with or without cryptography.

Distributed Trust

This idea of distributing the source of trust is powerful. Last year we announced the Distributed Web Gateway that

  • Enables users to access content on the InterPlanetary File System (IPFS), a network structured to reduce the trust placed in any single party.
  • Even if a participant of the network is compromised, it can’t be used to distribute compromised content because the network is content-addressed.
  • However, using content-based addressing is not the only way to distribute trust between multiple independent parties.

Another way to distribute trust is to literally split authority between multiple independent parties. We’ve explored this topic before. In the context of Internet services, this means ensuring that no single server can authenticate itself to a client on its own. For example,

  • In HTTPS the server’s private key is the lynchpin of its security. Compromising the owner of the private key (by hook or by crook) gives an attacker the ability to impersonate (spoof) that service. This single point of failure puts services at risk. You can mitigate this risk by distributing the authority to authenticate the service between multiple independently-operated services.

Welcome to Crypto Week 2019
TLS doesn’t protect against server compromise

Welcome to Crypto Week 2019
With distributed trust, multiple parties combine to protect the connection

Welcome to Crypto Week 2019
An attacker that has compromised one of the servers cannot break the security of the system‌‌

The Internet barge is old and slow, and we’ve only been able to improve it through the meticulous process of patching it piece by piece. Another option is to build new secure systems on top of this insecure foundation. IPFS is doing this, and IPFS is not alone in its design. There has been more research into secure systems with decentralized trust in the last ten years than ever before.

The result is radical new protocols and designs that use exotic new algorithms. These protocols do not supplant those at the core of the Internet (like TCP/IP), but instead, they sit on top of the existing Internet infrastructure, enabling new applications, much like HTTP did for the web.

Gaining Traction

Some of the most innovative technical projects were considered failures because they couldn’t attract users. New technology has to bring tangible benefits to users to sustain it: useful functionality, content, and a decent user experience. Distributed projects, such as IPFS and others, are gaining popularity, but have not found mass adoption. This is a chicken-and-egg problem. New protocols have a high barrier to entryusers have to install new softwareand because of the small audience, there is less incentive to create compelling content. Decentralization and distributed trust are nice security features to have, but they are not products. Users still need to get some benefit out of using the platform.

An example of a system to break this cycle is the web. In 1992 the web was hardly a cornucopia of awesomeness. What helped drive the dominance of the web was its users.

  • The growth of the user base meant more incentive for people to build services, and the availability of more services attracted more users. It was a virtuous cycle.
  • It’s hard for a platform to gain momentum, but once the cycle starts, a flywheel effect kicks in to help the platform grow.

The Distributed Web Gateway project Cloudflare launched last year in Crypto Week is our way of exploring what happens if we try to kickstart that flywheel. By providing a secure, reliable, and fast interface from the classic web with its two billion users to the content on the distributed web, we give the fledgling ecosystem an audience.

  • If the advantages provided by building on the distributed web are appealing to users, then the larger audience will help these services grow in popularity.
  • This is somewhat reminiscent of how IPv6 gained adoption. It started as a niche technology only accessible using IPv4-to-IPv6 translation services.
  • IPv6 adoption has now grown so much that it is becoming a requirement for new services. For example, Apple is requiring that all apps work in IPv6-only contexts.

Eventually, as user-side implementations of distributed web technologies improve, people may move to using the distributed web natively rather than through an HTTP gateway. Or they may not! By leveraging Cloudflare’s global network to give users access to new technologies based on distributed trust, we give these technologies a better chance at gaining adoption.

Happy Crypto Week

At Cloudflare, we always support new technologies that help make the Internet better. Part of helping make a better Internet is scaling the systems of trust that underpin web browsing and protect them from attack. We provide the tools to create better systems of assurance with fewer points of vulnerability. We work with academic researchers of security to get a vision of the future and engineer away vulnerabilities before they can become widespread. It’s a constant journey.

Cloudflare knows that none of this is possible without the work of researchers. From award-winning researcher publishing papers in top journals to the blog posts of clever hobbyists, dedicated and curious people are moving the state of knowledge of the world forward. However, the push to publish new and novel research sometimes holds researchers back from committing enough time and resources to fully realize their ideas. Great research can be powerful on its own, but it can have an even broader impact when combined with practical applications. We relish the opportunity to stand on the shoulders of these giants and use our engineering know-how and global reach to expand on their work to help build a better Internet.

So, to all of you dedicated researchers, thank you for your work! Crypto Week is yours as much as ours. If you’re working on something interesting and you want help to bring the results of your research to the broader Internet, please contact us at [email protected]. We want to help you realize your dream of making the Internet safe and trustworthy.