Tag Archives: TLS

Cloudflare for SaaS for All, now Generally Available!

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/cloudflare-for-saas-for-all-now-generally-available/

Cloudflare for SaaS for All, now Generally Available!

Cloudflare for SaaS for All, now Generally Available!

During Developer Week a few months ago, we opened up the Beta for Cloudflare for SaaS: a one-stop shop for SaaS providers looking to provide fast load times, unparalleled redundancy, and the strongest security to their customers.

Since then, we’ve seen numerous developers integrate with our technology, allowing them to spend their time building out their solution instead of focusing on the burdens of running a fast, secure, and scalable infrastructure — after all, that’s what we’re here for.

Today, we are very excited to announce that Cloudflare for SaaS is generally available, so that every customer, big and small, can use Cloudflare for SaaS to continue scaling and building their SaaS business.

What is Cloudflare for SaaS?

If you’re running a SaaS company, you have customers that are fully reliant on you for your service. That means you’re responsible for keeping their domain fast, secure, and protected. But this isn’t simple. There’s a long checklist you need to get through to put a solution in your customers’ hands:

  • Set up an origin server
  • Encrypt your customers’ traffic
  • Keep your customers online
  • Boost the performance of global customers
  • Support vanity domains
  • Protect against attacks and bots
  • Scale for growth
  • Provide insights and analytics

And on top of that, you need to also focus on building out your solution and your business. As a developer or startup with limited resources, this can delay your product launch by weeks or months.

That’s what we’re here to help with! We have numerous engineering teams whose sole focus is to work on products that take care of each one of these tasks, so you don’t have to!

The Cloudflare solution:

  • Set up an origin server  → Workers
  • Encrypt your customers’ traffic →  SSL for SaaS
  • Keep your customers online → Cloudflare’s global Anycast network
  • Boost the performance of global customers → Argo Smart Routing/Cache
  • Support vanity domains → Custom Hostnames
  • Protect against attacks and bots → WAF and Bot Management
  • Scale for growth → Workers
  • Provide insights and analytics → Custom Hostname Analytics

Pricing, Made for Developers

Starting today, Cloudflare for SaaS is available to purchase on Free, Pro, and Business plans. We wanted to make sure that the pricing made sense for developers. At the time of building, you don’t know how many customers you’ll have, so we wanted to offer flexibility by keeping the pricing as simple as possible: only pay for the customers you use.

Each customer domain using the service is called a Custom Hostname. For each Custom Hostname, we automatically provision a TLS certificate. But not just that!  Beyond the TLS certificate, each of your Custom Hostnames inherits the full suite of Cloudflare products that you set up on your SaaS zone. From Bot Management to Argo Smart Routing, you can extend these add-ons that protect and accelerate your domain to your customers.

Custom Hostnames cost two dollars per month. We will only charge you after each Custom Hostname has been onboarded, adjusted according to when you created it. That means that if you created 10 Custom Hostnames at the start of the month and 10 Custom Hostnames halfway through, at the end of the month you will be billed $30.

This way, you’re only charged for the Custom Hostnames that you provision. It’s also a great incentive to make sure you clean up after your churned customers.

If you’re an Enterprise customer and want to learn more about the benefits that you can from Cloudflare for SaaS, make sure you check out our blog post about the latest developments.

Show us what you’re building!

During the beta alone, we’ve seen incredible projects built out on the platform. We wanted to showcase these developers to show you what’s possible. And even better, some of these have been built on our Workers platform! We’d love to see what you’re working on. Join our Discord channel and showcase your work! Have feature requests for us? Let us know!

mmm.page: Simple Personal Websites

Cloudflare for SaaS for All, now Generally Available!

mmm.page is a drag-and-drop website builder that makes it dead simple to create auto-responsive, collage-like websites: websites with overlapping text, images, GIFs, YouTube videos, Spotify embeds, and (a lot) more. To make it easier, all the standard website tedium — uptime, usability, performance, reliability, responsiveness, SEO, etc. — are handled under the hood so all you have to worry about is adding content and arranging it how you want.

Under their hood is Cloudflare. Cloudflare’s CDN allows both the flexibility of server-side pages as well as the instant loading times of static pages — not to mention an 80% reduction in server costs. Custom Hostnames alone saved months of development time by handling domain names and SSL management (which are otherwise tricky to get perfect and reliable).

They’ve used Workers for increasingly more tasks that would’ve otherwise taken an order of magnitude more time if implemented with their current backend monolith — the ease of deployment and comparatively low cost of Workers is something that keeps them coming back.

The longer-term hope is for pages to be used as a sort of beacon signal, an easy-to-make yet unbounded way to express to others the things you’re interested in, especially for things that aren’t so easily describable or captured in words. They look forward to a world of a ton more DIY micro-sites. Cloudflare has been crucial in taking care of much of the difficult technical plumbing and giving them more time to work on designs and features that get them closer to this hope.

Lightfunnels

Cloudflare for SaaS for All, now Generally Available!

Lightfunnels is a performance driven e-commerce and lead generation platform. It focuses on delivering fast, reliable, and highly converting sales funnels to its users and their customers.

With Cloudflare for SaaS, Lightfunnels allows users to preserve their brand by easily connecting their own domain names with SSL to use on their funnels.

The platform handles large e-commerce traffic volume through Cloudflare Workers. This helps Lightfunnels serve pages from the closest edge to the customer, wherever they are in the world, allowing for blazing fast page load speeds.

Workers also come with a powerful caching API that eliminates a great percentage of back-end trips and reduces the stress on their servers.

“Our aim is to build the best performing e-commerce and lead generation platform on the market. Page load speeds play a significant role in performance. Using Cloudflare for SaaS along with Cloudflare Workers made building a reliable, secure, and fast infrastructure a breeze.”
Yassir Ennazk, Co-founder & CEO at Lightfunnels

Ventrata

Ventrata is a SaaS multi-channel booking platform for large attractions and tour operators. They power booking sites and B2B booking portals for clients that run on other domains. Cloudflare for SaaS has allowed them to leverage all of Cloudflare’s tools, including Firewall, image caching, Workers, and free TLS certificates on Custom Hostnames, while allowing their clients to keep full control of their brand. Their implementation involved just 4 lines of code without any infrastructure/DevOps help required, which would have been impossible before.

Currently a part of the Beta?

If you were accepted as a part of the Cloudflare for SaaS Beta, you will get a notice next week about migrating to the paid version.  

Help build a better Internet

Want to be a part of the Cloudflare team and work on the products that power Cloudflare for SaaS? We’re hiring!

Revisiting BetterTLS: Certificate Path Building

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/revisiting-bettertls-certificate-path-building-4c978b79843f

By Ian Haken

Last year the AddTrust root certificate expired and lots of clients had a bad time. Some Roku devices weren’t working right, Heroku had problems, and some folks couldn’t even curl. In the aftermath Ryan Sleevi wrote a really great blog post not just about the issue of this one certificate’s expiry, but the problem that so many TLS implementations have in general with certificate path building. If you haven’t read that blog post, you should. This post is probably going to make a lot more sense if you’ve read that one first, so go ahead and read it now.

To recap that previous AddTrust root certificate expiry, there was a certificate graph that looked like this:

The AddTrust certificate graph

This is a real example, and you can see the five certificates in the above graph here:

  1. www.agwa.name (leaf certificate)
  2. Sectigo RSA Domain Validation Secure Server CA (intermediate CA)
  3. USERTrust RSA Certification Authority (intermediate CA)
  4. USERTrust RSA Certification Authority (self-signed)
  5. AddTrust External CA Root (self-signed)

The important thing to understand about a certificate graph is that the boxes represent entities (meaning an X.500 Distinguished Name and public key). Entities are things you trust (or don’t, as the case may be). The arrows between entities represent certificates: a way to extend trust from one entity to another. This means that if you trust either the “USERTrust RSA Certification Authority” entity or the “AddTrust External CA Root” entity, you should be able to discover a chain of trust from that trusted entity (the “trust anchor”) down to “www.agwa.name”, the “end-entity”.

(Note that the self-signed certificates (4 and 5) are often useful for defining trusted entities, but aren’t going to be important in the context of path building.)

The problem that occurred last summer started because certificate 3 expired. The “USERTrust RSA Certificate Authority” was relatively new and not directly trusted by many clients and so most servers would send certificates 1, 2, and 3 to clients. If a client only trusted “AddTrust External CA Root” then this would be fine; that client can build a certificate chain 1 ← 2 ← 3 and see that they should trust www.agwa.name’s public key. On the other hand, if the client trusts “USERTrust RSA Certification Authority” then that’s also fine; it only needs to build a chain 1 ← 2.

The problem that arose was that some clients weren’t good at certificate path building (even though this is a fairly simple case of path building compared to the next example below). Those clients didn’t realize that they could stop building a chain at 2 if they trusted “USERTrust RSA Certification Authority”. So when certificate 3 expired on May 30, 2020, these clients treated the entire collection of certificates sent by the server as invalid and would no longer establish trust in the end-entity.

Even though that story is a year old and was well covered then, I’m retelling it here because a couple of weeks ago something kind of similar happened: a certificate for the Let’s Encrypt R3 CA expired (certificate 2 below) on September 30, 2021. This should have been fine; the Let’s Encrypt R3 entity also has a certificate signed by the ISRG Root X1 CA (3) which nowadays is trusted by most clients.

The Let’s Encrypt R3 Certificate Graph
  1. src.agwa.name (leaf certificate)
  2. Let’s Encrypt R3 (signed by DST Root CA X3)
  3. Let’s Encrypt R3 (signed by ISRG Root X1)
  4. DST Root CA X3 (self-signed)
  5. ISRG Root X1 (self-signed)

But predictably, even though it’s been a year since Ryan’s post, lots of services and clients had issues. You should read Scott Helme’s full post-mortem on the event to understand some of the contributing factors, but one big problem is that most TLS implementations still aren’t very good at path building. As a result, servers generally can’t send a complete collection of certificates down to clients (containing different possible paths to different trust anchors) which makes it hard to host a service that both old and new devices can talk to.

Maybe it’s because I saw history repeating or maybe it’s because I had just listened to Ryan Sleevi talk about the history of web PKI, but the whole episode really made me want to get back to something I had been wanting to do for a while. So over the last couple of weeks I set some time aside, started reading some RFCs, had to get more coffee, finished reading some RFCs, and finally started making certificates. The end result is the first major update to BetterTLS since its first release: a new suite of tests to exercise TLS implementations’ certificate path building. As a bonus, it also checks whether TLS implementations apply certain validity checks. Some of the checks are part of RFCs, like Basic Constraints, while others are not fully standardized, like distrusting deprecated signature algorithms and enforcing EKU constraints on CAs.

I found the results of applying these tests to various TLS implementations pretty interesting, but before I get into those results let me give you a few more details about why TLS implementations should be doing good path building and why we care about it.

What is Certificate Path Building?

If you want the really detailed answer to “What is Certificate Path Building” you can take a look at RFC 4158. But in short, certificate path building is the process of building a chain of trust from some end entity (like the blue boxes in the examples above) back to a trust anchor (like the ISRG Root X1 CA) given a collection of certificates. In the context of TLS, that collection of certificates is sent from the server to the client as part of the TLS handshake. A lot of the time, that collection of certificates is actually already an ordered sequence of certificates from end-entity to trust anchor, such as in the first example where servers would send certificates 1, 2, 3. This happens to already be a chain from “www.agwa.name” to “AddTrust External CA Root”.

But what happens if we can’t be sure what trust anchor the client has, such as the second example above where the server doesn’t know if the client will trust DST Root CA X3 or ISRG Root X1? In this case the server could send all the certificates (1, 2, and 3) and let the client figure out which path makes sense (either 1 ← 2, or 1 ← 3). But if the client expects the server’s certificates to simply be a chain already, the sequence 1 ← 2 ← 3 is going to fail to validate.

Why Does This Matter?

The most important reason for clients to support robust path building is that it allows for agility in the web PKI ecosystem. For example, we can add additional certificates that conform to new requirements such as SHA-1 deprecation, validity length restrictions, or trust anchor removal, all while leaving existing certificates in place to preserve legacy client functionality. This allows static, infrequently updated, or intentionally end-of-lifed clients to continue working while browsers (which frequently enforce new constraints like the ones mentioned above) can take advantage of the additional certificates in the handshake that conform to the new requirements.

In particular, Netflix runs on a lot of devices. Millions of them. The reality though is that the above description applies to many of them. Some devices only run older versions of Android of iOS. Some are embedded devices that no longer receive updates. Regardless of the specifics, the update cadence (if one exists) for those devices is outside of our control. But ideally we’d love it if every device that used to work just kept working. To that end, it’s helpful to know what trade-offs we can make in terms of agility versus retaining support for every device. Are those devices stuck using certain signature algorithms or cipher suites? Will those devices accept a certificate set that includes extra certificates with other alternate signature algorithms?

As service owners, having test suites that can answer these questions can guide decision making. On the other hand, TLS library implementers using these test suites can ensure that applications built with their libraries operate reliably throughout churn in the web PKI ecosystem.

An Aside About Agility

More than 4 years passed between publication of the first draft of the TLS 1.3 specification and the final version. An impressive amount of consideration went into the design of all of the versions of the TLS and SSL protocols and it speaks to the designers’ foresight and diligence that a single server can support clients speaking SSL 3.0 (final specification released 1996) all the way up to TLS 1.3 (final specification released 2018).

(Although I should say that in practice, supporting such a broad set of protocol versions on a single server is probably not a good idea.)

The reason that TLS protocol can support this is because agility has been designed into the system. The client advertises the TLS versions, cipher suites, and extensions it supports and the server can make decisions about the best supported version of those options and negotiate the details in its response. Over time this has allowed the ecosystem to evolve gracefully, supporting new cryptographic primitives (like elliptic curve cryptography) and deprecating old ones (like the MD5 hash algorithm).

Unfortunately the TLS specification has not enabled the same agility with the certificates that it relies on in practice. While there are great specifications like RFC 4158 for how to think about certificate path building, TLS specifications up to 1.2 only allowed for server to present “the chain”:

This is a sequence (chain) of certificates. The sender’s certificate MUST come first in the list. Each following certificate MUST directly certify the one preceding it.

Only in TLS 1.3 did the specification allow for greater flexibility:

The sender’s certificate MUST come in the first CertificateEntry in the list. Each following certificate SHOULD directly certify the one immediately preceding it.

Note: Prior to TLS 1.3, “certificate_list” ordering required each certificate to certify the one immediately preceding it; however, some implementations allowed some flexibility. Servers sometimes send both a current and deprecated intermediate for transitional purposes, and others are simply configured incorrectly, but these cases can nonetheless be validated properly. For maximum compatibility, all implementations SHOULD be prepared to handle potentially extraneous certificates and arbitrary orderings from any TLS version, with the exception of the end-entity certificate which MUST be first.

This change to the specification is hugely significant because it’s the first formalization that TLS implementations should be doing robust path building. Implementations which conform to this are far more likely to continue operating in a PKI ecosystem undergoing frequent changes. If more TLS implementations can tolerate changes, then web PKI ecosystem will be in a place where it is able to undergo those changes. And ultimately this means we will be able to update best practices and retain trust agility as time goes on, making the web a more secure place.

It’s hard to imagine a world where SSL and TLS were so inflexible that we wouldn’t have been able to gracefully transition off of MD5 or transition to PFS cipher suites. I’m hopeful that this update to the TLS specification will help bring the same agility that has existed in the TLS protocol itself to the web PKI ecosystem.

Test Results

So what does the new test suite in BetterTLS tell us about the state of certificate path building in TLS implementations? The good news is that there has been some improvement in the state of the world since Ryan’s roundup last year. The bad news is that that improvement isn’t everywhere.

The test suite both identifies what relevant features a TLS implementation supports (like default distrust of deprecated signing algorithms) and evaluates correctness. Here’s a quick enumeration of what features this test suite looks for:

  • Branching Path Building: Implementations that support “branching” path building can handle cases like the Let’s Encrypt R3 example above where an entity has multiple issuing certificates and the client needs to check multiple possible paths to find a route to a trust anchor. Importantly, as invalid certificates are found during path building (for all of the reasons listed below) the implementation should be able to pick an alternate issuer certificate to continue building a path. This is the primary feature of interest in this test suite.
  • Certificate expiration: Implementations should treat expired certificates as invalid during path building. This is a pretty straightforward expectation and fortunately all the tested implementations were properly verifying this.
  • Name constraints: Implementations should treat certificates with a name constraint extension in conflict with the end entity’s identity as invalid. Check out BetterTLS’s name constraints test suite for more thorough evaluations of this evaluation. All of the implementations tested below correctly evaluated the simple name constraints check in this test suite.
  • Bad Extended Key Usage (EKU) on CAs: This check tests whether an implementation rejects CA certificates with an Extended Key Usage extension that is incompatible with the end-entity’s use of the certificate for TLS server authentication. The Mozilla Certificate Policy FAQ states:

Inclusion of EKU in CA certificates is generally allowed. NSS and CryptoAPI both treat the EKU extension in intermediate certificates as a constraint on the permitted EKU OIDs in end-entity certificates. Browsers and certificate client software have been using EKU in intermediate certificates, and it has been common for enterprise subordinate CAs in Windows environments to use EKU in their intermediate certificates to constrain certificate issuance.

While many implementations support the semantics of an incompatible EKU in CAs as a reason to treat a certificate as invalid, RFCs do not require this behavior so we do see several implementations below not applying this verification.

  • Missing Basic Constraints Extension: This check tests whether the implementation rejects paths where a CA is missing the Basic Constraints extension. RFC 5280 requires that all CAs have this extension, so all implementations should support this.
  • Not a CA: This check tests whether the implementation rejects paths where a CA has a Basic Constraints extension, but that extension does not identify the certificate as being a CA. Similarly to the above, all implementations should support this and fortunately all of the implementations tested applied this check correctly.
  • Deprecated Signing Algorithm: This check tests whether the implementation rejects certificates that have been signed with an algorithm that is considered deprecated (in particular, with an algorithm using SHA-1). Enforcement of SHA-1 deprecation is not universally present in all TLS implementations at this point, so we see a mix of implementations below that do and do not apply it.

For more information about these checks, check out the repository’s README. Now on to the results!

webpki

webpki is a rust library for validating web PKI certificates. It’s the underlying validation mechanism for the rustls library that I actually tested. webpki shows up as the hero of the non-browser TLS implementations, supporting all of the features and having a 100% test pass rate. webpki is primarily maintained by Brian Smith who also worked on the mozilla::pkix codebase that’s used by Firefox.

Go

Go didn’t distrust deprecated signature algorithms by default (although looking at the issues tracker, an update was merged to change this long before I started working on this test suite; it should land in Go 1.18), but otherwise supported all the features in the test suite. However, while it supported EKU constraints on CAs the test suite discovered a bug that causes path building to fail under certain conditions when only a subset of paths have an EKU constraint.

Upon inspection, the Go x509 library validates most certificate constraints (like expiration and name constraints) as it builds paths, but EKU constraints are only applied after candidate paths are found. This looks to be a violation of Sleevi’s Rule, which probably explains why the EKU corner case causes Go to have a bad time:

Even if a library supports path building, doing some form of depth-first search in the PKI graph, the next most common mistake is still treating path building and path verification as separable, independent steps. That is, the path builder finds “a chain” that is rooted in a trusted CA, and then completes. The completed chain is then handed to a path verifier, which asks “Does this chain meet all the caller’s/application’s requirements”, and returns a “Yes/No” answer. If the answer is “No”, you want the path builder to consider those other paths in the graph, to see if there are any “Yes” paths. Yet if the path building and verification steps are different, you’re bound to have a bad time.

Java

I didn’t evaluate JDKs other than OpenJDK, but the latest version of OpenJDK 11 actually performed quite well. This JDK didn’t enforce EKU constraints on CAs or distrust certificates signed with SHA-1 algorithms. Otherwise, the JDK did a good job of building certificate paths.

PKI.js

The PKI.js library is a javascript library that can perform a number of PKI-related operations including certificate verification. It’s unclear if the “certificate chain validator” is meant to support complex certificate sets or if it was only meant to handle pre-validated paths, but the implementation fared poorly against the test suite. It didn’t support EKU constraints, distrust deprecated signature algorithms, didn’t perform any branching path building, and failed to validate even a simple “chain” when a parent certificate has expired but the intermediate was already trusted (this is the same issue OpenSSL ran into with the expired AddTrust certificate last year).

Even worse, when the certificate pool had a cycle (like in RFC 4158 figure 7), the validator got stuck in an infinite loop.

OpenSSL

In short, OpenSSL doesn’t appear to have changed significantly since Ryan’s roundup last year. OpenSSL does support the less ubiquitous validation checks (such as EKU constraints on CAs and distrusting deprecated signing algorithms), but it still doesn’t support branching path building (only non-branching chains).

LibreSSL

LibreSSL showed significant improvement over last year’s evaluation, which appears to be largely attributable to Bob Beck’s work on a new x509 verifier in LibreSSL 3.2.2 based on Go’s verifier. It supported path building and passed all of the non-skipped tests. As with other implementations it didn’t distrust deprecated algorithms by default. The one big surprise though is that it also didn’t distrust certificates missing the Basic Constraints extension, which as we described above is strictly required by the RFC 5280 spec:

If the basic constraints extension is not present in a version 3 certificate, or the extension is present but the cA boolean is not asserted, then the certified public key MUST NOT be used to verify certificate signatures.

BoringSSL

BoringSSL performed similarly to OpenSSL. Not only did it not support any path building (only non-branching chains), but it also didn’t distrust deprecated signature algorithms.

GnuTLS

GnuTLS looked just like OpenSSL in its results. It also supported all the validation checks in the test suite (EKU constraints, deprecated signature algorithms) but didn’t support branching path building.

Browsers

By and large, browsers (or the operating system libraries they utilize) do a good job of path building.

Firefox (all platforms)

Firefox didn’t distrust deprecated signature algorithms, but otherwise supported path building and passed all tests.

Chrome (all platforms)
Chrome supported all validation cases and passed all tests.

Microsoft Edge (Windows)
Edge supported all validation cases and passed all tests.

Safari (MacOS)
Safari didn’t support EKU constraints on CAs but did pass simple branching path building test cases. However, it failed most of the more complicated path building test cases (such as cases with cycles).

Summary

+-----------+----------+-------------------+---+-------------------+
| | Supports | Distrusts SHA-1 |EKU| Has other errors? |
| | branching| signing algs? | | |
+-----------+----------+-------------------+---+-------------------+
| webpki | ✓ | ✓ | ✓ | |
+-----------+----------+-------------------+---+-------------------+| Go | ✓ | ✖ (Fixed in 1.18) | ✓ | EKU bug |
+-----------+----------+-------------------+---+-------------------+
| Java | ✓ | ✖ | ✖ | |
+-----------+----------+-------------------+---+-------------------+
| PKI.js | ✖ | ✖ | ✖ | Fails even non- |
| | | | | branching path |
| | | | | building cases, |
| | | | | has infinite loop |
+-----------+----------+-------------------+---+-------------------+
| OpenSSL | ✖ | ✓ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| LibreSSL | ✓ | ✖ | ✓ | Doesn't require |
| | | | | Basic Constraints |
+-----------+----------+-------------------+---+-------------------+
| BoringSSL | ✖ | ✖ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| GnuTLS | ✖ | ✓ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| Firefox | ✓ | ✖ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| Chrome | ✓ | ✓ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| Edge | ✓ | ✓ | ✓ | |
+-----------+----------+-------------------+---+-------------------+
| Safari | Kind of? | ✓ | ✖ | Failed complex |
| | | | | path finding cases|
+-----------+----------+-------------------+---+-------------------+

Closing Thoughts

For most of the history of TLS, implementations have been pretty poor at certificate path building (if they supported it at all). In fairness, until recently the TLS specifications asserted that servers MUST behave in such a way that didn’t require clients to implement certificate path building.

However the evolution of the web PKI ecosystem has necessitated flexibility and this has been more directly codified in the TLS 1.3 specification. If you work on a TLS implementation, you really really ought to take heed of these new expectations in the TLS 1.3 specification. We’re going to have a lot more stability on the web if implementations can do good path building.

To be clear, it doesn’t need to be every implementation’s goal to pass every test in this suite. I’ll be the first to admit that the test suite contains some more pathological test cases than you’re likely to see in web PKI in practice. But at a minimum you should look at the changes that have occurred in the web PKI ecosystem in the past decade and be confident that your library supports enough path building to easily handle transitions (such as servers sending multiple possible paths to a trust anchor). And passing all of the tests in the BetterTLS test suite is a useful metric for establishing that confidence.

It’s important to make sure clients are forward-compatible with changes to the web PKI, because it’s not a matter of “if” but “when.” In Scott’s own words:

One thing that’s certain is that this event is coming again. Over the next few years we’re going to see a wide selection of Root Certificates expiring for all of the major CAs and we’re likely to keep experiencing the exact same issues unless something changes in the wider ecosystem.

If you are in a position to choose between different client-side TLS libraries, you can use these test results as a point of consideration for which libraries are most likely to weather those changes.

And if you are a service owner, it is important to know your customers. Will they be able to handle a transition from RSA to ECDSA? Will they be able to handle a transition from ECDSA to a post-quantum signature algorithm? Will they be able to handle having multiple certificates in a handshake when an old trust is expiring or no longer trusted by new clients? Knowing your clients can help you be resilient and set up appropriate configurations and endpoints to support them.

Standards, security base lines, and best practices in web PKI have been rapidly changing over the last few years and are only going to keep changing. Whether you implement TLS or just consume it, whether it’s a distrusted CA, a broken signature algorithm, or just the expiry of a certificate in good standing, it’s important to make sure that your application will be able to handle the next big change. We hope that BetterTLS can play a part in making that easier!


Revisiting BetterTLS: Certificate Path Building was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Exported Authenticators: The long road to RFC

Post Syndicated from Jonathan Hoyland original https://blog.cloudflare.com/exported-authenticators-the-long-road-to-rfc/

Exported Authenticators: The long road to RFC

Exported Authenticators: The long road to RFC

Our earlier blog post talked in general terms about how we work with the IETF. In this post we’re going to talk about a particular IETF project we’ve been working on, Exported Authenticators (EAs). Exported Authenticators is a new extension to TLS that we think will prove really exciting. It unlocks all sorts of fancy new authentication possibilities, from TLS connections with multiple certificates attached, to logging in to a website without ever revealing your password.

Now, you might have thought that given the innumerable hours that went into the design of TLS 1.3 that it couldn’t possibly be improved, but it turns out that there are a number of places where the design falls a little short. TLS allows us to establish a secure connection between a client and a server. The TLS connection presents a certificate to the browser, which proves the server is authorised to use the name written on the certificate, for example blog.cloudflare.com. One of the most common things we use that ability for is delivering webpages. In fact, if you’re reading this, your browser has already done this for you. The Cloudflare Blog is delivered over TLS, and by presenting a certificate for blog.cloudflare.com the server proves that it’s allowed to deliver Cloudflare’s blog.

When your browser requests blog.cloudflare.com you receive a big blob of HTML that your browser then starts to render. In the dim and distant past, this might have been the end of the story. Your browser would render the HTML, and display it. Nowadays, the web has become more complex, and the HTML your browser receives often tells it to go and load lots of other resources. For example, when I loaded the Cloudflare blog just now, my browser made 73 subrequests.

As we mentioned in our connection coalescing blog post, sometimes those resources are also served by Cloudflare, but on a different domain. In our connection coalescing experiment, we acquired certificates with a special extension, called a Subject Alternative Name (SAN), that tells the browser that the owner of the certificate can act as two different websites. Along with some further shenanigans that you can read about in our blog post, this lets us serve the resources for both the domains over a single TLS connection.

Cloudflare, however, services millions of domains, and we have millions of certificates. It’s possible to generate certificates that cover lots of domains, and in fact this is what Cloudflare used to do. We used to use so-called “cruise-liner” certificates, with dozens of names on them. But for connection coalescing this quickly becomes impractical, as we would need to know what sub-resources each webpage might request, and acquire certificates to match. We switched away from this model because issues with individual domains could affect other customers.

What we’d like to be able to do is serve as much content as possible down a single connection. When a user requests a resource from a different domain they need to perform a new TLS handshake, costing valuable time and resources. Our connection coalescing experiment showed the benefits when we know in advance what resources are likely to be requested, but most of the time we don’t know what subresources are going to be requested until the requests actually arrive. What we’d rather do is attach extra identities to a connection after it’s been established, and we know what extra domains the client actually wants. Because the TLS connection is just a transport mechanism and doesn’t understand the information being sent across it, it doesn’t actually know what domains might subsequently be requested. This is only available to higher-layer protocols such as HTTP. However, we don’t want any website to be able to impersonate another, so we still need to have strong authentication.

Exported Authenticators

Enter Exported Authenticators. They give us even more than we asked for. They allow us to do application layer authentication that’s just as strong as the authentication you get from TLS, and then tie it to the TLS channel. Now that’s a pretty complicated idea, so let’s break it down.

To understand application layer authentication we first need to explain what the application layer is. The application layer is a reference to the OSI model. The OSI model describes the various layers of abstraction we use, to make things work across the Internet. When you’re developing your latest web application you don’t want to have to worry about how light is flickered down a fibre optic cable, or even how the TLS handshake is encoded (although that’s a fascinating topic in its own right, let’s leave that for another time.)

All you want to care about is having your content delivered to your end-user, and using TLS gives you a guaranteed in-order, reliable, authenticated channel over which you can communicate. You just shove bits in one end of the pipe, and after lots of blinky lights, fancy routing, maybe a touch of congestion control, and a little decoding, *poof*, your data arrives at the end-user.

The application layer is the top of the OSI stack, and contains things like HTTP. Because the TLS handshake is lower in the stack, the application is oblivious to this process. So, what Exported Authenticators give us is the ability for the very top of the stack to reliably authenticate their partner.

Exported Authenticators: The long road to RFC
The seven-layered OSI model

Now let’s jump back a bit, and discuss what we mean when we say that EAs give us authentication that’s as strong as TLS authentication. TLS, as we know, is used to create a secure connection between two endpoints, but lots of us are hazy when we try and pin down exactly what we mean by “secure”. The TLS standard makes eight specific promises, but rather than get buried in that particular ocean of weeds, let’s just pick out the one guarantee that we care about most: Peer Authentication.

Peer authentication: The client's view of the peer identity should reflect the server's identity. [...]

In other words, if the client thinks that it’s talking to example.com then it should, in fact, be talking to example.com.

What we want from EAs is that if I receive an EA then I have cryptographic proof that the person I’m talking to is the person I think I’m talking to. Now at this point you might be wondering what an EA actually looks like, and what it has to do with certificates. Well, an EA is actually a trio of messages, the first of which is a Certificate. The second is a CertificateVerify, a cryptographic proof that the sender knows the private key for the certificate. Finally there is a Finished message, which acts as a MAC, and proves the first two parts of the message haven’t been tampered with. If this structure sounds familiar to you, it’s because it’s the same structure as used by the server in the TLS handshake to prove it is the owner of the certificate.

The final piece of unpacking we need to do is explaining what we mean by tying the authentication to the TLS channel. Because EAs are an application layer construct they don’t provide any transport mechanism. So, whilst I know that the EA was created by the server I want to talk to, without binding the EA to a TLS connection I can’t be sure that I’m talking directly to the server I want.

Exported Authenticators: The long road to RFC
Without protection, a malicious server can move Exported Authenticators from one connection to another.

For all I know, the TLS server I’m talking to is creating a new TLS connection to the EA Server, and relaying my request, and then returning the response. This would be very bad, because it would allow a malicious server to impersonate any server that supports EAs.

Exported Authenticators: The long road to RFC
Because EAs are bound to a single TLS connection, if a malicious server copies an EA from one connection to another it will fail to verify.

EAs therefore have an extra security feature. They use the fact that every TLS connection is guaranteed to produce a unique set of keys. EAs take one of these keys and use it to construct the EA. This means that if some malicious third-party copies an EA from one TLS session to another, the recipient wouldn’t be able to validate it. This technique is called channel binding, and is another fascinating topic, but this post is already getting a bit long, so we’ll have to revisit channel binding in a future blog post.

How the sausage is made

OK, now we know what EAs do, let’s talk about how they were designed and built. EAs are going through the IETF standardisation process. Draft standards move through the IETF process starting as Internet Drafts (I-Ds), and ending up as published Requests For Comment (RFCs). RFCs are voluntary standards that underpin much of the global Internet plumbing, and not just for security protocols like TLS. RFCs define DNS, UDP, TCP, and many, many more.

The first step in producing a new IETF standard is coming up with a proposal. Designing security protocols is a very conservative business, firstly because it’s very easy to introduce really subtle bugs, and secondly, because if you do introduce a security issue, things can go very wrong, very quickly. A flaw in the design of a protocol can be especially problematic as it can be replicated across multiple independent implementations — for example the TLS renegotiation vulnerabilities reported in 2009 and the custom EC(DH) parameters vulnerability from 2012. To minimise the risks of design issues, EAs hew closely to the design of the TLS 1.3 handshake.

Security and Assurance

Before making a big change to how authentication works on the Internet, we want as much assurance as possible that we’re not going to break anything. To give us more confidence that EAs are secure, they reuse parts of the design of TLS 1.3. The TLS 1.3 design was carefully examined by dozens of experts, and underwent multiple rounds of formal analysis — more on that in a moment. Using well understood design patterns is a super important part of security protocols. Making something secure is incredibly difficult, because security issues can be introduced in thousands of ways, and an attacker only needs to find one. By starting from a well understood design we can leverage the years of expertise that went into it.

Another vital step in catching design errors early is baked into the IETF process: achieving rough consensus. Although the ins and outs of the IETF process are worthy of their own blog post, suffice it to say the IETF works to ensure that all technical objections get addressed, and even if they aren’t solved they are given due care and attention. Exported Authenticators were proposed way back in 2016, and after many rounds of comments, feedback, and analysis the TLS Working Group (WG) at the IETF has finally reached consensus on the protocol. All that’s left before the EA I-D becomes an RFC is for a final revision of the text to be submitted and sent to the RFC Editors, leading hopefully to a published standard very soon.

As we just mentioned, the WG has to come to a consensus on the design of the protocol. One thing that can hold up achieving consensus are worries about security. After the Snowden revelations there was a barrage of attacks on TLS 1.2, not to mention some even earlier attacks from academia. Changing how trust works on the Internet can be pretty scary, and the TLS WG didn’t want to be caught flat-footed. Luckily this coincided with the maturation of some tools and techniques we can use to get mathematical guarantees that a protocol is secure. This class of techniques is known as formal methods. To help ensure that people are confident in the security of EAs I performed a formal analysis.

Formal Analysis

Formal analysis is a special technique that can be used to examine security protocols. It creates a mathematical description of the protocol, the security properties we want it to have, and a model attacker. Then, aided by some sophisticated software, we create a proof that the protocol has the properties we want even in the presence of our model attacker. This approach is able to catch incredibly subtle edge cases, which, if not addressed, could lead to attacks, as has happened before. Trotting out a formal analysis gives us strong assurances that we haven’t missed any horrible issues. By sticking as closely as possible to the design of TLS 1.3 we were able to repurpose much of the original analysis for EAs, giving us a big leg up in our ability to prove their security. Our EA model is available in Bitbucket, along with the proofs. You can check it out using Tamarin, a theorem prover for security protocols.

Formal analysis, and formal methods in general, give very strong guarantees that rule out entire classes of attack. However, they are not a panacea. TLS 1.3 was subject to a number of rounds of formal analysis, and yet an attack was still found. However, this attack in many ways confirms our faith in formal methods. The attack was found in a blind spot of the proof, showing that attackers have been pushed to the very edges of the protocol. As our formal analyses get more and more rigorous, attackers will have fewer and fewer places to search for attacks. As formal analysis has become more and more practical, more and more groups at the IETF have been asking to see proofs of security before standardising new protocols. This hopefully will mean that future attacks on protocol design will become rarer and rarer.

Once the EA I-D becomes an RFC, then all sorts of cool stuff gets unlocked — for example OPAQUE-EAs, which will allow us to do password-based login on the web without the server ever seeing the password! Watch this space.

Exported Authenticators: The long road to RFC

Introducing SSL/TLS Recommender

Post Syndicated from Suleman Ahmad original https://blog.cloudflare.com/ssl-tls-recommender/

Introducing SSL/TLS Recommender

Introducing SSL/TLS Recommender

Seven years ago, Cloudflare made HTTPS availability for any Internet property easy and free with Universal SSL. At the time, few websites — other than those that processed sensitive data like passwords and credit card information — were using HTTPS because of how difficult it was to set up.

However, as we all started using the Internet for more and more private purposes (communication with loved ones, financial transactions, shopping, healthcare, etc.) the need for encryption became apparent. Tools like Firesheep demonstrated how easily attackers could snoop on people using public Wi-Fi networks at coffee shops and airports. The Snowden revelations showed the ease with which governments could listen in on unencrypted communications at scale. We have seen attempts by browser vendors to increase HTTPS adoption such as the recent announcement by Chromium for loading websites on HTTPS by default. Encryption has become a vital part of the modern Internet, not just to keep your information safe, but to keep you safe.

When it was launched, Universal SSL doubled the number of sites on the Internet using HTTPS. We are building on that with SSL/TLS Recommender, a tool that guides you to stronger configurations for the backend connection from Cloudflare to origin servers. Recommender has been available in the SSL/TLS tab of the Cloudflare dashboard since August 2020 for self-serve customers. Over 500,000 zones are currently signed up. As of today, it is available for all customers!

How Cloudflare connects to origin servers

Cloudflare operates as a reverse proxy between clients (“visitors”) and customers’ web servers (“origins”), so that Cloudflare can protect origin sites from attacks and improve site performance. This happens, in part, because visitor requests to websites proxied by Cloudflare are processed by an “edge” server located in a data center close to the client. The edge server either responds directly back to the visitor, if the requested content is cached, or creates a new request to the origin server to retrieve the content.

Introducing SSL/TLS Recommender

The backend connection to the origin can be made with an unencrypted HTTP connection or with an HTTPS connection where requests and responses are encrypted using the TLS protocol (historically known as SSL). HTTPS is the secured form of HTTP and should be used whenever possible to avoid leaking information or allowing content tampering by third-party entities. The origin server can further authenticate itself by presenting a valid TLS certificate to prevent active monster-in-the-middle attacks. Such a certificate can be obtained from a certificate authority such as Let’s Encrypt or Cloudflare’s Origin CA. Origins can also set up authenticated origin pull, which ensures that any HTTPS requests outside of Cloudflare will not receive a response from your origin.

Cloudflare Tunnel provides an even more secure option for the connection between Cloudflare and origins. With Tunnel, users run a lightweight daemon on their origin servers that proactively establishes secure and private tunnels to the nearest Cloudflare data centers. With this configuration, users can completely lock down their origin servers to only receive requests routed through Cloudflare. While we encourage customers to set up tunnels if feasible, it’s important to encourage origins with more traditional configurations to adopt the strongest possible security posture.

Detecting HTTPS support

You might wonder, why doesn’t Cloudflare always connect to origin servers with a secure TLS connection? To start, some origin servers have no TLS support at all (for example, certain shared hosting providers and even government sites have been slow adopters) and rely on Cloudflare to ensure that the client request is at least encrypted over the Internet from the browser to Cloudflare’s edge.

Then why don’t we simply probe the origin to determine if TLS is supported? It turns out that many sites only partially support HTTPS, making the problem non-trivial. A single customer site can be served from multiple separate origin servers with differing levels of TLS support. For instance, some sites support HTTPS on their landing page but serve certain resources only over unencrypted HTTP. Further, site content can differ when accessed over HTTP versus HTTPS (for example, http://example.com and https://example.com can return different results).

Such content differences can arise due to misconfiguration on the origin server, accidental mistakes by developers when migrating their servers to HTTPS, or can even be intentional depending on the use case.

A study by researchers at Northeastern University, the Max Planck Institute for Informatics, and the University of Maryland highlights reasons for some of these inconsistencies. They found that 1.5% of surveyed sites had at least one page that was unavailable over HTTPS — despite the protocol being supported on other pages — and 3.7% of sites served different content over HTTP versus HTTPS for at least one page. Thus, always using the most secure TLS setting detected on a particular resource could result in unforeseen side effects and usability issues for the entire site.

We wanted to tackle all such issues and maximize the number of TLS connections to origin servers, but without compromising a website’s functionality and performance.

Introducing SSL/TLS Recommender
Content differences on sites when loaded over HTTPS vs HTTP; images taken from https://www.cs.umd.edu/~dml/papers/https_tma20.pdf with author permission

Configuring the SSL/TLS encryption mode

Cloudflare relies on customers to indicate the level of TLS support at their origins via the zone’s SSL/TLS encryption mode. The following SSL/TLS encryption modes can be configured from the Cloudflare dashboard:

  • Off indicates that client requests reaching Cloudflare as well as Cloudflare’s requests to the origin server should only use unencrypted HTTP. This option is never recommended, but is still in use by a handful of customers for legacy reasons or testing.
  • Flexible allows clients to connect to Cloudflare’s edge via HTTPS, but requests to the origin are over HTTP only. This is the most common option for origins that do not support TLS. However, we encourage customers to upgrade their origins to support TLS whenever possible and only use Flexible as a last resort.
  • Full enables encryption for requests to the origin when clients connect via HTTPS, but Cloudflare does not attempt to validate the certificate. This is useful for origins that have a self-signed or otherwise invalid certificate at the origin, but leaves open the possibility for an active attacker to impersonate the origin server with a fake certificate. Client HTTP requests result in HTTP requests to the origin.
  • Full (strict) indicates that Cloudflare should validate the origin certificate to fully secure the connection. The origin certificate can either be issued by a public CA or by Cloudflare Origin CA. HTTP requests from clients result in HTTP requests to the origin, exactly the same as in Full mode. We strongly recommend Full (strict) over weaker options if supported by the origin.
  • Strict (SSL-Only Origin Pull) causes all traffic to the origin to go over HTTPS, even if the client request was HTTP. This differs from Full (strict) in that HTTP client requests will result in an HTTPS request to the origin, not HTTP. Most customers do not need to use this option, and it is available only to Enterprise customers. The preferred way to ensure that no HTTP requests reach your origin is to enable Always Use HTTPS in conjunction with Full or Full (strict) to redirect visitor HTTP requests to the HTTPS version of the content.
Introducing SSL/TLS Recommender
SSL/TLS encryption modes determine how Cloudflare connects to origins

The SSL/TLS encryption mode is a zone-wide setting, meaning that Cloudflare applies the same policy to all subdomains and resources. If required, you can configure this setting more granularly via Page Rules. Misconfiguring this setting can make site resources unavailable. For instance, suppose your website loads certain assets from an HTTP-only subdomain. If you set your zone to Full or Full (strict), you might make these assets unavailable for visitors that request the content over HTTPS, since the HTTP-only subdomain lacks HTTPS support.

Importance of secure origin connections

When an end-user visits a site proxied by Cloudflare, there are two connections to consider: the front-end connection between the visitor and Cloudflare and the back-end connection between Cloudflare and the customer origin server. The front-end connection typically presents the largest attack surface (for example, think of the classic example of an attacker snooping on a coffee shop’s Wi-Fi network), but securing the back-end connection is equally important. While all SSL/TLS encryption modes (except Off) secure the front-end connection, less secure modes leave open the possibility of malicious activity on the backend.

Consider a zone set to Flexible where the origin is connected to the Internet via an untrustworthy ISP. In this case, spyware deployed by the customer’s ISP in an on-path middlebox could inspect the plaintext traffic from Cloudflare to the origin server, potentially resulting in privacy violations or leaks of confidential information. Upgrading the zone to Full or a stronger mode to encrypt traffic to the ISP would help prevent this basic form of snooping.

Similarly, consider a zone set to Full where the origin server is hosted in a shared hosting provider facility. An attacker colocated in the same facility could generate a fake certificate for the origin (since the certificate isn’t validated for Full) and deploy an attack technique such as ARP spoofing to direct traffic intended for the origin server to an attacker-owned machine instead. The attacker could then leverage this setup to inspect and filter traffic intended for the origin, resulting in site breakage or content unavailability. The attacker could even inject malicious JavaScript into the response served to the visitor to carry out other nefarious goals. Deploying a valid Cloudflare-trusted certificate on the origin and configuring the zone to use Full (strict) would prevent Cloudflare from trusting the attacker’s fake certificate in this scenario, preventing the hijack.

Since a secure backend only improves your website security, we strongly encourage setting your zone to the highest possible SSL/TLS encryption mode whenever possible.

Balancing functionality and security

When Universal SSL was launched, Cloudflare’s goal was to get as many sites away from the status quo of HTTP as possible. To accomplish this, Cloudflare provisioned TLS certificates for all customer domains to secure the connection between the browser and the edge. Customer sites that did not already have TLS support were defaulted to Flexible, to preserve existing site functionality. Although Flexible is not recommended for most zones, we continue to support this option as some Cloudflare customers still rely on it for origins that do not yet support TLS. Disabling this option would make these sites unavailable. Currently, the default option for newly onboarded zones is Full if we detect a TLS certificate on the origin zone, and Flexible otherwise.

Further, the SSL/TLS encryption mode configured at the time of zone sign-up can become suboptimal as a site evolves. For example, a zone might switch to a hosting provider that supports origin certificate installation. An origin server that is able to serve all content over TLS should at least be on Full. An origin server that has a valid TLS certificate installed should use Full (strict) to ensure that communication between Cloudflare and the origin server is not susceptible to monster-in-the-middle attacks.

The Research team combined lessons from academia and our engineering efforts to make encryption easy, while ensuring the highest level of security possible for our customers. Because of that goal, we’re proud to introduce SSL/TLS Recommender.

SSL/TLS Recommender

Cloudflare’s mission is to help build a better Internet, and that includes ensuring that requests from visitors to our customers’ sites are as secure as possible. To that end, we began by asking ourselves the following question: how can we detect when a customer is able to use a more secure SSL/TLS encryption mode without impacting site functionality?

To answer this question, we built the SSL/TLS Recommender. Customers can enable Recommender for a zone via the SSL/TLS tab of the Cloudflare dashboard. Using a zone’s currently configured SSL/TLS option as the baseline for expected site functionality, the Recommender performs a series of checks to determine if an upgrade is possible. If so, we email the zone owner with the recommendation. If a zone is currently misconfigured — for example, an HTTP-only origin configured on Full — Recommender will not recommend a downgrade.

Introducing SSL/TLS Recommender

The checks that Recommender runs are determined by the site’s currently configured SSL/TLS option.

The simplest check is to determine if a customer can upgrade from Full to Full (strict). In this case, all site resources are already served over HTTPS, so the check comprises a few simple tests of the validity of the TLS certificate for the domain and all subdomains (which can be on separate origin servers).

The check to determine if a customer can upgrade from Off or Flexible to Full is more complex. A site can be upgraded if all resources on the site are available over HTTPS and the content matches when served over HTTP versus HTTPS. Recommender carries out this check as follows:

  • Crawl customer sites to collect links. For large sites where it is impractical to scan every link, Recommender tests only a subset of links (up to some threshold), leading to a trade-off between performance and potential false positives. Similarly, for sites where the crawl turns up an insufficient number of links, we augment our results with a sample of links from recent visitors requests to the zone to provide a high-confidence recommendation. The crawler uses the user agent Cloudflare-SSLDetector and has been added to Cloudflare’s list of known good bots. Similar to other Cloudflare crawlers, Recommender ignores robots.txt (except for rules explicitly targeting the crawler’s user agent) to avoid negatively impacting the accuracy of the recommendation.
  • Download the content of each link over both HTTP and HTTPS. Recommender makes only idempotent GET requests when scanning origin servers to avoid modifying server resource state.
  • Run a content similarity algorithm to determine if the content matches. The algorithm is adapted from a research paper called “A Deeper Look at Web Content Availability and Consistency over HTTP/S” (TMA Conference 2020) and is designed to provide an accurate similarity score even for sites with dynamic content.

Recommender is conservative with recommendations, erring on the side of maintaining current site functionality rather than risking breakage and usability issues. If a zone is non-functional, the zone owner blocks all types of bots, or if misconfigured SSL-specific Page Rules are applied to the zone, then Recommender will not be able to complete its scans and provide a recommendation. Therefore, it is not intended to resolve issues with website or domain functionality, but rather maximize your zone’s security when possible.

Please send questions and feedback to [email protected]. We’re excited to continue this line of work to improve the security of customer origins!

Mentions

While this work is led by the Research team, we have been extremely privileged to get support from all across the company!

Special thanks to the incredible team of interns that contributed to SSL/TLS Recommender. Suleman Ahmad (now full-time), Talha Paracha, and Ananya Ghose built the current iteration of the project and Matthew Bernhard helped to lay the groundwork in a previous iteration of the project.

Handshake Encryption: Endgame (an ECH update)

Post Syndicated from Christopher Wood original https://blog.cloudflare.com/handshake-encryption-endgame-an-ech-update/

Handshake Encryption: Endgame (an ECH update)

Handshake Encryption: Endgame (an ECH update)

Privacy and security are fundamental to Cloudflare, and we believe in and champion the use of cryptography to help provide these fundamentals for customers, end-users, and the Internet at large. In the past, we helped specify, implement, and ship TLS 1.3, the latest version of the transport security protocol underlying the web, to all of our users. TLS 1.3 vastly improved upon prior versions of the protocol with respect to security, privacy, and performance: simpler cryptographic algorithms, more handshake encryption, and fewer round trips are just a few of the many great features of this protocol.

TLS 1.3 was a tremendous improvement over TLS 1.2, but there is still room for improvement. Sensitive metadata relating to application or user intent is still visible in plaintext on the wire. In particular, all client parameters, including the name of the target server the client is connecting to, are visible in plaintext. For obvious reasons, this is problematic from a privacy perspective: Even if your application traffic to crypto.cloudflare.com is encrypted, the fact you’re visiting crypto.cloudflare.com can be quite revealing.

And so, in collaboration with other participants in the standardization community and members of industry, we embarked towards a solution for encrypting all sensitive TLS metadata in transit. The result: TLS Encrypted ClientHello (ECH), an extension to protect this sensitive metadata during connection establishment.

Last year, we described the current status of this standard and its relation to the TLS 1.3 standardization effort, as well as ECH’s predecessor, Encrypted SNI (ESNI). The protocol has come a long way since then, but when will we know when it’s ready? There are many ways by which one can measure a protocol. Is it implementable? Is it easy to enable? Does it seamlessly integrate with existing protocols or applications? In order to assess these questions and see if the Internet is ready for ECH, the community needs deployment experience. Hence, for the past year, we’ve been focused on making the protocol stable, interoperable, and, ultimately, deployable. And today, we’re pleased to announce that we’ve begun our initial deployment of TLS ECH.

What does ECH mean for connection security and privacy on the network? How does it relate to similar technologies and concepts such as domain fronting? In this post, we’ll dig into ECH details and describe what this protocol does to move the needle to help build a better Internet.

Connection privacy

For most Internet users, connections are made to perform some type of task, such as loading a web page, sending a message to a friend, purchasing some items online, or accessing bank account information. Each of these connections reveals some limited information about user behavior. For example, a connection to a messaging platform reveals that one might be trying to send or receive a message. Similarly, a connection to a bank or financial institution reveals when the user typically makes financial transactions. Individually, this metadata might seem harmless. But consider what happens when it accumulates: does the set of websites you visit on a regular basis uniquely identify you as a user? The safe answer is: yes.

This type of metadata is privacy-sensitive, and ultimately something that should only be known by two entities: the user who initiates the connection, and the service which accepts the connection. However, the reality today is that this metadata is known to more than those two entities.

Making this information private is no easy feat. The nature or intent of a connection, i.e., the name of the service such as crypto.cloudflare.com, is revealed in multiple places during the course of connection establishment: during DNS resolution, wherein clients map service names to IP addresses; and during connection establishment, wherein clients indicate the service name to the target server. (Note: there are other small leaks, though DNS and TLS are the primary problems on the Internet today.)

As is common in recent years, the solution to this problem is encryption. DNS-over-HTTPS (DoH) is a protocol for encrypting DNS queries and responses to hide this information from onpath observers. Encrypted Client Hello (ECH) is the complementary protocol for TLS.

The TLS handshake begins when the client sends a ClientHello message to the server over a TCP connection (or, in the context of QUIC, over UDP) with relevant parameters, including those that are sensitive. The server responds with a ServerHello, encrypted parameters, and all that’s needed to finish the handshake.

Handshake Encryption: Endgame (an ECH update)

The goal of ECH is as simple as its name suggests: to encrypt the ClientHello so that privacy-sensitive parameters, such as the service name, are unintelligible to anyone listening on the network. The client encrypts this message using a public key it learns by making a DNS query for a special record known as the HTTPS resource record. This record advertises the server’s various TLS and HTTPS capabilities, including ECH support. The server decrypts the encrypted ClientHello using the corresponding secret key.

Conceptually, DoH and ECH are somewhat similar. With DoH, clients establish an encrypted connection (HTTPS) to a DNS recursive resolver such as 1.1.1.1 and, within that connection, perform DNS transactions.

Handshake Encryption: Endgame (an ECH update)

With ECH, clients establish an encrypted connection to a TLS-terminating server such as crypto.cloudflare.com, and within that connection, request resources for an authorized domain such as cloudflareresearch.com.

Handshake Encryption: Endgame (an ECH update)

There is one very important difference between DoH and ECH that is worth highlighting. Whereas a DoH recursive resolver is specifically designed to allow queries for any domain, a TLS server is configured to allow connections for a select set of authorized domains. Typically, the set of authorized domains for a TLS server are those which appear on its certificate, as these constitute the set of names for which the server is authorized to terminate a connection.

Basically, this means the DNS resolver is open, whereas the ECH client-facing server is closed. And this closed set of authorized domains is informally referred to as the anonymity set. (This will become important later on in this post.) Moreover, the anonymity set is assumed to be public information. Anyone can query DNS to discover what domains map to the same client-facing server.

Why is this distinction important? It means that one cannot use ECH for the purposes of connecting to an authorized domain and then interacting with a different domain, a practice commonly referred to as domain fronting. When a client connects to a server using an authorized domain but then tries to interact with a different domain within that connection, e.g., by sending HTTP requests for an origin that does not match the domain of the connection, the request will fail.

From a high level, encrypting names in DNS and TLS may seem like a simple feat. However, as we’ll show, ECH demands a different look at security and an updated threat model.

A changing threat model and design confidence

The typical threat model for TLS is known as the Dolev-Yao model, in which an active network attacker can read, write, and delete packets from the network. This attacker’s goal is to derive the shared session key. There has been a tremendous amount of research analyzing the security of TLS to gain confidence that the protocol achieves this goal.

The threat model for ECH is somewhat stronger than considered in previous work. Not only should it be hard to derive the session key, it should also be hard for the attacker to determine the identity of the server from a known anonymity set. That is, ideally, it should have no more advantage in identifying the server than if it simply guessed from the set of servers in the anonymity set. And recall that the attacker is free to read, write, and modify any packet as part of the TLS connection. This means, for example, that an attacker can replay a ClientHello and observe the server’s response. It can also extract pieces of the ClientHello — including the ECH extension — and use them in its own modified ClientHello.

Handshake Encryption: Endgame (an ECH update)

The design of ECH ensures that this sort of attack is virtually impossible by ensuring the server certificate can only be decrypted by either the client or client-facing server.

Something else an attacker might try is masquerade as the server and actively interfere with the client to observe its behavior. If the client reacted differently based on whether the server-provided certificate was correct, this would allow the attacker to test whether a given connection using ECH was for a particular name.

Handshake Encryption: Endgame (an ECH update)

ECH also defends against this attack by ensuring that an attacker without access to the private ECH key material cannot actively inject anything into the connection.

The attacker can also be entirely passive and try to infer encrypted information from other visible metadata, such as packet sizes and timing. (Indeed, traffic analysis is an open problem for ECH and in general for TLS and related protocols.) Passive attackers simply sit and listen to TLS connections, and use what they see and, importantly, what they know to make determinations about the connection contents. For example, if a passive attacker knows that the name of the client-facing server is crypto.cloudflare.com, and it sees a ClientHello with ECH to crypto.cloudflare.com, it can conclude, with reasonable certainty, that the connection is to some domain in the anonymity set of crypto.cloudflare.com.

The number of potential attack vectors is astonishing, and something that the TLS working group has tripped over in prior iterations of the ECH design. Before any sort of real world deployment and experiment, we needed confidence in the design of this protocol. To that end, we are working closely with external researchers on a formal analysis of the ECH design which captures the following security goals:

  1. Use of ECH does not weaken the security properties of TLS without ECH.
  2. TLS connection establishment to a host in the client-facing server’s anonymity set is indistinguishable from a connection to any other host in that anonymity set.

We’ll write more about the model and analysis when they’re ready. Stay tuned!

There are plenty of other subtle security properties we desire for ECH, and some of these drill right into the most important question for a privacy-enhancing technology: Is this deployable?

Focusing on deployability

With confidence in the security and privacy properties of the protocol, we then turned our attention towards deployability. In the past, significant protocol changes to fundamental Internet protocols such as TCP or TLS have been complicated by some form of benign interference. Network software, like any software, is prone to bugs, and sometimes these bugs manifest in ways that we only detect when there’s a change elsewhere in the protocol. For example, TLS 1.3 unveiled middlebox ossification bugs that ultimately led to the middlebox compatibility mode for TLS 1.3.

While itself just an extension, the risk of ECH exposing (or introducing!) similar bugs is real. To combat this problem, ECH supports a variant of GREASE whose goal is to ensure that all ECH-capable clients produce syntactically equivalent ClientHello messages. In particular, if a client supports ECH but does not have the corresponding ECH configuration, it uses GREASE. Otherwise, it produces a ClientHello with real ECH support. In both cases, the syntax of the ClientHello messages is equivalent.

This hopefully avoids network bugs that would otherwise trigger upon real or fake ECH. Or, in other words, it helps ensure that all ECH-capable client connections are treated similarly in the presence of benign network bugs or otherwise passive attackers. Interestingly, active attackers can easily distinguish — with some probability — between real or fake ECH. Using GREASE, the ClientHello carries an ECH extension, though its contents are effectively randomized, whereas a real ClientHello using ECH has information that will match what is contained in DNS. This means an active attacker can simply compare the ClientHello against what’s in the DNS. Indeed, anyone can query DNS and use it to determine if a ClientHello is real or fake:

$ dig +short crypto.cloudflare.com TYPE65
\# 134 0001000001000302683200040008A29F874FA29F884F000500480046 FE0D0042D500200020E3541EC94A36DCBF823454BA591D815C240815 77FD00CAC9DC16C884DF80565F0004000100010013636C6F7564666C 6172652D65736E692E636F6D00000006002026064700000700000000 0000A29F874F260647000007000000000000A29F884F

Despite this obvious distinguisher, the end result isn’t that interesting. If a server is capable of ECH and a client is capable of ECH, then the connection most likely used ECH, and whether clients and servers are capable of ECH is assumed public information. Thus, GREASE is primarily intended to ease deployment against benign network bugs and otherwise passive attackers.

Note, importantly, that GREASE (or fake) ECH ClientHello messages are semantically different from real ECH ClientHello messages. This presents a real problem for networks such as enterprise settings or school environments that otherwise use plaintext TLS information for the purposes of implementing various features like filtering or parental controls. (Encrypted DNS protocols like DoH also encountered similar obstacles in their deployment.) Fundamentally, this problem reduces to the following: How can networks securely disable features like DoH and ECH? Fortunately, there are a number of approaches that might work, with the more promising one centered around DNS discovery. In particular, if clients could securely discover encrypted recursive resolvers that can perform filtering in lieu of it being done at the TLS layer, then TLS-layer filtering might be wholly unnecessary. (Other approaches, such as the use of canary domains to give networks an opportunity to signal that certain features are not permitted, may work, though it’s not clear if these could or would be abused to disable ECH.)

We are eager to collaborate with browser vendors, network operators, and other stakeholders to find a feasible deployment model that works well for users without ultimately stifling connection privacy for everyone else.

Next steps

ECH is rolling out for some FREE zones on our network in select geographic regions. We will continue to expand the set of zones and regions that support ECH slowly, monitoring for failures in the process. Ultimately, the goal is to work with the rest of the TLS working group and IETF towards updating the specification based on this experiment in hopes of making it safe, secure, usable, and, ultimately, deployable for the Internet.

ECH is one part of the connection privacy story. Like a leaky boat, it’s important to look for and plug all the gaps before taking on lots of passengers! Cloudflare Research is committed to these narrow technical problems and their long-term solutions. Stay tuned for more updates on this and related protocols.

Staging TLS Certificate: Make every deployment a safe deployment

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/staging-tls-certificate-every-deployment-safe-deployment/

Staging TLS Certificate: Make every deployment a safe deployment

Staging TLS Certificate: Make every deployment a safe deployment

We are excited to announce that Enterprise customers now have the ability to test custom uploaded certificates in a staging environment before pushing them to production.

With great power comes great responsibility

If you’re running a website or the API that’s behind a popular app, you know your users have high expectations: it can’t just be up and running; it also has to be fast and secure. One of the easiest and most standardized ways to secure connections is with the TLS protocol. To do that, you need to acquire a TLS certificate for your domain.

One way to get a certificate is by using a CDN provider, like Cloudflare. We make the process really easy by issuing certificates on your behalf. Not just that, but when your certificate is getting closer to its expiration date, we are responsible for re-issuing it. But, if you don’t want Cloudflare to issue the certificate on your behalf and want to obtain the certificate yourself, you can do so. You can either keep control of your private key, or generate a Certificate Signing Request (CSR) through Cloudflare, so we maintain the private key, but you can still use the certificate authority (CA) of your choice for the certificate. Once you get your certificate, you upload it to Cloudflare and voilà — your site is secure.

One of the downsides of obtaining your own certificate is that you’re in charge of its renewal, which comes with a great deal of responsibility. You have to ensure that any renewed certificate  will not cause TLS errors, causing your property to be unreachable by your users.

Now, you’re probably wondering, why is this so risky? What could go wrong?

It’s possible that when you renew a certificate, you forget to include one of your subdomains in the certificate. Another reason why your clients might see failures when you upload the new certificate is that some of those clients are pinning the old certificate.

The problem with pinning certificates

Certificate pinning allows Internet application owners to say “trust this certificate and no other” by pinning the certificate or public key to their service, or even to client devices. While this was originally intended to tighten security, certificate pinning has created many problems. We have seen customers shoot themselves in the foot with certificate pinning numerous times. There are a lot of things that could go wrong: you could pin the wrong certificate, or configure the wrong settings, which will bring your service down. It’s possible that you rely on a CDN for your certificate renewal and forgot to update your pin — again, blocking access to your service in the process. Alternatively, you can pin certificates to clients so that they refuse connections from servers that don’t have the same certificate. But when it comes to renewing that certificate, you have to make sure that your clients have the updated certificate, or they won’t be able to access your server. Instead, you can use tools like Cloudflare’s Certificate Transparency Monitoring which notifies you when certificates are being issued for your domain.

But, some of our customers do rely on this behavior. So for them, switching to a new certificate is a big change — one that can bring down their whole application. For this reason, they need to be able to test the new certificate to identify any problems that may occur before their customers see it.

A staging environment for certificate deployments

When it comes to your traffic, there are things you can’t predict. You can’t control if your customer’s browser is having a bad day, or if some ISP is having a network outage. When it comes to certificate deployment, the last thing you want is to be surprised. Imagine pushing out a new certificate, and realizing you’ve uploaded the wrong one only after customers complain they’ve lost access; or that you never updated your pin to your new certificate to begin with. Mistakes happen, but this type of outage can cause a lot of pain and lose you a lot of money and trust. The good news is that this is preventable.

We are giving customers the ability to find these problems before they’re encountered by real users. We’re doing this by giving customers the ability to test certificate deployment changes against a pair of staging IPs.

Imagine uploading a new certificate, finding any related problems, and fixing them in a way that doesn’t impact production traffic. That’s exactly what Staging Certificates allows you to do!

You can go to the Staging Certificates section under the SSL/TLS tab in the Cloudflare dashboard and upload a new certificate to our staging network. The staging network will replicate your production environment but will only be accessible through the Staging IPs.

Staging TLS Certificate: Make every deployment a safe deployment

Once you upload your certificate, you can make `curl` requests against the staging IPs to verify that it’s being served and that it covers all the hostnames that you need it to. If you’ve decided to pin your previous certificate, the staging environment helps you verify that your pins have been updated correctly and that TLS termination is successful.

Once you’re confident in your tests, you can push the certificate out to production.

Oh no, something went wrong

Mistakes happen. It’s possible that you found one client device whose pin you forgot to update. In that case, you can disable the certificate to quickly mitigate the problem by rolling back to the certificate that you were last serving.

Staging TLS Certificate: Make every deployment a safe deployment

What if you realize that this is a bigger issue, and you need more time to test? You can push the disabled certificate back to the staging environment, and when you’ve verified everything works correctly, push it back to production.

Staging changes: We’re just getting started

We want to de-risk every change you make. Staging your custom uploaded certificates is a start, but it doesn’t end there. In the future, we’ll allow you to stage certificate renewals for certificates issued through Cloudflare. We’re also planning to give customers the ability to test TLS configuration changes like minimum TLS or cipher suite settings.

If you’re interested in staying aware of these future developments, or you have some settings that you want to test in staging, reach out to your Account team and let them know!

Ready to test your next certificate deployment?

This feature is currently in Beta and available to Enterprise customers. If you want to use it, reach out to your Account team to get it enabled.

How to use ACM Private CA for enabling mTLS in AWS App Mesh

Post Syndicated from Raj Jain original https://aws.amazon.com/blogs/security/how-to-use-acm-private-ca-for-enabling-mtls-in-aws-app-mesh/

Securing east-west traffic in service meshes, such as AWS App Mesh, by using mutual Transport Layer Security (mTLS) adds an additional layer of defense beyond perimeter control. mTLS adds bidirectional peer-to-peer authentication on top of the one-way authentication in normal TLS. This is done by adding a client-side certificate during the TLS handshake, through which a client proves possession of the corresponding private key to the server, and as a result the server trusts the client. This prevents an arbitrary client from connecting to an App Mesh service, because the client wouldn’t possess a valid certificate.

In this blog post, you’ll learn how to enable mTLS in App Mesh by using certificates derived from AWS Certificate Manager Private Certificate Authority (ACM Private CA). You’ll also learn how to reuse AWS CloudFormation templates, which we make available through a companion open-source project, for configuring App Mesh and ACM Private CA.

You’ll first see how to derive server-side certificates from ACM Private CA into App Mesh internally by using the native integration between the two services. You’ll then see a method and code for installing client-side certificates issued from ACM Private CA into App Mesh; this method is needed because client-side certificates aren’t integrated natively.

You’ll learn how to use AWS Lambda to export a client-side certificate from ACM Private CA and store it in AWS Secrets Manager. You’ll then see Envoy proxies in App Mesh retrieve the certificate from Secrets Manager and use it in an mTLS handshake. The solution is designed to ensure confidentiality of the private key of a client-side certificate, in transit and at rest, as it moves from ACM to Envoy.

The solution described in this blog post simplifies and allows you to automate the configuration and operations of mTLS-enabled App Mesh deployments, because all of the certificates are derived from a single managed private public key infrastructure (PKI) service—ACM Private CA—eliminating the need to run your own private PKI. The solution uses Amazon Elastic Container Services (Amazon ECS) with AWS Fargate as the App Mesh hosting environment, although the design presented here can be applied to any compute environment that is supported by App Mesh.

Solution overview

ACM Private CA provides a highly available managed private PKI service that enables creation of private CA hierarchies—including root and subordinate CAs—without the investment and maintenance costs of operating your own private PKI service. The service allows you to choose among several CA key algorithms and key sizes and makes it easier for you to export and deploy private certificates anywhere by using API-based automation.

App Mesh is a service mesh that provides application-level networking across multiple types of compute infrastructure. It standardizes how your microservices communicate, giving you end-to-end visibility and helping to ensure transport security and high availability for your applications. In order to communicate securely between mesh endpoints, App Mesh directs the Envoy proxy instances that are running within the mesh to use one-way or mutual TLS.

TLS provides authentication, privacy, and data integrity between two communicating endpoints. The authentication in TLS communications is governed by the PKI system. The PKI system allows certificate authorities to issue certificates that are used by clients and servers to prove their identity. The authentication process in TLS happens by exchanging certificates via the TLS handshake protocol. By default, the TLS handshake protocol proves the identity of the server to the client by using X.509 certificates, while the authentication of the client to the server is left to the application layer. This is called one-way TLS. TLS also supports two-way authentication through mTLS. In mTLS, in addition to the one-way TLS server authentication with a certificate, a client presents its certificate and proves possession of the corresponding private key to a server during the TLS handshake.

Example application

The following sections describe one-way and mutual TLS integrations between App Mesh and ACM Private CA in the context of an example application. This example application exposes an API to external clients that returns a text string name of a color—for example, “yellow”. It’s an extension of the Color App that’s used to demonstrate several existing App Mesh examples.

The example application is comprised of two services running in App Mesh—ColorGateway and ColorTeller. An external client request enters the mesh through the ColorGateway service and is proxied to the ColorTeller service. The ColorTeller service responds back to the ColorGateway service with the name of a color. The ColorGateway service proxies the response to the external client. Figure 1 shows the basic design of the application.
 

Figure 1: App Mesh services in the Color App example application

Figure 1: App Mesh services in the Color App example application

The two services are mapped onto the following constructs in App Mesh:

  • ColorGateway is mapped as a Virtual gateway. A virtual gateway in App Mesh allows resources that are outside of a mesh to communicate to resources that are inside the mesh. A virtual gateway represents Envoy deployed by itself. In this example, the virtual gateway represents an Envoy proxy that is running as an Amazon ECS service. This Envoy proxy instance acts as a TLS client, since it initiates TLS connections to the Envoy proxy that is running in the ColorTeller service.
  • ColorTeller is mapped as a Virtual node. A virtual node in App Mesh acts as a logical pointer to a particular task group. In this example, the virtual node—ColorTeller—runs as another Amazon ECS service. The service runs two tasks—an Envoy proxy instance and a ColorTeller application instance. The Envoy proxy instance acts as a TLS server, receiving inbound TLS connections from ColorGateway.

Let’s review running the example application in one-way TLS mode. Although optional, starting with one-way TLS allows you to compare the two methods and establish how to look at certain Envoy proxy statistics to distinguish and verify one-way TLS versus mTLS connections.

For practice, you can deploy the example application project in your own AWS account and perform the steps described in your own test environment.

Note: In both the one-way TLS and mTLS descriptions in the following sections, we’re using a flat certificate hierarchy for demonstration purposes. The root CAs are issuing end-entity certificates. The AWS ACM Private CA best practices recommend that root CAs should only be used to issue certificates for intermediate CAs. When intermediate CAs are involved, your certificate chain has multiple certificates concatenated in it, but the mechanisms are the same as those described here.

One-way TLS in App Mesh using ACM Private CA

Because this is a one-way TLS authentication scenario, you need only one Private CA—ColorTeller—and issue one end-entity certificate from it that’s used as the server-side certificate for the ColorTeller virtual node.

Figure 2, following, shows the architecture for this setup, including notations and color codes for certificates and a step-by-step process that shows how the system is configured and functions. Because this architecture uses a server-side certificate only, you use the native integration between App Mesh and ACM Private CA and don’t need an external mechanism for certificate integration.
 

Figure 2: One-way TLS in App Mesh integrated with ACM Private CA

Figure 2: One-way TLS in App Mesh integrated with ACM Private CA

The steps in Figure 2 are:

Step 1: A Private CA instance—ColorTeller—is created in ACM Private CA. Next, an end-entity certificate is created and signed by the CA. This certificate is used as the server-side certificate in ColorTeller.

Step 2: The CloudFormation templates configure the ColorGateway to validate server certificates against the ColorTeller private CA certificate chain. As the App Mesh endpoints are starting up, the ColorTeller CA certificate trust chain is ingested into the ColorGateway Envoy instance. The TLS configuration for ColorGateway in App Mesh is shown in Figure 3.
 

Figure 3: One-way TLS configuration in the client policy of ColorGateway

Figure 3: One-way TLS configuration in the client policy of ColorGateway

Figure 3 shows that the client policy attributes for outbound transport connections for ColorGateway have been configured as follows:

  • Enforce TLS is set to Enforced. This enforces use of TLS while communicating with backends.
  • TLS validation method is set to AWS Certificate Manager Private Certificate Authority (ACM-PCA hosting). This instructs App Mesh to derive the certificate trust chain from ACM PCA.
  • Certificate is set to the Amazon Resource Name (ARN) of the ColorTeller Private CA, which is the identifier of the certificate trust chain in ACM PCA.

This configuration ensures that ColorGateway makes outbound TLS-only connections towards ColorTeller, extracts the CA trust chain from ACM-PCA, and validates the server certificate presented by the ColorTeller virtual node against the configured CA ARN.

Step 3: The CloudFormation templates configure the ColorTeller virtual node with the ColorTeller end-entity certificate ARN in ACM Private CA. While the App Mesh endpoints are started, the ColorTeller end-entity certificate is ingested into the ColorTeller Envoy instance.

The TLS configuration for the ColorTeller virtual node in App Mesh is shown in Figure 4.
 

Figure 4: One-way TLS configuration in the listener configuration of ColorTeller

Figure 4: One-way TLS configuration in the listener configuration of ColorTeller

Figure 4 shows that various TLS-related attributes are configured as follows:

  • Enable TLS termination is on.
  • Mode is set to Strict to limit connections to TLS only.
  • TLS Certificate method is set to ACM Certificate Manager (ACM) hosting as the source of the end-entity certificate.
  • Certificate is set to ARN of the ColorTeller end-entity certificate.

Note: Figure 4 shows an annotation where the certificate ARN has been superimposed by the cert icon in green color. This icon follows the color convention from Figure 2 and can help you relate how the individual resources are configured to construct the architecture shown in Figure 2. The cert shown (and the associated private key that is not shown) in the diagram is necessary for ColorTeller to run the TLS stack and serve the certificate. The exchange of this material happens over the internal communications between App Mesh and ACM Private CA.

Step 4: The ColorGateway service receives a request from an external client.

Step 5: This step includes multiple sub-steps:

  • The ColorGateway Envoy initiates a one-way TLS handshake towards the ColorTeller Envoy.
  • The ColorTeller Envoy presents its server-side certificate to the ColorGateway Envoy.
  • The ColorGateway Envoy validates the certificate against its configured CA trust chain—the ColorTeller CA trust chain—and the TLS handshake succeeds.

Verifying one-way TLS

To verify that a TLS connection was established and that it is one-way TLS authenticated, run the following command on your bastion host:

$ curl -s http://colorteller.mtls-ec2.svc.cluster.local:9901/stats |grep -E 'ssl.handshake|ssl.no_certificate'

listener.0.0.0.0_15000.ssl.handshake: 1
listener.0.0.0.0_15000.ssl.no_certificate: 1

This command queries the runtime statistics that are maintained in ColorTeller Envoy and filters the output for certain SSL-related counts. The count for ssl.handshake should be one. If the ssl.handshake count is more than one, that means there’s been more than one TLS handshake. The count for ssl.no_certificate should also be one, or equal to the count for ssl.handshake. The ssl.no_certificate count tracks the total successful TLS connections with no client certificate. Since this is a one-way TLS handshake that doesn’t involve client certificates, this count is the same as the count of ssl.handshake.

The preceding statistics verify that a TLS handshake was completed and the authentication was one-way, where the ColorGateway authenticated the ColorTeller but not vice-versa. You’ll see in the next section how the ssl.no_certificate count differs when mTLS is enabled.

Mutual TLS in App Mesh using ACM Private CA

In the one-way TLS discussion in the previous section, you saw that App Mesh and ACM Private CA integration works without needing external enhancements. You also saw that App Mesh retrieved the server-side end-entity certificate in ColorTeller and the root CA trust chain in ColorGateway from ACM Private CA internally, by using the native integration between the two services.

However, a native integration between App Mesh and ACM Private CA isn’t currently available for client-side certificates. Client-side certificates are necessary for mTLS. In this section, you’ll see how you can issue and export client-side certificates from ACM Private CA and ingest them into App Mesh.

The solution uses Lambda to export the client-side certificate from ACM Private CA and store it in Secrets Manager. The solution includes an enhanced startup script embedded in the Envoy image to retrieve the certificate from Secrets Manager and place it on the Envoy file system before the Envoy process is started. The Envoy process reads the certificate, loads it in memory, and uses it in the TLS stack for the client-side certificate exchange of the mTLS handshake.

The choice of Lambda is based on this being an ephemeral workflow that needs to run only during system configuration. You need a short-lived, runtime compute context that lets you run the logic for exporting certificates from ACM Private CA and store them in Secrets Manager. Because this compute doesn’t need to run beyond this step, Lambda is an ideal choice for hosting this logic, for cost and operational effectiveness.

The choice of Secrets Manager for storing certificates is based on the confidentiality requirements of the passphrase that is used for encrypting the private key (PKCS #8) of the certificate. You also need a higher throughput data store that can support secrets retrieval from large meshes. Secrets Manager supports a higher API rate limit than the API for exporting certificates from ACM Private CA, and thus serves as a high-throughput front end for ACM Private CA for serving certificates without compromising data confidentiality.

The resulting architecture is shown in Figure 5. The figure includes notations and color codes for certificates—such as root certificates, endpoint certificates, and private keys—and a step-by-step process showing how the system is configured, started, and functions at runtime. The example uses two CA hierarchies for ColorGateway and ColorTeller to demonstrate an mTLS setup where the client and server belong to separate CA hierarchies but trust each other’s CAs.
 

Figure 5: mTLS in App Mesh integrated with ACM Private CA

Figure 5: mTLS in App Mesh integrated with ACM Private CA

The numbered steps in Figure 5 are:

Step 1: A Private CA instance representing the ColorGateway trust hierarchy is created in ACM Private CA. Next, an end-entity certificate is created and signed by the CA, which is used as the client-side certificate in ColorGateway.

Step 2: Another Private CA instance representing the ColorTeller trust hierarchy is created in ACM Private CA. Next, an end-entity certificate is created and signed by the CA, which is used as the server-side certificate in ColorTeller.

Step 3: As part of running CloudFormation, the Lambda function is invoked. This Lambda function is responsible for exporting the client-side certificate from ACM Private CA and storing it in Secrets Manager. This function begins by requesting a random password from Secrets Manager. This random password is used as the passphrase for encrypting the private key inside ACM Private CA before it’s returned to the function. Generating a random password from Secrets Manager allows you to generate a random password with a specified complexity.

Step 4: The Lambda function issues an export certificate request to ACM, requesting the ColorGateway end-entity certificate. The request conveys the private key passphrase retrieved from Secrets Manager in the previous step so that ACM Private CA can use it to encrypt the private key that’s sent in the response.

Step 5: The ACM Private CA responds to the Lambda function. The response carries the following elements of the ColorGateway end-entity certificate.

{
  'Certificate': '..',
  'CertificateChain': '..',
  'PrivateKey': '..'
}   

Step 6: The Lambda function processes the response that is returned from ACM. It extracts individual fields in the JSON-formatted response and stores them in Secrets Manager. The Lambda function stores the following four values in Secrets Manager:

  • The ColorGateway endpoint certificate
  • The ColorGateway certificate trust chain, which contains the ColorGateway Private CA root certificate
  • The encrypted private key for the ColorGateway end-entity certificate
  • The passphrase that was used to encrypt the private key

Step 7: The App Mesh services—ColorGateway and ColorTeller—are started, which then start their Envoy proxy containers. A custom startup script embedded in the Envoy docker image fetches a certificate from Secrets Manager and places it on the Envoy file system.

Note: App Mesh publishes its own custom Envoy proxy Docker container image that ensures it is fully tested and patched with the latest vulnerability and performance patches. You’ll notice in the example source code that a custom Envoy image is built on top of the base image published by App Mesh. In this solution, we add an Envoy startup script and certain utilities such as AWS Command Line Interface (AWS CLI) and jq to help retrieve the certificate from Secrets Manager and place it on the Envoy file system during Envoy startup.

Step 8: The CloudFormation scripts configure the client policy for mTLS in ColorGateway in App Mesh, as shown in Figure 6. The following attributes are configured:

  • Provide client certificate is enabled. This ensures that the client certificate is exchanged as part of the mTLS handshake.
  • Certificate method is set to Local file hosting so that the certificate is read from the local file system.
  • Certificate chain is set to the path for the file that contains the ColorGateway certificate chain.
  • Private key is set to the path for the file that contains the private key for the ColorGateway certificate.
Figure 6: Client-side mTLS configuration in ColorGateway

Figure 6: Client-side mTLS configuration in ColorGateway

At the end of the custom Envoy startup script described in step 7, the core Envoy process in ColorGateway service is started. It retrieves the ColorTeller CA root certificate from ACM Private CA and configures it internally as a trusted CA. This retrieval happens due to native integration between App Mesh and ACM Private CA. This allows ColorGateway Envoy to validate the certificate presented by ColorTeller Envoy during the TLS handshake.

Step 9: The CloudFormation scripts configure the listener configuration for mTLS in ColorTeller in App Mesh, as shown in Figure 7. The following attributes are configured:

  • Require client certificate is enabled, which enforces mTLS.
  • Validation Method is set to Local file hosting, which causes Envoy to read the certificate from the local file system.
  • Certificate chain is set to the path for the file that contains the ColorGateway certificate chain.
Figure 7: Server-side mTLS configuration in ColorTeller

Figure 7: Server-side mTLS configuration in ColorTeller

At the end of the Envoy startup script described in step 7, the core Envoy process in ColorTeller service is started. It retrieves its own server-side end-entity certificate and corresponding private key from ACM Private CA. This retrieval happens internally, driven by the native integration between App Mesh and ACM Private CA. This allows ColorTeller Envoy to present its server-side certificate to ColorGateway Envoy during the TLS handshake.

The system startup concludes with this step, and the application is ready to process external client requests.

Step 10: The ColorGateway service receives a request from an external client.

Step 11: The ColorGateway Envoy initiates a TLS handshake with the ColorTeller Envoy. During the first half of the TLS handshake protocol, the ColorTeller Envoy presents its server-side certificate to the ColorGateway Envoy. The ColorGateway Envoy validates the certificate. Because the ColorGateway Envoy has been configured with the ColorTeller CA trust chain in step 8, the validation succeeds.

Step 12: During the second half of the TLS handshake, the ColorTeller Envoy requests the ColorGateway Envoy to provide its client-side certificate. This step is what distinguishes an mTLS exchange from a one-way TLS exchange.

The ColorGateway Envoy responds with its end-entity certificate that had been placed on its file system in step 7. The ColorTeller Envoy validates the received certificate with its CA trust chain, which contains the ColorGateway root CA that was placed on its file system (in step 7). The validation succeeds, and so an mTLS session is established.

Verifying mTLS

You can now verify that an mTLS exchange happened by running the following command on your bastion host.

$ curl -s http://colorteller.mtls-ec2.svc.cluster.local:9901/stats |grep -E 'ssl.handshake|ssl.no_certificate'

listener.0.0.0.0_15000.ssl.handshake: 1
listener.0.0.0.0_15000.ssl.no_certificate: 0

The count for ssl.handshake should be one. If the ssl.handshake count is more than one, that means that you’ve gone through more than one TLS handshake. It’s important to note that the count for ssl.no_certificate—the total successful TLS connections with no client certificate—is zero. This shows that mTLS configuration is working as expected. Recall that this count was one or higher—equal to the ssl.handshake count—in the previous section that described one-way TLS. The ssl.no_certificate count being zero indicates that this was an mTLS authenticated connection, where the ColorGateway authenticated the ColorTeller and vice-versa.

Certificate renewal

The ACM Private CA certificates that are imported into App Mesh are not eligible for managed renewal, so an external certificate renewal method is needed. This example solution uses an external renewal method as recommended in Renewing certificates in a private PKI that you can use in your own implementations.

The certificate renewal mechanism can be broken down into six steps, which are outlined in Figure 8.
 

Figure 8: Certificate renewal process in ACM Private CA and App Mesh on ECS integration

Figure 8: Certificate renewal process in ACM Private CA and App Mesh on ECS integration

Here are the steps illustrated in Figure 8:

Step 1: ACM generates an Amazon CloudWatch Events event when a certificate is close to expiring.

Step 2: CloudWatch triggers a Lambda function that is responsible for certificate renewal.

Step 3: The Lambda function renews the certificate in ACM and exports the new certificate by calling ACM APIs.

Step 4: The Lambda function writes the certificate to Secrets Manager.

Step 5: The Lambda function triggers a new service deployment in an Amazon ECS cluster. This will cause the ECS services to go through a graceful update process to acquire a renewed certificate.

Step 6: The Envoy processes in App Mesh fetch the client-side certificate from Secrets Manager using external integration, and the server-side certificate from ACM using native integration.

Conclusion

In this post, you learned a method for enabling mTLS authentication between App Mesh endpoints based on certificates issued by ACM Private CA. mTLS enhances security of App Mesh deployments due to its bidirectional authentication capability. While server-side certificates are integrated natively, you saw how to use Lambda and Secrets Manager to integrate client-side certificates externally. Because ACM Private CA certificates aren’t eligible for managed renewal, you also learned how to implement an external certificate renewal process.

This solution enhances your App Mesh security posture by simplifying configuration of mTLS-enabled App Mesh deployments. It achieves this because all mTLS certificate requirements are met by a single, managed private PKI service—ACM Private CA—which allows you to centrally manage certificates and eliminates the need to run your own private PKI.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Certificate Manager forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Raj Jain

Raj is an engineering leader at Amazon in the FinTech space. He is passionate about building SaaS applications for Amazon internal and external customers using AWS. He is currently working on an AI/ML application in the governance, risk and compliance domain. Raj is a published author in the Bell Labs Technical Journal, has authored 3 IETF standards, and holds 12 patents in internet telephony and applied cryptography. In his spare time, he enjoys the outdoors, cooking, reading, and travel.

Author

Nagmesh Kumar

Nagmesh is a Cloud Architect with the Worldwide Public Sector Professional Services team. He enjoys working with customers to design and implement well-architected solutions in the cloud. He was a researcher who stumbled into IT operations as a database administrator. After spending all day in the cloud, you can spot him in the wild with his family, reading, or gaming.

TLS 1.2 will be required for all AWS FIPS endpoints beginning March 31, 2021

Post Syndicated from Janelle Hopper original https://aws.amazon.com/blogs/security/tls-1-2-required-for-aws-fips-endpoints/

To help you meet your compliance needs, we’re updating all AWS Federal Information Processing Standard (FIPS) endpoints to a minimum of Transport Layer Security (TLS) 1.2. We have already updated over 40 services to require TLS 1.2, removing support for TLS 1.0 and TLS 1.1. Beginning March 31, 2021, if your client application cannot support TLS 1.2, it will result in connection failures. In order to avoid an interruption in service, we encourage you to act now to ensure that you connect to AWS FIPS endpoints at TLS version 1.2. This change does not affect non-FIPS AWS endpoints.

Amazon Web Services (AWS) continues to notify impacted customers directly via their Personal Health Dashboard and email. However, if you’re connecting anonymously to AWS shared resources, such as through a public Amazon Simple Storage Service (Amazon S3) bucket, then you would not have received a notification, as we cannot identify anonymous connections.

Why are you removing TLS 1.0 and TLS 1.1 support from FIPS endpoints?

At AWS, we’re continually expanding the scope of our compliance programs to meet the needs of customers who want to use our services for sensitive and regulated workloads. Compliance programs, including FedRAMP, require a minimum level of TLS 1.2. To help you meet compliance requirements, we’re updating all AWS FIPS endpoints to a minimum of TLS version 1.2 across all AWS Regions. Following this update, you will not be able to use TLS 1.0 and TLS 1.1 for connections to FIPS endpoints.

How can I detect if I am using TLS 1.0 or TLS 1.1?

To detect the use of TLS 1.0 or 1.1, we recommend that you perform code, network, or log analysis. If you are using an AWS Software Developer Kit (AWS SDK) or Command Line Interface (CLI), we have provided hyperlinks to detailed guidance in our previous TLS blog post about how to examine your client application code and properly configure the TLS version used.

When the application source code is unavailable, you can use a network tool, such as TCPDump (Linux) or Wireshark (Linux or Windows), to analyze your network traffic to find the TLS versions you’re using when connecting to AWS endpoints. For a detailed example of using these tools, see the example, below.

If you’re using Amazon S3, you can also use your access logs to view the TLS connection information for these services and identify client connections that are not at TLS 1.2.

What is the most common use of TLS 1.0 or TLS 1.1?

The most common client applications that use TLS 1.0 or 1.1 are Microsoft .NET Framework versions earlier than 4.6.2. If you use the .NET Framework, please confirm you are using version 4.6.2 or later. For information on how to update and configure .NET Framework to support TLS 1.2, see How to enable TLS 1.2 on clients.

How do I know if I am using an AWS FIPS endpoint?

All AWS services offer TLS 1.2 encrypted endpoints that you can use for all API calls. Some AWS services also offer FIPS 140-2 endpoints for customers who need to use FIPS-validated cryptographic libraries to connect to AWS services. You can check our list of all AWS FIPS endpoints and compare the list to your application code, configuration repositories, DNS logs, or other network logs.

EXAMPLE: TLS version detection using a packet capture

To capture the packets, multiple online sources, such as this article, provide guidance for setting up TCPDump on a Linux operating system. On a Windows operating system, the Wireshark tool provides packet analysis capabilities and can be used to analyze packets captured with TCPDump or it can also directly capture packets.

In this example, we assume there is a client application with the local IP address 10.25.35.243 that is making API calls to the CloudWatch FIPS API endpoint in the AWS GovCloud (US-West) Region. To analyze the traffic, first we look up the endpoint URL in the AWS FIPS endpoint list. In our example, the endpoint URL is monitoring.us-gov-west-1.amazonaws.com. Then we use NSLookup to find the IP addresses used by this FIPS endpoint.

Figure 1: Use NSLookup to find the IP addresses used by this FIPS endpoint

Figure 1: Use NSLookup to find the IP addresses used by this FIPS endpoint

Wireshark is then used to open the captured packets, and filter to just the packets with the relevant IP address. This can be done automatically by selecting one of the packets in the upper section, and then right-clicking to use the Conversation filter/IPv4 option.

After the results are filtered to only the relevant IP addresses, the next step is to find the packet whose description in the Info column is Client Hello. In the lower packet details area, expand the Transport Layer Security section to find the version, which in this example is set to TLS 1.0 (0x0301). This indicates that the client only supports TLS 1.0 and must be modified to support a TLS 1.2 connection.

Figure 2: After the conversation filter has been applied, select the Client Hello packet in the top pane. Expand the Transport Layer Security section in the lower pane to view the packet details and the TLS version.

Figure 2: After the conversation filter has been applied, select the Client Hello packet in the top pane. Expand the Transport Layer Security section in the lower pane to view the packet details and the TLS version.

Figure 3 shows what it looks like after the client has been updated to support TLS 1.2. This second packet capture confirms we are sending TLS 1.2 (0x0303) in the Client Hello packet.

Figure 3: The client TLS has been updated to support TLS 1.2

Figure 3: The client TLS has been updated to support TLS 1.2

Is there more assistance available?

If you have any questions or issues, you can start a new thread on one of the AWS forums, or contact AWS Support or your technical account manager (TAM). The AWS support tiers cover development and production issues for AWS products and services, along with other key stack components. AWS Support doesn’t include code development for client applications.

Additionally, you can use AWS IQ to find, securely collaborate with, and pay AWS-certified third-party experts for on-demand assistance to update your TLS client components. Visit the AWS IQ page for information about how to submit a request, get responses from experts, and choose the expert with the right skills and experience. Log in to your console and select Get Started with AWS IQ to start a request.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Janelle Hopper

Janelle is a Senior Technical Program Manager in AWS Security with over 15 years of experience in the IT security field. She works with AWS services, infrastructure, and administrative teams to identify and drive innovative solutions that improve AWS’ security posture.

Author

Daniel Salzedo

Daniel is a Senior Specialist Technical Account Manager – Security. He has over 25 years of professional experience in IT in industries as diverse as video game development, manufacturing, banking and used car sales. He loves working with our wonderful AWS customers to help them solve their complex security challenges at scale.

Over 40 services require TLS 1.2 minimum for AWS FIPS endpoints

Post Syndicated from Janelle Hopper original https://aws.amazon.com/blogs/security/over-40-services-require-tls-1-2-minimum-for-aws-fips-endpoints/

In a March 2020 blog post, we told you about work Amazon Web Services (AWS) was undertaking to update all of our AWS Federal Information Processing Standard (FIPS) endpoints to a minimum of Transport Layer Security (TLS) 1.2 across all AWS Regions. Today, we’re happy to announce that over 40 services have been updated and now require TLS 1.2:

These services no longer support using TLS 1.0 or TLS 1.1 on their FIPS endpoints. To help you meet your compliance needs, we are updating all AWS FIPS endpoints to a minimum of TLS 1.2 across all Regions. We will continue to update our services to support only TLS 1.2 or later on AWS FIPS endpoints, which you can check on the AWS FIPS webpage. This change doesn’t affect non-FIPS AWS endpoints.

When you make a connection from your client application to an AWS service endpoint, the client provides its TLS minimum and TLS maximum versions. The AWS service endpoint will always select the maximum version offered.

What is TLS?

TLS is a cryptographic protocol designed to provide secure communication across a computer network. API calls to AWS services are secured using TLS.

What is FIPS 140-2?

The FIPS 140-2 is a US and Canadian government standard that specifies the security requirements for cryptographic modules that protect sensitive information.

What are AWS FIPS endpoints?

All AWS services offer TLS 1.2 encrypted endpoints that can be used for all API calls. Some AWS services also offer FIPS 140-2 endpoints for customers who need to use FIPS validated cryptographic libraries to connect to AWS services.

Why are we upgrading to TLS 1.2?

Our upgrade to TLS 1.2 across all Regions reflects our ongoing commitment to help customers meet their compliance needs.

Is there more assistance available to help verify or update client applications?

If you’re using an AWS software development kit (AWS SDK), you can find information about how to properly configure the minimum and maximum TLS versions for your clients in the following AWS SDK topics:

You can also visit Tools to Build on AWS and browse by programming language to find the relevant SDK. AWS Support tiers cover development and production issues for AWS products and services, along with other key stack components. AWS Support doesn’t include code development for client applications.

If you have any questions or issues, you can start a new thread on one of the AWS forums, or contact AWS Support or your technical account manager (TAM).

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Janelle Hopper

Janelle Hopper is a Senior Technical Program Manager in AWS Security with over 15 years of experience in the IT security field. She works with AWS services, infrastructure, and administrative teams to identify and drive innovative solutions that improve AWS’ security posture.

Author

Marta Taggart

Marta is a Seattle-native and Senior Program Manager in AWS Security, where she focuses on privacy, content development, and educational programs. Her interest in education stems from two years she spent in the education sector while serving in the Peace Corps in Romania. In her free time, she’s on a global hunt for the perfect cup of coffee.

KEMTLS: Post-quantum TLS without signatures

Post Syndicated from Sofía Celi original https://blog.cloudflare.com/kemtls-post-quantum-tls-without-signatures/

KEMTLS: Post-quantum TLS without signatures

KEMTLS: Post-quantum TLS without signatures

The Transport Layer Security protocol (TLS), which secures most Internet connections, has mainly been a protocol consisting of a key exchange authenticated by digital signatures used to encrypt data at transport[1]. Even though it has undergone major changes since 1994, when SSL 1.0 was introduced by Netscape, its main mechanism has remained the same. The key exchange was first based on RSA, and later on traditional Diffie-Hellman (DH) and Elliptic-curve Diffie-Hellman (ECDH). The signatures used for authentication have almost always been RSA-based, though in recent years other kinds of signatures have been adopted, mainly ECDSA and Ed25519. This recent change to elliptic curve cryptography in both at the key exchange and at the signature level has resulted in considerable speed and bandwidth benefits in comparison to traditional Diffie-Hellman and RSA.

TLS is the main protocol that protects the connections we use everyday. It’s everywhere: we use it when we buy products online, when we register for a newsletter — when we access any kind of website, IoT device, API for mobile apps and more, really. But with the imminent threat of the arrival of quantum computers (a threat that seems to be getting closer and closer), we need to reconsider the future of TLS once again. A wide-scale post-quantum experiment was carried out by Cloudflare and Google: two post-quantum key exchanges were integrated into our TLS stack and deployed at our edge servers as well as in Chrome Canary clients. The goal of that experiment was to evaluate the performance and feasibility of deployment of two post-quantum key exchanges in TLS.

Similar experiments have been proposed for introducing post-quantum algorithms into the TLS handshake itself. Unfortunately, it seems infeasible to replace both the key exchange and signature with post-quantum primitives, because post-quantum cryptographic primitives are bigger, or slower (or both), than their predecessors. The proposed algorithms under consideration in the NIST post-quantum standardization process use mathematical objects that are larger than the ones used for elliptic curves, traditional Diffie-Hellman, or RSA. As a result, the overall size of public keys, signatures and key exchange material is much bigger than those from elliptic curves, Diffie-Hellman, or RSA.

How can we solve this problem? How can we use post-quantum algorithms as part of the TLS handshake without making the material too big to be transmitted? In this blogpost, we will introduce a new mechanism for making this happen. We’ll explain how it can be integrated into the handshake and we’ll cover implementation details. The key observation in this mechanism is that, while post-quantum algorithms have bigger communication size than their predecessors, post-quantum key exchanges have somewhat smaller sizes than post-quantum signatures, so we can try to replace signatures with key exchanges in some places to save space.  We will only focus on the TLS 1.3 handshake as it is the TLS version that should be currently used.

Past experiments: making the TLS 1.3 handshake post-quantum

KEMTLS: Post-quantum TLS without signatures

TLS 1.3 was introduced in August 2018, and it brought many security and performance improvements (notably, having only one round-trip to complete the handshake). But TLS 1.3 is designed for a world with classical computers, and some of its functionality will be broken by quantum computers when they do arrive.

The primary goal of TLS 1.3 is to provide authentication (the server side of the channel is always authenticated, the client side is optionally authenticated), confidentiality, and integrity by using a handshake protocol and a record protocol. The handshake protocol, the one of interest for us today, establishes the cryptographic parameters for securing and authenticating a connection. It can be thought of as having three main phases, as defined in RFC8446:

–  The Parameter Negotiation phase (referred to as ‘Server Parameters’ in RFC8446), which establishes other handshake parameters (whether the client is authenticated, application-layer protocol support, etc).

–  The Key Exchange phase, which establishes shared keying material and selects the cryptographic parameters to be used. Everything after this phase will be encrypted.

–  The Authentication phase, which authenticates the server (and, optionally, the client) and provides key confirmation and handshake integrity.

The main idea of past experiments that introduced post-quantum algorithms into the handshake of TLS 1.3 was to use them in place of classical algorithms by advertising them as part of the supported groups[2] and key share[3] extensions, and, therefore, establishing with them the negotiated connection parameters. Key encapsulation mechanisms (KEMs) are an abstraction of the basic key exchange primitive, and were used to generate the shared secrets. When using a pre-shared key, its symmetric algorithms can be easily replaced by post-quantum KEMs as well; and, in the case of password-authenticated TLS, some ideas have been proposed on how to use post-quantum algorithms with them.

Most of the above ideas only provide what is often defined as ‘transitional security’, because its main focus is to provide quantum-resistant confidentiality, and not to take quantum-resistant authentication into account. It is possible to use post-quantum signatures for TLS authentication, but the post-quantum signatures are larger than traditional ones. Furthermore, it is worth noting that using post-quantum signatures is much more expensive than using post-quantum KEMs.

We can estimate the impact of such a replacement on network traffic by simply looking at the sum of the cryptographic objects that are transmitted during the handshake. A typical TLS 1.3 handshake using elliptic curve X25519 and RSA-2048 would transmit 1,376 bytes, which would correspond to the public keys for key exchange, the certificate, the signature of the handshake, and the certificate chain. If we were to replace X25519 by the post-quantum KEM Kyber512 and RSA by the post-quantum signature Dilithium II, two of the more efficient proposals, the size transmitted data would increase to 10,036 bytes[4]. The increase is mostly due to the size of the post-quantum signature algorithm.

The question then is: how can we achieve full post-quantum security and give a handshake that is efficient to be used?

A more efficient proposal: KEMTLS

There is a long history of other mechanisms, besides signatures, being used for authentication. Modern protocols, such as the Signal protocol, the Noise framework, or WireGuard, rely on key exchange mechanisms for authentication; but they are unsuitable for the TLS 1.3 case as they expect the long-term key material to be known in advance by the interested parties.

The OPTLS proposal by Krawczyk and Wee authenticates the TLS handshake without signatures by using a non-interactive key exchange (NIKE). However, the only somewhat efficient construction for a post-quantum NIKE is CSIDH, the security of which is the subject of an ongoing debate. But we can build on this idea, and use KEMs for authentication. KEMTLS, the current proposed experiment, replaces the handshake signature by a post-quantum KEM key exchange. It was designed and introduced by Peter Schwabe, Douglas Stebila and Thom Wiggers in the publication ‘Post-Quantum TLS Without Handshake Signatures’.

KEMTLS, therefore, achieves the same goals as TLS 1.3 (authentication, confidentiality and integrity) in the face of quantum computers. But there’s one small difference compared to the TLS 1.3 handshake. KEMTLS allows the client to send encrypted application data in the second client-to-server TLS message flow when client authentication is not required, and in the third client-to-server TLS message flow when mutual authentication is required. Note that with TLS 1.3, the server is able to send encrypted and authenticated application data in its first response message (although, in most uses of TLS 1.3, this feature is not actually used). With KEMTLS, when client authentication is not required, the client is able to send its first encrypted application data after the same number of handshake round trips as in TLS 1.3.

Intuitively, the handshake signature in TLS 1.3 proves possession of the private key corresponding to the public key certified in the TLS 1.3 server certificate. For these signature schemes, this is the straightforward way to prove possession; another way to prove possession is through key exchanges. By carefully considering the key derivation sequence, a server can decrypt any messages sent by the client only if it holds the private key corresponding to the certified public key. Therefore, implicit authentication is fulfilled. It is worth noting that KEMTLS still relies on signatures by certificate authorities to authenticate the long-term KEM keys.

With KEMTLS, application data transmitted during the handshake is implicitly authenticated rather than explicitly (as in TLS 1.3), and has slightly weaker downgrade resilience and forward secrecy; but full downgrade resilience and forward secrecy are achieved once the KEMTLS handshake completes.

KEMTLS: Post-quantum TLS without signatures

By replacing the handshake signature by a KEM key exchange, we reduce the size of the data transmitted in the example handshake to 8,344 bytes, using Kyber512 and Dilithium II — a significant reduction. We can reduce the handshake size even for algorithms such as the NTRU-assumption based KEM NTRU and signature algorithm Falcon, which have a less-pronounced size gap. Typically, KEM operations are computationally much lighter than signing operations, which makes the reduction even more significant.

KEMTLS was presented at ACM CCS 2020. You can read more about its details in the paper. It was initially implemented in the RustTLS library by Thom Wiggers using optimized C/assembly implementations of the post-quantum algorithms provided by the PQClean and Open Quantum Safe projects.

Cloudflare and KEMTLS: the implementation

As part of our effort to show that TLS can be completely post-quantum safe, we implemented the full KEMTLS handshake in Golang’s TLS 1.3 suite. The implementation was done in several steps:

  1. We first needed to clone our own version of Golang, so we could add different post-quantum algorithms to it. You can find our own version here. This code gets constantly updated with every release of Golang, following these steps.
  2. We needed to implement post-quantum algorithms in Golang, which we did on our own cryptographic library, CIRCL.
  3. As we cannot force certificate authorities to use certificates with long-term post-quantum KEM keys, we decided to use Delegated Credentials. A delegated credential is a short-lasting key that the certificate’s owner has delegated for use in TLS. Therefore, they can be used for post-quantum KEM keys. See its implementation in our Golang code here.
  4. We implemented mutual auth (client and server authentication) KEMTLS by using Delegated Credentials for the authentication process. See its implementation in our Golang code here. You can also check its test for an overview of how it works.

Implementing KEMTLS was a straightforward process, although it did require changes to the way Golang handles a TLS 1.3 handshake and how the key schedule works.

A “regular” TLS 1.3 handshake in Golang (from the server perspective) looks like this:

func (hs *serverHandshakeStateTLS13) handshake() error {
    c := hs.c

    // For an overview of the TLS 1.3 handshake, see RFC 8446, Section 2.
    if err := hs.processClientHello(); err != nil {
   	 return err
    }
    if err := hs.checkForResumption(); err != nil {
   	 return err
    }
    if err := hs.pickCertificate(); err != nil {
   	 return err
    }
    c.buffering = true
    if err := hs.sendServerParameters(); err != nil {
   	 return err
    }
    if err := hs.sendServerCertificate(); err != nil {
   	 return err
    }
    if err := hs.sendServerFinished(); err != nil {
   	 return err
    }
    // Note that at this point we could start sending application data without
    // waiting for the client's second flight, but the application might not
    // expect the lack of replay protection of the ClientHello parameters.
    if _, err := c.flush(); err != nil {
   	 return err
    }
    if err := hs.readClientCertificate(); err != nil {
   	 return err
    }
    if err := hs.readClientFinished(); err != nil {
   	 return err
    }

    atomic.StoreUint32(&c.handshakeStatus, 1)

    return nil
}

We had to interrupt the process when the server sends the Certificate (sendServerCertificate()) in order to send the KEMTLS specific messages. In the same way, we had to add the appropriate KEM TLS messages to the client’s handshake. And, as we didn’t want to change so much the way Golang handles TLS 1.3, we only added one new constant to the configuration that can be used by a server in order to ask for the Client’s Certificate (the constant is serverConfig.ClientAuth = RequestClientKEMCert).

The implementation is easy to work with: if a delegated credential or a certificate has a public key of a supported post-quantum KEM algorithm, the handshake will proceed with KEMTLS. If the server requests a Client KEMTLS Certificate, the handshake will use client KEMTLS authentication.

Running the Experiment

So, what’s next? We’ll take the code we have produced and run it on actual Cloudflare infrastructure to measure how efficiently it works.

Thanks

Many thanks to everyone involved in the project: Chris Wood, Armando Faz-Hernández, Thom Wiggers, Bas Westerbaan, Peter Wu, Peter Schwabe, Goutam Tamvada, Douglas Stebila, Thibault Meunier, and the whole Cloudflare Research team.

1It is worth noting that the RSA key transport in TLS ≤1.2 has the server only authenticated by RSA public key encryption, although the server’s RSA public key is certified using RSA signatures by Certificate Authorities.
2An extension used by the client to indicate which named groups -Elliptic Curve Groups, Finite Field Groups- it supports for key exchange.
3An extension which contains the endpoint’s cryptographic parameters.
4These numbers, as it is noted in the paper, are based on the round-2 submissions.

Good-bye ESNI, hello ECH!

Post Syndicated from Christopher Patton original https://blog.cloudflare.com/encrypted-client-hello/

Good-bye ESNI, hello ECH!

Good-bye ESNI, hello ECH!

Most communication on the modern Internet is encrypted to ensure that its content is intelligible only to the endpoints, i.e., client and server. Encryption, however, requires a key and so the endpoints must agree on an encryption key without revealing the key to would-be attackers. The most widely used cryptographic protocol for this task, called key exchange, is the Transport Layer Security (TLS) handshake.

In this post we’ll dive into Encrypted Client Hello (ECH), a new extension for TLS that promises to significantly enhance the privacy of this critical Internet protocol. Today, a number of privacy-sensitive parameters of the TLS connection are negotiated in the clear. This leaves a trove of metadata available to network observers, including the endpoints’ identities, how they use the connection, and so on.

ECH encrypts the full handshake so that this metadata is kept secret. Crucially, this closes a long-standing privacy leak by protecting the Server Name Indication (SNI) from eavesdroppers on the network. Encrypting the SNI secret is important because it is the clearest signal of which server a given client is communicating with. However, and perhaps more significantly, ECH also lays the groundwork for adding future security features and performance enhancements to TLS while minimizing their impact on the privacy of end users.

ECH is the product of close collaboration, facilitated by the IETF, between academics and the tech industry leaders, including Cloudflare, our friends at Fastly and Mozilla (both of whom are the affiliations of co-authors of the standard), and many others. This feature represents a significant upgrade to the TLS protocol, one that builds on bleeding edge technologies, like DNS-over-HTTPS, that are only now coming into their own. As such, the protocol is not yet ready for Internet-scale deployment. This article is intended as a sign post on the road to full handshake encryption.

Background

The story of TLS is the story of the Internet. As our reliance on the Internet has grown, so the protocol has evolved to address ever-changing operational requirements, use cases, and threat models. The client and server don’t just exchange a key: they negotiate a wide variety of features and parameters: the exact method of key exchange; the encryption algorithm; who is authenticated and how; which application layer protocol to use after the handshake; and much, much more. All of these parameters impact the security properties of the communication channel in one way or another.

SNI is a prime example of a parameter that impacts the channel’s security. The SNI extension is used by the client to indicate to the server the website it wants to reach. This is essential for the modern Internet, as it’s common nowadays for many origin servers to sit behind a single TLS operator. In this setting, the operator uses the SNI to determine who will authenticate the connection: without it, there would be no way of knowing which TLS certificate to present to the client. The problem is that SNI leaks to the network the identity of the origin server the client wants to connect to, potentially allowing eavesdroppers to infer a lot of information about their communication. (Of course, there are other ways for a network observer to identify the origin — the origin’s IP address, for example. But co-locating with other origins on the same IP address makes it much harder to use this metric to determine the origin than it is to simply inspect the SNI.)

Although protecting SNI is the impetus for ECH, it is by no means the only privacy-sensitive handshake parameter that the client and server negotiate. Another is the ALPN extension, which is used to decide which application-layer protocol to use once the TLS connection is established. The client sends the list of applications it supports — whether it’s HTTPS, email, instant messaging, or the myriad other applications that use TLS for transport security — and the server selects one from this list, and sends its selection to the client. By doing so, the client and server leak to the network a clear signal of their capabilities and what the connection might be used for.

Some features are so privacy-sensitive that their inclusion in the handshake is a non-starter. One idea that has been floated is to replace the key exchange at the heart of TLS with password-authenticated key-exchange (PAKE). This would allow password-based authentication to be used alongside (or in lieu of) certificate-based authentication, making TLS more robust and suitable for a wider range of applications. The privacy issue here is analogous to SNI: servers typically associate a unique identifier to each client (e.g., a username or email address) that is used to retrieve the client’s credentials; and the client must, somehow, convey this identity to the server during the course of the handshake. If sent in the clear, then this personally identifiable information would be easily accessible to any network observer.

A necessary ingredient for addressing all of these privacy leaks is handshake encryption, i.e., the encryption of handshake messages in addition to application data. Sounds simple enough, but this solution presents another problem: how do the client and server pick an encryption key if, after all, the handshake is itself a means of exchanging a key? Some parameters must be sent in the clear, of course, so the goal of ECH is to encrypt all handshake parameters except those that are essential to completing the key exchange.

In order to understand ECH and the design decisions underpinning it, it helps to understand a little bit about the history of handshake encryption in TLS.

Handshake encryption in TLS

TLS had no handshake encryption at all prior to the latest version, TLS 1.3. In the wake of the Snowden revelations in 2013, the IETF community began to consider ways of countering the threat that mass surveillance posed to the open Internet. When the process of standardizing TLS 1.3 began in 2014, one of its design goals was to encrypt as much of the handshake as possible. Unfortunately, the final standard falls short of full handshake encryption, and several parameters, including SNI, are still sent in the clear. Let’s take a closer look to see why.

The TLS 1.3 protocol flow is illustrated in Figure 1. Handshake encryption begins as soon as the client and server compute a fresh shared secret. To do this, the client sends a key share in its ClientHello message, and the server responds in its ServerHello with its own key share. Having exchanged these shares, the client and server can derive a shared secret. Each subsequent handshake message is encrypted using the handshake traffic key derived from the shared secret. Application data is encrypted using a different key, called the application traffic key, which is also derived from the shared secret. These derived keys have different security properties: to emphasize this, they are illustrated with different colors.

The first handshake message that is encrypted is the server’s EncryptedExtensions. The purpose of this message is to protect the server’s sensitive handshake parameters, including the server’s ALPN extension, which contains the application selected from the client’s ALPN list. Key-exchange parameters are sent unencrypted in the ClientHello and ServerHello.

Good-bye ESNI, hello ECH!
Figure 1: The TLS 1.3 handshake.

All of the client’s handshake parameters, sensitive or not, are sent in the ClientHello. Looking at Figure 1, you might be able to think of ways of reworking the handshake so that some of them can be encrypted, perhaps at the cost of additional latency (i.e., more round trips over the network). However, extensions like SNI create a kind of “chicken-and-egg” problem.

The client doesn’t encrypt anything until it has verified the server’s identity (this is the job of the Certificate and CertificateVerify messages) and the server has confirmed that it knows the shared secret (the job of the Finished message). These measures ensure the key exchange is authenticated, thereby preventing monster-in-the-middle (MITM) attacks in which the adversary impersonates the server to the client in a way that allows it to decrypt messages sent by the client.  Because SNI is needed by the server to select the certificate, it needs to be transmitted before the key exchange is authenticated.

In general, ensuring confidentiality of handshake parameters used for authentication is only possible if the client and server already share an encryption key. But where might this key come from?

Full handshake encryption in the early days of TLS 1.3. Interestingly, full handshake encryption was once proposed as a core feature of TLS 1.3. In early versions of the protocol (draft-10, circa 2015), the server would offer the client a long-lived public key during the handshake, which the client would use for encryption in subsequent handshakes. (This design came from a protocol called OPTLS, which in turn was borrowed from the original QUIC proposal.) Called “0-RTT”, the primary purpose of this mode was to allow the client to begin sending application data prior to completing a handshake. In addition, it would have allowed the client to encrypt its first flight of handshake messages following the ClientHello, including its own EncryptedExtensions, which might have been used to protect the client’s sensitive handshake parameters.

Ultimately this feature was not included in the final standard (RFC 8446, published in 2018), mainly because its usefulness was outweighed by its added complexity. In particular, it does nothing to protect the initial handshake in which the client learns the server’s public key. Parameters that are required for server authentication of the initial handshake, like SNI, would still be transmitted in the clear.

Nevertheless, this scheme is notable as the forerunner of other handshake encryption mechanisms, like ECH, that use public key encryption to protect sensitive ClientHello parameters. The main problem these mechanisms must solve is key distribution.

Before ECH there was (and is!) ESNI

The immediate predecessor of ECH was the Encrypted SNI (ESNI) extension. As its name implies, the goal of ESNI was to provide confidentiality of the SNI. To do so, the client would encrypt its SNI extension under the server’s public key and send the ciphertext to the server. The server would attempt to decrypt the ciphertext using the secret key corresponding to its public key. If decryption were to succeed, then the server would proceed with the connection using the decrypted SNI. Otherwise, it would simply abort the handshake. The high-level flow of this simple protocol is illustrated in Figure 2.

Good-bye ESNI, hello ECH!
Figure 2: The TLS 1.3 handshake with the ESNI extension. It is identical to the TLS 1.3 handshake, except the SNI extension has been replaced with ESNI.

For key distribution, ESNI relied on another critical protocol: Domain Name Service (DNS). In order to use ESNI to connect to a website, the client would piggy-back on its standard A/AAAA queries a request for a TXT record with the ESNI public key. For example, to get the key for crypto.dance, the client would request the TXT record of _esni.crypto.dance:

$ dig _esni.crypto.dance TXT +short
"/wGuNThxACQAHQAgXzyda0XSJRQWzDG7lk/r01r1ZQy+MdNxKg/mAqSnt0EAAhMBAQQAAAAAX67XsAAAAABftsCwAAA="

The base64-encoded blob contains an ESNI public key and related parameters such as the encryption algorithm.

But what’s the point of encrypting SNI if we’re just going to leak the server name to network observers via a plaintext DNS query? Deploying ESNI this way became feasible with the introduction of DNS-over-HTTPS (DoH), which enables encryption of DNS queries to resolvers that provide the DoH service (1.1.1.1 is an example of such a service.). Another crucial feature of DoH is that it provides an authenticated channel for transmitting the ESNI public key from the DoH server to the client. This prevents cache-poisoning attacks that originate from the client’s local network: in the absence of DoH, a local attacker could prevent the client from offering the ESNI extension by returning an empty TXT record, or coerce the client into using ESNI with a key it controls.

While ESNI took a significant step forward, it falls short of our goal of achieving full handshake encryption. Apart from being incomplete — it only protects SNI — it is vulnerable to a handful of sophisticated attacks, which, while hard to pull off, point to theoretical weaknesses in the protocol’s design that need to be addressed.

ESNI was deployed by Cloudflare and enabled by Firefox, on an opt-in basis, in 2018, an  experience that laid bare some of the challenges with relying on DNS for key distribution. Cloudflare rotates its ESNI key every hour in order to minimize the collateral damage in case a key ever gets compromised. DNS artifacts are sometimes cached for much longer, the result of which is that there is a decent chance of a client having a stale public key. While Cloudflare’s ESNI service tolerates this to a degree, every key must eventually expire. The question that the ESNI protocol left open is how the client should proceed if decryption fails and it can’t access the current public key, via DNS or otherwise.

Another problem with relying on DNS for key distribution is that several endpoints might be authoritative for the same origin server, but have different capabilities. For example, a request for the A record of “example.com” might return one of two different IP addresses, each operated by a different CDN. The TXT record for “_esni.example.com” would contain the public key for one of these CDNs, but certainly not both. The DNS protocol does not provide a way of atomically tying together resource records that correspond to the same endpoint. In particular, it’s possible for a client to inadvertently offer the ESNI extension to an endpoint that doesn’t support it, causing the handshake to fail. Fixing this problem requires changes to the DNS protocol. (More on this below.)

The future of ESNI. In the next section, we’ll describe the ECH specification and how it addresses the shortcomings of ESNI. Despite its limitations, however, the practical privacy benefit that ESNI provides is significant. Cloudflare intends to continue its support for ESNI until ECH is production-ready.

The ins and outs of ECH

The goal of ECH is to encrypt the entire ClientHello, thereby closing the gap left in TLS 1.3 and ESNI by protecting all privacy-sensitive handshake-parameters. Similar to ESNI, the protocol uses a public key, distributed via DNS and obtained using DoH, for encryption during the client’s first flight. But ECH has improvements to key distribution that make the protocol more robust to DNS cache inconsistencies. Whereas the ESNI server aborts the connection if decryption fails, the ECH server attempts to complete the handshake and supply the client with a public key it can use to retry the connection.

But how can the server complete the handshake if it’s unable to decrypt the ClientHello? As illustrated in Figure 3, the ECH protocol actually involves two ClientHello messages: the ClientHelloOuter, which is sent in the clear, as usual; and the ClientHelloInner, which is encrypted and sent as an extension of the ClientHelloOuter. The server completes the handshake with just one of these ClientHellos: if decryption succeeds, then it proceeds with the ClientHelloInner; otherwise, it proceeds with the ClientHelloOuter.

Good-bye ESNI, hello ECH!
Figure 3: The TLS 1.3 handshake with the ECH extension.

The ClientHelloInner is composed of the handshake parameters the client wants to use for the connection. This includes sensitive values, like the SNI of the origin server it wants to reach (called the backend server in ECH parlance), the ALPN list, and so on. The ClientHelloOuter, while also a fully-fledged ClientHello message, is not used for the intended connection. Instead, the handshake is completed by the ECH service provider itself (called the client-facing server), signaling to the client that its intended destination couldn’t be reached due to decryption failure. In this case, the service provider also sends along the correct ECH public key with which the client can retry handshake, thereby “correcting” the client’s configuration. (This mechanism is similar to how the server distributed its public key for 0-RTT mode in the early days of TLS 1.3.)

At a minimum, both ClientHellos must contain the handshake parameters that are required for a server-authenticated key-exchange. In particular, while the ClientHelloInner contains the real SNI, the ClientHelloOuter also contains an SNI value, which the client expects to verify in case of ECH decryption failure (i.e., the client-facing server). If the connection is established using the ClientHelloOuter, then the client is expected to immediately abort the connection and retry the handshake with the public key provided by the server. It’s not necessary that the client specify an ALPN list in the ClientHelloOuter, nor any other extension used to guide post-handshake behavior. All of these parameters are encapsulated by the encrypted ClientHelloInner.

This design resolves — quite elegantly, I think — most of the challenges for securely deploying handshake encryption encountered by earlier mechanisms. Importantly, the design of ECH was not conceived in a vacuum. The protocol reflects the diverse perspectives of the IETF community, and its development dovetails with other IETF standards that are crucial to the success of ECH.

The first is an important new DNS feature known as the HTTPS resource record type. At a high level, this record type is intended to allow multiple HTTPS endpoints that are authoritative for the same domain name to advertise different capabilities for TLS. This makes it possible to rely on DNS for key distribution, resolving one of the deployment challenges uncovered by the initial ESNI deployment. For a deep dive into this new record type and what it means for the Internet more broadly, check out Alessandro Ghedini’s recent blog post on the subject.

The second is the CFRG’s Hybrid Public Key Encryption (HPKE) standard, which specifies an extensible framework for building public key encryption schemes suitable for a wide variety of applications. In particular, ECH delegates all of the details of its handshake encryption mechanism to HPKE, resulting in a much simpler and easier-to-analyze specification. (Incidentally, HPKE is also one of the main ingredients of Oblivious DNS-over-HTTPS.

The road ahead

The current ECH specification is the culmination of a multi-year collaboration. At this point, the overall design of the protocol is fairly stable. In fact, the next draft of the specification will be the first to be targeted for interop testing among implementations. Still, there remain a number of details that need to be sorted out. Let’s end this post with a brief overview of the road ahead.

Resistance to traffic analysis

Ultimately, the goal of ECH is to ensure that TLS connections made to different origin servers behind the same ECH service provider are indistinguishable from one another. In other words, when you connect to an origin behind, say, Cloudflare, no one on the network between you and Cloudflare should be able to discern which origin you reached, or which privacy-sensitive handshake-parameters you and the origin negotiated. Apart from an immediate privacy boost, this property, if achieved, paves the way for the deployment of new features for TLS without compromising privacy.

Encrypting the ClientHello is an important step towards achieving this goal, but we need to do a bit more. An important attack vector we haven’t discussed yet is traffic analysis. This refers to the collection and analysis of properties of the communication channel that betray part of the ciphertext’s contents, but without cracking the underlying encryption scheme. For example, the length of the encrypted ClientHello might leak enough information about the SNI for the adversary to make an educated guess as to its value (this risk is especially high for domain names that are either particularly short or particularly long). It is therefore crucial that the length of each ciphertext is independent of the values of privacy-sensitive parameters. The current ECH specification provides some mitigations, but their coverage is incomplete. Thus, improving ECH’s resistance to traffic analysis is an important direction for future work.

The spectre of ossification

An important open question for ECH is the impact it will have on network operations.

One of the lessons learned from the deployment of TLS 1.3 is that upgrading a core Internet protocol can trigger unexpected network behavior. Cloudflare was one of the first major TLS operators to deploy TLS 1.3 at scale; when browsers like Firefox and Chrome began to enable it on an experimental basis, they observed a significantly higher rate of connection failures compared to TLS 1.2. The root cause of these failures was network ossification, i.e., the tendency of middleboxes — network appliances between clients and servers that monitor and sometimes intercept traffic — to write software that expects traffic to look and behave a certain way. Changing the protocol before middleboxes had the chance to update their software led to middleboxes trying to parse packets they didn’t recognize, triggering software bugs that, in some instances, caused connections to be dropped completely.

This problem was so widespread that, instead of waiting for network operators to update their software, the design of TLS 1.3 was altered in order to mitigate the impact of network ossification. The ingenious solution was to make TLS 1.3 “look like” another protocol that middleboxes are known to tolerate. Specifically, the wire format and even the contents of handshake messages were made to resemble TLS 1.2. These two protocols aren’t identical, of course — a curious network observer can still distinguish between them — but they look and behave similar enough to ensure that the majority of existing middleboxes don’t treat them differently. Empirically, it was found that this strategy significantly reduced the connection failure rate enough to make deployment of TLS 1.3 viable.

Once again, ECH represents a significant upgrade for TLS for which the spectre of network ossification looms large. The ClientHello contains parameters, like SNI, that have existed in the handshake for a long time, and we don’t yet know what the impact will be of encrypting them. In anticipation of the deployment issues ossification might cause, the ECH protocol has been designed to look as much like a standard TLS 1.3 handshake as possible. The most notable difference is the ECH extension itself: if middleboxes ignore it — as they should, if they are compliant with the TLS 1.3 standard — then the rest of the handshake will look and behave very much as usual.

It remains to be seen whether this strategy will be enough to ensure the wide-scale deployment of ECH. If so, it is notable that this new feature will help to mitigate the impact of future TLS upgrades on network operations. Encrypting the full handshake reduces the risk of ossification since it means that there are less visible protocol features for software to ossify on. We believe this will be good for the health of the Internet overall.

Conclusion

The old TLS handshake is (unintentionally) leaky. Operational requirements of both the client and server have led to privacy-sensitive parameters, like SNI, being negotiated completely in the clear and available to network observers. The ECH extension aims to close this gap by enabling encryption of the full handshake. This represents a significant upgrade to TLS, one that will help preserve end-user privacy as the protocol continues to evolve.

The ECH standard is a work-in-progress. As this work continues, Cloudflare is committed to doing its part to ensure this important upgrade for TLS reaches Internet-scale deployment.

Helping build the next generation of privacy-preserving protocols

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/next-generation-privacy-protocols/

Helping build the next generation of privacy-preserving protocols

Helping build the next generation of privacy-preserving protocols

Over the last ten years, Cloudflare has become an important part of Internet infrastructure, powering websites, APIs, and web services to help make them more secure and efficient. The Internet is growing in terms of its capacity and the number of people using it and evolving in terms of its design and functionality. As a player in the Internet ecosystem, Cloudflare has a responsibility to help the Internet grow in a way that respects and provides value for its users. Today, we’re making several announcements around improving Internet protocols with respect to something important to our customers and Internet users worldwide: privacy.

These initiatives are:

Each of these projects impacts an aspect of the Internet that influences our online lives and digital footprints. Whether we know it or not, there is a lot of private information about us and our lives floating around online. This is something we can help fix.

For over a year, we have been working through standards bodies like the IETF and partnering with the biggest names in Internet technology (including Mozilla, Google, Equinix, and more) to design, deploy, and test these new privacy-preserving protocols at Internet scale. Each of these three protocols touches on a critical aspect of our online lives, and we expect them to help make real improvements to privacy online as they gain adoption.

A continuing tradition at Cloudflare

One of Cloudflare’s core missions is to support and develop technology that helps build a better Internet. As an industry, we’ve made exceptional progress in making the Internet more secure and robust. Cloudflare is proud to have played a part in this progress through multiple initiatives over the years.

Here are a few highlights:

  • Universal SSL™. We’ve been one of the driving forces for encrypting the web. We launched Universal SSL in 2014 to give website encryption to our customers for free and have actively been working along with certificate authorities like Let’s Encrypt, web browsers, and website operators to help remove mixed content. Before Universal SSL launched to give all Cloudflare customers HTTPS for free, only 30% of connections to websites were encrypted. Through the industry’s efforts, that number is now 80% — and a much more significant proportion of overall Internet traffic. Along with doing our part to encrypt the web, we have supported the Certificate Transparency project via Nimbus and Merkle Town, which has improved accountability for the certificate ecosystem HTTPS relies on for trust.
  • TLS 1.3 and QUIC. We’ve also been a proponent of upgrading existing security protocols. Take Transport Layer Security (TLS), the underlying protocol that secures HTTPS. Cloudflare engineers helped contribute to the design of TLS 1.3, the latest version of the standard, and in 2016 we launched support for an early version of the protocol. This early deployment helped lead to improvements to the final version of the protocol. TLS 1.3 is now the most widely used encryption protocol on the web and a vital component of the emerging QUIC standard, of which we were also early adopters.
  • Securing Routing, Naming, and Time. We’ve made major efforts to help secure other critical components of the Internet. Our efforts to help secure Internet routing through our RPKI toolkit, measurement studies, and “Is BGP Safe Yet” tool have significantly improved the Internet’s resilience against disruptive route leaks. Our time service (time.cloudflare.com) has helped keep people’s clocks in sync with more secure protocols like NTS and Roughtime. We’ve also made DNS more secure by supporting DNS-over-HTTPS and DNS-over-TLS in 1.1.1.1 at launch, along with one-click DNSSEC in our authoritative DNS service and registrar.

Continuing to improve the security of the systems of trust online is critical to the Internet’s growth. However, there is a more fundamental principle at play: respect. The infrastructure underlying the Internet should be designed to respect its users.

Building an Internet that respects users

When you sign in to a specific website or service with a privacy policy, you know what that site is expected to do with your data. It’s explicit. There is no such visibility to the users when it comes to the operators of the Internet itself. You may have an agreement with your Internet Service Provider (ISP) and the site you’re visiting, but it’s doubtful that you even know which networks your data is traversing. Most people don’t have a concept of the Internet beyond what they see on their screen, so it’s hard to imagine that people would accept or even understand what a privacy policy from a transit wholesaler or an inspection middlebox would even mean.

Without encryption, Internet browsing information is implicitly shared with countless third parties online as information passes between networks. Without secure routing, users’ traffic can be hijacked and disrupted. Without privacy-preserving protocols, users’ online life is not as private as they would think or expect. The infrastructure of the Internet wasn’t built in a way that reflects their expectations.

Helping build the next generation of privacy-preserving protocols
Normal network flow
Helping build the next generation of privacy-preserving protocols
Network flow with malicious route leak

The good news is that the Internet is continuously evolving. One of the groups that help guide that evolution is the Internet Architecture Board (IAB). The IAB provides architectural oversight to the Internet Engineering Task Force (IETF), the Internet’s main standard-setting body. The IAB recently published RFC 8890, which states that individual end-users should be prioritized when designing Internet protocols. It says that if there’s a conflict between the interests of end-users and the interest of service providers, corporations, or governments, IETF decisions should favor end users. One of the prime interests of end-users is the right to privacy, and the IAB published RFC 6973 to indicate how Internet protocols should take privacy into account.

Today’s technical blog posts are about improvements to the Internet designed to respect user privacy. Privacy is a complex topic that spans multiple disciplines, so it’s essential to clarify what we mean by “improving privacy.” We are specifically talking about changing the protocols that handle privacy-sensitive information exposed “on-the-wire” and modifying them so that this data is exposed to fewer parties. This data continues to exist. It’s just no longer available or visible to third parties without building a mechanism to collect it at a higher layer of the Internet stack, the application layer. These changes go beyond website encryption; they go deep into the design of the systems that are foundational to making the Internet what it is.

The toolbox: cryptography and secure proxies

Two tools for making sure data can be used without being seen are cryptography and secure proxies.

Helping build the next generation of privacy-preserving protocols

Cryptography allows information to be transformed into a format that a very limited number of people (those with the key) can understand. Some describe cryptography as a tool that transforms data security problems into key management problems. This is a humorous but fair description. Cryptography makes it easier to reason about privacy because only key holders can view data.

Another tool for protecting access to data is isolation/segmentation. By physically limiting which parties have access to information, you effectively build privacy walls. A popular architecture is to rely on policy-aware proxies to pass data from one place to another. Such proxies can be configured to strip sensitive data or block data transfers between parties according to what the privacy policy says.

Both these tools are useful individually, but they can be even more effective if combined. Onion routing (the cryptographic technique underlying Tor) is one example of how proxies and encryption can be used in tandem to enforce strong privacy. Broadly, if party A wants to send data to party B, they can encrypt the data with party B’s key and encrypt the metadata with a proxy’s key and send it to the proxy.

Platforms and services built on top of the Internet can build in consent systems, like privacy policies presented through user interfaces. The infrastructure of the Internet relies on layers of underlying protocols. Because these layers of the Internet are so far below where the user interacts with them, it’s almost impossible to build a concept of user consent. In order to respect users and protect them from privacy issues, the protocols that glue the Internet together should be designed with privacy enabled by default.

Data vs. metadata

The transition from a mostly unencrypted web to an encrypted web has done a lot for end-user privacy. For example, the “coffeeshop stalker” is no longer an issue for most sites. When accessing the majority of sites online, users are no longer broadcasting every aspect of their web browsing experience (search queries, browser versions, authentication cookies, etc.) over the Internet for any participant on the path to see. Suppose a site is configured correctly to use HTTPS. In that case, users can be confident their data is secure from onlookers and reaches only the intended party because their connections are both encrypted and authenticated.

However, HTTPS only protects the content of web requests. Even if you only browse sites over HTTPS, that doesn’t mean that your browsing patterns are private. This is because HTTPS fails to encrypt a critical aspect of the exchange: the metadata. When you make a phone call, the metadata is the phone number, not the call’s contents. Metadata is the data about the data.

To illustrate the difference and why it matters, here’s a diagram of what happens when you visit a website like an imageboard. Say you’re going to a specific page on that board (https://<imageboard>.com/room101/) that has specific embedded images hosted on <embarassing>.com.

Helping build the next generation of privacy-preserving protocols
Page load for an imageboard, returning an HTML page with an image from an embarassing site
Helping build the next generation of privacy-preserving protocols
Subresource fetch for the image from an embarassing site

The space inside the dotted line here represents the part of the Internet that your data needs to transit. They include your local area network or coffee shop, your ISP, an Internet transit provider, and it could be the network portion of the cloud provider that hosts the server. Users often don’t have a relationship with these entities or a contract to prevent these parties from doing anything with the user’s data. And even if those entities don’t look at the data, a well-placed observer intercepting Internet traffic could see anything sent unencrypted. It would be best if they just didn’t see it at all. In this example, the fact that the user visited <imageboard>.com can be seen by an observer, which is expected. However, though page content is encrypted, it’s possible to learn which specific page you’ve visited can be seen since <embarassing>.com is also visible.

It’s a general rule that if data is available to on-path parties on the Internet, some of these on-path parties will use this data. It’s also true that these on-path parties need some metadata in order to facilitate the transport of this data. This balance is explored in RFC 8558, which explains how protocols should be designed thoughtfully with respect to the balance between too much metadata (bad for privacy) and too little metadata (bad for operations).

In an ideal world, Internet protocols would be designed with the principle of least privilege. They would provide the minimum amount of information needed for the on-path parties (the pipes) to do the job of transporting the data to the right place and keep everything else confidential by default. Current protocols, including TLS 1.3 and QUIC, are important steps towards this ideal but fall short with respect to metadata privacy.

Knowing both who you are and what you do online can lead to profiling

Today’s announcements reflect two metadata protection levels: the first involves limiting the amount of metadata available to third-party observers (like ISPs). The second involves restricting the amount of metadata that users share with service providers themselves.

Hostnames are an example of metadata that needs to be protected from third-party observers, which DoH and ECH intend to do. However, it doesn’t make sense to hide the hostname from the site you’re visiting. It also doesn’t make sense to hide it from a directory service like DNS. A DNS server needs to know which hostname you’re resolving to resolve it for you!

A privacy issue arises when a service provider knows about both what sites you’re visiting and who you are. Individual websites do not have this dangerous combination of information (except in the case of third party cookies, which are going away soon in browsers), but DNS providers do. Thankfully, it’s not actually necessary for a DNS resolver to know *both* the hostname of the service you’re going to and which IP you’re coming from. Disentangling the two, which is the goal of ODoH, is good for privacy.

The Internet is part of ‘our’ Infrastructure

Roads should be well-paved, well lit, have accurate signage, and be optimally connected. They aren’t designed to stop a car based on who’s inside it. Nor should they be! Like transportation infrastructure, Internet infrastructure is responsible for getting data where it needs to go, not looking inside packets, and making judgments. But the Internet is made of computers and software, and software tends to be written to make decisions based on the data it has available to it.

Privacy-preserving protocols attempt to eliminate the temptation for infrastructure providers and others to peek inside and make decisions based on personal data. A non-privacy preserving protocol like HTTP keeps data and metadata, like passwords, IP addresses, and hostnames, as explicit parts of the data sent over the wire. The fact that they are explicit means that they are available to any observer to collect and act on. A protocol like HTTPS improves upon this by making some of the data (such as passwords and site content) invisible on the wire using encryption.

The three protocols we are exploring today extend this concept.

  • ECH takes most of the unencrypted metadata in TLS (including the hostname) and encrypts it with a key that was fetched ahead of time.
  • ODoH (a new variant of DoH co-designed by Apple, Cloudflare, and Fastly engineers) uses proxies and onion-like encryption to make the source of a DNS query invisible to the DNS resolver. This protects the user’s IP address when resolving hostnames.
  • OPAQUE uses a new cryptographic technique to keep passwords hidden even from the server. Utilizing a construction called an Oblivious Pseudo-Random Function (as seen in Privacy Pass), the server does not learn the password; it only learns whether or not the user knows the password.

By making sure Internet infrastructure acts more like physical infrastructure, user privacy is more easily protected. The Internet is more private if private data can only be collected where the user has a chance to consent to its collection.

Doing it together

As much as we’re excited about working on new ways to make the Internet more private, innovation at a global scale doesn’t happen in a vacuum. Each of these projects is the output of a collaborative group of individuals working out in the open in organizations like the IETF and the IRTF. Protocols must come about through a consensus process that involves all the parties that make up the interconnected set of systems that power the Internet. From browser builders to cryptographers, from DNS operators to website administrators, this is truly a global team effort.

We also recognize that sweeping technical changes to the Internet will inevitably also impact the technical community. Adopting these new protocols may have legal and policy implications. We are actively working with governments and civil society groups to help educate them about the impact of these potential changes.

We’re looking forward to sharing our work today and hope that more interested parties join in developing these protocols. The projects we are announcing today were designed by experts from academia, industry, and hobbyists together and were built by engineers from Cloudflare Research (including the work of interns, which we will highlight) with everyone’s support Cloudflare.

If you’re interested in this type of work, we’re hiring!

Round 2 post-quantum TLS is now supported in AWS KMS

Post Syndicated from Alex Weibel original https://aws.amazon.com/blogs/security/round-2-post-quantum-tls-is-now-supported-in-aws-kms/

AWS Key Management Service (AWS KMS) now supports three new hybrid post-quantum key exchange algorithms for the Transport Layer Security (TLS) 1.2 encryption protocol that’s used when connecting to AWS KMS API endpoints. These new hybrid post-quantum algorithms combine the proven security of a classical key exchange with the potential quantum-safe properties of new post-quantum key exchanges undergoing evaluation for standardization. The fastest of these algorithms adds approximately 0.3 milliseconds of overheard compared to a classical TLS handshake. The new post-quantum key exchange algorithms added are Round 2 versions of Kyber, Bit Flipping Key Encapsulation (BIKE), and Supersingular Isogeny Key Encapsulation (SIKE). Each organization has submitted their algorithms to the National Institute of Standards and Technology (NIST) as part of NIST’s post-quantum cryptography standardization process. This process spans several rounds of evaluation over multiple years, and is likely to continue beyond 2021.

In our previous hybrid post-quantum TLS blog post, we announced that AWS KMS had launched hybrid post-quantum TLS 1.2 with Round 1 versions of BIKE and SIKE. The Round 1 post-quantum algorithms are still supported by AWS KMS, but at a lower priority than the Round 2 algorithms. You can choose to upgrade your client to enable negotiation of Round 2 algorithms.

Why post-quantum TLS is important

A large-scale quantum computer would be able to break the current public-key cryptography that’s used for key exchange in classical TLS connections. While a large-scale quantum computer isn’t available today, it’s still important to think about and plan for your long-term security needs. TLS traffic using classical algorithms recorded today could be decrypted by a large-scale quantum computer in the future. If you’re developing applications that rely on the long-term confidentiality of data passed over a TLS connection, you should consider a plan to migrate to post-quantum cryptography before the lifespan of the sensitivity of your data would be susceptible to an unauthorized user with a large-scale quantum computer. As an example, this means that if you believe that a large-scale quantum computer is 25 years away, and your data must be secure for 20 years, you should migrate to post-quantum schemes within the next 5 years. AWS is working to prepare for this future, and we want you to be prepared too.

We’re offering this feature now instead of waiting for standardization efforts to be complete so you have a way to measure the potential performance impact to your applications. Offering this feature now also gives you the protection afforded by the proposed post-quantum schemes today. While we believe that the use of this feature raises the already high security bar for connecting to AWS KMS endpoints, these new cipher suites will impact bandwidth utilization and latency. However, using these new algorithms could also create connection failures for intermediate systems that proxy TLS connections. We’d like to get feedback from you on the effectiveness of our implementation or any issues found so we can improve it over time.

Hybrid post-quantum TLS 1.2

Hybrid post-quantum TLS is a feature that provides the security protections of both the classical and post-quantum key exchange algorithms in a single TLS handshake. Figure 1 shows the differences in the connection secret derivation process between classical and hybrid post-quantum TLS 1.2. Hybrid post-quantum TLS 1.2 has three major differences from classical TLS 1.2:

  • The negotiated post-quantum key is appended to the ECDHE key before being used as the hash-based message authentication code (HMAC) key.
  • The text hybrid in its ASCII representation is prepended to the beginning of the HMAC message.
  • The entire client key exchange message from the TLS handshake is appended to the end of the HMAC message.
Figure 1: Differences in the connection secret derivation process between classical and hybrid post-quantum TLS 1.2

Figure 1: Differences in the connection secret derivation process between classical and hybrid post-quantum TLS 1.2

Some background on post-quantum TLS

Today, all requests to AWS KMS use TLS with key exchange algorithms that provide perfect forward secrecy and use one of the following classical schemes:

While existing FFDHE and ECDHE schemes use perfect forward secrecy to protect against the compromise of the server’s long-term secret key, these schemes don’t protect against large-scale quantum computers. In the future, a sufficiently capable large-scale quantum computer could run Shor’s Algorithm to recover the TLS session key of a recorded classical session, and thereby gain access to the data inside. Using a post-quantum key exchange algorithm during the TLS handshake protects against attacks from a large-scale quantum computer.

The possibility of large-scale quantum computing has spurred the development of new quantum-resistant cryptographic algorithms. NIST has started the process of standardizing post-quantum key encapsulation mechanisms (KEMs). A KEM is a type of key exchange that’s used to establish a shared symmetric key. AWS has chosen three NIST KEM submissions to adopt in our post-quantum efforts:

Hybrid mode ensures that the negotiated key is as strong as the weakest key agreement scheme. If one of the schemes is broken, the communications remain confidential. The Internet Engineering Task Force (IETF) Hybrid Post-Quantum Key Encapsulation Methods for Transport Layer Security 1.2 draft describes how to combine post-quantum KEMs with ECDHE to create new cipher suites for TLS 1.2.

These cipher suites use a hybrid key exchange that performs two independent key exchanges during the TLS handshake. The key exchange then cryptographically combines the keys from each into a single TLS session key. This strategy combines the proven security of a classical key exchange with the potential quantum-safe properties of new post-quantum key exchanges being analyzed by NIST.

The effect of hybrid post-quantum TLS on performance

Post-quantum cipher suites have a different performance profile and bandwidth usage from traditional cipher suites. AWS has measured bandwidth and latency across 2,000 TLS handshakes between an Amazon Elastic Compute Cloud (Amazon EC2) C5n.4xlarge client and the public AWS KMS endpoint, which were both in the us-west-2 Region. Your own performance characteristics might differ, and will depend on your environment, including your:

  • Hardware–CPU speed and number of cores.
  • Existing workloads–how often you call AWS KMS and what other work your application performs.
  • Network–location and capacity.

The following graphs and table show latency measurements performed by AWS for all newly supported Round 2 post-quantum algorithms, in addition to the classical ECDHE key exchange algorithm currently used by most customers.

Figure 2 shows the latency differences of all hybrid post-quantum algorithms compared with classical ECDHE alone, and shows that compared to ECDHE alone, SIKE adds approximately 101 milliseconds of overhead, BIKE adds approximately 9.5 milliseconds of overhead, and Kyber adds approximately 0.3 milliseconds of overhead.
 

Figure 2: TLS handshake latency at varying percentiles for four key exchange algorithms

Figure 2: TLS handshake latency at varying percentiles for four key exchange algorithms

Figure 3 shows the latency differences between ECDHE with Kyber, and ECDHE alone. The addition of Kyber adds approximately 0.3 milliseconds of overhead.
 

Figure 3: TLS handshake latency at varying percentiles, with only top two performing key exchange algorithms

Figure 3: TLS handshake latency at varying percentiles, with only top two performing key exchange algorithms

The following table shows the total amount of data (in bytes) needed to complete the TLS handshake for each cipher suite, the average latency, and latency at varying percentiles. All measurements were gathered from 2,000 TLS handshakes. The time was measured on the client from the start of the handshake until the handshake was completed, and includes all network transfer time. All connections used RSA authentication with a 2048-bit key, and ECDHE used the secp256r1 curve. All hybrid post-quantum tests used the NIST Round 2 versions. The Kyber test used the Kyber-512 parameter, the BIKE test used the BIKE-1 Level 1 parameter, and the SIKE test used the SIKEp434 parameter.

Item Bandwidth
(bytes)
Total
handshakes
Average
(ms)
p0
(ms)
p50
(ms)
p90
(ms)
p99
(ms)
ECDHE (classic) 3,574 2,000 3.08 2.07 3.02 3.95 4.71
ECDHE + Kyber R2 5,898 2,000 3.36 2.38 3.17 4.28 5.35
ECDHE + BIKE R2 12,456 2,000 14.91 11.59 14.16 18.27 23.58
ECDHE + SIKE R2 4,628 2,000 112.40 103.22 108.87 126.80 146.56

By default, the AWS SDK client performs a TLS handshake once to set up a new TLS connection, and then reuses that TLS connection for multiple requests. This means that the increased cost of a hybrid post-quantum TLS handshake is amortized over multiple requests sent over the TLS connection. You should take the amortization into account when evaluating the overall additional cost of using post-quantum algorithms; otherwise performance data could be skewed.

AWS KMS has chosen Kyber Round 2 to be KMS’s highest prioritized post-quantum algorithm, with BIKE Round 2, and SIKE Round 2 next in priority order for post-quantum algorithms. This is because Kyber’s performance is closest to the classical ECDHE performance that most AWS KMS customers are using today and are accustomed to.

How to use hybrid post-quantum cipher suites

To use the post-quantum cipher suites with AWS KMS, you need the preview release of the AWS Common Runtime (CRT) HTTP client for the AWS SDK for Java 2.x. Also, you will need to configure the AWS CRT HTTP client to use the s2n post-quantum hybrid cipher suites. Post-quantum TLS for AWS KMS is available in all AWS Regions except for AWS GovCloud (US-East), AWS GovCloud (US-West), AWS China (Beijing) Region operated by Beijing Sinnet Technology Co. Ltd (“Sinnet”), and AWS China (Ningxia) Region operated by Ningxia Western Cloud Data Technology Co. Ltd. (“NWCD”). Since NIST has not yet standardized post-quantum cryptography, connections that require Federal Information Processing Standards (FIPS) compliance cannot use the hybrid key exchange. For example, kms.<region>.amazonaws.com supports the use of post-quantum cipher suites, while kms-fips.<region>.amazonaws.com does not.

  1. If you’re using the AWS SDK for Java 2.x, you must add the preview release of the AWS Common Runtime client to your Maven dependencies.
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>aws-crt-client</artifactId>
        <version>2.14.13-PREVIEW</version>
    </dependency>
    

  2. You then must configure the new SDK and cipher suite in the existing initialization code of your application:
    if(!TLS_CIPHER_PREF_KMS_PQ_TLSv1_0_2020_07.isSupported()){
        throw new RuntimeException("Post Quantum Ciphers not supported on this Platform");
    }
    
    SdkAsyncHttpClient awsCrtHttpClient = AwsCrtAsyncHttpClient.builder()
              .tlsCipherPreference(TLS_CIPHER_PREF_KMS_PQ_TLSv1_0_2020_07)
              .build();
              
    KmsAsyncClient kms = KmsAsyncClient.builder()
             .httpClient(awsCrtHttpClient)
             .build();
             
    ListKeysResponse response = kms.listKeys().get();
    

Now, all connections made to AWS KMS in supported Regions will use the new hybrid post-quantum cipher suites! To see a complete example of everything set up, check out the example application here.

Things to try

Here are some ideas about how to use this post-quantum-enabled client:

  • Run load tests and benchmarks. These new cipher suites perform differently than traditional key exchange algorithms. You might need to adjust your connection timeouts to allow for the longer handshake times or, if you’re running inside an AWS Lambda function, extend the execution timeout setting.
  • Try connecting from different locations. Depending on the network path your request takes, you might discover that intermediate hosts, proxies, or firewalls with deep packet inspection (DPI) block the request. This could be due to the new cipher suites in the ClientHello or the larger key exchange messages. If this is the case, you might need to work with your security team or IT administrators to update the relevant configuration to unblock the new TLS cipher suites. We’d like to hear from you about how your infrastructure interacts with this new variant of TLS traffic. If you have questions or feedback, please start a new thread on the AWS KMS discussion forum.

Conclusion

In this blog post, I announced support for Round 2 hybrid post-quantum algorithms in AWS KMS, and showed you how to begin experimenting with hybrid post-quantum key exchange algorithms for TLS when connecting to AWS KMS endpoints.

More info

If you’d like to learn more about post-quantum cryptography check out:

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Alex Weibel

Alex is a Senior Software Engineer on the AWS Crypto Algorithms team. He’s one of the maintainers for Amazon’s TLS Library s2n. Previously, Alex worked on TLS termination and request proxying for S3 and the Elastic Load Balancing Service developing new features for customers. Alex holds a Bachelor of Science degree in Computer Science from the University of Texas at Austin.

Automated Origin CA for Kubernetes

Post Syndicated from Terin Stock original https://blog.cloudflare.com/automated-origin-ca-for-kubernetes/

Automated Origin CA for Kubernetes

Automated Origin CA for Kubernetes

In 2016, we launched the Cloudflare Origin CA, a certificate authority optimized for making it easy to secure the connection between Cloudflare and an origin server. Running our own CA has allowed us to support fast issuance and renewal, simple and effective revocation, and wildcard certificates for our users.

Out of the box, managing TLS certificates and keys within Kubernetes can be challenging and error prone. The secret resources have to be constructed correctly, as components expect secrets with specific fields. Some forms of domain verification require manually rotating secrets to pass. Once you’re successful, don’t forget to renew before the certificate expires!

cert-manager is a project to fill this operational gap, providing Kubernetes resources that manage the lifecycle of a certificate. Today we’re releasing origin-ca-issuer, an extension to cert-manager integrating with Cloudflare Origin CA to easily create and renew certificates for your account’s domains.

Origin CA Integration

Creating an Issuer

After installing cert-manager and origin-ca-issuer, you can create an OriginIssuer resource. This resource creates a binding between cert-manager and the Cloudflare API for an account. Different issuers may be connected to different Cloudflare accounts in the same Kubernetes cluster.

apiVersion: cert-manager.k8s.cloudflare.com/v1
kind: OriginIssuer
metadata:
  name: prod-issuer
  namespace: default
spec:
  signatureType: OriginECC
  auth:
    serviceKeyRef:
      name: service-key
      key: key
      ```

This creates a new OriginIssuer named “prod-issuer” that issues certificates using ECDSA signatures, and the secret “service-key” in the same namespace is used to authenticate to the Cloudflare API.

Signing an Origin CA Certificate

After creating an OriginIssuer, we can now create a Certificate with cert-manager. This defines the domains, including wildcards, that the certificate should be issued for, how long the certificate should be valid, and when cert-manager should renew the certificate.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com
  namespace: default
spec:
  # The secret name where cert-manager
  # should store the signed certificate.
  secretName: example-com-tls
  dnsNames:
    - example.com
  # Duration of the certificate.
  duration: 168h
  # Renew a day before the certificate expiration.
  renewBefore: 24h
  # Reference the Origin CA Issuer you created above,
  # which must be in the same namespace.
  issuerRef:
    group: cert-manager.k8s.cloudflare.com
    kind: OriginIssuer
    name: prod-issuer

Once created, cert-manager begins managing the lifecycle of this certificate, including creating the key material, crafting a certificate signature request (CSR), and constructing a certificate request that will be processed by the origin-ca-issuer.

When signed by the Cloudflare API, the certificate will be made available, along with the private key, in the Kubernetes secret specified within the secretName field. You’ll be able to use this certificate on servers proxied behind Cloudflare.

Extra: Ingress Support

If you’re using an Ingress controller, you can use cert-manager’s Ingress support to automatically manage Certificate resources based on your Ingress resource.

apiVersion: networking/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/issuer: prod-issuer
    cert-manager.io/issuer-kind: OriginIssuer
    cert-manager.io/issuer-group: cert-manager.k8s.cloudflare.com
  name: example
  namespace: default
spec:
  rules:
    - host: example.com
      http:
        paths:
          - backend:
              serviceName: examplesvc
              servicePort: 80
            path: /
  tls:
    # specifying a host in the TLS section will tell cert-manager 
    # what DNS SANs should be on the created certificate.
    - hosts:
        - example.com
      # cert-manager will create this secret
      secretName: example-tls

Building an External cert-manager Issuer

An external cert-manager issuer is a specialized Kubernetes controller. There’s no direct communication between cert-manager and external issuers at all; this means that you can use any existing tools and best practices for developing controllers to develop an external issuer.

We’ve decided to use the excellent controller-runtime project to build origin-ca-issuer, running two reconciliation controllers.

Automated Origin CA for Kubernetes

OriginIssuer Controller

The OriginIssuer controller watches for creation and modification of OriginIssuer custom resources. The controllers create a Cloudflare API client using the details and credentials referenced. This client API instance will later be used to sign certificates through the API. The controller will periodically retry to create an API client; once it is successful, it updates the OriginIssuer’s status to be ready.

CertificateRequest Controller

The CertificateRequest controller watches for the creation and modification of cert-manager’s CertificateRequest resources. These resources are created automatically by cert-manager as needed during a certificate’s lifecycle.

The controller looks for Certificate Requests that reference a known OriginIssuer, this reference is copied by cert-manager from the origin Certificate resource, and ignores all resources that do not match. The controller then verifies the OriginIssuer is in the ready state, before transforming the certificate request into an API request using the previously created clients.

On a successful response, the signed certificate is added to the certificate request, and which cert-manager will use to create or update the secret resource. On an unsuccessful request, the controller will periodically retry.

Learn More

Up-to-date documentation and complete installation instructions can be found in our GitHub repository. Feedback and contributions are greatly appreciated. If you’re interested in Kubernetes at Cloudflare, including building controllers like these, we’re hiring.

The importance of encryption and how AWS can help

Post Syndicated from Ken Beer original https://aws.amazon.com/blogs/security/importance-of-encryption-and-how-aws-can-help/

Encryption is a critical component of a defense-in-depth strategy, which is a security approach with a series of defensive mechanisms designed. It means if one security mechanism fails, there’s at least one more still operating. As more organizations look to operate faster and at scale, they need ways to meet critical compliance requirements and improve data security. Encryption, when used correctly, can provide an additional layer of protection above basic access control.

How and why does encryption work?

Encryption works by using an algorithm with a key to convert data into unreadable data (ciphertext) that can only become readable again with the right key. For example, a simple phrase like “Hello World!” may look like “1c28df2b595b4e30b7b07500963dc7c” when encrypted. There are several different types of encryption algorithms, all using different types of keys. A strong encryption algorithm relies on mathematical properties to produce ciphertext that can’t be decrypted using any practically available amount of computing power without also having the necessary key. Therefore, protecting and managing the keys becomes a critical part of any encryption solution.

Encryption as part of your security strategy

An effective security strategy begins with stringent access control and continuous work to define the least privilege necessary for persons or systems accessing data. AWS requires that you manage your own access control policies, and also supports defense in depth to achieve the best possible data protection.

Encryption is a critical component of a defense-in-depth strategy because it can mitigate weaknesses in your primary access control mechanism. What if an access control mechanism fails and allows access to the raw data on disk or traveling along a network link? If the data is encrypted using a strong key, as long as the decryption key is not on the same system as your data, it is computationally infeasible for an attacker to decrypt your data. To show how infeasible it is, let’s consider the Advanced Encryption Standard (AES) with 256-bit keys (AES-256). It’s the strongest industry-adopted and government-approved algorithm for encrypting data. AES-256 is the technology we use to encrypt data in AWS, including Amazon Simple Storage Service (S3) server-side encryption. It would take at least a trillion years to break using current computing technology. Current research suggests that even the future availability of quantum-based computing won’t sufficiently reduce the time it would take to break AES encryption.

But what if you mistakenly create overly permissive access policies on your data? A well-designed encryption and key management system can also prevent this from becoming an issue, because it separates access to the decryption key from access to your data.

Requirements for an encryption solution

To get the most from an encryption solution, you need to think about two things:

  1. Protecting keys at rest: Are the systems using encryption keys secured so the keys can never be used outside the system? In addition, do these systems implement encryption algorithms correctly to produce strong ciphertexts that cannot be decrypted without access to the right keys?
  2. Independent key management: Is the authorization to use encryption independent from how access to the underlying data is controlled?

There are third-party solutions that you can bring to AWS to meet these requirements. However, these systems can be difficult and expensive to operate at scale. AWS offers a range of options to simplify encryption and key management.

Protecting keys at rest

When you use third-party key management solutions, it can be difficult to gauge the risk of your plaintext keys leaking and being used outside the solution. The keys have to be stored somewhere, and you can’t always know or audit all the ways those storage systems are secured from unauthorized access. The combination of technical complexity and the necessity of making the encryption usable without degrading performance or availability means that choosing and operating a key management solution can present difficult tradeoffs. The best practice to maximize key security is using a hardware security module (HSM). This is a specialized computing device that has several security controls built into it to prevent encryption keys from leaving the device in a way that could allow an adversary to access and use those keys.

One such control in modern HSMs is tamper response, in which the device detects physical or logical attempts to access plaintext keys without authorization, and destroys the keys before the attack succeeds. Because you can’t install and operate your own hardware in AWS datacenters, AWS offers two services using HSMs with tamper response to protect customers’ keys: AWS Key Management Service (KMS), which manages a fleet of HSMs on the customer’s behalf, and AWS CloudHSM, which gives customers the ability to manage their own HSMs. Each service can create keys on your behalf, or you can import keys from your on-premises systems to be used by each service.

The keys in AWS KMS or AWS CloudHSM can be used to encrypt data directly, or to protect other keys that are distributed to applications that directly encrypt data. The technique of encrypting encryption keys is called envelope encryption, and it enables encryption and decryption to happen on the computer where the plaintext customer data exists, rather than sending the data to the HSM each time. For very large data sets (e.g., a database), it’s not practical to move gigabytes of data between the data set and the HSM for every read/write operation. Instead, envelope encryption allows a data encryption key to be distributed to the application when it’s needed. The “master” keys in the HSM are used to encrypt a copy of the data key so the application can store the encrypted key alongside the data encrypted under that key. Once the application encrypts the data, the plaintext copy of data key can be deleted from its memory. The only way for the data to be decrypted is if the encrypted data key, which is only a few hundred bytes in size, is sent back to the HSM and decrypted.

The process of envelope encryption is used in all AWS services in which data is encrypted on a customer’s behalf (which is known as server-side encryption) to minimize performance degradation. If you want to encrypt data in your own applications (client-side encryption), you’re encouraged to use envelope encryption with AWS KMS or AWS CloudHSM. Both services offer client libraries and SDKs to add encryption functionality to their application code and use the cryptographic functionality of each service. The AWS Encryption SDK is an example of a tool that can be used anywhere, not just in applications running in AWS.

Because implementing encryption algorithms and HSMs is critical to get right, all vendors of HSMs should have their products validated by a trusted third party. HSMs in both AWS KMS and AWS CloudHSM are validated under the National Institute of Standards and Technology’s FIPS 140-2 program, the standard for evaluating cryptographic modules. This validates the secure design and implementation of cryptographic modules, including functions related to ports and interfaces, authentication mechanisms, physical security and tamper response, operational environments, cryptographic key management, and electromagnetic interference/electromagnetic compatibility (EMI/EMC). Encryption using a FIPS 140-2 validated cryptographic module is often a requirement for other security-related compliance schemes like FedRamp and HIPAA-HITECH in the U.S., or the international payment card industry standard (PCI-DSS).

Independent key management

While AWS KMS and AWS CloudHSM can protect plaintext master keys on your behalf, you are still responsible for managing access controls to determine who can cause which encryption keys to be used under which conditions. One advantage of using AWS KMS is that the policy language you use to define access controls on keys is the same one you use to define access to all other AWS resources. Note that the language is the same, not the actual authorization controls. You need a mechanism for managing access to keys that is different from the one you use for managing access to your data. AWS KMS provides that mechanism by allowing you to assign one set of administrators who can only manage keys and a different set of administrators who can only manage access to the underlying encrypted data. Configuring your key management process in this way helps provide separation of duties you need to avoid accidentally escalating privilege to decrypt data to unauthorized users. For even further separation of control, AWS CloudHSM offers an independent policy mechanism to define access to keys.

Even with the ability to separate key management from data management, you can still verify that you have configured access to encryption keys correctly. AWS KMS is integrated with AWS CloudTrail so you can audit who used which keys, for which resources, and when. This provides granular vision into your encryption management processes, which is typically much more in-depth than on-premises audit mechanisms. Audit events from AWS CloudHSM can be sent to Amazon CloudWatch, the AWS service for monitoring and alarming third-party solutions you operate in AWS.

Encrypting data at rest and in motion

All AWS services that handle customer data encrypt data in motion and provide options to encrypt data at rest. All AWS services that offer encryption at rest using AWS KMS or AWS CloudHSM use AES-256. None of these services store plaintext encryption keys at rest — that’s a function that only AWS KMS and AWS CloudHSM may perform using their FIPS 140-2 validated HSMs. This architecture helps minimize the unauthorized use of keys.

When encrypting data in motion, AWS services use the Transport Layer Security (TLS) protocol to provide encryption between your application and the AWS service. Most commercial solutions use an open source project called OpenSSL for their TLS needs. OpenSSL has roughly 500,000 lines of code with at least 70,000 of those implementing TLS. The code base is large, complex, and difficult to audit. Moreover, when OpenSSL has bugs, the global developer community is challenged to not only fix and test the changes, but also to ensure that the resulting fixes themselves do not introduce new flaws.

AWS’s response to challenges with the TLS implementation in OpenSSL was to develop our own implementation of TLS, known as s2n, or signal to noise. We released s2n in June 2015, which we designed to be small and fast. The goal of s2n is to provide you with network encryption that is easier to understand and that is fully auditable. We released and licensed it under the Apache 2.0 license and hosted it on GitHub.

We also designed s2n to be analyzed using automated reasoning to test for safety and correctness using mathematical logic. Through this process, known as formal methods, we verify the correctness of the s2n code base every time we change the code. We also automated these mathematical proofs, which we regularly re-run to ensure the desired security properties are unchanged with new releases of the code. Automated mathematical proofs of correctness are an emerging trend in the security industry, and AWS uses this approach for a wide variety of our mission-critical software.

Implementing TLS requires using encryption keys and digital certificates that assert the ownership of those keys. AWS Certificate Manager and AWS Private Certificate Authority are two services that can simplify the issuance and rotation of digital certificates across your infrastructure that needs to offer TLS endpoints. Both services use a combination of AWS KMS and AWS CloudHSM to generate and/or protect the keys used in the digital certificates they issue.

Summary

At AWS, security is our top priority and we aim to make it as easy as possible for you to use encryption to protect your data above and beyond basic access control. By building and supporting encryption tools that work both on and off the cloud, we help you secure your data and ensure compliance across your entire environment. We put security at the center of everything we do to make sure that you can protect your data using best-of-breed security technology in a cost-effective way.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS KMS forum or the AWS CloudHSM forum, or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ken Beer

Ken is the General Manager of the AWS Key Management Service. Ken has worked in identity and access management, encryption, and key management for over 7 years at AWS. Before joining AWS, Ken was in charge of the network security business at Trend Micro. Before Trend Micro, he was at Tumbleweed Communications. Ken has spoken on a variety of security topics at events such as the RSA Conference, the DoD PKI User’s Forum, and AWS re:Invent.

How Netflix brings safer and faster streaming experience to the living room on crowded networks…

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/how-netflix-brings-safer-and-faster-streaming-experience-to-the-living-room-on-crowded-networks-78b8de7f758c

How Netflix brings safer and faster streaming experience to the living room on crowded networks using TLS 1.3

By Sekwon Choi

At Netflix, we are obsessed with the best streaming experiences. We want playback to start instantly and to never stop unexpectedly in any network environment. We are also committed to protecting users’ privacy and service security without sacrificing any part of the playback experience.

To achieve that, we are efficiently using ABR (adaptive bitrate streaming) for a better playback experience, DRM (Digital Right Management) to protect our service and TLS (Transport Layer Security) to protect customer privacy and to create a safer streaming experience.

Netflix on consumer electronics devices such as TVs, set-top boxes and streaming sticks was until recently using TLS 1.2 for streaming traffic. Now we support TLS 1.3 for safer and faster experiences.

What is TLS?

For two parties to communicate securely, a secure channel is necessary. This needs to have the following three properties.

  • Authentication: Identity of the communicating party is verified.
  • Confidentiality: Data sent over the channel is only visible to the endpoints.
  • Integrity: Data sent over the channel cannot be modified by attackers without detection.

The TLS protocol is designed to provide a secure channel between two peers by providing tools and methods to achieve the above properties.

TLS 1.3

TLS 1.3 is the latest version of the Transport Layer Security protocol. It is simpler, more secure and more efficient than its predecessor.

Perfect Forward Secrecy

One thing we believe is very important at Netflix is providing PFS (Perfect Forward Secrecy).

PFS is a feature of the key exchange algorithm that assures that session keys will not be compromised, even if the server’s private key is compromised. By generating new keys for each session, PFS protects past sessions against the future compromise of secret keys.

TLS 1.2 supports key exchange algorithms with PFS, but it also allows key exchange algorithms that do not support PFS. Even with the previous version of TLS 1.2, Netflix has always selected a key exchange algorithm that provides PFS such as ECDHE (Elliptic Curve Diffie Hellman Ephemeral). TLS 1.3, however, enforces this concept even more by removing all the key exchange algorithms that do not provide PFS, such as static RSA.

Authenticated Encryption

For encryption, TLS 1.3 removes all weak ciphers and uses only Authenticated Encryption with Associated Data (AEAD). This assures the confidentiality, integrity, and authenticity of the data. We use AES Galois/Counter Mode, as it also provides good performance and high throughput.

Secure Handshake

While the above changes are important, the most important change in TLS 1.3 is perhaps its redesign of the handshake protocol.

The TLS 1.2 handshake was not designed to protect the integrity of the entire handshake. It protected only the part of the handshake after the cipher suite negotiation and this opened up the possibility of downgrade attacks which may allow the attackers to force the use of insecure cipher suites.

With TLS 1.3, the server signs the entire handshake including the cipher suite negotiation and thus prevents the attacker from downgrading the cipher suite.

Also in TLS 1.2, extensions were sent in the clear in the ServerHello. Now with TLS 1.3, even extensions are encrypted and all handshake messages after ServerHello are now encrypted.

Reduced Handshake

TLS 1.2 supports numerous key exchange algorithms, cipher suites and digital signatures, including weak and vulnerable ones. Therefore, it requires more messages to perform a handshake and two network round trips.

In contrast, the handshake in TLS 1.3 now requires only one round trip, with a simplified design and with all weak and vulnerable algorithms removed.

In addition, it has a new feature called 0-RTT, or TLS early data, for the resumed handshake. This allows an application to include application data with its initial handshake message, instead of having to wait until the handshake completes.

At Netflix, by the efficient resumption of the TLS session and careful use of 0-RTT for the streaming data, we can reduce the play delay.

A/B Testing Result

We were pretty confident that TLS 1.3 would bring us better security from the analysis of its protocol composition, but we did not know how it would perform in the context of streaming.

Since TLS 1.3’s performance-related feature is the 0-RTT mode with the resumed handshake, our hypothesis is that TLS 1.3 would reduce play delay, as we are no longer required to wait for the handshake to finish and we can instead issue the HTTP request for media data and receive the HTTP response for media data earlier.

To see the actual performance of TLS 1.3 in the field, we performed an experiment with

  • User accounts: half-million user accounts per cell.
  • Device type: mid-performance device with Quad ARM core @ 1.7GHz.
  • Control cell: TLS 1.2
  • Treatment cell: TLS 1.3

Play Delay

Play Delay is defined by how long it takes for playback to start. Below are the results of the play delay measured in the experiment. The results imply that on slower or congested networks, which can be represented by the quantiles of at least 0.75, TLS 1.3 achieves the largest gains, with improvements across all network conditions.

Below is the time series median play delay graph for this mid-performance device in the field. It also shows that playback starts earlier with TLS 1.3.

Media Rebuffer

At Netflix, we define a media rebuffer as a non-network originated rebuffer. It typically occurs when media data is not processed quickly enough by the device due to the high load on the CPU. Comparing the control cell with TLS 1.2, the experiment cell with TLS 1.3 showed about a 7.4% improvement in media rebuffers. This result implies that using TLS 1.3 with 0-RTT is more efficient and can reduce the CPU load.

Conclusion

From the security analysis, we are confident that TLS 1.3 improves communication security over TLS 1.2. From the field test, we are confident that TLS 1.3 provides us a better streaming experience.

At the time of writing this article, the Internet is experiencing higher than usual traffic and congestion. We believe saving even small amounts of data and round trips can be meaningful and even better if it also provides a more secure and efficient streaming experience.

Therefore, we have started deploying TLS 1.3 on newer consumer electronics devices and we are expecting even more devices to be deployed with TLS 1.3 capability in the near future.


How Netflix brings safer and faster streaming experience to the living room on crowded networks… was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Round 2 Hybrid Post-Quantum TLS Benchmarks

Post Syndicated from Alex Weibel original https://aws.amazon.com/blogs/security/round-2-hybrid-post-quantum-tls-benchmarks/

AWS Cryptography has completed benchmarks of Round 2 Versions of the Bit Flipping Key Encapsulation (BIKE) and Supersingular Isogeny Key Encapsulation (SIKE) hybrid post-quantum Transport Layer Security (TLS) Algorithms. Both of these algorithms have been submitted to the National Institute of Standards and Technology (NIST) as part of NIST’s Post-Quantum Cryptography standardization process.

In the first hybrid post-quantum TLS blog, we announced that AWS Key Management Service (KMS) had launched support for hybrid post-quantum TLS 1.2 using Round 1 versions of BIKE and SIKE. In this blog, we are announcing AWS Cryptography’s benchmark results of using Round 2 versions of BIKE and SIKE with hybrid post-quantum TLS 1.2 against an HTTP webservice. Round 2 versions of BIKE and SIKE include performance improvements, parameter tuning, and algorithm updates in response to NIST’s comments on Round 1 versions. I’ll give a refresher on hybrid post-quantum TLS 1.2, go over our Round 2 hybrid post-quantum TLS 1.2 benchmark results, and then describe our benchmarking methodology.

This blog post is intended to inform software developers, AWS customers, and cryptographic researchers about the potential upcoming performance differences between classical and hybrid post-quantum TLS.

Refresher on Hybrid Post-Quantum TLS 1.2

Some of this section is repeated from the previous hybrid post-quantum TLS 1.2 launch announcement for KMS. If you are already familiar with hybrid post-quantum TLS, feel free to skip to the Benchmark Results section.

What is Hybrid Post-Quantum TLS 1.2?

Hybrid post-quantum TLS 1.2 is a proposed extension to the TLS 1.2 Protocol implemented by Amazon’s open source TLS library s2n that provides the security protections of both the classical and post-quantum schemes. It does this by performing two independent key exchanges (one classical and one post-quantum), and then cryptographically combining both keys into a single TLS master secret.

Why is Post-Quantum TLS Important?

Hybrid post-quantum TLS allows connections to remain secure even if one of the key exchanges (either classical or post-quantum) performed during the TLS Handshake is compromised in the future. For example, if a sufficiently large-scale quantum computer were to be built, it could break the current classical public-key cryptography that is used for key exchange in every TLS connection today. Encrypted TLS traffic recorded today could be decrypted in the future with a large-scale quantum computer if post-quantum TLS is not used to protect it.

Round 2 Hybrid Post-Quantum TLS Benchmark Results

Figure 2: Latency in relation to HTTP request count for four key exchange algorithms

Figure 2: Latency in relation to HTTP request count for four key exchange algorithms

Key Exchange Algorithm Server PQ Implementation TLS Handshake
+ 1 HTTP Request
TLS Handshake
+ 2 HTTP Requests
TLS Handshake
+ 10 HTTP Requests
TLS Handshake
+ 25 HTTP Requests
ECDHE Only N/A 10.8 ms 15.1 ms 52.6 ms 124.2 ms
ECDHE + BIKE1‑CCA‑L1‑R2 C 19.9 ms 24.4 ms 61.4 ms 133.2 ms
ECDHE + SIKE‑P434‑R2 C 169.6 ms 180.3 ms 219.1 ms 288.1 ms
ECDHE + SIKE‑P434‑R2 x86-64
Assembly
20.1 ms 24.5 ms 62.0 ms 133.3 ms

Table 1 shows the time (in milliseconds) that a client and server in the same region take to complete a TCP Handshake, a TLS Handshake, and complete varying numbers of HTTP Requests sent to an HTTP web service running on an i3en.12xlarge host.

Key Exchange Algorithm Client Hello Server Key Exchange Client Key Exchange Other TLS Handshake Total
ECDHE Only 218 338 75 2430 3061
ECDHE + BIKE1‑CCA‑L1‑R2 220 3288 3023 2430 8961
ECDHE + SIKE‑P434‑R2 214 672 423 2430 3739

Table 2 shows the amount of data (in bytes) used by different messages in the TLS Handshake for each Key Exchange algorithm.

1 HTTP
Request
2 HTTP Requests 10 HTTP
Requests
25 HTTP Requests
HTTP Request Bytes 878 1,761 8,825 22,070
HTTP Response Bytes 698 1,377 6,809 16,994
Total HTTP Bytes 1576 3,138 15,634 39,064

Table 3 shows the amount of data (in bytes) sent and received through each TLS connection for varying numbers of HTTP requests.

Benchmark Results Analysis

In general, we find that the major trade off between BIKE and SIKE is data usage versus processing time, with BIKE needing to send more bytes but requiring less time processing them, and SIKE making the opposite trade off of needing to send fewer bytes but requiring more time processing them. At the time of integration for our benchmarks, an x86-64 assembly optimized implementation of BIKE1-CCA-L1-R2 was not available in s2n.

Our results show that when only a single HTTP request is sent, completing a BIKE1-CCA-L1-R2 hybrid TLS 1.2 handshake takes approximately 84% more time compared to a non-hybrid TLS connection, and completing an x86-64 assembly optimized SIKE-P434-R2 hybrid TLS 1.2 handshake takes approximately 86% more time than non-hybrid. However, at 25 HTTP Requests per TLS connection, when using the fastest available implementation for both BIKE and SIKE, the increased TLS Handshake latency is amortized, and only 7% more total time is needed for both BIKE and SIKE compared to a classical TLS connection.

Our results also show that BIKE1-CCA-L1-R2 hybrid TLS Handshakes used 5900 more bytes than a classical TLS Handshake, while SIKE-P434-R2 hybrid TLS Handshakes used 678 more bytes than classical TLS.

In the AWS EC2 network, using modern x86-64 CPU’s with the fastest available algorithm implementations, we found that BIKE and SIKE performed similarly, with their maximum latency difference being only 0.6 milliseconds apart, and BIKE being the faster of the two in every benchmark. However when compared to SIKE’s C implementation, which would be used on hosts without the ADX and MULX x86-64 instructions used by SIKE’s assembly implementation, BIKE performed significantly better, seeing a maximum improvement of 157 milliseconds over SIKE.

Hybrid Post-Quantum TLS Benchmark Details and Methodology

Hybrid Post-Quantum TLS Client

Figure 3: Architecture diagram of the AWS SDK Java Client using Java Native Interface (JNI) to communicate with the native AWS Common Runtime (CRT)

Figure 3: Architecture diagram of the AWS SDK Java Client using Java Native Interface (JNI) to communicate with the native AWS Common Runtime (CRT)

Our post-quantum TLS Client is using the aws-crt-dev-preview branch of the AWS SDK Java v2 Client, that has Java Native Interface Bindings to the AWS Common Runtime (AWS CRT) written in C. The AWS Common Runtime uses s2n for TLS negotiation on Linux platforms.

Our client was a single EC2 i3en.6xlarge host, using v0.5.1 of the AWS Common Runtime (AWS CRT) Java Bindings, with commit f3abfaba of s2n and used the x86-64 Assembly implementation for all SIKE-P434-R2 benchmarks.

Hybrid Post-Quantum TLS Server

Our server was a single EC2 i3en.12xlarge host running a REST-ful HTTP web service which used s2n to terminate TLS connections. In order to measure the latency of the SIKE-P434-R2 C implementation on these hosts, we used an s2n compile time flag to build a 2nd version of s2n with SIKE’s x86-64 assembly optimization disabled, and reran our benchmarks with that version.

We chose i3en.12xlarge as our host type because it is optimized for high IO usage, provides high levels of network bandwidth, has a high number of vCPU’s that is typical for many web service endpoints, and has a modern x86-64 CPU with the ADX and MULX instructions necessary to use the high performance Round 2 SIKE x86-64 assembly implementation. Additional TLS Handshake benchmarks performed on other modern types of EC2 hosts, such as the C5 family and M5 family of EC2 instances, also showed similar latency results to those generated on i3en family of EC2 instances.

Post-Quantum Algorithm Implementation Details

The implementations of the post-quantum algorithms used in these benchmarks can be found in the pq-crypto directory of the s2n GitHub Repository. Our Round 2 BIKE implementation uses portable optimized C code, and our Round 2 SIKE implementation uses an optimized implementation in x86-64 assembly when available, and falls back to a portable optimized C implementation otherwise.

Key Exchange Algorithm s2n Client Cipher Preference s2n Server Cipher Preference Negotiated Cipher
ECDHE Only ELBSecurityPolicy-TLS-1-1-2017-01 KMS-PQ-TLS-1-0-2020-02 ECDHE-RSA-AES256-GCM-SHA384
ECDHE + BIKE1‑CCA‑L1‑R2 KMS-PQ-TLS-1-0-2020-02 KMS-PQ-TLS-1-0-2020-02 ECDHE-BIKE-RSA-AES256-GCM-SHA384
ECDHE + SIKE‑P434‑R2 PQ-SIKE-TEST-TLS-1-0-2020-02 KMS-PQ-TLS-1-0-2020-02 ECDHE-SIKE-RSA-AES256-GCM-SHA384

Table 4 shows the Clients and Servers TLS Cipher Config name used in order to negotiate each Key Exchange Algorithm.

Hybrid Post-Quantum TLS Benchmark Methodology

Figure 4: Benchmarking Methodology Client/Server Architecture Diagram

Figure 4: Benchmarking Methodology Client/Server Architecture Diagram

Our Benchmarks were run with a single client host connecting to a single host running a HTTP web service in a different availability zone within the same AWS Region (us-east-1), through a TCP Load Balancer.

We chose to include varying numbers of HTTP requests in our latency benchmarks, rather than TLS Handshakes alone, because customers are unlikely to establish a secure TLS connection and let the connection sit idle performing no work. Customers use TLS connections in order to send and receive data securely, and HTTP web services are one of the most common types of data being secured by TLS. We also chose to place our EC2 server behind a TCP Load Balancer to more closely approximate how an HTTP web service would be deployed in a typical setup.

Latency was measured at the client in Java starting from before a TCP connection was established, until after the final HTTP Response was received, and includes all network transfer time. All connections used RSA Certificate Authentication with a 2048-bit key, and ECDHE Key Exchange used the secp256r1 curve. All latency values listed in Tables 1 above were calculated from the median value (50th percentile) from 60 minutes of continuous single-threaded measurements between the EC2 Client and Server.

More Info

If you’re interested to learn more about post-quantum cryptography check out the following links:

Conclusion

In this blog post, I gave a refresher on hybrid post-quantum TLS, I went over our hybrid post-quantum TLS 1.2 benchmark results, and went over our hybrid post-quantum benchmarking methodology. Our benchmark results found that BIKE and SIKE performed similarly when using s2n’s fastest available implementation on modern CPU’s, but that BIKE performed better than SIKE when both were using their generic C implementation.

If you have feedback about this blog post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Alex Weibel

Alex is a Senior Software Development Engineer on the AWS Crypto Algorithms team. He’s one of the maintainers for Amazon’s TLS Library s2n. Previously, Alex worked on TLS termination and request proxying for S3 and the Elastic Load Balancing Service developing new features for customers. Alex holds a Bachelor of Science degree in Computer Science from the University of Texas at Austin.

Internship Experience: Cryptography Engineer

Post Syndicated from Watson Ladd original https://blog.cloudflare.com/internship-experience-cryptography-engineer/

Internship Experience: Cryptography Engineer

Internship Experience: Cryptography Engineer

Back in the summer of 2017 I was an intern at Cloudflare. During the scholastic year I was a graduate student working on automorphic forms and computational Langlands at Berkeley: a part of number theory with deep connections to representation theory, aimed at uncovering some of the deepest facts about number fields. I had also gotten involved in Internet standardization and security research, but much more on the applied side.

While I had published papers in computer security and had coded for my dissertation, building and deploying new protocols to production systems was going to be new. Going from the academic environment of little day to day supervision to the industrial one of more direction; from greenfield code that would only ever be run by one person to large projects that had to be understandable by a team; from goals measured in years or even decades, to goals measured in days, weeks, or quarters; these transitions would present some challenges.

Cloudflare at that stage was a very different company from what it is now. Entire products and offices simply did not exist. Argo, now a mainstay of our offering for sophisticated companies, was slowly emerging. Access, which has been helping safeguard employees working from home these past weeks, was then experiencing teething issues. Workers was being extensively developed for launch that autumn. Quicksilver was still in the slow stages of replacing KyotoTycoon. Lisbon wasn’t on the map, and Austin was very new.

Day 1

My first job was to get my laptop working. Quickly I discovered that despite the promise of using either Mac or Linux, only Mac was supported as a local development environment. Most Linux users would take a good part of a month to tweak all the settings and get the local development environment up. I didn’t have months. After three days, I broke down and got a Mac.

Needless to say I asked for some help. Like a drowning man in quicksand, I managed to attract three engineers to this near insoluble problem of the edge dev stack, and after days of hacking on it, fixing problems that had long been ignored, we got it working well enough to test a few things. That development environment is now gone and replaced with one built Kubernetes VMs, and works much better that way. When things work on your machine, you can now send everyone your machine.

Speeding up

With setup complete enough, it was on to the problem we needed to solve. Our goal was to implement a set of three interrelated Internet drafts, one defining secondary certificates, one defining external authentication with TLS certificates, and a third permitting servers to advertise the websites they could serve.

External authentication is a TLS feature that permits a server or a client on an already opened connection to prove its possession of the private key of another certificate. This proof of possession is tied to the TLS connection, avoiding attacks on bearer tokens caused by the lack of this binding.

Secondary certificates is an HTTP/2 feature enabling clients and servers to send certificates together with proof that they actually know the private key. This feature has many applications such as certificate-based authentication, but also enables us to prove that we are permitted to serve the websites we claim to serve.

The last draft was the HTTP/2 ORIGIN frame. The ORIGIN frame enables a website to advertise other sites that it could serve, permitting more connection reuse than allowed under the traditional rules. Connection reuse is an important part of browser performance as it avoids much of the setup of a connection.

These drafts solved an important problem for Cloudflare. Many resources such as JavaScript, CSS, and images hosted by one website can be used by others. Because Cloudflare proxies so many different websites, our servers have often cached these resources as well. Browsers though, do not know that these different websites are made faster by Cloudflare, and as a result they repeat all the steps to request the subresources again. This takes unnecessary time since there is an established and usable perfectly good connection already. If the browser could know this, it could use the connection again.

We could only solve this problem by getting browsers and the broader community of TLS implementers on board. Some of these drafts such as external authentication and secondary certificates had a broader set of motivations, such as getting certificate based authentication to work with HTTP/2 and TLS 1.3. All of these needs had to be addressed in the drafts, even if we were only implementing a subset of the uses.

Successful standards cover the use cases that are needed while being simple enough to implement and achieve interoperability. Implementation experience is essential to achieving this success: a standard with no implementations fails to incorporate hard won lessons. Computers are hard.

Prototype

My first goal was to set up a simple prototype to test the much more complex production implementation, as well as to share outside of Cloudflare so that others could have confidence in their implementations. But these drafts that had to be implemented in the prototype were incremental improvements to an already massive stack of TLS and HTTP standards.

I decided it would be easiest to build on top of an already existing implementation of TLS and HTTP. I picked the Go standard library as my base: it’s simple, readable, and in a language I was already familiar with. There was already a basic demo showcasing support in Firefox for the ORIGIN frame, and it would be up to me to extend it.

Using that as my starting point I was able in 3 weeks to set up a demonstration server and a client. This showed good progress, and that nothing in the specification was blocking implementation. But without integrating it into our servers for further experimentation so that we might  discover rare issues that could be showstoppers. This was a bitter lesson learned from TLS 1.3, where it took months to track down a single brand of printer that was incompatible with the standard, and forced a change.

From Prototype to Production

We also wanted to understand the benefits with some real world data, to convince others that this approach was worthwhile. Our position as a provider to many websites globally gives us diverse, real world data on performance that we use to make our products better, and perhaps more important, to learn lessons that help everyone make the Internet better. As a result we had to implement this in production: the experimental framework for TLS 1.3 development had been removed and we didn’t have an environment for experimentation.

At the time everything at Cloudflare was based on variants of NGINX. We had extended it with modules to implement features like Keyless and customized certificate handling to meet our needs, but much of the business logic was and is carried out in Lua via OpenResty.

Lua has many virtues, but at the time both the TLS termination and the core business logic lived in the same repo despite being different processes at runtime. This made it very difficult to understand what code was running when, and changes to basic libraries could create problems for both. The build system for this creation had the significant disadvantage of building the same targets with different settings. Lua also is a very dynamic language, but unlike the dynamic languages I was used to, there was no way to interact with the system as it was running on requests.

The first step was implementing the ORIGIN frame. In implementing this, we had to figure out which sites hosted the subresources used by the page we were serving. Luckily, we already had this logic to enable server push support driven by Link headers. Building on this let me quickly get ORIGIN working.

This work wasn’t the only thing I was up to as an intern. I was also participating in weekly team meetings, attending our engineering presentations, and getting a sense of what life was like at Cloudflare. We had an excursion for interns to the Computer History Museum in Mountain View and Moffett Field, where we saw the base museum.

The next challenge was getting the CERTIFICATE frame to work. This was a much deeper problem. NGINX processes a request in phases, and some of the phases, like the header processing phase, do not permit network I/O without locking up the event loop. Since we are parsing the headers to determine what to send, the frame is created in the header processing phase. But finding a certificate and telling Keyless to sign it required network I/O.

The standard solution to this problem is to have Lua execute a timer callback, in which network I/O is possible. But this context doesn’t have any data from the request: some serious refactoring was needed to create a way to get the keyless module to function outside the context of a request.

Once the signature was created, the battle was half over. Formatting the CERTIFICATE frame was simple, but it had to be stuck into the connection associated with the request that had demanded it be created. And there was no reason to expect the request was still alive, and no way to know what state it was in when the request was handled by the Keyless module.

To handle this issue I made a shared btree indexed by a number containing space for the data to be passed back and forth. This enabled the request to record that it was ready to send the CERTIFICATE frame and Keyless to record that it was ready with a frame to send. Whichever of these happened second would do the work to enqueue the frame to send out.

This was not an easy solution: the Keyless module had been written years before and largely unmodified. It fundamentally assumed it could access data from the request, and changing this assumption opened the door to difficult to diagnose bugs. It integrates into BoringSSL callbacks through some pretty tricky mechanisms.

However, I was able to test it using the client from the prototype and it worked. Unfortunately when I pushed the commit in which it worked upstream, the CI system could not find the git repo where the client prototype was due to a setting I forgot to change. The CI system unfortunately didn’t associate this failure with the branch, but attempted to check it out whenever it checked out any other branch people were working on. Murphy ensured my accomplishment had happened on a Friday afternoon Pacific time, and the team that manages the SSL server was then exclusively in London…

Monday morning the issue was quickly fixed, and whatever tempers had frayed were smoothed over when we discovered the deficiency in the CI system that had enabled a single branch to break every build. It’s always tricky to work in a global team. Later Alessandro flew to San Francisco for a number of projects with the team here and we worked side by side trying to get a demonstration working on a test site. Unfortunately there was some difficulty tracking down a bug that prevented it working in production. We had run out of time, and my internship was over.  Alessandro flew back to London, and I flew to Idaho to see the eclipse.

The End

Ultimately we weren’t able to integrate this feature into the software at our edge: the risks of such intrusive changes for a very experimental feature outweighed the benefits. With not much prospect of support by clients, it would be difficult to get the real savings in performance promised. There also were nontechnical issues in standardization that have made this approach more difficult to implement: any form of traffic direction that doesn’t obey DNS creates issues for network debugging, and there were concerns about the impact of certificate misissuance.

While the project was less successful than I hoped it would be, I learned a lot of important skills: collaborating on large software projects, working with git, and communicating with other implementers about issues we found.  I also got a taste of what it would be like to be on the Research team at Cloudflare and turning research from idea into practical reality and this ultimately confirmed my choice to go into industrial research.

I’ve now returned to Cloudflare full-time, working on extensions for TLS as well as time synchronization. These drafts have continued to progress through the standardization process, and we’ve contributed some of the code I wrote as a starting point for other implementers to use.  If we knew all our projects would work out, they wouldn’t be ambitious enough to be research worth doing.

If this sort of research experience appeals to you, we’re hiring.

TLS 1.2 to become the minimum for all AWS FIPS endpoints

Post Syndicated from Janelle Hopper original https://aws.amazon.com/blogs/security/tls-1-2-to-become-the-minimum-for-all-aws-fips-endpoints/

To improve security for data in transit, AWS will update all of our AWS Federal Information Processing Standard (FIPS) endpoints to a minimum Transport Layer Security (TLS) version TLS 1.2 over the next year. This update will deprecate the ability to use TLS 1.0 and TLS 1.1 on all FIPS endpoints across all AWS Regions by March 31, 2021. No other AWS endpoints are affected by this change.

As outlined in the AWS Shared Responsibility Model, security and compliance is a shared responsibility between AWS and our customers. When a customer makes a connection from their client application to an AWS service endpoint, the client provides its TLS minimum and TLS maximum version. The AWS service endpoint selects the maximum version offered.

What should customers do to prepare for this update?

Customers should confirm that their client applications support TLS 1.2 by verifying it is encapsulated between the clients’ minimum and the maximum TLS versions. We encourage customers to be proactive with security standards in order to avoid any impact to availability and to protect the integrity of their data in transit. Also, we recommend configuration changes should be tested in a staging environment, before introduction into production workloads.

When will these changes happen?

To minimize the impact to our customers who use TLS 1.0 and TLS 1.1, AWS is rolling out changes on a service-by-service basis between now and the end of March 2021. For each service, after a 30-day period during which no connections are detected, AWS will deploy a configuration change to remove support for them. After March 31, 2021, AWS may update the endpoint configuration to remove TLS 1.0 and 1.1, even if we detect customer connections. Additional reminders will be provided before these updates are final.

What are AWS FIPS endpoints?

All AWS services offer Transport Layer Security (TLS) 1.2 encrypted endpoints that can be used for all API calls. Some AWS services also offer FIPS 140-2 endpoints for customers that require use of FIPS validated cryptographic libraries.

What is Transport Layer Security (TLS)?

Transport Layer Security (TLS) is a cryptographic protocol designed to provide secure communication across a computer network. API calls to AWS services are secured using TLS.

Is there more assistance available to help verify or update client applications?

Customers using an AWS Software Development Kit (AWS SDK) can find information about how to properly configure their client’s minimum and maximum TLS versions on the following topics in the AWS SDKs:

Or see Tools to Build on AWS, and browse by programming language to find the relevant SDK.

Additionally, AWS IQ enables customers to find, securely collaborate with, and pay AWS Certified third-party experts for on-demand project work. Visit the AWS IQ page for information about how to submit a request, get responses from experts, and choose the expert with the right skills and experience. Log into your console and select Get Started with AWS IQ to start a request.

The AWS Technical Support tiers cover development and production issues for AWS products and services, along with other key stack components. AWS Support does not include code development for client applications.

If you have any questions or issues, please start a new thread on one of the AWS Forums, or contact AWS Support or your Technical Account Manager (TAM). If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Sincerely,
Amazon Web Services

Author

Janelle Hopper

Janelle Hopper is a Senior Technical Program Manager in AWS Security with over 15 years of experience in the IT security field. She works with AWS services, infrastructure, and administrative teams to identify and drive innovative solutions that improve AWS’ security posture.

How to improve LDAP security in AWS Directory Service with client-side LDAPS

Post Syndicated from Dave Martinez original https://aws.amazon.com/blogs/security/how-to-improve-ldap-security-in-aws-directory-service-with-client-side-ldaps/

You can now better protect your organization’s identity data by encrypting Lightweight Directory Access Protocol (LDAP) communications between AWS Directory Service products (AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft AD, and AD Connector) and self-managed Active Directory. Client-side secure LDAP (LDAPS) support enables applications that integrate with AWS Directory Service, such as Amazon WorkSpaces and AWS Single Sign-On, to connect to AD using Secure Sockets Layer/Transport Layer Security (SSL/TLS).

Note: In 2017, AWS Directory Service released server-side LDAPS support in AWS Managed Microsoft AD. This update adds client-side LDAPS support to both AWS Managed Microsoft AD and AD Connector.

In this post, I’ll step through configuring client-side LDAPS to enable encrypted communications between Amazon WorkSpaces and an Amazon Elastic Compute Cloud (Amazon EC2)-based self-managed AD.

Solution architecture

When you have completed the steps outlined in this post, your solution will look like Figure 1:

Figure 1: Solution architecture

Figure 1: Solution architecture

To build the solution, you will follow a three step process:

  1. Prepare all prerequisites, including the setup of certificate-based security in the self-managed AD environment.
  2. Register your certificate authority (CA) certificate into AWS Directory Service and enable client-side LDAPS (purple arrow in diagram above).
  3. Test client-side LDAPS using Amazon WorkSpaces and AWS Directory Service (yellow arrows in diagram above).

Step one: Set up prerequisites

To follow the steps described in this blog, you will need:

  1. A self-managed AD deployment to store your user identities. You can find setup guidance in “Step 1: Set Up Your Environment for Trusts” of the Tutorial: Creating a Trust from AWS Managed Microsoft AD to a Self-Managed Active Directory Installation on Amazon EC2.
  2. A server authentication certificate installed on your self-managed AD domain controller. Creating the certificate is typically done one of two ways:
    1. Using Active Directory Certificate Services (AD CS) in Windows Server to deploy an in-house CA for issuing server certificates. For help with setting up an AD CS deployment that supports LDAPS, see Microsoft’s LDAP over SSL (LDAPS) Certificate.
    2. Purchasing SSL certificates from a commercial CA like Verisign or AWS Certificate Manager. For help using commercial certificates with AD, see How to enable LDAP over SSL with a third-party certification authority.
  3. An AWS Directory Service directory, either AWS Managed Microsoft AD or AD Connector, to act as a bridge from AWS to your self-managed AD. See the documentation for AWS Managed Microsoft AD or AD Connector for detailed steps and tutorials. If you’re using AWS Managed Microsoft AD, also set up a two-way trust with your self-managed AD using Tutorial: Creating a Trust from AWS Managed Microsoft AD to a Self-Managed Active Directory Installation on Amazon EC2.
  4. Amazon WorkSpaces connected to your AWS Directory Service directory to look up and authenticate users. See the WorkSpaces documentation for detailed steps on using AWS Managed Microsoft AD with a Trusted Domain or AD Connector.

The remainder of this post assumes you have:

  1. Created an AWS Managed Microsoft AD instance called corp.example.com
  2. Connected corp.example.com via two-way trust to an EC2-based self-managed AD called example.local
  3. Deployed an AD CS enterprise root certificate authority in example.local with the common name Example SelfManaged CA.

When you perform the steps described below, you should replace these names with the names you selected.

Step two: Configure client-side LDAPS in AWS Directory Service

Now, you’ll retrieve the CA certificate — which represents the issuing certificate authority — from your self-managed AD and use it to enable client-side LDAPS in AWS Directory Service. To review CA certificate requirements for AWS Directory Service, see the client-side LDAPS documentation for AWS Managed Microsoft AD or AD Connector.

  1. Export the CA certificate from the example.local CA:
    1. To open the Certification Authority MMC snap-in, on the example.local server hosting AD CS, right-click the Windows icon, select Run, type certsrv.msc, and select OK.
    2. Right-click the name of the CA (in this case, Example SelfManaged CA) and select Properties.
    3. In the Properties window, on the General tab, under CA certificates, select the CA certificate listed, and then select View Certificate.
       
      Figure 2: View the CA certificate

      Figure 2: View the CA certificate

    4. In the Certificate window, on the Details tab, select Copy to File.
    5. In the Certificate Export Wizard, select Next.
    6. In the Export File Format screen, select Base-64 encoded X.509 (.CER), and then select Next. This saves the file in the format required by AWS.
       
      Figure 3: Select the base-64 encoded export file format

      Figure 3: Select the base-64 encoded export file format

    7. Select Browse, and then select a file name and save location for the CA certificate.
    8. Select Save, and then click Next.
    9. Select Finish, then select OK to complete the export process.
    10. Copy the file to a location accessible by the machine where you will be performing the AWS Directory Service configuration.
  2. Register the example.local CA certificate in AWS Directory Service:
    1. In the AWS Management Console, select Directory Service, and then select the Directory ID link for the AWS Directory Service directory connected to example.local (in this case, corp.example.com).
       
      Figure 4: Select the Directory ID

      Figure 4: Select the Directory ID

    2. On the Directory details page, in the Networking & security tab, in the Client-side LDAPS section (shown in Figure 5), select the Actions menu, and then select Register certificate.
       
      Figure 5: Select “Register certificate”

      Figure 5: Select “Register certificate”

    3. In the Register a CA certificate dialog box, select Browse, navigate to the location where you stored the CA certificate for your AD CS certificate authority, select Open, and then select Register certificate.
       
      Figure 6: Register a CA certificate

      Figure 6: Register a CA certificate

  3. Enable client-side LDAPS in AWS Directory Service:
    1. In the Client-side LDAPS section, once the Registration status field for the certificate reads Registered, select the Enable button. Click the Refresh button for updated status.
       
      Figure 7: Check the “Registration status” and then select “Enable”

      Figure 7: Check the “Registration status” and then select “Enable”

    2. In the Enable client-side LDAPS dialog box, select Enable.
    3. In the Client-side LDAPS section, under Status, when the status field changes to Enabled, LDAPS is successfully configured. Click the Refresh button for updated status.
       
      Figure 8: LDAPS successfully configured

      Figure 8: LDAPS successfully configured

Step three: Test client-side LDAPS with Amazon WorkSpaces

The last step is to test client-side LDAPS with an AWS application. Now that client-side LDAPS has been configured, all LDAP traffic to the self-managed AD will be encrypted and travel over port 636.

Note: Ensure that AWS security group, network firewall, and Windows firewall settings applied to the AWS Directory Service directory (outbound) and self-managed AD (inbound) allow TCP communications on port 636.

To test your client-side LDAPS configuration, perform a WorkSpaces user look up:

  1. In the AWS Management Console, choose WorkSpaces, and then click Launch WorkSpaces.
  2. On the Select a Directory screen, pick corp.example.com and then select Next Step.
  3. On the Identify Users screen, In the Select trust from forest menu, select example.local, and then select Show All Users (see Figure 9 for an example). This search will be executed over LDAPS.
     
    Figure 9: Searching users from a trusted domain with client-side LDAPS

    Figure 9: Searching users from a trusted domain with client-side LDAPS

Summary

In this post, we’ve explored how client-side LDAPS support in AWS Managed Microsoft AD and AD Connector improves LDAP security for AWS applications and services like Amazon WorkSpaces, AWS Single Sign-On, and Amazon QuickSight by encrypting sensitive network traffic between AWS and Active Directory.

To learn more about using AWS Managed Microsoft AD or AD Connector, visit the AWS Directory Service documentation. For general information and pricing, see the AWS Directory Service home page. If you have comments about this blog post, submit a comment in the Comments section below. If you have implementation or troubleshooting questions, start a new thread on the Directory Service forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Dave Martinez

Dave is a Senior Product Manager working on AWS Directory Service. Outside of work he enjoys Seattle sports and coaching his son’s Little League baseball team.