Tag Archives: DNS

Oblivious DNS-over-HTTPS

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/12/oblivious-dns-over-https.html

This new protocol, called Oblivious DNS-over-HTTPS (ODoH), hides the websites you visit from your ISP.

Here’s how it works: ODoH wraps a layer of encryption around the DNS query and passes it through a proxy server, which acts as a go-between the internet user and the website they want to visit. Because the DNS query is encrypted, the proxy can’t see what’s inside, but acts as a shield to prevent the DNS resolver from seeing who sent the query to begin with.

IETF memo.

The paper:

Abstract: The Domain Name System (DNS) is the foundation of a human-usable Internet, responding to client queries for host-names with corresponding IP addresses and records. Traditional DNS is also unencrypted, and leaks user information to network operators. Recent efforts to secure DNS using DNS over TLS (DoT) and DNS over HTTPS (DoH) havebeen gaining traction, ostensibly protecting traffic and hiding content from on-lookers. However, one of the criticisms ofDoT and DoH is brought to bear by the small number of large-scale deployments (e.g., Comcast, Google, Cloudflare): DNS resolvers can associate query contents with client identities in the form of IP addresses. Oblivious DNS over HTTPS (ODoH) safeguards against this problem. In this paper we ask what it would take to make ODoH practical? We describe ODoH, a practical DNS protocol aimed at resolving this issue by both protecting the client’s content and identity. We implement and deploy the protocol, and perform measurements to show that ODoH has comparable performance to protocols like DoH and DoT which are gaining widespread adoption,while improving client privacy, making ODoH a practical privacy enhancing replacement for the usage of DNS.

Slashdot thread.

How to protect a self-managed DNS service against DDoS attacks using AWS Global Accelerator and AWS Shield Advanced

Post Syndicated from Chido Chemambo original https://aws.amazon.com/blogs/security/how-to-protect-a-self-managed-dns-service-against-ddos-attacks-using-aws-global-accelerator-and-aws-shield-advanced/

In this blog post, I show you how to improve the distributed denial of service (DDoS) resilience of your self-managed Domain Name System (DNS) service by using AWS Global Accelerator and AWS Shield Advanced. You can use those services to incorporate some of the techniques used by Amazon Route 53 to protect against DDoS attacks.

DNS routes users to your application by quickly translating a human-readable domain name to a machine-readable IP address. When protecting the availability of your application against DDoS attacks, it’s important to consider every part of the stack, including domain name resolution. The recommended best practice is to create hosted zones on Route 53, a scalable, highly available DNS service that’s protected against large DDoS attacks and query floods. Route 53 uses anycast routing to serve DNS queries from more than 150 edge locations around the globe. With anycast routing, DNS queries are served from locations that are closer to your users and the globally distributed DDoS mitigation capacity of Amazon Web Services (AWS) reduces the impact of attacks.

Optionally, you can also build your own DNS service on Amazon Elastic Compute Cloud (Amazon EC2). For example, you can run your own proprietary DNS server to take advantage of custom features that you wrote to integrate with an existing DNS service that isn’t running on AWS. When you register a domain name, you’re usually required to provide at least two name servers that can respond to queries from your users. It’s possible to build a DNS service on only two instances, but that provides limited DDoS resilience.

Solution overview

To protect your self-managed DNS service using this solution, you need a strong understanding of DNS and how to operate a distributed, self-managed DNS service on Amazon EC2. This solution improves upon an existing self-managed DNS service by significantly enhancing its ability to withstand DDoS attacks. There are two components that you add to your application:

  • You use Global Accelerator to provide your application with two static IP addresses that act as a fixed entry point to Amazon EC2 instances in multiple AWS Regions. Global Accelerator uses anycast to route your traffic to a point of entry close to the source of the traffic. In addition to providing availability and performance benefits, this gives you access to global DDoS mitigation capacity through AWS.
  • You use Shield Advanced to monitor the availability of your application and automatically engage the AWS Shield Response Team (SRT) if its availability is affected by a DDoS attack. When you associate a Route 53 health check to your protected resources, Shield Advanced uses the health of the application as an input for detection and as a signal to SRT to contact your operations center when needed. You can also engage with SRT to write custom mitigations for your application. For your self-managed DNS service use case, this can include mitigations like DNS packet validation and suspicion scoring that gives a higher priority to queries that are more likely to be legitimate traffic for your application.

As part of this solution, you will build a DNS canary that uses Amazon CloudWatch to update the status of a Route 53 health check if your self-managed DNS service stops responding to queries. An example architecture using Amazon EC2 based DNS behind Global Accelerator and Shield is shown in figure 1.

Figure 1: Amazon EC2 based DNS behind Global Accelerator and Shield

Figure 1: Amazon EC2 based DNS behind Global Accelerator and Shield

Create and configure an accelerator

To begin, create an accelerator and add your existing DNS servers as endpoints. The newly created accelerator will receive queries and forward them to your DNS service.

To create and configure an accelerator

Step 1: Create an accelerator

  1. Navigate to the AWS Global Accelerator dashboard.
  2. Choose Create accelerator.
  3. Enter a name for your accelerator.
  4. Choose Next.

Step 2: Add listeners

Since DNS uses both TCP and UDP protocols, you must create separate listeners to handle requests for each protocol.

At the Add Listeners step, enter the following:

  1. Ports: 53
  2. Protocol: TCP
  3. Client affinity: None

Choose Add listener again to add the UDP listener. Enter the following:

  1. Ports: 53
  2. Protocol: UDP
  3. Client affinity: None
  4. Choose Next

To learn more about the different options available in this step, see To create a listener in Getting started with AWS Global Accelerator.

Step 3: Add endpoint groups

Starting with the TCP listener, enter the following settings:

  1. Region: Choose a Region that your DNS instances are located in, for example, us-east-1.
  2. Traffic dial: 100
  3. If you have additional DNS instances in another AWS Region, choose Add endpoint group and repeat steps a) and b), entering the appropriate Region.
  4. Repeat steps a) through c) to add endpoint groups for the UDP listener, and then choose Next.

To learn more about the different options available in this step, for example, Traffic dial, see the Add endpoint groups in Getting started with AWS Global Accelerator.

Step 4: Add endpoints

Starting with the TCP listener, enter the following in the form boxes for each Region specified in the previous step:

  1. Endpoint type: Select EC2 instance from the drop-down list.
  2. Endpoint: Select a DNS instance from the drop-down list.
  3. Weight: 128

If you have additional DNS instances in the Region, choose Add endpoint and repeat the preceding steps, but select a DNS instance that hasn’t been added as an endpoint.

Repeat all of the preceding steps for the UDP listener, then choose Create accelerator.

To learn more about the different options available in this step, see the Add endpoints in Getting started with AWS Global Accelerator.

Step 5: Verification

When you choose the Create accelerator button, you’re redirected to a Global Accelerator console page that lists all the accelerators in your account. On this page, you can view the global IPs and DNS name allocated to your newly created accelerator, in addition to the current status.

Wait until the status of the accelerators changes to Deployed before proceeding with any tests.

Configure Shield Advanced and Shield Advanced proactive engagement

Protect your accelerator with Shield Advanced, monitor the health of your application, and configure proactive engagement. When you turn on proactive engagement, the SRT will directly contact you if an Amazon Route 53 health check associated with your protected resource becomes unhealthy during an event that’s detected by Shield Advanced.

To configure proactive engagement

Step 1: Create a Route 53 health check

If you already have a Route 53 health check that monitors the health of your DNS service, you can proceed to step 2 of this section. If you don’t yet have a health check, you can use this AWS CloudFormation template to create one. The template will:

  1. Create a Lambda function that queries your DNS server through the accelerator global IPs. This function posts metrics to CloudWatch to indicate whether the query was successful or not.
  2. Create a CloudWatch alarm that will detect when DNS queries fail.
  3. Create a Route 53 health check that tracks the CloudWatch alarm and changes status to unhealthy when the alarm changes to the Alarm state.

Step 2: Subscribe to Shield Advanced

Please note that with AWS Shield Advanced, you pay a monthly fee of $3,000 per month per organization. In addition, you also pay for AWS Shield Advanced Data Transfer usage fees for AWS resources enabled for advanced protection.

  1. Navigate to the AWS Shield console.
  2. In the AWS Shield navigation bar, choose Getting started, and then choose Subscribe to Shield Advanced.
  3. On the Subscribe to Shield Advanced page, read the terms of agreement, and then select all of the check boxes to indicate that you accept the terms.
  4. Choose Subscribe to Shield Advanced.

Step 3: Add resources to protect

  1. Do one of the following, depending on if you were already subscribed to Shield Advanced.
    • If you just subscribed to Shield Advanced by completing Step 2 above, choose Add resources to protect.
    • If you were already subscribed to Shield Advanced, open the Shield console and choose Protected Resources, and then choose Add resources to protect.
  2. In the Choose resources to protect with Shield Advanced page, select the Regions and resource types that you want to protect, then choose Load resources.
  3. Select the resources that you want to protect, and then choose Protect with Shield Advanced.
  4. In the Configure health check based DDoS detection page, under the Protected resources section, select a Route 53 health check to add—either one that you created previously, or a health check created by the AWS CloudFormation template—as the Associated Health Check.
  5. Choose Next until you reach the Review and configure DDoS mitigation and visibility page, and then review the settings and choose Finish configuration.

Step 4: Add contacts

  1. Navigate to the Overview tab of the AWS Shield console.
  2. In the Proactive engagements and contacts section, choose Edit under the Contacts heading.
  3. In the Add contact form, add the contact’s Email, Phone number, and Notes.
  4. Choose Save.

Step 5: Request proactive engagement

  1. Choose Edit proactive engagement feature.
  2. Select Enable.
  3. Choose Save.

Step 6: Configuration review with the SRT

After you enable proactive engagement, the state will be Proactive engagement requested and pending.

SRT will contact you to schedule a configuration review. The review will include a review of your Route 53 health check configuration and a consultation about custom mitigations that can be configured to support your DNS use case. Following this review, SRT will complete your request to enable proactive engagement.

Summary

DNS is a foundational part of the user experience for any application that is accessed via a human readable domain name. Your DNS service should be highly available, DDoS resilient, and accessible to your users with minimal latency. If you run your own DNS service on Amazon EC2, you can improve the DDoS resiliency using Global Accelerator and Shield Advanced. This solution provides your users with a low latency path to your DNS service and provides you with some of the DDoS mitigation that protects Route 53. To learn more about DDoS best practices, see AWS Best Practices for DDoS Resiliency.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Shield forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Chido Chemambo

Chido is a Security Engineer on the AWS Shield Team with 12 years of experience in the telecommunications industry. He specializes in network security and enjoys working with colleagues to improve AWS Shield, and with customers to improve their cloud architectures. Outside of work, Chido enjoys jumping rope, improving his development skills, and watching English Premier League soccer and Formula 1.

Helping build the next generation of privacy-preserving protocols

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/next-generation-privacy-protocols/

Helping build the next generation of privacy-preserving protocols

Helping build the next generation of privacy-preserving protocols

Over the last ten years, Cloudflare has become an important part of Internet infrastructure, powering websites, APIs, and web services to help make them more secure and efficient. The Internet is growing in terms of its capacity and the number of people using it and evolving in terms of its design and functionality. As a player in the Internet ecosystem, Cloudflare has a responsibility to help the Internet grow in a way that respects and provides value for its users. Today, we’re making several announcements around improving Internet protocols with respect to something important to our customers and Internet users worldwide: privacy.

These initiatives are:

Each of these projects impacts an aspect of the Internet that influences our online lives and digital footprints. Whether we know it or not, there is a lot of private information about us and our lives floating around online. This is something we can help fix.

For over a year, we have been working through standards bodies like the IETF and partnering with the biggest names in Internet technology (including Mozilla, Google, Equinix, and more) to design, deploy, and test these new privacy-preserving protocols at Internet scale. Each of these three protocols touches on a critical aspect of our online lives, and we expect them to help make real improvements to privacy online as they gain adoption.

A continuing tradition at Cloudflare

One of Cloudflare’s core missions is to support and develop technology that helps build a better Internet. As an industry, we’ve made exceptional progress in making the Internet more secure and robust. Cloudflare is proud to have played a part in this progress through multiple initiatives over the years.

Here are a few highlights:

  • Universal SSL™. We’ve been one of the driving forces for encrypting the web. We launched Universal SSL in 2014 to give website encryption to our customers for free and have actively been working along with certificate authorities like Let’s Encrypt, web browsers, and website operators to help remove mixed content. Before Universal SSL launched to give all Cloudflare customers HTTPS for free, only 30% of connections to websites were encrypted. Through the industry’s efforts, that number is now 80% — and a much more significant proportion of overall Internet traffic. Along with doing our part to encrypt the web, we have supported the Certificate Transparency project via Nimbus and Merkle Town, which has improved accountability for the certificate ecosystem HTTPS relies on for trust.
  • TLS 1.3 and QUIC. We’ve also been a proponent of upgrading existing security protocols. Take Transport Layer Security (TLS), the underlying protocol that secures HTTPS. Cloudflare engineers helped contribute to the design of TLS 1.3, the latest version of the standard, and in 2016 we launched support for an early version of the protocol. This early deployment helped lead to improvements to the final version of the protocol. TLS 1.3 is now the most widely used encryption protocol on the web and a vital component of the emerging QUIC standard, of which we were also early adopters.
  • Securing Routing, Naming, and Time. We’ve made major efforts to help secure other critical components of the Internet. Our efforts to help secure Internet routing through our RPKI toolkit, measurement studies, and “Is BGP Safe Yet” tool have significantly improved the Internet’s resilience against disruptive route leaks. Our time service (time.cloudflare.com) has helped keep people’s clocks in sync with more secure protocols like NTS and Roughtime. We’ve also made DNS more secure by supporting DNS-over-HTTPS and DNS-over-TLS in 1.1.1.1 at launch, along with one-click DNSSEC in our authoritative DNS service and registrar.

Continuing to improve the security of the systems of trust online is critical to the Internet’s growth. However, there is a more fundamental principle at play: respect. The infrastructure underlying the Internet should be designed to respect its users.

Building an Internet that respects users

When you sign in to a specific website or service with a privacy policy, you know what that site is expected to do with your data. It’s explicit. There is no such visibility to the users when it comes to the operators of the Internet itself. You may have an agreement with your Internet Service Provider (ISP) and the site you’re visiting, but it’s doubtful that you even know which networks your data is traversing. Most people don’t have a concept of the Internet beyond what they see on their screen, so it’s hard to imagine that people would accept or even understand what a privacy policy from a transit wholesaler or an inspection middlebox would even mean.

Without encryption, Internet browsing information is implicitly shared with countless third parties online as information passes between networks. Without secure routing, users’ traffic can be hijacked and disrupted. Without privacy-preserving protocols, users’ online life is not as private as they would think or expect. The infrastructure of the Internet wasn’t built in a way that reflects their expectations.

Helping build the next generation of privacy-preserving protocols
Normal network flow
Helping build the next generation of privacy-preserving protocols
Network flow with malicious route leak

The good news is that the Internet is continuously evolving. One of the groups that help guide that evolution is the Internet Architecture Board (IAB). The IAB provides architectural oversight to the Internet Engineering Task Force (IETF), the Internet’s main standard-setting body. The IAB recently published RFC 8890, which states that individual end-users should be prioritized when designing Internet protocols. It says that if there’s a conflict between the interests of end-users and the interest of service providers, corporations, or governments, IETF decisions should favor end users. One of the prime interests of end-users is the right to privacy, and the IAB published RFC 6973 to indicate how Internet protocols should take privacy into account.

Today’s technical blog posts are about improvements to the Internet designed to respect user privacy. Privacy is a complex topic that spans multiple disciplines, so it’s essential to clarify what we mean by “improving privacy.” We are specifically talking about changing the protocols that handle privacy-sensitive information exposed “on-the-wire” and modifying them so that this data is exposed to fewer parties. This data continues to exist. It’s just no longer available or visible to third parties without building a mechanism to collect it at a higher layer of the Internet stack, the application layer. These changes go beyond website encryption; they go deep into the design of the systems that are foundational to making the Internet what it is.

The toolbox: cryptography and secure proxies

Two tools for making sure data can be used without being seen are cryptography and secure proxies.

Helping build the next generation of privacy-preserving protocols

Cryptography allows information to be transformed into a format that a very limited number of people (those with the key) can understand. Some describe cryptography as a tool that transforms data security problems into key management problems. This is a humorous but fair description. Cryptography makes it easier to reason about privacy because only key holders can view data.

Another tool for protecting access to data is isolation/segmentation. By physically limiting which parties have access to information, you effectively build privacy walls. A popular architecture is to rely on policy-aware proxies to pass data from one place to another. Such proxies can be configured to strip sensitive data or block data transfers between parties according to what the privacy policy says.

Both these tools are useful individually, but they can be even more effective if combined. Onion routing (the cryptographic technique underlying Tor) is one example of how proxies and encryption can be used in tandem to enforce strong privacy. Broadly, if party A wants to send data to party B, they can encrypt the data with party B’s key and encrypt the metadata with a proxy’s key and send it to the proxy.

Platforms and services built on top of the Internet can build in consent systems, like privacy policies presented through user interfaces. The infrastructure of the Internet relies on layers of underlying protocols. Because these layers of the Internet are so far below where the user interacts with them, it’s almost impossible to build a concept of user consent. In order to respect users and protect them from privacy issues, the protocols that glue the Internet together should be designed with privacy enabled by default.

Data vs. metadata

The transition from a mostly unencrypted web to an encrypted web has done a lot for end-user privacy. For example, the “coffeeshop stalker” is no longer an issue for most sites. When accessing the majority of sites online, users are no longer broadcasting every aspect of their web browsing experience (search queries, browser versions, authentication cookies, etc.) over the Internet for any participant on the path to see. Suppose a site is configured correctly to use HTTPS. In that case, users can be confident their data is secure from onlookers and reaches only the intended party because their connections are both encrypted and authenticated.

However, HTTPS only protects the content of web requests. Even if you only browse sites over HTTPS, that doesn’t mean that your browsing patterns are private. This is because HTTPS fails to encrypt a critical aspect of the exchange: the metadata. When you make a phone call, the metadata is the phone number, not the call’s contents. Metadata is the data about the data.

To illustrate the difference and why it matters, here’s a diagram of what happens when you visit a website like an imageboard. Say you’re going to a specific page on that board (https://<imageboard>.com/room101/) that has specific embedded images hosted on <embarassing>.com.

Helping build the next generation of privacy-preserving protocols
Page load for an imageboard, returning an HTML page with an image from an embarassing site
Helping build the next generation of privacy-preserving protocols
Subresource fetch for the image from an embarassing site

The space inside the dotted line here represents the part of the Internet that your data needs to transit. They include your local area network or coffee shop, your ISP, an Internet transit provider, and it could be the network portion of the cloud provider that hosts the server. Users often don’t have a relationship with these entities or a contract to prevent these parties from doing anything with the user’s data. And even if those entities don’t look at the data, a well-placed observer intercepting Internet traffic could see anything sent unencrypted. It would be best if they just didn’t see it at all. In this example, the fact that the user visited <imageboard>.com can be seen by an observer, which is expected. However, though page content is encrypted, it’s possible to learn which specific page you’ve visited can be seen since <embarassing>.com is also visible.

It’s a general rule that if data is available to on-path parties on the Internet, some of these on-path parties will use this data. It’s also true that these on-path parties need some metadata in order to facilitate the transport of this data. This balance is explored in RFC 8558, which explains how protocols should be designed thoughtfully with respect to the balance between too much metadata (bad for privacy) and too little metadata (bad for operations).

In an ideal world, Internet protocols would be designed with the principle of least privilege. They would provide the minimum amount of information needed for the on-path parties (the pipes) to do the job of transporting the data to the right place and keep everything else confidential by default. Current protocols, including TLS 1.3 and QUIC, are important steps towards this ideal but fall short with respect to metadata privacy.

Knowing both who you are and what you do online can lead to profiling

Today’s announcements reflect two metadata protection levels: the first involves limiting the amount of metadata available to third-party observers (like ISPs). The second involves restricting the amount of metadata that users share with service providers themselves.

Hostnames are an example of metadata that needs to be protected from third-party observers, which DoH and ECH intend to do. However, it doesn’t make sense to hide the hostname from the site you’re visiting. It also doesn’t make sense to hide it from a directory service like DNS. A DNS server needs to know which hostname you’re resolving to resolve it for you!

A privacy issue arises when a service provider knows about both what sites you’re visiting and who you are. Individual websites do not have this dangerous combination of information (except in the case of third party cookies, which are going away soon in browsers), but DNS providers do. Thankfully, it’s not actually necessary for a DNS resolver to know *both* the hostname of the service you’re going to and which IP you’re coming from. Disentangling the two, which is the goal of ODoH, is good for privacy.

The Internet is part of ‘our’ Infrastructure

Roads should be well-paved, well lit, have accurate signage, and be optimally connected. They aren’t designed to stop a car based on who’s inside it. Nor should they be! Like transportation infrastructure, Internet infrastructure is responsible for getting data where it needs to go, not looking inside packets, and making judgments. But the Internet is made of computers and software, and software tends to be written to make decisions based on the data it has available to it.

Privacy-preserving protocols attempt to eliminate the temptation for infrastructure providers and others to peek inside and make decisions based on personal data. A non-privacy preserving protocol like HTTP keeps data and metadata, like passwords, IP addresses, and hostnames, as explicit parts of the data sent over the wire. The fact that they are explicit means that they are available to any observer to collect and act on. A protocol like HTTPS improves upon this by making some of the data (such as passwords and site content) invisible on the wire using encryption.

The three protocols we are exploring today extend this concept.

  • ECH takes most of the unencrypted metadata in TLS (including the hostname) and encrypts it with a key that was fetched ahead of time.
  • ODoH (a new variant of DoH co-designed by Apple, Cloudflare, and Fastly engineers) uses proxies and onion-like encryption to make the source of a DNS query invisible to the DNS resolver. This protects the user’s IP address when resolving hostnames.
  • OPAQUE uses a new cryptographic technique to keep passwords hidden even from the server. Utilizing a construction called an Oblivious Pseudo-Random Function (as seen in Privacy Pass), the server does not learn the password; it only learns whether or not the user knows the password.

By making sure Internet infrastructure acts more like physical infrastructure, user privacy is more easily protected. The Internet is more private if private data can only be collected where the user has a chance to consent to its collection.

Doing it together

As much as we’re excited about working on new ways to make the Internet more private, innovation at a global scale doesn’t happen in a vacuum. Each of these projects is the output of a collaborative group of individuals working out in the open in organizations like the IETF and the IRTF. Protocols must come about through a consensus process that involves all the parties that make up the interconnected set of systems that power the Internet. From browser builders to cryptographers, from DNS operators to website administrators, this is truly a global team effort.

We also recognize that sweeping technical changes to the Internet will inevitably also impact the technical community. Adopting these new protocols may have legal and policy implications. We are actively working with governments and civil society groups to help educate them about the impact of these potential changes.

We’re looking forward to sharing our work today and hope that more interested parties join in developing these protocols. The projects we are announcing today were designed by experts from academia, industry, and hobbyists together and were built by engineers from Cloudflare Research (including the work of interns, which we will highlight) with everyone’s support Cloudflare.

If you’re interested in this type of work, we’re hiring!

Good-bye ESNI, hello ECH!

Post Syndicated from Christopher Patton original https://blog.cloudflare.com/encrypted-client-hello/

Good-bye ESNI, hello ECH!

Good-bye ESNI, hello ECH!

Most communication on the modern Internet is encrypted to ensure that its content is intelligible only to the endpoints, i.e., client and server. Encryption, however, requires a key and so the endpoints must agree on an encryption key without revealing the key to would-be attackers. The most widely used cryptographic protocol for this task, called key exchange, is the Transport Layer Security (TLS) handshake.

In this post we’ll dive into Encrypted Client Hello (ECH), a new extension for TLS that promises to significantly enhance the privacy of this critical Internet protocol. Today, a number of privacy-sensitive parameters of the TLS connection are negotiated in the clear. This leaves a trove of metadata available to network observers, including the endpoints’ identities, how they use the connection, and so on.

ECH encrypts the full handshake so that this metadata is kept secret. Crucially, this closes a long-standing privacy leak by protecting the Server Name Indication (SNI) from eavesdroppers on the network. Encrypting the SNI secret is important because it is the clearest signal of which server a given client is communicating with. However, and perhaps more significantly, ECH also lays the groundwork for adding future security features and performance enhancements to TLS while minimizing their impact on the privacy of end users.

ECH is the product of close collaboration, facilitated by the IETF, between academics and the tech industry leaders, including Cloudflare, our friends at Fastly and Mozilla (both of whom are the affiliations of co-authors of the standard), and many others. This feature represents a significant upgrade to the TLS protocol, one that builds on bleeding edge technologies, like DNS-over-HTTPS, that are only now coming into their own. As such, the protocol is not yet ready for Internet-scale deployment. This article is intended as a sign post on the road to full handshake encryption.

Background

The story of TLS is the story of the Internet. As our reliance on the Internet has grown, so the protocol has evolved to address ever-changing operational requirements, use cases, and threat models. The client and server don’t just exchange a key: they negotiate a wide variety of features and parameters: the exact method of key exchange; the encryption algorithm; who is authenticated and how; which application layer protocol to use after the handshake; and much, much more. All of these parameters impact the security properties of the communication channel in one way or another.

SNI is a prime example of a parameter that impacts the channel’s security. The SNI extension is used by the client to indicate to the server the website it wants to reach. This is essential for the modern Internet, as it’s common nowadays for many origin servers to sit behind a single TLS operator. In this setting, the operator uses the SNI to determine who will authenticate the connection: without it, there would be no way of knowing which TLS certificate to present to the client. The problem is that SNI leaks to the network the identity of the origin server the client wants to connect to, potentially allowing eavesdroppers to infer a lot of information about their communication. (Of course, there are other ways for a network observer to identify the origin — the origin’s IP address, for example. But co-locating with other origins on the same IP address makes it much harder to use this metric to determine the origin than it is to simply inspect the SNI.)

Although protecting SNI is the impetus for ECH, it is by no means the only privacy-sensitive handshake parameter that the client and server negotiate. Another is the ALPN extension, which is used to decide which application-layer protocol to use once the TLS connection is established. The client sends the list of applications it supports — whether it’s HTTPS, email, instant messaging, or the myriad other applications that use TLS for transport security — and the server selects one from this list, and sends its selection to the client. By doing so, the client and server leak to the network a clear signal of their capabilities and what the connection might be used for.

Some features are so privacy-sensitive that their inclusion in the handshake is a non-starter. One idea that has been floated is to replace the key exchange at the heart of TLS with password-authenticated key-exchange (PAKE). This would allow password-based authentication to be used alongside (or in lieu of) certificate-based authentication, making TLS more robust and suitable for a wider range of applications. The privacy issue here is analogous to SNI: servers typically associate a unique identifier to each client (e.g., a username or email address) that is used to retrieve the client’s credentials; and the client must, somehow, convey this identity to the server during the course of the handshake. If sent in the clear, then this personally identifiable information would be easily accessible to any network observer.

A necessary ingredient for addressing all of these privacy leaks is handshake encryption, i.e., the encryption of handshake messages in addition to application data. Sounds simple enough, but this solution presents another problem: how do the client and server pick an encryption key if, after all, the handshake is itself a means of exchanging a key? Some parameters must be sent in the clear, of course, so the goal of ECH is to encrypt all handshake parameters except those that are essential to completing the key exchange.

In order to understand ECH and the design decisions underpinning it, it helps to understand a little bit about the history of handshake encryption in TLS.

Handshake encryption in TLS

TLS had no handshake encryption at all prior to the latest version, TLS 1.3. In the wake of the Snowden revelations in 2013, the IETF community began to consider ways of countering the threat that mass surveillance posed to the open Internet. When the process of standardizing TLS 1.3 began in 2014, one of its design goals was to encrypt as much of the handshake as possible. Unfortunately, the final standard falls short of full handshake encryption, and several parameters, including SNI, are still sent in the clear. Let’s take a closer look to see why.

The TLS 1.3 protocol flow is illustrated in Figure 1. Handshake encryption begins as soon as the client and server compute a fresh shared secret. To do this, the client sends a key share in its ClientHello message, and the server responds in its ServerHello with its own key share. Having exchanged these shares, the client and server can derive a shared secret. Each subsequent handshake message is encrypted using the handshake traffic key derived from the shared secret. Application data is encrypted using a different key, called the application traffic key, which is also derived from the shared secret. These derived keys have different security properties: to emphasize this, they are illustrated with different colors.

The first handshake message that is encrypted is the server’s EncryptedExtensions. The purpose of this message is to protect the server’s sensitive handshake parameters, including the server’s ALPN extension, which contains the application selected from the client’s ALPN list. Key-exchange parameters are sent unencrypted in the ClientHello and ServerHello.

Good-bye ESNI, hello ECH!
Figure 1: The TLS 1.3 handshake.

All of the client’s handshake parameters, sensitive or not, are sent in the ClientHello. Looking at Figure 1, you might be able to think of ways of reworking the handshake so that some of them can be encrypted, perhaps at the cost of additional latency (i.e., more round trips over the network). However, extensions like SNI create a kind of “chicken-and-egg” problem.

The client doesn’t encrypt anything until it has verified the server’s identity (this is the job of the Certificate and CertificateVerify messages) and the server has confirmed that it knows the shared secret (the job of the Finished message). These measures ensure the key exchange is authenticated, thereby preventing monster-in-the-middle (MITM) attacks in which the adversary impersonates the server to the client in a way that allows it to decrypt messages sent by the client.  Because SNI is needed by the server to select the certificate, it needs to be transmitted before the key exchange is authenticated.

In general, ensuring confidentiality of handshake parameters used for authentication is only possible if the client and server already share an encryption key. But where might this key come from?

Full handshake encryption in the early days of TLS 1.3. Interestingly, full handshake encryption was once proposed as a core feature of TLS 1.3. In early versions of the protocol (draft-10, circa 2015), the server would offer the client a long-lived public key during the handshake, which the client would use for encryption in subsequent handshakes. (This design came from a protocol called OPTLS, which in turn was borrowed from the original QUIC proposal.) Called “0-RTT”, the primary purpose of this mode was to allow the client to begin sending application data prior to completing a handshake. In addition, it would have allowed the client to encrypt its first flight of handshake messages following the ClientHello, including its own EncryptedExtensions, which might have been used to protect the client’s sensitive handshake parameters.

Ultimately this feature was not included in the final standard (RFC 8446, published in 2018), mainly because its usefulness was outweighed by its added complexity. In particular, it does nothing to protect the initial handshake in which the client learns the server’s public key. Parameters that are required for server authentication of the initial handshake, like SNI, would still be transmitted in the clear.

Nevertheless, this scheme is notable as the forerunner of other handshake encryption mechanisms, like ECH, that use public key encryption to protect sensitive ClientHello parameters. The main problem these mechanisms must solve is key distribution.

Before ECH there was (and is!) ESNI

The immediate predecessor of ECH was the Encrypted SNI (ESNI) extension. As its name implies, the goal of ESNI was to provide confidentiality of the SNI. To do so, the client would encrypt its SNI extension under the server’s public key and send the ciphertext to the server. The server would attempt to decrypt the ciphertext using the secret key corresponding to its public key. If decryption were to succeed, then the server would proceed with the connection using the decrypted SNI. Otherwise, it would simply abort the handshake. The high-level flow of this simple protocol is illustrated in Figure 2.

Good-bye ESNI, hello ECH!
Figure 2: The TLS 1.3 handshake with the ESNI extension. It is identical to the TLS 1.3 handshake, except the SNI extension has been replaced with ESNI.

For key distribution, ESNI relied on another critical protocol: Domain Name Service (DNS). In order to use ESNI to connect to a website, the client would piggy-back on its standard A/AAAA queries a request for a TXT record with the ESNI public key. For example, to get the key for crypto.dance, the client would request the TXT record of _esni.crypto.dance:

$ dig _esni.crypto.dance TXT +short
"/wGuNThxACQAHQAgXzyda0XSJRQWzDG7lk/r01r1ZQy+MdNxKg/mAqSnt0EAAhMBAQQAAAAAX67XsAAAAABftsCwAAA="

The base64-encoded blob contains an ESNI public key and related parameters such as the encryption algorithm.

But what’s the point of encrypting SNI if we’re just going to leak the server name to network observers via a plaintext DNS query? Deploying ESNI this way became feasible with the introduction of DNS-over-HTTPS (DoH), which enables encryption of DNS queries to resolvers that provide the DoH service (1.1.1.1 is an example of such a service.). Another crucial feature of DoH is that it provides an authenticated channel for transmitting the ESNI public key from the DoH server to the client. This prevents cache-poisoning attacks that originate from the client’s local network: in the absence of DoH, a local attacker could prevent the client from offering the ESNI extension by returning an empty TXT record, or coerce the client into using ESNI with a key it controls.

While ESNI took a significant step forward, it falls short of our goal of achieving full handshake encryption. Apart from being incomplete — it only protects SNI — it is vulnerable to a handful of sophisticated attacks, which, while hard to pull off, point to theoretical weaknesses in the protocol’s design that need to be addressed.

ESNI was deployed by Cloudflare and enabled by Firefox, on an opt-in basis, in 2018, an  experience that laid bare some of the challenges with relying on DNS for key distribution. Cloudflare rotates its ESNI key every hour in order to minimize the collateral damage in case a key ever gets compromised. DNS artifacts are sometimes cached for much longer, the result of which is that there is a decent chance of a client having a stale public key. While Cloudflare’s ESNI service tolerates this to a degree, every key must eventually expire. The question that the ESNI protocol left open is how the client should proceed if decryption fails and it can’t access the current public key, via DNS or otherwise.

Another problem with relying on DNS for key distribution is that several endpoints might be authoritative for the same origin server, but have different capabilities. For example, a request for the A record of “example.com” might return one of two different IP addresses, each operated by a different CDN. The TXT record for “_esni.example.com” would contain the public key for one of these CDNs, but certainly not both. The DNS protocol does not provide a way of atomically tying together resource records that correspond to the same endpoint. In particular, it’s possible for a client to inadvertently offer the ESNI extension to an endpoint that doesn’t support it, causing the handshake to fail. Fixing this problem requires changes to the DNS protocol. (More on this below.)

The future of ESNI. In the next section, we’ll describe the ECH specification and how it addresses the shortcomings of ESNI. Despite its limitations, however, the practical privacy benefit that ESNI provides is significant. Cloudflare intends to continue its support for ESNI until ECH is production-ready.

The ins and outs of ECH

The goal of ECH is to encrypt the entire ClientHello, thereby closing the gap left in TLS 1.3 and ESNI by protecting all privacy-sensitive handshake-parameters. Similar to ESNI, the protocol uses a public key, distributed via DNS and obtained using DoH, for encryption during the client’s first flight. But ECH has improvements to key distribution that make the protocol more robust to DNS cache inconsistencies. Whereas the ESNI server aborts the connection if decryption fails, the ECH server attempts to complete the handshake and supply the client with a public key it can use to retry the connection.

But how can the server complete the handshake if it’s unable to decrypt the ClientHello? As illustrated in Figure 3, the ECH protocol actually involves two ClientHello messages: the ClientHelloOuter, which is sent in the clear, as usual; and the ClientHelloInner, which is encrypted and sent as an extension of the ClientHelloOuter. The server completes the handshake with just one of these ClientHellos: if decryption succeeds, then it proceeds with the ClientHelloInner; otherwise, it proceeds with the ClientHelloOuter.

Good-bye ESNI, hello ECH!
Figure 3: The TLS 1.3 handshake with the ECH extension.

The ClientHelloInner is composed of the handshake parameters the client wants to use for the connection. This includes sensitive values, like the SNI of the origin server it wants to reach (called the backend server in ECH parlance), the ALPN list, and so on. The ClientHelloOuter, while also a fully-fledged ClientHello message, is not used for the intended connection. Instead, the handshake is completed by the ECH service provider itself (called the client-facing server), signaling to the client that its intended destination couldn’t be reached due to decryption failure. In this case, the service provider also sends along the correct ECH public key with which the client can retry handshake, thereby “correcting” the client’s configuration. (This mechanism is similar to how the server distributed its public key for 0-RTT mode in the early days of TLS 1.3.)

At a minimum, both ClientHellos must contain the handshake parameters that are required for a server-authenticated key-exchange. In particular, while the ClientHelloInner contains the real SNI, the ClientHelloOuter also contains an SNI value, which the client expects to verify in case of ECH decryption failure (i.e., the client-facing server). If the connection is established using the ClientHelloOuter, then the client is expected to immediately abort the connection and retry the handshake with the public key provided by the server. It’s not necessary that the client specify an ALPN list in the ClientHelloOuter, nor any other extension used to guide post-handshake behavior. All of these parameters are encapsulated by the encrypted ClientHelloInner.

This design resolves — quite elegantly, I think — most of the challenges for securely deploying handshake encryption encountered by earlier mechanisms. Importantly, the design of ECH was not conceived in a vacuum. The protocol reflects the diverse perspectives of the IETF community, and its development dovetails with other IETF standards that are crucial to the success of ECH.

The first is an important new DNS feature known as the HTTPS resource record type. At a high level, this record type is intended to allow multiple HTTPS endpoints that are authoritative for the same domain name to advertise different capabilities for TLS. This makes it possible to rely on DNS for key distribution, resolving one of the deployment challenges uncovered by the initial ESNI deployment. For a deep dive into this new record type and what it means for the Internet more broadly, check out Alessandro Ghedini’s recent blog post on the subject.

The second is the CFRG’s Hybrid Public Key Encryption (HPKE) standard, which specifies an extensible framework for building public key encryption schemes suitable for a wide variety of applications. In particular, ECH delegates all of the details of its handshake encryption mechanism to HPKE, resulting in a much simpler and easier-to-analyze specification. (Incidentally, HPKE is also one of the main ingredients of Oblivious DNS-over-HTTPS.

The road ahead

The current ECH specification is the culmination of a multi-year collaboration. At this point, the overall design of the protocol is fairly stable. In fact, the next draft of the specification will be the first to be targeted for interop testing among implementations. Still, there remain a number of details that need to be sorted out. Let’s end this post with a brief overview of the road ahead.

Resistance to traffic analysis

Ultimately, the goal of ECH is to ensure that TLS connections made to different origin servers behind the same ECH service provider are indistinguishable from one another. In other words, when you connect to an origin behind, say, Cloudflare, no one on the network between you and Cloudflare should be able to discern which origin you reached, or which privacy-sensitive handshake-parameters you and the origin negotiated. Apart from an immediate privacy boost, this property, if achieved, paves the way for the deployment of new features for TLS without compromising privacy.

Encrypting the ClientHello is an important step towards achieving this goal, but we need to do a bit more. An important attack vector we haven’t discussed yet is traffic analysis. This refers to the collection and analysis of properties of the communication channel that betray part of the ciphertext’s contents, but without cracking the underlying encryption scheme. For example, the length of the encrypted ClientHello might leak enough information about the SNI for the adversary to make an educated guess as to its value (this risk is especially high for domain names that are either particularly short or particularly long). It is therefore crucial that the length of each ciphertext is independent of the values of privacy-sensitive parameters. The current ECH specification provides some mitigations, but their coverage is incomplete. Thus, improving ECH’s resistance to traffic analysis is an important direction for future work.

The spectre of ossification

An important open question for ECH is the impact it will have on network operations.

One of the lessons learned from the deployment of TLS 1.3 is that upgrading a core Internet protocol can trigger unexpected network behavior. Cloudflare was one of the first major TLS operators to deploy TLS 1.3 at scale; when browsers like Firefox and Chrome began to enable it on an experimental basis, they observed a significantly higher rate of connection failures compared to TLS 1.2. The root cause of these failures was network ossification, i.e., the tendency of middleboxes — network appliances between clients and servers that monitor and sometimes intercept traffic — to write software that expects traffic to look and behave a certain way. Changing the protocol before middleboxes had the chance to update their software led to middleboxes trying to parse packets they didn’t recognize, triggering software bugs that, in some instances, caused connections to be dropped completely.

This problem was so widespread that, instead of waiting for network operators to update their software, the design of TLS 1.3 was altered in order to mitigate the impact of network ossification. The ingenious solution was to make TLS 1.3 “look like” another protocol that middleboxes are known to tolerate. Specifically, the wire format and even the contents of handshake messages were made to resemble TLS 1.2. These two protocols aren’t identical, of course — a curious network observer can still distinguish between them — but they look and behave similar enough to ensure that the majority of existing middleboxes don’t treat them differently. Empirically, it was found that this strategy significantly reduced the connection failure rate enough to make deployment of TLS 1.3 viable.

Once again, ECH represents a significant upgrade for TLS for which the spectre of network ossification looms large. The ClientHello contains parameters, like SNI, that have existed in the handshake for a long time, and we don’t yet know what the impact will be of encrypting them. In anticipation of the deployment issues ossification might cause, the ECH protocol has been designed to look as much like a standard TLS 1.3 handshake as possible. The most notable difference is the ECH extension itself: if middleboxes ignore it — as they should, if they are compliant with the TLS 1.3 standard — then the rest of the handshake will look and behave very much as usual.

It remains to be seen whether this strategy will be enough to ensure the wide-scale deployment of ECH. If so, it is notable that this new feature will help to mitigate the impact of future TLS upgrades on network operations. Encrypting the full handshake reduces the risk of ossification since it means that there are less visible protocol features for software to ossify on. We believe this will be good for the health of the Internet overall.

Conclusion

The old TLS handshake is (unintentionally) leaky. Operational requirements of both the client and server have led to privacy-sensitive parameters, like SNI, being negotiated completely in the clear and available to network observers. The ECH extension aims to close this gap by enabling encryption of the full handshake. This represents a significant upgrade to TLS, one that will help preserve end-user privacy as the protocol continues to evolve.

The ECH standard is a work-in-progress. As this work continues, Cloudflare is committed to doing its part to ensure this important upgrade for TLS reaches Internet-scale deployment.

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

Post Syndicated from Tanya Verma original https://blog.cloudflare.com/oblivious-dns/

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

Today we are announcing support for a new proposed DNS standard — co-authored by engineers from Cloudflare, Apple, and Fastly — that separates IP addresses from queries, so that no single entity can see both at the same time. Even better, we’ve made source code available, so anyone can try out ODoH, or run their own ODoH service!

But first, a bit of context. The Domain Name System (DNS) is the foundation of a human-usable Internet. It maps usable domain names, such as cloudflare.com, to IP addresses and other information needed to connect to that domain. A quick primer about the importance and issues with DNS can be read in a previous blog post. For this post, it’s enough to know that, in the initial design and still dominant usage of DNS, queries are sent in cleartext. This means anyone on the network path between your device and the DNS resolver can see both the query that contains the hostname (or website) you want, as well as the IP address that identifies your device.

To safeguard DNS from onlookers and third parties, the IETF standardized DNS encryption with DNS over HTTPS (DoH) and DNS over TLS (DoT). Both protocols prevent queries from being intercepted, redirected, or modified between the client and resolver. Client support for DoT and DoH is growing, having been implemented in recent versions of Firefox, iOS, and more. Even so, until there is wider deployment among Internet service providers, Cloudflare is one of only a few providers to offer a public DoH/DoT service. This has raised two main concerns. One concern is that the centralization of DNS introduces single points of failure (although, with data centers in more than 100 countries, Cloudflare is designed to always be reachable). The other concern is that the resolver can still link all queries to client IP addresses.

Cloudflare is committed to end-user privacy. Users of our public DNS resolver service are protected by a strong, audited privacy policy. However, for some, trusting Cloudflare with sensitive query information is a barrier to adoption, even with such a strong privacy policy. Instead of relying on privacy policies and audits, what if we could give users an option to remove that bar with technical guarantees?

Today, Cloudflare and partners are launching support for a protocol that does exactly that: Oblivious DNS over HTTPS, or ODoH for short.

ODoH Partners:

We’re excited to launch ODoH with several leading launch partners who are equally committed to privacy.

A key component of ODoH is a proxy that is disjoint from the target resolver. Today, we’re launching ODoH with several leading proxy partners, including: PCCW, SURF, and Equinix.

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

“ODoH is a revolutionary new concept designed to keep users’ privacy at the center of everything. Our ODoH partnership with Cloudflare positions us well in the privacy and “Infrastructure of the Internet” space. As well as the enhanced security and performance of the underlying PCCW Global network, which can be accessed on-demand via Console Connect, the performance of the proxies on our network are now improved by Cloudflare’s 1.1.1.1 resolvers. This model for the first time completely decouples client proxy from the resolvers. This partnership strengthens our existing focus on privacy as the world moves to a more remote model and privacy becomes an even more critical feature.” — Michael Glynn, Vice President, Digital Automated Innovation, PCCW Global

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

“We are partnering with Cloudflare to implement better user privacy via ODoH. The move to ODoH is a true paradigm shift, where the users’ privacy or the IP address is not exposed to any provider, resulting in true privacy. With the launch of ODoH-pilot, we’re joining the power of Cloudflare’s network to meet the challenges of any users around the globe. The move to ODoH is not only a paradigm shift but it emphasizes how privacy is important to any users than ever, especially during 2020. It resonates with our core focus and belief around Privacy.” — Joost van Dijk, Technical Product Manager, SURF

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

How does Oblivious DNS over HTTPS (ODoH) work?

ODoH works by adding a layer of public key encryption, as well as a network proxy between clients and DoH servers such as 1.1.1.1. The combination of these two added elements guarantees that only the user has access to both the DNS messages and their own IP address at the same time.

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

There are three players in the ODoH path. Looking at the figure above, let’s begin with the target. The target decrypts queries encrypted by the client, via a proxy. Similarly, the target encrypts responses and returns them to the proxy. The standard says that the target may or may not be the resolver (we’ll touch on this later). The proxy does as a proxy is supposed to do, in that it forwards messages between client and target. The client behaves as it does in DNS and DoH, but differs by encrypting queries for the target, and decrypting the target’s responses. Any client that chooses to do so can specify a proxy and target of choice.

Together, the added encryption and proxying provide the following guarantees:

  1. The target sees only the query and the proxy’s IP address.
  2. The proxy has no visibility into the DNS messages, with no ability to identify, read, or modify either the query being sent by the client or the answer being returned by the target.
  3. Only the intended target can read the content of the query and produce a response.

These three guarantees improve client privacy while maintaining the security and integrity of DNS queries. However, each of these guarantees relies on one fundamental property — that the proxy and the target servers do not collude. So long as there is no collusion, an attacker succeeds only if both the proxy and target are compromised.

One aspect of this system worth highlighting is that the target is separate from the upstream recursive resolver that performs DNS resolution. In practice, for performance, we expect the target to be the same. In fact, 1.1.1.1 is now both a recursive resolver and a target! There is no reason that a target needs to exist separately from any resolver. If they are separated then the target is free to choose resolvers, and just act as a go-between. The only real requirement, remember, is that the proxy and target never collude.

Also, importantly, clients are in complete control of proxy and target selection. Without any need for TRR-like programs, clients can have privacy for their queries, in addition to security. Since the target only knows about the proxy, the target and any upstream resolver are oblivious to the existence of any client IP addresses. Importantly, this puts clients in greater control over their queries and the ways they might be used. For example, clients could select and alter their proxies and targets any time, for any reason!

ODoH Message Flow

In ODoH, the ‘O’ stands for oblivious, and this property comes from the level of encryption of the DNS messages themselves. This added encryption is `end-to-end` between client and target, and independent from the connection-level encryption provided by TLS/HTTPS. One might ask why this additional encryption is required at all in the presence of a proxy. This is because two separate TLS connections are required to support proxy functionality. Specifically, the proxy terminates a TLS connection from the client, and initiates another TLS connection to the target. Between those two connections, the DNS message contexts would otherwise appear in plaintext! For this reason, ODoH additionally encrypts messages between client and target so the proxy has no access to the message contents.

The whole process begins with clients that encrypt their query for the target using HPKE. Clients obtain the target’s public key via DNS, where it is bundled into a HTTPS resource record and protected by DNSSEC. When the TTL for this key expires, clients request a new copy of the key as needed (just as they would for an A/AAAA record when that record’s TTL expires). The usage of a target’s DNSSEC-validated public key guarantees that only the intended target can decrypt the query and encrypt a response (answer).

Clients transmit these encrypted queries to a proxy over an HTTPS connection. Upon receipt, the proxy forwards the query to the designated target. The target then decrypts the query, produces a response by sending the query to a recursive resolver such as 1.1.1.1, and then encrypts the response to the client. The encrypted query from the client contains encapsulated keying material from which targets derive the response encryption symmetric key.

This response is then sent back to the proxy, and then subsequently forwarded to the client. All communication is authenticated and confidential since these DNS messages are end-to-end encrypted, despite being transmitted over two separate HTTPS connections (client-proxy and proxy-target). The message that otherwise appears to the proxy as plaintext is actually an encrypted garble.

What about Performance? Do I have to trade performance to get privacy?

We’ve been doing lots of measurements to find out, and will be doing more as ODoH deploys more widely. Our initial set of measurement configurations spanned cities in the USA, Canada, and Brazil. Importantly, our measurements include not just 1.1.1.1, but also 8.8.8.8 and 9.9.9.9. The full set of measurements, so far, is documented for open access.

In those measurements, it was important to isolate the cost of proxying and additional encryption from the cost of TCP and TLS connection setup. This is because the TLS and TCP costs are incurred by DoH, anyway. So, in our setup, we ‘primed’ measurements by establishing connections once and reusing that connection for all measurements. We did this for both DoH and for ODoH, since the same strategy could be used in either case.

The first thing that we can say with confidence is that the additional encryption is marginal. We know this because we randomly selected 10,000 domains from the Tranco million dataset and measured both encryption of the A record with a different public key, as well as its decryption. The additional cost between a proxied DoH query/response and its ODoH counterpart is consistently less than 1ms at the 99th percentile.

The ODoH request-response pipeline, however, is much more than just encryption. A very useful way of looking at measurements is by looking at the cumulative distribution chart — if you’re familiar with these kinds of charts, skip to the next paragraph. In contrast to most charts where we start along the x-axis, with cumulative distributions we often start with the y-axis.

The chart below shows the cumulative distributions for query/response times in DoH, ODoH, and DoH when transmitted over the Tor Network. The dashed horizontal line that starts on the left from 0.5 is the 50% mark. Along this horizontal line, for any plotted curve, the part of the curve below the dashed line is 50% of the data points. Now look at the x-axis, which is a measure of time. The lines that appear to the left are faster than lines to the right. One last important detail is that the x-axis is plotted on a logarithmic scale. What does this mean? Notice that the distance between the labeled markers (10x) is equal in cumulative distributions but the ‘x’ is an exponent, and represents orders of magnitude. So, while the time difference between the first two markers is 9ms, the difference between the 3rd and 4th markers is 900ms.

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

In this chart, the middle curve represents ODoH measurements. We also measured the performance of privacy-preserving alternatives, for example, DoH queries transmitted over the Tor network as represented by the right curve in the chart. (Additional privacy-preserving alternatives are captured in the open access technical report.) Compared to other privacy-oriented DNS variants, ODoH cuts query time in half, or better. This point is important since privacy and performance rarely play nicely together, so seeing this kind of improvement is encouraging!

The chart above also tells us that 50% of the time ODoH queries are resolved in fewer than 228ms. Now compare the middle line to the left line that represents ‘straight-line’ (or normal) DoH without any modification. That left plotline says that 50% of the time, DoH queries are resolved in fewer than 146ms. Looking below the 50% mark, the curves also tell us that ½ the time that difference is never greater than 100ms. On the other side, looking at the curves above the 50% mark tells us that ½ ODoH queries are competitive with DoH.

Those curves also hide a lot of information, so it is important to delve further into the measurements. The chart below has three different cumulative distribution curves that describe ODoH performance if we select proxies and targets by their latency. This is also an example of the insights that measurements can reveal, some of which are counterintuitive. For example, looking above 0.5, these curves say that ½ of ODoH query/response times are virtually indistinguishable, no matter the choice of proxy and target. Now shift attention below 0.5 and compare the two solid curves against the dashed curve that represents overall average. This region suggests that selecting the lowest-latency proxy and target offers minimal improvement over the average but, most importantly, it shows that selecting the lowest-latency proxy leads to worse performance!

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

Open questions remain, of course. This first set of measurements were executed largely in North America. Does performance change at a global level? How does this affect client performance, in practice? We’re working on finding out, and this release will help us to do that.

Interesting! Can I experiment with ODoH? Is there an open ODoH service?

Yes, and yes! We have open sourced our interoperable ODoH implementations in Rust, odoh-rs and Go, odoh-go, as well as integrated the target into the Cloudflare DNS Resolver. That’s right, 1.1.1.1 is ready to receive queries via ODoH.

We have also open sourced test clients in Rust, odoh-client-rs, and Go, odoh-client-go, to demo ODoH queries. You can also check out the HPKE configuration used by ODoH for message encryption to 1.1.1.1 by querying the service directly:

$ dig -t type65 +dnssec @ns1.cloudflare.com odoh.cloudflare-dns.com 

; <<>> DiG 9.10.6 <<>> -t type65 +dnssec @ns1.cloudflare.com odoh.cloudflare-dns.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19923
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;odoh.cloudflare-dns.com.	IN	TYPE65

;; ANSWER SECTION:
odoh.cloudflare-dns.com. 300	IN	TYPE65	\# 108 00010000010003026832000400086810F8F96810F9F9000600202606 470000000000000000006810F8F92606470000000000000000006810 F9F98001002E002CFF0200280020000100010020ED82DBE32CCDE189 BC6C643A80B5FAFF82548D21601C613408BACAAE6467B30A
odoh.cloudflare-dns.com. 300	IN	RRSIG	TYPE65 13 3 300 20201119163629 20201117143629 34505 odoh.cloudflare-dns.com. yny5+ApxPSO6Q4aegv09ZnBmPiXxDEnX5Xv21TAchxbxt1VhqlHpb5Oc 8yQPNGXb0fb+NyibmHlvTXjphYjcPA==

;; Query time: 21 msec
;; SERVER: 173.245.58.100#53(173.245.58.100)
;; WHEN: Wed Nov 18 07:36:29 PST 2020
;; MSG SIZE  rcvd: 291

We are working to add ODoH to existing stub resolvers such as cloudflared. If you’re interested in adding support to a client, or if you encounter bugs with the implementations, please drop us a line at [email protected]! Announcements about the ODoH specification and server will be sent to the IETF DPRIVE mailing list. You can subscribe and follow announcements and discussion about the specification here.

We are committed to moving it forward in the IETF and are already seeing interest from client vendors. Eric Rescorla, CTO of Firefox, says, “Oblivious DoH is a great addition to the secure DNS ecosystem. We’re excited to see it starting to take off and are looking forward to experimenting with it in Firefox.” We hope that more operators join us along the way and provide support for the protocol, by running either proxies or targets, and we hope client support will increase as the available infrastructure increases, too.

The ODoH protocol is a practical approach for improving privacy of users, and aims to improve the overall adoption of encrypted DNS protocols without compromising performance and user experience on the Internet.

Acknowledgements

Marek Vavruša and Anbang Wen were instrumental in getting the 1.1.1.1 resolver to support ODoH. Chris Wood and Peter Wu helped get the ODoH libraries ready and tested.

Improving the Resiliency of Our Infrastructure DNS Zone

Post Syndicated from Ryan Timken original https://blog.cloudflare.com/improving-the-resiliency-of-our-infrastructure-dns-zone/

Improving the Resiliency of Our Infrastructure DNS Zone

In this blog post we will discuss how we made our infrastructure DNS zone more reliable by using multiple primary nameservers to leverage our own DNS product running on our edge as well as a third-party DNS provider.

Improving the Resiliency of Our Infrastructure DNS Zone

Authoritative Nameservers

You can think of an authoritative nameserver as the source of truth for the records of a given DNS zone. When a recursive resolver wants to look up a record, it will eventually need to talk to the authoritative nameserver(s) for the zone in question. If you’d like to read more on the topic, our learning center provides some additional information.

Here’s an example of our authoritative nameservers (replacing our actual domain with example.com):

~$ dig NS example.com +short
ns1.example.com.
ns2.example.com.
ns3.example.com.

As you can see, there are three nameservers listed. You’ll notice that the nameservers happen to reside in the same zone, but they don’t have to. Those three nameservers point to six anycasted IP addresses (3 x IPv4, 3 x IPv6) announced from our edge, comprising data centers from 200+ cities around the world.

The Problem

We store the hostnames for all of our machines, both the ones at the edge and the ones at core data centers, in a DNS zone we refer to as our infrastructure zone. By using our own DNS product to run our infrastructure zone, we get the reliability and performance of our Global Anycast Network. However, what would happen if those nameservers became unavailable due to an issue on our edge? Eventually DNS lookups would fail, as resolvers would not be able to connect to the only authoritative nameservers configured on the zone: the ones hosted on our edge. This is the main problem we set out to solve.

When an incident occurs, our engineering teams are busy investigating, debugging, and fixing the issue at hand. If they cannot resolve the hostnames of the infrastructure they need to use, it will lead to confusion and delays in resolving the incident.

Imagine you are part of a team tackling an incident. As you quickly begin to investigate, you try to connect to a specific machine to gather some debugging information, but you get a DNS resolution error. What now? You know you are using the right hostname. Is this a result of the incident at hand, some side effect, or is this completely unrelated? You need to keep investigating, so you try another way. Maybe you have memorized the IP address of the machine and you can connect. More likely, you ask another resolver which still has the answer in its cache and resolve the IP that way. One thing is certain, the extra “mini-debug” step costs you precious time and detracts from debugging the real root cause.

Unavailable nameservers and the problems they cause are not unique to Cloudflare, and are not specific to our use case. Even if we hosted our authoritative nameservers with a single external provider, they could also experience their own issues causing the nameservers to fail.

Within the DNS community it is always understood that the more diversity, the better. Some common methods include diversifying network/routing configurations and software choices between a standard primary/secondary setup. However, no matter how resilient the network or software stack is, a single authority is still responsible for the zone. Luckily, there are ways to solve this problem!

Our Solution: Multiple Primary Nameservers

Our solution was to set up our infrastructure zone with multiple primary nameservers. This type of setup is referred to as split authority, multi-primary, or primary/primary. This way, instead of using just one provider for our authoritative nameservers, we use two.

We added three nameservers from our additional provider to our zone at our registrar. Using multiple primaries allows changes to our zone at each provider, to be controlled separately and completely by us. Instead of using zone transfers to keep our zone in sync, we use OctoDNS to independently and simultaneously manage the zone at both providers. We’ll talk more about our use of OctoDNS in a bit.

This setup is similar to using a primary and secondary server. The main difference is that the nameservers operate independently from one another, and do not use the usual DNS AXFR/IXFR method to keep the zone up to date. If you’d like to learn more about this type of solution, here is a great blog post by Dina Kozlov, our Product Manager for DNS, about Secondary DNS.

The nameservers in our zone after adding an additional provider would look something like:

~$ dig NS example.com +short
ns1.example.com.
ns2.example.com.
ns3.example.com.
ns1.additional-provider.net.
ns2.additional-provider.net.
ns3.additional-provider.net.

Predictable Query Routing

Currently, we cannot coordinate which provider should be used when querying a record within the zone. Recursive resolvers have different methods of choosing which nameserver should be used when presented with multiple NS records. A popular choice is to use the server with the lowest RTT (round trip time). A great blog post from APNIC on recursive resolver authoritative nameserver selection can be consulted for a detailed explanation.

We needed to enforce specific routing decisions between the two authorities. Our requirement was for a weighted routing policy preferring Cloudflare based on availability checks for requests originating from our infrastructure servers. This reduces RTT, since the queries originate from the same servers hosting our nameservers in the same data center, and do not need to travel further to the external provider when all is well.

Our edge infrastructure servers are configured to use DNSDist as their primary system resolver. DNSDist load balances queries using multiple upstream recursive DNS servers (Unbound) in each data center to provide DNS resolution. This setup is used for internal DNS resolution only, and as such, it is independent from our authoritative DNS and public resolver 1.1.1.1 service.

Additionally, we modified our DNSDist configuration by adding two server pools for our infrastructure zone authoritative servers, and set up active checks with weighted routes to always use the Cloudflare pool when available. With the active checking and weighted routing in place, our queries are always routed internally when available. In the event of a Cloudflare authoritative DNS failure, DNSDist will route all requests for the zone to our external provider.

Maintaining Our Infrastructure DNS Zone

In addition to a more reliable infrastructure zone, we also wanted to further automate the provisioning of DNS records. We had been relying on an older manual tool that became quite slow handling our growing number of DNS records. In a nutshell, this tool queried our provisioning database to gather all of the machine names, and then created, deleted, or updated the required DNS records using the Cloudflare API. We had to run this tool whenever we provisioned or decommissioned  machines that serve our customers’ requests.

Part of the procedure for running this tool was to first run it in dry-run mode, and then paste the results for review in our team chat room. This review step ensured the changes the tool found were expected and safe to run, and it is something we want to keep as part of our plan to automate the process.

Here’s what the old tool looked like:

$ cf-provision update-example -l ${USER}@cloudflare.com -u
Running update-example.sh -t /var/tmp/cf-provision -l [email protected] -u
* Loading up possible records …

deleting: {"name":"node44.example.com","ttl":300, "type":"A","content":"192.0.2.2","proxied":false}
 rec_id:abcdefg
updating: { "name":"build-ts.example.com", "ttl":120, "type":"TXT", "content":"1596722274"} rec_id:123456 previous content: 1596712896

Result    : SUCCESS
Task      : BUILD
Dest dir  :
Started   : Thu, 06 Aug 2020 13:57:40 +0000
Finished  : Thu, 06 Aug 2020 13:58:16 +0000
Elapsed   : 36 secs
Records Changed In API  : 1 record(s) changed
Records Deleted In API  : 1 record(s) deleted

Zone Management with OctoDNS

As we mentioned earlier, when using a multi-primary setup for our infrastructure zone, we are required to maintain the zone data outside of traditional DNS replication. Before rolling out our own solution we wanted to see what was out there, and we stumbled upon OctoDNS, a project from GitHub. OctoDNS provides a set of tools that make it easy to manage your DNS records across multiple providers.

OctoDNS uses a pluggable architecture. This means we can rewrite our old script as a plugin (which is called a ‘source’), and use other existing provider plugins (called ‘providers’) to interact with both the Cloudflare and our other provider’s APIs. Should we decide to sync to more external DNS providers in the future, we would just need to add them to our OctoDNS configuration. This allows our records to stay up to date, and, as an added benefit, records OctoDNS doesn’t know about will be removed during operation (for example, changes made outside of OctoDNS). This ensures that manual changes do not diverge from what is present in the provisioning database.

Our goal was to keep the zone management workflow as simple as possible. At Cloudflare, we use TeamCity for CI/CD, and we realized that it could not only facilitate code builds of our OctoDNS implementation, but it could also be used to deploy the zone.

There are a number of benefits to using our existing TeamCity infrastructure.

  • Our DevTools team can reliably manage it as a service
  • It has granular permissions, which allow us to control who can deploy the changes
  • It provides storage of the logs for auditing
  • It allows easy rollback of zone revisions
  • It integrates easily with our chat ops workflow via Google Chat webhooks

Below is a high-level overview of how we manage our zone through OctoDNS.

Improving the Resiliency of Our Infrastructure DNS Zone

There are three steps in the workflow:

  1. Build – evaluate the sources and build the complete zone.
  2. Compare – parse the built zone and compare to the records on the providers. Changes found are sent to SRE for evaluation.
  3. Deploy – deploy the changes to the providers.

Step 1: Build

Sources
OctoDNS consumes data from our internal systems via custom source modules. So far, we have built two source modules for querying data:

  1. ProvAPI queries our internal provisioning API providing data center and node configuration.
  2. NetBox queries our internal NetBox deployment providing hardware metadata.

Additionally, static records are defined in a YAML file. These sources form the infrastructure zone source of truth.

Providers
The build process establishes a staging area per execution. The YAML provider builds a static YAML file containing the complete zone. The hosts file provider is used to generate an emergency hosts file for the zone. Contents from the staging area are revision controlled in CI, which allows us to easily deploy previous versions if required.

Improving the Resiliency of Our Infrastructure DNS Zone

Step 2: Compare

Following a successful zone build CI executes the compare build. During this phase, OctoDNS performs an octodns-sync in dry-run mode. The compare build consumes the staging YAML zone configuration, and compares the records against the records on our authoritative providers via their respective APIs. Any identified zone changes are parsed, and a summary line is generated and sent to the SRE Chat room for approval. SRE are automatically linked to the changes and associated CI build for deployment.

Improving the Resiliency of Our Infrastructure DNS Zone

Step 3: Deploy

The deployment CI build is access-controlled and scoped to the SRE group using our single sign-on provider. Following successful approval and peer review, an SRE can deploy the changes by executing the deploy build.

The deployment process is simple; consume the YAML zone data from our staging area and deploy the changes to the zone’s authoritative providers via ‘octodns-sync –doit’. The hosts file generated for the zone is packaged and deployed, to be used in the event of complete DNS failure.

Here’s an example of how the message looks. When the deploy is finished, the thread is updated to indicate which user initiated it.

Improving the Resiliency of Our Infrastructure DNS Zone

Future Improvements

In the future, we would like to automate the process further by reducing the need for approvals. There is usually no harm in adding new records, and it is done very often during the provisioning of new machines. Removing the need to approve those records would take out another step in the provisioning process, which is something we are always looking to optimize.

Introducing OctoDNS and an additional provider allowed us to make our infrastructure DNS zone more reliable and easier to manage. We can now easily include new sources of record data, with OctoDNS allowing us to focus more on new and exciting projects and less on managing DNS records.

SAD DNS Explained

Post Syndicated from Marek Vavruša original https://blog.cloudflare.com/sad-dns-explained/

SAD DNS Explained

This week, at the ACM CCS 2020 conference, researchers from UC Riverside and Tsinghua University announced a new attack against the Domain Name System (DNS) called SAD DNS (Side channel AttackeD DNS). This attack leverages recent features of the networking stack in modern operating systems (like Linux) to allow attackers to revive a classic attack category: DNS cache poisoning. As part of a coordinated disclosure effort earlier this year, the researchers contacted Cloudflare and other major DNS providers and we are happy to announce that 1.1.1.1 Public Resolver is no longer vulnerable to this attack.

In this post, we’ll explain what the vulnerability was, how it relates to previous attacks of this sort, what mitigation measures we have taken to protect our users, and future directions the industry should consider to prevent this class of attacks from being a problem in the future.

DNS Basics

The Domain Name System (DNS) is what allows users of the Internet to get around without memorizing long sequences of numbers. What’s often called the “phonebook of the Internet” is more like a helpful system of translators that take natural language domain names (like blog.cloudflare.com or gov.uk) and translate them into the native language of the Internet: IP addresses (like 192.0.2.254 or [2001:db8::cf]). This translation happens behind the scenes so that users only need to remember hostnames and don’t have to get bogged down with remembering IP addresses.

DNS is both a system and a protocol. It refers to the hierarchical system of computers that manage the data related to naming on a network and it refers to the language these computers use to speak to each other to communicate answers about naming. The DNS protocol consists of pairs of messages that correspond to questions and responses. Each DNS question (query) and answer (reply) follows a standard format and contains a set of parameters that contain relevant information such as the name of interest (such as blog.cloudflare.com) and the type of response record desired (such as A for IPv4 or AAAA for IPv6).

The DNS Protocol and Spoofing

These DNS messages are exchanged over a network between machines using a transport protocol. Originally, DNS used UDP, a simple stateless protocol in which messages are endowed with a set of metadata indicating a source port and a destination port. More recently, DNS has adapted to use more complex transport protocols such as TCP and even advanced protocols like TLS or HTTPS, which incorporate encryption and strong authentication into the mix (see Peter Wu’s blog post about DNS protocol encryption).

Still, the most common transport protocol for message exchange is UDP, which has the advantages of being fast, ubiquitous and requiring no setup. Because UDP is stateless, the pairing of a response to an outstanding query is based on two main factors: the source address and port pair, and information in the DNS message. Given that UDP is both stateless and unauthenticated, anyone, and not just the recipient, can send a response with a forged source address and port, which opens up a range of potential problems.

SAD DNS Explained
The blue portions contribute randomness

Since the transport layer is inherently unreliable and untrusted, the DNS protocol was designed with additional mechanisms to protect against forged responses. The first two bytes in the message form a message or transaction ID that must be the same in the query and response. When a DNS client sends a query, it will set the ID to a random value and expect the value in the response to match. This unpredictability introduces entropy into the protocol, which makes it less likely that a malicious party will be able to construct a valid DNS reply without first seeing the query. There are other potential variables to account for, like the DNS query name and query type are also used to pair query and response, but these are trivial to guess and don’t introduce an additional entropy.

Those paying close attention to the diagram may notice that the amount of entropy introduced by this measure is only around 16 bits, which means that there are fewer than a hundred thousand possibilities to go through to find the matching reply to a given query. More on this later.

The DNS Ecosystem

DNS servers fall into one of a few main categories: recursive resolvers (like 1.1.1.1 or 8.8.8.8), nameservers (like the DNS root servers or Cloudflare Authoritative DNS). There are also elements of the ecosystem that act as “forwarders” such as dnsmasq. In a typical DNS lookup, these DNS servers work together to complete the task of delivering the IP address for a specified domain to the client (the client is usually a stub resolver – a simple resolver built into an operating system). For more detailed information about the DNS ecosystem, take a look at our learning site. The SAD DNS attack targets the communication between recursive resolvers and nameservers.

Each of the participants in DNS (client, resolver, nameserver) uses the DNS protocol to communicate with each other. Most of the latest innovations in DNS revolve around upgrading the transport between users and recursive resolvers to use encryption. Upgrading the transport protocol between resolvers and authoritative servers is a bit more complicated as it requires a new discovery mechanism to instruct the resolver when to (and when not to use) a more secure channel.  Aside from a few examples like our work with Facebook to encrypt recursive-to-authoritative traffic with DNS-over-TLS, most of these exchanges still happen over UDP. This is the core issue that enables this new attack on DNS, and one that we’ve seen before.

Kaminsky’s Attack

Prior to 2008, recursive resolvers typically used a single open port (usually port 53) to send and receive messages to authoritative nameservers. This made guessing the source port trivial, so the only variable an attacker needed to guess to forge a response to a query was the 16-bit message ID. The attack Kaminsky described was relatively simple: whenever a recursive resolver queried the authoritative name server for a given domain, an attacker would flood the resolver with DNS responses for some or all of the 65 thousand or so possible message IDs. If the malicious answer with the right message ID arrived before the response from the authoritative server, then the DNS cache would be effectively poisoned, returning the attacker’s chosen answer instead of the real one for as long as the DNS response was valid (called the TTL, or time-to-live).

For popular domains, resolvers contact authoritative servers once per TTL (which can be as short as 5 minutes), so there are plenty of opportunities to mount this attack. Forwarders that cache DNS responses are also vulnerable to this type of attack.

SAD DNS Explained

In response to this attack, DNS resolvers started doing source port randomization and careful checking of the security ranking of cached data. To poison these updated resolvers, forged responses would not only need to guess the message ID, but they would also have to guess the source port, bringing the number of guesses from the tens of thousands to over a trillion. This made the attack effectively infeasible. Furthermore, the IETF published RFC 5452 on how to harden DNS from guessing attacks.

It should be noted that this attack did not work for DNSSEC-signed domains since their answers are digitally signed. However, even now in 2020, DNSSEC is far from universal.

Defeating Source Port Randomization with Fragmentation

Another way to avoid having to guess the source port number and message ID is to split the DNS response in two. As is often the case in computer security, old attacks become new again when attackers discover new capabilities. In 2012, researchers Amir Herzberg and Haya Schulman from Bar Ilan University discovered that it was possible for a remote attacker to defeat the protections provided by source port randomization. This new attack leveraged another feature of UDP: fragmentation. For a primer on the topic of UDP fragmentation, check out our previous blog post on the subject by Marek Majkowski.

The key to this attack is the fact that all the randomness that needs to be guessed in a DNS poisoning attack is concentrated at the beginning of the DNS message (UDP header and DNS header).If the UDP response packet (sometimes called a datagram) is split into two fragments, the first half containing the message ID and source port and the second containing part of the DNS response, then all an attacker needs to do is forge the second fragment and make sure that the fake second fragment arrives at the resolver before the true second fragment does. When a datagram is fragmented, each fragment is assigned a 16-bit IDs (called IP-ID), which is used to reassemble it at the other end of the connection. Since the second fragment only has the IP-ID as entropy (again, this is a familiar refrain in this area), this attack is feasible with a relatively small number of forged packets. The downside of this attack is the precondition that the response must be fragmented in the first place, and the fragment must be carefully altered to pass the original section counts and UDP checksum.

SAD DNS Explained

Also discussed in the original and follow-up papers is a method of forcing two remote servers to send packets between each other which are fragmented at an attacker-controlled point, making this attack much more feasible. The details are in the paper, but it boils down to the fact that the control mechanism for describing the maximum transmissible unit (MTU) between two servers — which determines at which point packets are fragmented — can be set via a forged UDP packet.

SAD DNS Explained

We explored this risk in a previous blog post in the context of certificate issuance last year when we introduced our multi-path DCV service, which mitigates this risk in the context of certificate issuance by making DNS queries from multiple vantage points. Nevertheless, fragmentation-based attacks are proving less and less effective as DNS providers move to eliminate support for fragmented DNS packets (one of the major goals of DNS Flag Day 2020).

Defeating Source Port Randomization via ICMP error messages

Another way to defeat the source port randomization is to use some measurable property of the server that makes the source port easier to guess. If the attacker could ask the server which port number is being used for a pending query, that would make the construction of a spoofed packet much easier. No such thing exists, but it turns out there is something close enough – the attacker can discover which ports are surely closed (and thus avoid having to send traffic). One such mechanism is the ICMP “port unreachable” message.

Let’s say the target receives a UDP datagram destined for its IP and some port, the datagram either ends up either being accepted and silently discarded by the application, or rejected because the port is closed. If the port is closed, or more importantly, closed to the IP address that the UDP datagram was sent from, the target will send back an ICMP message notifying the attacker that the port is closed. This is handy to know since the attacker now doesn’t have to bother trying to guess the pending message ID on this port and move to other ports. A single scan of the server effectively reduces the search space of valid UDP responses from 232 (over a trillion) to 217 (around a hundred thousand), at least in theory.

This trick doesn’t always work. Many resolvers use “connected” UDP sockets instead of “open” UDP sockets to exchange messages between the resolver and nameserver. Connected sockets are tied to the peer address and port on the OS layer, which makes it impossible for an attacker to guess which “connected” UDP sockets are established between the target and the victim, and since the attacker isn’t the victim, it can’t directly observe the outcome of the probe.

To overcome this, the researchers found a very clever trick: they leverage ICMP rate limits as a side channel to reveal whether a given port is open or not. ICMP rate limiting was introduced (somewhat ironically, given this attack) as a security feature to prevent a server from being used as an unwitting participant in a reflection attack. In broad terms, it is used to limit how many ICMP responses a server will send out in a given time period. Say an attacker wanted to scan 10,000 ports and sent a burst of 10,000 UDP packets to a server configured with an ICMP rate limit of 50 per second, then only the first 50 would get an ICMP “port unreachable” message in reply.

Rate limiting seems innocuous until you remember one of the core rules of data security: don’t let private information influence publicly measurable metrics. ICMP rate limiting violates this rule because the rate limiter’s behavior can be influenced by an attacker making guesses as to whether a “secret” port number is open or not.

don’t let private information influence publicly measurable metrics

An attacker wants to know whether the target has an open port, so it sends a spoofed UDP message from the authoritative server to that port. If the port is open, no ICMP reply is sent and the rate counter remains unchanged. If the port is inaccessible, then an ICMP reply is sent (back to the authoritative server, not to the attacker) and the rate is increased by one. Although the attacker doesn’t see the ICMP response, it has influenced the counter. The counter itself isn’t known outside the server, but whether it has hit the rate limit or not can be measured by any outside observer by sending a UDP packet and waiting for a reply. If an ICMP “port unreachable” reply comes back, the rate limit hasn’t been reached. No reply means the rate limit has been met. This leaks one bit of information about the counter to the outside observer, which in the end is enough to reveal the supposedly secret information (whether the spoofed request got through or not).

SAD DNS Explained
Diagram inspired by original paper‌‌

Concretely, the attack works as follows: the attacker sends a bunch (large enough to trigger the rate limiting) of probe messages to the target, but with a forged source address of the victim. In the case where there are no open ports in the probed set, the target will send out the same amount of ICMP “port unreachable” responses back to the victim and trigger the rate limit on outgoing ICMP messages. The attacker can now send an additional verification message from its own address and observe whether an ICMP response comes back or not. If it does then there was at least one port open in the set and the attacker can divide the set and try again, or do a linear scan by inserting the suspected port number into a set of known closed ports. Using this approach, the attacker can narrow down to the open ports and try to guess the message ID until it is successful or gives up, similarly to the original Kaminsky attack.

In practice there are some hurdles to successfully mounting this attack.

  • First, the target IP, or a set of target IPs must be discovered. This might be trivial in some cases – a single forwarder, or a fixed set of IPs that can be discovered by probing and observing attacker controlled zones, but more difficult if the target IPs are partitioned across zones as the attacker can’t see the resolver egress IP unless she can monitor the traffic for the victim domain.
  • The attack also requires a large enough ICMP outgoing rate limit in order to be able to scan with a reasonable speed. The scan speed is critical, as it must be completed while the query to the victim nameserver is still pending. As the scan speed is effectively fixed, the paper instead describes a method to potentially extend the window of opportunity by triggering the victim’s response rate limiting (RRL), a technique to protect against floods of forged DNS queries. This may work if the victim implements RRL and the target resolver doesn’t implement a retry over TCP (A Quantitative Study of the Deployment of DNS Rate Limiting shows about 16% of nameservers implement some sort of RRL).
  • Generally, busy resolvers will have ephemeral ports opening and closing, which introduces false positive open ports for the attacker, and ports open for different pending queries than the one being attacked.

We’ve implemented an additional mitigation to 1.1.1.1 to prevent message ID guessing – if the resolver detects an ID enumeration attempt, it will stop accepting any more guesses and switches over to TCP. This reduces the number of attempts for the attacker even if it guesses the IP address and port correctly, similarly to how the number of password login attempts is limited.

Outlook

Ultimately these are just mitigations, and the attacker might be willing to play the long game. As long as the transport layer is insecure and DNSSEC is not widely deployed, there will be different methods of chipping away at these mitigations.

It should be noted that trying to hide source IPs or open port numbers is a form of security through obscurity. Without strong cryptographic authentication, it will always be possible to use spoofing to poison DNS resolvers. The silver lining here is that DNSSEC exists, and is designed to protect against this type of attack, and DNS servers are moving to explore cryptographically strong transports such as TLS for communicating between resolvers and authoritative servers.

At Cloudflare, we’ve been helping to reduce the friction of DNSSEC deployment, while also helping to improve transport security in the long run. There is also an effort to increase entropy in DNS messages with RFC 7873 – Domain Name System (DNS) Cookies, and make DNS over TCP support mandatory RFC 7766 – DNS Transport over TCP – Implementation Requirements, with even more documentation around ways to mitigate this type of issue available in different places. All of these efforts are complementary, which is a good thing. The DNS ecosystem consists of many different parties and software with different requirements and opinions, as long as the operators support at least one of the preventive measures, these types of attacks will become more and more difficult.

If you are an operator of an authoritative DNS server, you should consider taking the following steps to protect yourself from this attack:

We’d like to thank the researchers for responsibly disclosing this attack and look forward to working with them in the future on efforts to strengthen the DNS.

Unwrap the SERVFAIL

Post Syndicated from Anbang Wen original https://blog.cloudflare.com/unwrap-the-servfail/

Unwrap the SERVFAIL

We recently released a new version of Cloudflare Resolver which adds a piece of information called “Extended DNS Errors” (EDE) along with the response code under certain circumstances. This will be helpful in tracing DNS resolution errors and figuring out what went wrong behind the scenes.

Unwrap the SERVFAIL
(image from: https://www.pxfuel.com/en/free-photo-expka)

A tight-lipped agent

The DNS protocol was designed to map domain names to IP addresses. To inform the client about the result of the lookup, the protocol has a 4 bit field, called response code/RCODE. The logic to serve a response might look something like this:

function lookup(domain) {
    ...
    switch result {
    case "No error condition":
        return NOERROR with client expected answer
    case "No record for the request type":
        return NOERROR
    case "The request domain does not exist":
        return NXDOMAIN
    case "Refuse to perform the specified operation for policy reasons":
        return REFUSE
    default("Server failure: unable to process this query due to a problem with the name server"):
        return SERVFAIL
    }
}

try {
    lookup(domain)
} catch {
    return SERVFAIL
}

Although the context hasn’t changed much, protocol extensions such as DNSSEC have been added, which makes the RCODE run out of space to express the server’s internal status. To keep backward compatibility, DNS servers have to squeeze various statuses into existing ones. This behavior could confuse the client, especially with the “catch-all” SERVFAIL: something went wrong but what exactly?

Most often, end users don’t talk to authoritative name servers directly, but use a stub and/or a recursive resolver as an agent to acquire the information it needs. When a user receives  SERVFAIL, the failure can be one of the following:

  • The stub resolver fails to send the request.
  • The stub resolver doesn’t get a response.
  • The recursive resolver, which the stub resolver sends its query to, is overloaded.
  • The recursive resolver is unable to communicate with upstream authoritative servers.
  • The recursive resolver fails to verify the DNSSEC chain.
  • The authoritative server takes too long to respond.

In such cases, it is nearly impossible for the user to know exactly what’s wrong. The resolver is usually the one to be blamed, because, as an agent, it fails to get back the answer, and doesn’t return a clear reason for the failure in the response.

Keep backward compatibility

It seems we need to return more information, but (there’s always a but) we also need to keep the behavior of existing clients unchanged.

One way is to extend the RCODE space, which came out with the Extension mechanisms for DNS or EDNS. It defines a 8 bit EXTENDED-RCODE, as high-order bits to current 4 bit RCODE. Together they make up a 12 bit integer. This changes the processing of RCODE, requires both client and server to fully support the logic unfortunately.

Another approach is to provide out-of-band data without touching the current RCODE. This is how Extended DNS Errors is defined. It introduces a new option to EDNS, containing an INFO-CODE to describe error details with an EXTRA-TEXT as an optional supplement. The option can be repeated as many times as needed, so it’s possible for the client to get a full error chain with detailed messages. The INFO-CODE is just something like RCODE, but is 16 bits wide, while the EXTRA-TEXT is an utf-8 encoded string. For example, let’s say a client sends a request to a resolver, and the requested domain has two name servers. The client may receive a SERVFAIL response with an OPT record (see below) which contains two extended errors, one from one of the authoritative servers that shows it’s not ready to serve, and the other from the resolver, showing it cannot connect to the other name server.

;; OPT PSEUDOSECTION:
; ...
; EDE: 14 (Not Ready)
; EDE: 23 (Network Error): (cannot reach upstream 192.0.2.1)
; ...

Google has something similar in their DoH JSON API, which provides diagnostic information in the “Comment” field.

Let’s dig into it

Our 1.1.1.1 service has an initial support of the draft version of Extended DNS Errors, while we are still trying to find the best practice. As we mentioned above, this is not a breaking change, and existing clients will not be affected. The additional options can be safely ignored without any problem, since the RCODE stays the same.

If you have a newer version of dig, you can simply check it out with a known problematic domain. As you can see, due to DNSSEC verification failing, the RCODE is still SERVFAIL, but the extended error shows the failure is “DNSSEC Bogus”.

$ dig @1.1.1.1 dnssec-failed.org

; <<>> DiG 9.16.4-Debian <<>> @1.1.1.1 dnssec-failed.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1111
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; EDE: 6 (DNSSEC Bogus)
;; QUESTION SECTION:
;dnssec-failed.org.		IN	A

;; Query time: 111 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Wed Sep 01 00:00:00 PDT 2020
;; MSG SIZE  rcvd: 52

Note that Extended DNS Error relies on EDNS. So to be able to get one, the client needs to support EDNS, and needs to enable it in the request. At the time of writing this blog post, we see about 17% of queries that 1.1.1.1 received had EDNS enabled within a short time range. We hope this information will help you uncover the root cause of a SERVFAIL in the future.

DNS Flag Day 2020

Post Syndicated from Christian Elmerot original https://blog.cloudflare.com/dns-flag-day-2020/

DNS Flag Day 2020

DNS Flag Day 2020

October 1 was this year’s DNS Flag Day. Read on to find out all about DNS Flag Day and how it affects Cloudflare’s DNS services (hint: it doesn’t, we already did the work to be compliant).

What is DNS Flag Day?

DNS Flag Day is an initiative by several DNS vendors and operators to increase the compliance of implementations with DNS standards. The goal is to make DNS more secure, reliable and robust. Rather than a push for new features, DNS flag day is meant to ensure that workarounds for non-compliance can be reduced and a common set of functionalities can be established and relied upon.

Last year’s flag day was February 1, and it set forth that servers and clients must be able to properly handle the Extensions to DNS (EDNS0) protocol (first RFC about EDNS0 are from 1999 – RFC 2671). This way, by assuming clients have a working implementation of EDNS0, servers can resort to always sending messages as EDNS0. This is needed to support DNSSEC, the DNS security extensions. We were, of course, more than thrilled to support the effort, as we’re keen to push DNSSEC adoption forward .

DNS Flag Day 2020

The goal for this year’s flag day is to increase DNS messaging reliability by focusing on problems around IP fragmentation of DNS packets. The intention is to reduce DNS message fragmentation which continues to be a problem. We can do that by ensuring cleartext DNS messages sent over UDP are not too large, as large messages risk being fragmented during the transport. Additionally, when sending or receiving large DNS messages, we have the ability to do so over TCP.

Problem with DNS transport over UDP

A potential issue with sending DNS messages over UDP is that the sender has no indication of the recipient actually receiving the message. When using TCP, each packet being sent is acknowledged (ACKed) by the recipient, and the sender will attempt to resend any packets not being ACKed. UDP, although it may be faster than TCP, does not have the same mechanism of messaging reliability. Anyone still wishing to use UDP as their transport protocol of choice will have to implement this reliability mechanism in higher layers of the network stack. For instance, this is what is being done in QUIC, the new Internet transport protocol used by HTTP/3 that is built on top of UDP.

Even the earliest DNS standards (RFC 1035) specified the use of sending DNS messages over TCP as well as over UDP. Unfortunately, the choice of supporting TCP or not was up to the implementer/operator, and then firewalls were sometimes set to block DNS over TCP. More recent updates to RFC 1035, on the other hand, require that the DNS server is available to query using DNS over TCP.

DNS message fragmentation

Sending data over networks and the Internet is restricted to the limitation of how large each packet can be. Data is chopped up into a stream of packets, and sized to adhere to the Maximum Transmission Unit (MTU) of the network. MTU is typically 1500 bytes for IPv4 and, in the case of IPv6, the minimum is 1280 bytes. Subtracting both the IP header size (IPv4 20 bytes/IPv6 40 bytes) and the UDP protocol header size (8 bytes) from the MTU, we end up with a maximum DNS message size of 1472 bytes for IPv4 and 1232 bytes in order for a message to fit within a single packet. If the message is any larger than that, it will have to be fragmented into more packets.

Sending large messages causes them to get fragmented into more than one pack. This is not a problem with TCP transports since each packet is ACK:ed to ensure proper delivery. However, the same does not hold true when sending large DNS messages over UDP. For many intents and purposes, UDP has been treated as a second-class citizen to TCP as far as network routing is concerned. It is quite common to see UDP packet fragments being dropped by routers and firewalls, potentially causing parts of a message to be lost. To avoid fragmentation over UDP it is better to truncate the DNS message and set the Truncation Flag in the DNS response. This tells the recipient that more data is available if the query is retried over TCP.

DNS Flag Day 2020 wants to ensure that DNS message fragmentation does not happen. When larger DNS messages need to be sent, we need to ensure it can be done reliably over TCP.

DNS servers need to support DNS message transport over TCP in order to be compliant with this year’s flag day. Also, DNS messages sent over UDP must never exceed the limit over which they risk being fragmented.

Cloudflare authoritative DNS and 1.1.1.1

We fully support the DNS Flag Day initiative, as it aims to make DNS more reliable and robust, and it ensures a common set of features for the DNS community to evolve on. In the DNS ecosystem, we are as much a client as we are a provider. When we perform DNS lookups on behalf of our customers and users, we rely on other providers to follow standards and be compliant. When they are not, and we can’t work around the issues, it leads to problems resolving names and reaching resources.

Both our public resolver 1.1.1.1 as well as our authoritative DNS service, set and enforce reasonable limits on DNS message sizes when sent over UDP. Of course, both services are available over TCP. If you’re already using Cloudflare, there is nothing you need to do but to keep using our DNS services! We will continually work on improving DNS.

Oh, and you can test your domain on the DNS Flag Day site: https://dnsflagday.net/2020/

Secondary DNS – Deep Dive

Post Syndicated from Alex Fattouche original https://blog.cloudflare.com/secondary-dns-deep-dive/

How Does Secondary DNS Work?

Secondary DNS - Deep Dive

If you already understand how Secondary DNS works, please feel free to skip this section. It does not provide any Cloudflare-specific information.

Secondary DNS has many use cases across the Internet; however, traditionally, it was used as a synchronized backup for when the primary DNS server was unable to respond to queries. A more modern approach involves focusing on redundancy across many different nameservers, which in many cases broadcast the same anycasted IP address.

Secondary DNS involves the unidirectional transfer of DNS zones from the primary to the Secondary DNS server(s). One primary can have any number of Secondary DNS servers that it must communicate with in order to keep track of any zone updates. A zone update is considered a change in the contents of a  zone, which ultimately leads to a Start of Authority (SOA) serial number increase. The zone’s SOA serial is one of the key elements of Secondary DNS; it is how primary and secondary servers synchronize zones. Below is an example of what an SOA record might look like during a dig query.

example.com	3600	IN	SOA	ashley.ns.cloudflare.com. dns.cloudflare.com. 
2034097105  // Serial
10000 // Refresh
2400 // Retry
604800 // Expire
3600 // Minimum TTL

Each of the numbers is used in the following way:

  1. Serial – Used to keep track of the status of the zone, must be incremented at every change.
  2. Refresh – The maximum number of seconds that can elapse before a Secondary DNS server must check for a SOA serial change.
  3. Retry – The maximum number of seconds that can elapse before a Secondary DNS server must check for a SOA serial change, after previously failing to contact the primary.
  4. Expire – The maximum number of seconds that a Secondary DNS server can serve stale information, in the event the primary cannot be contacted.
  5. Minimum TTL – Per RFC 2308, the number of seconds that a DNS negative response should be cached for.

Using the above information, the Secondary DNS server stores an SOA record for each of the zones it is tracking. When the serial increases, it knows that the zone must have changed, and that a zone transfer must be initiated.  

Serial Tracking

Serial increases can be detected in the following ways:

  1. The fastest way for the Secondary DNS server to keep track of a serial change is to have the primary server NOTIFY them any time a zone has changed using the DNS protocol as specified in RFC 1996, Secondary DNS servers will instantly be able to initiate a zone transfer.
  2. Another way is for the Secondary DNS server to simply poll the primary every “Refresh” seconds. This isn’t as fast as the NOTIFY approach, but it is a good fallback in case the notifies have failed.

One of the issues with the basic NOTIFY protocol is that anyone on the Internet could potentially notify the Secondary DNS server of a zone update. If an initial SOA query is not performed by the Secondary DNS server before initiating a zone transfer, this is an easy way to perform an amplification attack. There is two common ways to prevent anyone on the Internet from being able to NOTIFY Secondary DNS servers:

  1. Using transaction signatures (TSIG) as per RFC 2845. These are to be placed as the last record in the extra records section of the DNS message. Usually the number of extra records (or ARCOUNT) should be no more than two in this case.
  2. Using IP based access control lists (ACL). This increases security but also prevents flexibility in server location and IP address allocation.

Generally NOTIFY messages are sent over UDP, however TCP can be used in the event the primary server has reason to believe that TCP is necessary (i.e. firewall issues).

Zone Transfers

In addition to serial tracking, it is important to ensure that a standard protocol is used between primary and Secondary DNS server(s), to efficiently transfer the zone. DNS zone transfer protocols do not attempt to solve the confidentiality, authentication and integrity triad (CIA); however, the use of TSIG on top of the basic zone transfer protocols can provide integrity and authentication. As a result of DNS being a public protocol, confidentiality during the zone transfer process is generally not a concern.

Authoritative Zone Transfer (AXFR)

AXFR is the original zone transfer protocol that was specified in RFC 1034 and RFC 1035 and later further explained in RFC 5936. AXFR is done over a TCP connection because a reliable protocol is needed to ensure packets are not lost during the transfer. Using this protocol, the primary DNS server will transfer all of the zone contents to the Secondary DNS server, in one connection, regardless of the serial number. AXFR is recommended to be used for the first zone transfer, when none of the records are propagated, and IXFR is recommended after that.

Incremental Zone Transfer (IXFR)

IXFR is the more sophisticated zone transfer protocol that was specified in RFC 1995. Unlike the AXFR protocol, during an IXFR, the primary server will only send the secondary server the records that have changed since its current version of the zone (based on the serial number). This means that when a Secondary DNS server wants to initiate an IXFR, it sends its current serial number to the primary DNS server. The primary DNS server will then format its response based on previous versions of changes made to the zone. IXFR messages must obey the following pattern:

  1. Current latest SOA
  2. Secondary server current SOA
  3. DNS record deletions
  4. Secondary server current SOA + changes
  5. DNS record additions
  6. Current latest SOA

Steps 2,3,4,5,6 can be repeated any number of times, as each of those represents one change set of deletions and additions, ultimately leading to a new serial.

IXFR can be done over UDP or TCP, but again TCP is generally recommended to avoid packet loss.

How Does Secondary DNS Work at Cloudflare?

The DNS team loves microservice architecture! When we initially implemented Secondary DNS at Cloudflare, it was done using Mesos Marathon. This allowed us to separate each of our services into several different marathon apps, individually scaling apps as needed. All of these services live in our core data centers. The following services were created:

  1. Zone Transferer – responsible for attempting IXFR, followed by AXFR if IXFR fails.
  2. Zone Transfer Scheduler – responsible for periodically checking zone SOA serials for changes.
  3. Rest API – responsible for registering new zones and primary nameservers.

In addition to the marathon apps, we also had an app external to the cluster:

  1. Notify Listener – responsible for listening for notifies from primary servers and telling the Zone Transferer to initiate an AXFR/IXFR.

Each of these microservices communicates with the others through Kafka.

Secondary DNS - Deep Dive
Figure 1: Secondary DNS Microservice Architecture‌‌

Once the zone transferer completes the AXFR/IXFR, it then passes the zone through to our zone builder, and finally gets pushed out to our edge at each of our 200 locations.

Although this current architecture worked great in the beginning, it left us open to many vulnerabilities and scalability issues down the road. As our Secondary DNS product became more popular, it was important that we proactively scaled and reduced the technical debt as much as possible. As with many companies in the industry, Cloudflare has recently migrated all of our core data center services to Kubernetes, moving away from individually managed apps and Marathon clusters.

What this meant for Secondary DNS is that all of our Marathon-based services, as well as our NOTIFY Listener, had to be migrated to Kubernetes. Although this long migration ended up paying off, many difficult challenges arose along the way that required us to come up with unique solutions in order to have a seamless, zero downtime migration.

Challenges When Migrating to Kubernetes

Although the entire DNS team agreed that kubernetes was the way forward for Secondary DNS, it also introduced several challenges. These challenges arose from a need to properly scale up across many distributed locations while also protecting each of our individual data centers. Since our core does not rely on anycast to automatically distribute requests, as we introduce more customers, it opens us up to denial-of-service attacks.

The two main issues we ran into during the migration were:

  1. How do we create a distributed and reliable system that makes use of kubernetes principles while also making sure our customers know which IPs we will be communicating from?
  2. When opening up a public-facing UDP socket to the Internet, how do we protect ourselves while also preventing unnecessary spam towards primary nameservers?.

Issue 1:

As was previously mentioned, one form of protection in the Secondary DNS protocol is to only allow certain IPs to initiate zone transfers. There is a fine line between primary servers allow listing too many IPs and them having to frequently update their IP ACLs. We considered several solutions:

  1. Open source k8s controllers
  2. Altering Network Address Translation(NAT) entries
  3. Do not use k8s for zone transfers
  4. Allowlist all Cloudflare IPs and dynamically update
  5. Proxy egress traffic

Ultimately we decided to proxy our egress traffic from k8s, to the DNS primary servers, using static proxy addresses. Shadowsocks-libev was chosen as the SOCKS5 implementation because it is fast, secure and known to scale. In addition, it can handle both UDP/TCP and IPv4/IPv6.

Secondary DNS - Deep Dive
Figure 2: Shadowsocks proxy Setup

The partnership of k8s and Shadowsocks combined with a large enough IP range brings many benefits:

  1. Horizontal scaling
  2. Efficient load balancing
  3. Primary server ACLs only need to be updated once
  4. It allows us to make use of kubernetes for both the Zone Transferer and the Local ShadowSocks Proxy.
  5. Shadowsocks proxy can be reused by many different Cloudflare services.

Issue 2:

The Notify Listener requires listening on static IPs for NOTIFY Messages coming from primary DNS servers. This is mostly a solved problem through the use of k8s services of type loadbalancer, however exposing this service directly to the Internet makes us uneasy because of its susceptibility to attacks. Fortunately DDoS protection is one of Cloudflare’s strengths, which lead us to the likely solution of dogfooding one of our own products, Spectrum.

Spectrum provides the following features to our service:

  1. Reverse proxy TCP/UDP traffic
  2. Filter out Malicious traffic
  3. Optimal routing from edge to core data centers
  4. Dual Stack technology
Secondary DNS - Deep Dive
Figure 3: Spectrum interaction with Notify Listener

Figure 3 shows two interesting attributes of the system:

  1. Spectrum <-> k8s IPv4 only:
  2. This is because our custom k8s load balancer currently only supports IPv4; however, Spectrum has no issue terminating the IPv6 connection and establishing a new IPv4 connection.
  3. Spectrum <-> k8s routing decisions based of L4 protocol:
  4. This is because k8s only supports one of TCP/UDP/SCTP per service of type load balancer. Once again, spectrum has no issues proxying this correctly.

One of the problems with using a L4 proxy in between services is that source IP addresses get changed to the source IP address of the proxy (Spectrum in this case). Not knowing the source IP address means we have no idea who sent the NOTIFY message, opening us up to attack vectors. Fortunately, Spectrum’s proxy protocol feature is capable of adding custom headers to TCP/UDP packets which contain source IP/Port information.

As we are using miekg/dns for our Notify Listener, adding proxy headers to the DNS NOTIFY messages would cause failures in validation at the DNS server level. Alternatively, we were able to implement custom read and write decorators that do the following:

  1. Reader: Extract source address information on inbound NOTIFY messages. Place extracted information into new DNS records located in the additional section of the message.
  2. Writer: Remove additional records from the DNS message on outbound NOTIFY replies. Generate a new reply using proxy protocol headers.

There is no way to spoof these records, because the server only permits two extra records, one of which is the optional TSIG. Any other records will be overwritten.

Secondary DNS - Deep Dive
Figure 4: Proxying Records Between Notifier and Spectrum‌‌

This custom decorator approach abstracts the proxying away from the Notify Listener through the use of the DNS protocol.  

Although knowing the source IP will block a significant amount of bad traffic, since NOTIFY messages can use both UDP and TCP, it is prone to IP spoofing. To ensure that the primary servers do not get spammed, we have made the following additions to the Zone Transferer:

  1. Always ensure that the SOA has actually been updated before initiating a zone transfer.
  2. Only allow at most one working transfer and one scheduled transfer per zone.

Additional Technical Challenges

Zone Transferer Scheduling

As shown in figure 1, there are several ways of sending Kafka messages to the Zone Transferer in order to initiate a zone transfer. There is no benefit in having a large backlog of zone transfers for the same zone. Once a zone has been transferred, assuming no more changes, it does not need to be transferred again. This means that we should only have at most one transfer ongoing, and one scheduled transfer at the same time, for any zone.

If we want to limit our number of scheduled messages to one per zone, this involves ignoring Kafka messages that get sent to the Zone Transferer. This is not as simple as ignoring specific messages in any random order. One of the benefits of Kafka is that it holds on to messages until the user actually decides to acknowledge them, by committing that messages offset. Since Kafka is just a queue of messages, it has no concept of order other than first in first out (FIFO). If a user is capable of reading from the Kafka topic concurrently, it is entirely possible that a message in the middle of the queue be committed before a message at the end of the queue.

Most of the time this isn’t an issue, because we know that one of the concurrent readers has read the message from the end of the queue and is processing it. There is one Kubernetes-related catch to this issue, though: pods are ephemeral. The kube master doesn’t care what your concurrent reader is doing, it will kill the pod and it’s up to your application to handle it.

Consider the following problem:

Secondary DNS - Deep Dive
Figure 5: Kafka Partition‌‌
  1. Read offset 1. Start transferring zone 1.
  2. Read offset 2. Start transferring zone 2.
  3. Zone 2 transfer finishes. Commit offset 2, essentially also marking offset 1.
  4. Restart pod.
  5. Read offset 3 Start transferring zone 3.

If these events happen, zone 1 will never be transferred. It is important that zones stay up to date with the primary servers, otherwise stale data will be served from the Secondary DNS server. The solution to this problem involves the use of a list to track which messages have been read and completely processed. In this case, when a zone transfer has finished, it does not necessarily mean that the kafka message should be immediately committed. The solution is as follows:

  1. Keep a list of Kafka messages, sorted based on offset.
  2. If finished transfer, remove from list:
  3. If the message is the oldest in the list, commit the messages offset.
Secondary DNS - Deep Dive
Figure 6: Kafka Algorithm to Solve Message Loss

This solution is essentially soft committing Kafka messages, until we can confidently say that all other messages have been acknowledged. It’s important to note that this only truly works in a distributed manner if the Kafka messages are keyed by zone id, this will ensure the same zone will always be processed by the same Kafka consumer.

Life of a Secondary DNS Request

Although Cloudflare has a large global network, as shown above, the zone transferring process does not take place at each of the edge datacenter locations (which would surely overwhelm many primary servers), but rather in our core data centers. In this case, how do we propagate to our edge in seconds? After transferring the zone, there are a couple more steps that need to be taken before the change can be seen at the edge.

  1. Zone Builder – This interacts with the Zone Transferer to build the zone according to what Cloudflare edge understands. This then writes to Quicksilver, our super fast, distributed KV store.
  2. Authoritative Server – This reads from Quicksilver and serves the built zone.
Secondary DNS - Deep Dive
Figure 7: End to End Secondary DNS‌‌

What About Performance?

At the time of writing this post, according to dnsperf.com, Cloudflare leads in global performance for both Authoritative and Resolver DNS. Here, Secondary DNS falls under the authoritative DNS category here. Let’s break down the performance of each of the different parts of the Secondary DNS pipeline, from the primary server updating its records, to them being present at the Cloudflare edge.

  1. Primary Server to Notify Listener – Our most accurate measurement is only precise to the second, but we know UDP/TCP communication is likely much faster than that.
  2. NOTIFY to Zone Transferer – This is negligible
  3. Zone Transferer to Primary Server – 99% of the time we see ~800ms as the average latency for a zone transfer.
Secondary DNS - Deep Dive
Figure 8: Zone XFR latency

4. Zone Transferer to Zone Builder – 99% of the time we see ~10ms to build a zone.

Secondary DNS - Deep Dive
Figure 9: Zone Build time

5. Zone Builder to Quicksilver edge: 95% of the time we see less than 1s propagation.

Secondary DNS - Deep Dive
Figure 10: Quicksilver propagation time

End to End latency: less than 5 seconds on average. Although we have several external probes running around the world to test propagation latencies, they lack precision due to their sleep intervals, location, provider and number of zones that need to run. The actual propagation latency is likely much lower than what is shown in figure 10. Each of the different colored dots is a separate data center location around the world.

Secondary DNS - Deep Dive
Figure 11: End to End Latency

An additional test was performed manually to get a real world estimate, the test had the following attributes:

Primary server: NS1
Number of records changed: 1
Start test timer event: Change record on NS1
Stop test timer event: Observe record change at Cloudflare edge using dig
Recorded timer value: 6 seconds

Conclusion

Cloudflare serves 15.8 trillion DNS queries per month, operating within 100ms of 99% of the Internet-connected population. The goal of Cloudflare operated Secondary DNS is to allow our customers with custom DNS solutions, be it on-premise or some other DNS provider, to be able to take advantage of Cloudflare’s DNS performance and more recently, through Secondary Override, our proxying and security capabilities too. Secondary DNS is currently available on the Enterprise plan, if you’d like to take advantage of it, please let your account team know. For additional documentation on Secondary DNS, please refer to our support article.

Orange Clouding with Secondary DNS

Post Syndicated from Alex Fattouche original https://blog.cloudflare.com/orange-clouding-with-secondary-dns/

What is secondary DNS?

Orange Clouding with Secondary DNS

In a traditional sense, secondary DNS servers act as a backup to the primary authoritative DNS server.  When a change is made to the records on the primary server, a zone transfer occurs, synchronizing the secondary DNS servers with the primary server. The secondary servers can then serve the records as if they were the primary server, however changes can only be made by the primary server, not the secondary servers. This creates redundancy across many different servers that can be distributed as necessary.

There are many common ways to take advantage of Secondary DNS, some of which are:

  1. Secondary DNS as passive backup – The secondary DNS server sits idle until the primary server goes down, at which point a failover can occur and the secondary can start serving records.
  2. Secondary DNS as active backup – The secondary DNS server works alongside the primary server to serve records.
  3. Secondary DNS with a hidden primary – The nameserver records at the registrar point towards the secondary servers only, essentially treating them as the primary nameservers.

What is secondary DNS Override?

Secondary DNS Override builds on the Secondary DNS with a hidden primary model by allowing our customers to not only serve records as they tell us to, but also enable them to proxy any A/AAAA/CNAME records through Cloudflare’s network. This is similar to how Cloudflare as a primary DNS provider currently works.

Consider the following example:

example.com Cloudflare IP – 192.0.2.0
example.com origin IP – 203.0.113.0

In order to take advantage of Cloudflare’s security and performance services, we need to make sure that the origin IP stays hidden from the Internet.

Orange Clouding with Secondary DNS
Figure 1: Secondary DNS without a hidden primary nameserver

Figure 1 shows that without a hidden primary nameserver, the resolver can choose to query either one. This opens up two issues:

  1. Violates RFC 1034 and RFC 2182 because the Cloudflare server will be responding differently than the primary nameserver.
  2. The origin IP will be exposed to the Internet.
Orange Clouding with Secondary DNS
Figure 2: Secondary DNS with a hidden primary nameserver

Figure 2 shows the resolver always querying the Cloudflare Secondary DNS server.

How Does Secondary DNS Override work

The Secondary DNS Override UI looks similar to the primary UI, the only difference is that records cannot be edited.

Orange Clouding with Secondary DNS
Figure 3: Secondary DNS Override Dashboard

In figure 3, all of the records have been transferred from the primary DNS server. test-orange and test-aaaa-orange have been overridden to proxy through the cloudflare network, while test-grey and test-mx are treated as regular DNS records.

Behind the scenes we store override records that pair with transferred records based on the name. For secondary override we don’t care about the type when overriding records, because of two things:

  1. According to RFC 1912 you cannot have a CNAME record with the same name as any other record. (This does not apply to some DNSSEC records, see RFC 2181)
  2. A and AAAA records are both address type records which should be either all proxied or all not proxied under the same name.

This means if you have several A and several AAAA records all with the name “example.com”, if one of them is proxied, all of them will be proxied. The UI helps abstract the idea that we are storing additional override records through the “orange cloud” button, which when clicked, will create an override record which applies to all A/AAAA or CNAME records with that name.

CNAME at the Apex

Normally, putting a CNAME at the apex of a zone is not allowed. For example:

example.com CNAME other-domain.com

Is not allowed because this means that there will be at least one other SOA record and one other NS record with the same name, disobeying RFC 1912 as mentioned above. Cloudflare can overcome this through the use of CNAME Flattening, which is a common technique used within the primary DNS product today. CNAME flattening allows us to return address records instead of the CNAME record when a query comes into our authoritative server.

Contrary to what was said above regarding the prevention of editing records through the Secondary DNS Override UI, the CNAME at the apex is the one exception to this rule. Users are able to create a CNAME at the apex in addition to the regular secondary DNS records, however the same rules defined in RFC 1912 also apply here. What this means is that the CNAME at the apex record can be treated as a regular DNS record or a proxied record, depending on what the user decides. Regardless of the proxy status of the CNAME at the apex record, it will override any other A/AAAA records that have been transferred from the primary DNS server.

Merging Secondary, Override and CNAME at Apex Records

At record edit time we do all of the merging of the secondary, override and CNAME at the apex records. This means that when a DNS request comes in to our authoritative server at the edge, we can still return the records in blazing fast times. The workflow is shown in figure 4.

Orange Clouding with Secondary DNS
Figure 4: Record Merging process

Within the zone builder the steps are as follows:

  1. Check if there is any CNAME at the apex, if so, override all other A/AAAA secondary records at the apex.
  2. For each secondary record, check if there is a matching override record, if so, apply the proxy status of the override record to all secondary records with that name.
  3. Leave all other secondary records as is.

Getting Started

Secondary DNS Override is a great option for any users that want to take advantage of the Cloudflare network, without transferring all of their zones to Cloudflare DNS as a primary provider. Security and access control can be managed on the primary side, without worrying about unauthorized edits of information on the Cloudflare side.

Secondary DNS Override is currently available on the Enterprise plan, if you’d like to take advantage of it, please let your account team know. For additional documentation on Secondary DNS Override, please refer to our support article.

Bringing Your Own IPs to Cloudflare (BYOIP)

Post Syndicated from Tom Brightbill original https://blog.cloudflare.com/bringing-your-own-ips-to-cloudflare-byoip/

Bringing Your Own IPs to Cloudflare (BYOIP)

Today we’re thrilled to announce general availability of Bring Your Own IP (BYOIP) across our Layer 7 products as well as Spectrum and Magic Transit services. When BYOIP is configured, the Cloudflare edge will announce a customer’s own IP prefixes and the prefixes can be used with our Layer 7 services, Spectrum, or Magic Transit. If you’re not familiar with the term, an IP prefix is a range of IP addresses. Routers create a table of reachable prefixes, known as a routing table, to ensure that packets are delivered correctly across the Internet.

As part of this announcement, we are listing BYOIP on the relevant product pages, developer documentation, and UI support for controlling your prefixes. Previous support was API only.

Customers choose BYOIP with Cloudflare for a number of reasons. It may be the case that your IP prefix is already allow-listed in many important places, and updating firewall rules to also allow Cloudflare address space may represent a large administrative hurdle. Additionally, you may have hundreds of thousands, or even millions, of end users pointed directly to your IPs via DNS, and it would be hugely time consuming to get them all to update their records to point to Cloudflare IPs.

Over the last several quarters we have been building tooling and processes to support customers bringing their own IPs at scale. At the time of writing this post we’ve successfully onboarded hundreds of customer IP prefixes. Of these, 84% have been for Magic Transit deployments, 14% for Layer 7 deployments, and 2% for Spectrum deployments.

When you BYOIP with Cloudflare, this means we announce your IP space in over 200 cities around the world and tie your IP prefix to the service (or services!) of your choosing. Your IP space will be protected and accelerated as if they were Cloudflare’s own IPs. We can support regional deployments for BYOIP prefixes as well if you have technical and/or legal requirements limiting where your prefixes can be announced, such as data sovereignty.

Bringing Your Own IPs to Cloudflare (BYOIP)

You can turn on advertisement of your IPs from the Cloudflare edge with a click of a button and be live across the world in a matter of minutes.

All BYOIP customers receive network analytics on their prefixes. Additionally all IPs in BYOIP prefixes can be considered static IPs. There are also benefits specific to the service you use with your IP prefix on Cloudflare.

Layer 7 + BYOIP:

Cloudflare has a robust Layer 7 product portfolio, including products like Bot Management, Rate Limiting, Web Application Firewall, and Content Delivery, to name just a few. You can choose to BYOIP with our Layer 7 products and receive all of their benefits on your IP addresses.

For Layer 7 services, we can support a variety of IP to domain mapping requests including sharing IPs between domains or putting domains on dedicated IPs, which can help meet requirements for things such as non-SNI support.

If you are also an SSL for SaaS customer, using BYOIP, you have increased flexibility to change IP address responses for custom_hostnames in the event an IP is unserviceable for some reason.

Spectrum + BYOIP:

Spectrum is Cloudflare’s solution to protect and accelerate applications that run any UDP or TCP protocol. The Spectrum API supports BYOIP today. Spectrum customers who use BYOIP can specify, through Spectrum’s API, which IPs they would like associated with a Spectrum application.

Magic Transit + BYOIP:

Magic Transit is a Layer 3 security service which processes all your network traffic by announcing your IP addresses and attracting that traffic to the Cloudflare edge for processing.  Magic Transit supports sophisticated packet filtering and firewall configurations. BYOIP is a requirement for using the Magic Transit service. As Magic Transit is an IP level service, Cloudflare must be able to announce your IPs in order to provide this service

Bringing Your IPs to Cloudflare: What is Required?

Before Cloudflare can announce your prefix we require some documentation to get started. The first is something called a ‘Letter of Authorization’ (LOA), which details information about your prefix and how you want Cloudflare to announce it. We then share this document with our Tier 1 transit providers in advance of provisioning your prefix. This step is done to ensure that Tier 1s are aware we have authorization to announce your prefixes.

Secondly, we require that your Internet Routing Registry (IRR) records are up to date and reflect the data in the LOA. This typically means ensuring the entry in your regional registry is updated (i.e. ARIN, RIPE, APNIC).

Once the administrivia is out of the way, work with your account team to learn when your prefixes will be ready to announce.

We also encourage customers to use RPKI and can support this for customer prefixes. We have blogged and built extensive tooling to make adoption of this protocol easier. If you’re interested in BYOIP with RPKI support just let your account team know!

Configuration

Each customer prefix can be announced via the ‘dynamic advertisement’ toggle in either the UI or API, which will cause the Cloudflare edge to either announce or withdraw a prefix on your behalf. This can be done as soon as your account team lets you know your prefixes are ready to go.

Once the IPs are ready to be announced, you may want to set up ‘delegations’ for your prefixes. Delegations manage how the prefix can be used across multiple Cloudflare accounts and have slightly different implications depending on which service your prefix is bound to. A prefix is owned by a single account, but a delegation can extend some of the prefix functionality to other accounts. This is also captured on our developer docs. Today, delegations can affect Layer 7 and Spectrum BYOIP prefixes.

Bringing Your Own IPs to Cloudflare (BYOIP)

Layer 7: If you use BYOIP + Layer 7 and also use the SSL for SaaS service, a delegation to another account will allow that account to also use that prefix to validate custom hostnames in addition to the original account which owns the prefix. This means that multiple accounts can use the same IP prefix to serve up custom hostname traffic. Additionally, all of your IPs can serve traffic for custom hostnames, which means you can easily change IP addresses for these hostnames if an IP is blocked for any reason.

Spectrum: If you used BYOIP + Spectrum, via the Spectrum API, you can specify which IP in your prefix you want to create a Spectrum app with. If you create a delegation for prefix to another account, that second account will also be able to specify an IP from that prefix to create an app.

If you are interested in learning more about BYOIP across either Magic Transit, CDN, or Spectrum, please reach out to your account team if you’re an existing customer or contact [email protected] if you’re a new prospect.

Secondary DNS — A faster, more resilient way to serve your DNS records

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/secondary-dns-a-faster-more-resilient-way-to-serve-your-dns-records/

Secondary DNS — A faster, more resilient way to serve your DNS records

What is secondary DNS, and why is it important?

Secondary DNS — A faster, more resilient way to serve your DNS records

In DNS, nameservers are responsible for serving DNS records for a zone. How the DNS records populate into the nameservers differs based on the type of nameserver.

A primary master is a nameserver that manages a zone’s DNS records. This is where the zone file is maintained and where DNS records are added, removed, and modified. However, relying on one DNS server can be risky. What if that server goes down, or your DNS provider has an outage? If you run a storefront, then your customers would have to wait until your DNS server is back up to access your site. If your website were a brick and mortar store, this would be effectively like boarding up the door while customers are trying to get in.This type of outage can be very costly.

Now imagine you have another DNS server that has a replica of your DNS records. Wouldn’t it be great to have it as a back-up if your primary nameserver went down? Or better yet, what if both served your DNS records at all times— this could help decrease the latency of DNS requests, distribute the load between DNS servers, and add resiliency to your infrastructure! And that’s precisely what Secondary DNS nameservers were built for.

Secondary DNS — A faster, more resilient way to serve your DNS records

As businesses grow, they often scale their DNS infrastructure. We’re seeing more customers move away from two or three on-premise DNS servers to using a managed DNS provider to having multiple DNS vendors—all to increase redundancy against the possibility of a DDoS attack taking down one of their providers. Cloudflare has data centers in over 200 cities, all of which run our DNS software allowing our authoritative DNS customers to benefit from DNS lookups averaging around 11ms globally. So we decided to expand this functionality to customers who want to use more than one DNS provider, or for those that find it too complicated to move away from their on-premise DNS server.

Customer Challenges

When we first built our secondary DNS product, our MVP was focused on functionality and not ease of use. We did this because we thought that this feature would be used by a small portion of our Enterprise customers and that they would be comfortable using the API. But the demand for secondary DNS was far greater than we initially imagined. Many customers are interested in the service, including those who aren’t comfortable managing DNS  through the API.

Previously, setting up secondary DNS on a zone required a series of API calls: one for creating the zone, one for defining the IP address and settings of the master server, one for linking the master(s) to the zone, and one for initiating a zone transfer.

Secondary DNS — A faster, more resilient way to serve your DNS records

We heard from customers that this experience was frustrating. There were also a lot of places where the setup could go wrong: some customers would forget to link a master to their zone, others would forget a step when adding subsequent zones, and still others would have to spend hours debugging a typo in their API call. We believe secondary DNS customers should have as seamless an experience as our authoritative DNS customers, and shouldn’t be treated as secondary (pun intended) class citizens. When creating the onboarding UI, we asked ourselves, how can we simplify the experience to just a few input fields? How do we prevent customers from making easy, potentially messy mistakes, like forgetting to attach a master?

Enter: The new Secondary DNS Onboarding Experience

Starting today, enterprise customers who are entitled to secondary DNS will be able to configure their zone in the Cloudflare Dashboard. The time from when they type in their domain name to when they see their records in the dashboard is less than two minutes. We’ve added error prevention to stop customers from adding their zone until they’ve configured at least one master. Customers will also be able to review their transferred records before finishing the onboarding process, allowing them to see what was transferred, without juggling API calls and and switching back and forth between the dashboard and a support article.

How It Looks

The “Add Site” flow in the Cloudflare Dashboard gives customers two options: Authoritative or Secondary DNS. Next, they will need to fill out the IP address of their master server, attach a TSIG (Transactional Signature) to authenticate zone transfers, and voila! In just a few clicks, records populate to your DNS table.

Secondary DNS — A faster, more resilient way to serve your DNS records

The Intricacies of Secondary DNS

As mentioned above, primary nameservers are where DNS records are managed, and secondary nameservers are responsible for holding the read-only replica of those records. But how do they get there? The communication between a primary master and a secondary nameserver is known as a zone transfer.

Master servers use SOA (Start of Authority) records to keep track of zone updates. Every time a zone file changes (say you add or remove a DNS record), the serial number of the SOA record is incremented as a way to signal secondary nameservers that the zone updated, and it’s time to fetch a fresh copy.

Primary masters can send a NOTIFY message to a secondary master to signal a zone file change. Once the secondary receives the NOTIFY, it will do an SOA sanity check against the master and perform a zone transfer if it sees that the SOA value has increased. An AXFR or IXFR query can initiate the zone transfer. An AXFR query initiates a full zone transfer and is usually requested the very first time a zone is transferred. But AXFR transfers are not always necessary as most zone file changes are minute. This is why IXFR (incremental zone transfer) requests were created— they tell a master server which version of the zone a secondary currently holds and the master sends the difference between the new version and the one the secondary has— this way only the new changes are transferred.

Secondary DNS — A faster, more resilient way to serve your DNS records

Some masters, unfortunately, do not support NOTIFY queries. This means that instead of the master notifying the secondary of zone updates, the secondary needs to periodically check the SOA of the primary server to see if the value has changed.

Securing Zone Transfers

Secondary DNS — A faster, more resilient way to serve your DNS records

Zone transfers between a primary and secondary server are unauthenticated on their own. TSIGs (Transactional Signatures) were developed as a means to add authentication to the DNS protocol, and have mostly been used for zone transfers. They provide mutual authentication between a client and a server by using a shared secret between the two parties and a one-way keyed hash function, which is attached as a TSIG record to a DNS message. The TSIG record guarantees that only secondary nameservers with the TSIG can pull zone transfers from a master. And vice versa, secondary servers will only accept zone transfers from masters that have the proper TSIG attached. Additionally, TSIGs provide data integrity and ensure that the DNS message was not modified en route.

We support TSIGs and highly recommend that you add it when configuring your master.

Extending DNS Analytics to Secondary DNS

Setting up a secondary zone on Cloudflare is a simple process with the new onboarding UI. In just a few clicks, Cloudflare’s nameservers in all 200+ cities will begin responding to DNS queries. In addition to serving DNS records, secondary DNS customers will also be able to see the same DNS analytics that we provide to our authoritative DNS customers. The analytics show a breakdown of DNS traffic by record type, response code, and even geographical regions.

One of our customers, Big Cartel, runs an E-commerce platform that has helped people all over the world sell $2.5 billion of their work since 2005. As they grow, Cloudflare’s secondary DNS product helps keep their site fast and reliable:

“At Big Cartel, we provide an online storefront for our customers. We need to be always available and avoid any chances of downtime — eliminating all single points of failure is critical for us. With Cloudflare’s Secondary DNS, we can do just that! It keeps our DNS infrastructure more resilient while allowing our customers to benefit from fast query times. Additionally, using Cloudflare’s Secondary DNS analytics provides granular insights into how our traffic is balanced between our DNS providers” – Lee Jensen, Technical Director

Getting Started

Secondary DNS is currently available on the Enterprise plan, if you’d like to take advantage of it, please let your account team know. For additional documentation on Secondary DNS, please refer to our support article.

Making DNS record changes more reliable

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/making-dns-record-changes-more-reliable/

Making DNS record changes more reliable

Making DNS record changes more reliable

DNS is the very first step in accessing any website, API, or pretty much anything on the Internet, which makes it mission-critical to keeping your site up and running. This week, we are launching two significant changes that allow our customers to better maintain and update their DNS records. For customers who use Cloudflare as their authoritative DNS provider, we’ve added a much asked for feature: confirmation to DNS record edits. For our secondary DNS customers, we’re excited to provide a brand new onboarding experience.

Confirm and Commit

One of the benefits of using Cloudflare DNS is that changes quickly propagate to our 200+ data centers. And I mean very quickly: DNS propagation typically takes <5 seconds worldwide. Our UI was set up to allow customers to edit records, click out of the input box, and boom! The record has propagated!

Making DNS record changes more reliable

There are a lot of advantages to fast DNS, but there’s also one clear downside – it leaves room for fat fingering. What if you accidentally toggle the proxy icon, or mistype the content of your DNS record? This could result in users not being able to access your website or API and could cause a significant outage. To protect customers from these kinds of mistakes, we’ve added a Save button for DNS record changes.

Now editing records in the DNS table allows you to take an extra look before committing the change.

Making DNS record changes more reliable

The new confirmation layout applies to all record types and affects any content, TTL, or proxy status changes.

Let us know what you think by filling out the feedback survey linked at the top of the DNS tab in the dashboard.

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Post Syndicated from Jason Farber original https://blog.cloudflare.com/deploying-gateway-using-a-raspberry-pi-dns-over-https-and-pi-hole/

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Like many who are able, I am working remotely and in this post, I describe some of the ways to deploy Cloudflare Gateway directly from your home. Gateway’s DNS filtering protects networks from malware, phishing, ransomware and other security threats. It’s not only for corporate environments – it can be deployed on your browser or laptop to protect your computer or your home WiFi. Below you will learn how to deploy Gateway, including, but not limited to, DNS over HTTPS (DoH) using a Raspberry Pi, Pi-hole and DNSCrypt.

We recently launched Cloudflare Gateway and shortly thereafter, offered it for free until at least September to any company in need. Cloudflare leadership asked the global Solutions Engineering (SE) team, amongst others, to assist with the incoming onboarding calls. As an SE at Cloudflare, our role is to learn new products, such as Gateway, to educate, and to ensure the success of our prospects and customers. We talk to our customers daily, understand the challenges they face and consult on best practices. We were ready to help!

One way we stay on top of all the services that Cloudflare provides, is by using them ourselves. In this blog, I’ll talk about my experience setting up Cloudflare Gateway.

Gateway sits between your users, device or network and the public Internet. Once you setup Cloudflare Gateway, the service will inspect and manage all Internet-bound DNS queries. In simple terms, you point your recursive DNS to Cloudflare and we enforce policies you create, such as activating SafeSearch, an automated filter for adult and offensive content that’s built into popular search engines like Google, Bing, DuckDuckGo, Yandex and others.

There are various methods and locations DNS filtering can be enabled, whether it’s on your entire laptop, each of your individual browsers and devices or your entire home network. With DNS filtering front of mind, including DoH, I explored each model. The model you choose ultimately depends on your objective.

But first, let’s review what DNS and DNS over HTTPS are.

DNS is the protocol used to resolve hostnames (like www.cloudflare.com) into IP addresses so computers can talk to each other. DNS is an unencrypted clear text protocol, meaning that any eavesdropper or machine between the client and DNS server can see the contents of the DNS request. DNS over HTTPS adds security to DNS and encrypt DNS queries using HTTPS (the protocol we use to encrypt the web).

Let’s get started

Navigate to https://dash.teams.cloudflare.com. If you don’t already have an account, the sign up process only takes a few minutes.

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Configuring a Gateway location, shown below, is the first step.

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Conceptually similar to HTTPS traffic, when our edge receives an HTTPS request, we match the incoming SNI header to the correct domain’s configuration (or for plain text HTTP the Host header). And when our edge receives a DNS query, we need a similar mapping to identify the correct configuration. We attempt to match configurations, in this order:

  1. DNS over HTTPS check and lookup based on unique hostname
  2. IPv4 check and lookup based on source IPv4 address
  3. Lookup based on IPv6 destination address

Let’s discuss each option.

DNS over HTTPS

The first attempt to match DNS requests to a location is pointing your traffic to a unique DNS over HTTPS hostname. After you configure your first location, you are given a unique destination IPv6 address and a unique DoH endpoint as shown below. Take note of the hostname as you will need it shortly. I’ll first discuss deploying Gateway in a browser and then to your entire network.

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

DNS over HTTPS is my favorite method for deploying Gateway and securing DNS queries at the same time. Enabling DoH prevents anyone but the DNS server of your choosing from seeing your DNS queries.

Enabling DNS over HTTPS in browsers

By enabling it in a browser, only queries issued in that browser are affected. It’s available in most browsers and there are quite a few tutorials online to show you how to turn it on.

Browser Supports DoH Supports Custom Alternative Providers Supports Custom Servers
Chrome Yes Yes No
Safari No No No
Edge Yes** Yes** No**
Firefox Yes Yes Yes
Opera Yes* Yes* No*
Brave Yes* Yes* No*
Vivaldi Yes* Yes* No*

* Chromium based browser. Same support as Chrome
** Most recent version of Edge is built on Chromium

Chromium based browsers

Using Chrome as an example on behalf of all the Chromium-based browsers, enabling DNS over HTTPS is straightforward, but as you can see in the table above, there is one issue: Chrome does not currently support custom servers. So while it is great that a user can protect their DNS queries, they cannot choose the provider, including Gateway.

Here is how to enable DoH in Chromium based browsers:

Navigate to chrome://flags and toggle the beta flag to enabled.

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Firefox

Firefox is the exception to the rule because they support both DNS over HTTPS and the ability to define a custom server. Mozilla provides detailed instructions about how to get started.

Once enabled, navigate to Preferences -> General -> Network Security and select ‘Settings’. Scroll to the section ‘Enable DNS over HTTPS’, select ‘Custom’ and input your Gateway DoH address, as shown below:

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Optionally, you can enable Encrypted SNI (ESNI), which is an IETF draft for encrypting the SNI headers, by toggling the ‘network.security.esni.enabled’ preference in about:config to ‘true’. Read more about how Cloudflare is one of the few providers that supports ESNI by default.

Congratulations, you’ve configured Gateway using DNS over HTTPS! Keep in mind that only queries issued from the configured browser will be secured. Any other device connected to your network such as your mobile devices, gaming platforms, or smart TVs will still use your network’s default DNS server, likely assigned by your ISP.

Configuring Gateway for your entire home or business network

Deploying Gateway at the router level allows you to secure every device on your network without needing to configure each one individually.

Requirements include:

  • Access to your router’s administrative portal
  • A router that supports DHCP forwarding
  • Raspberry Pi with WiFi or Ethernet connectivity

There aren’t any consumer routers on the market that natively support DoH custom servers and likely few that natively support DoH at all. A newer router I purchased, the Netgear R7800, does not support either, but it is one of the most popular routers for flashing dd-wrt or open-wrt, which both support DoH. Unfortunately, neither of these popular firmwares support custom servers.

While it’s rare to find a router that supports DoH out of the box, DoH with custom servers, or has potential to be flashed, it’s common for a router to support DHCP forwarding (dd-wrt and open-wrt both support DHCP forwarding). So, I installed Pi-hole on my Raspberry Pi and used it as my home network’s DNS and DHCP server.

Getting started with Pi-hole and dnscrypt-proxy

If your Raspberry Pi is new and hasn’t been configured yet, follow their guide to get started. (Note: by default, ssh is disabled, so you will need a keyboard and/or mouse to access your box in your terminal.)

Once your Raspberry Pi has been initialized, assign it a static IP address in the same network as your router. I hardcoded my router’s LAN address to 192.168.1.2

Using vim:
sudo vi /etc/dhcpcd.conf

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Restart the service.
sudo /etc/init.d/dhcpcd restart

Check that your static IP is configured correctly.
ip addr show dev eth0

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Now that your Raspberry Pi is configured, we need to install Pi-hole: https://github.com/pi-hole/pi-hole/#one-step-automated-install

I chose to use dnscrypt-proxy as the local service that Pi-hole will use to forward all DNS queries. You can find the latest version here.

To install dnscrypt-proxy on your pi-hole, follow these steps:

wget https://github.com/DNSCrypt/dnscrypt-proxy/releases/download/2.0.39/dnscrypt-proxy-linux_arm-2.0.39.tar.gz
tar -xf dnscrypt-proxy-linux_arm-2.0.39.tar.gz
mv linux-arm dnscrypt-proxy
cd dnscrypt-proxy
cp example-dnscrypt-proxy.toml dnscrypt-proxy.toml

Next step is to build a DoH stamp. A stamp is simply an encoded DNS address that encodes your DoH server and other options.

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

As a reminder, you can find Gateway’s unique DoH address in your location configuration.

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

At the very bottom of your dnscrypt-proxy.toml configuration file, uncomment both lines beneath [static].

  • Change  [static.'myserver'] to [static.'gateway']
  • Replace the default stamp with the one generated above

The static section should look similar to this:

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Also in dnscrypt-proxy.toml configuration file, update the following settings:
server_names = ['gateway']
listen_addresses = ['127.0.0.1:5054']
fallback_resolvers = ['1.1.1.1:53', '1.0.0.1:53']
cache = false

Now we need to install dnscrypt-proxy as a service and configure Pi-hole to point to the listen_addresses defined above.

Install dnscrypt-proxy as a service:
sudo ./dnscrypt-proxy -service install

Start the service:
sudo ./dnscrypt-proxy -service start

You can validate the status of the service by running:
sudo service dnscrypt-proxy status or netstat -an | grep 5054:

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole
Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Also, confirm the upstream is working by querying localhost on port 5054:
dig www.cloudflare.com  -p 5054 @127.0.0.1

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

You will see a matching request in the Gateway query log (note the timestamps match):

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Configuring DNS and DHCP in the Pi-hole administrative console

Open your browser and navigate to the Pi-hole’s administrative console. In my case, it’s http://192.168.1.6/admin

Go to Settings -> DNS to modify the upstream DNS provider, which we’ve just configured to be dnscrypt-proxy.

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Change the upstream server to 127.0.0.1#5054 and hit save. If you want to deploy redundancy, add in a secondary address in Custom 2, such as 1.1.1.1 or Custom 3, such as your IPv6 destination address.

Almost done!

In Settings->DHCP, enable the DHCP server:

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Hit save.

At this point, your Pi-hole server is running in isolation and we need to deploy it to your network. The simplest way to ensure your Pi-hole is being used exclusively by every device is to use your Pi-hole as both a DNS server and a DHCP server. I’ve found that routers behave oddly if you outsource DNS but not DHCP, so I outsource both.

After I enabled the DHCP server on the Pi-hole, I set the router’s configuration to DHCP forwarding and defined the Pi-hole static address:

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

After applying the routers configuration, I confirmed it was working properly by forgetting the network in my network settings and re-joining. This results in a new IPv4 address (from our new DHCP server) and if all goes well, a new DNS server (the IP of Pi-hole).

Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole
Deploying Gateway using a Raspberry Pi, DNS over HTTPS and Pi-hole

Done!

Now that our entire network is using Gateway, we can configure Gateway Policies to secure our DNS queries!

IPv4 check and lookup based on source IPv4 address

For this method to work properly, Gateway requires that your network has a static IPv4 address. If your IP address does not change, then this is the quickest solution (but still does not prevent third-parties from seeing what domains you are going to). However, if you are configuring Gateway in your home, like I am, and you don’t explicitly pay for this service, then most likely you have a dynamic IP address. These addresses will always change when your router restarts, intentionally or not.

Lookup based on IPv6 destination address

Another option for matching requests in Gateway is to configure your DNS server to point to a unique IPv6 address provided to you by Cloudflare. Any DNS query pointed to this address will be matched properly on our edge.

This might be a good option if you want to use Cloudflare Gateway on your entire laptop by setting your local DNS resolution to this address. However, if your home router or ISP does not support IPv6, DNS resolution won’t work.

Conclusion

In this blog post, we’ve discussed the various ways Gateway can be deployed and how DNS over HTTPS is one of the next big Internet privacy improvements. Deploying Gateway can be done on a per device basis, on your router or even with a Raspberry Pi.

Microsoft Buys Corp.com

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/04/microsoft_buys_.html

A few months ago, Brian Krebs told the story of the domain corp.com, and how it is basically a security nightmare:

At issue is a problem known as “namespace collision,” a situation where domain names intended to be used exclusively on an internal company network end up overlapping with domains that can resolve normally on the open Internet.

Windows computers on an internal corporate network validate other things on that network using a Microsoft innovation called Active Directory, which is the umbrella term for a broad range of identity-related services in Windows environments. A core part of the way these things find each other involves a Windows feature called “DNS name devolution,” which is a kind of network shorthand that makes it easier to find other computers or servers without having to specify a full, legitimate domain name for those resources.

For instance, if a company runs an internal network with the name internalnetwork.example.com, and an employee on that network wishes to access a shared drive called “drive1,” there’s no need to type “drive1.internalnetwork.example.com” into Windows Explorer; typing “\\drive1\” alone will suffice, and Windows takes care of the rest.

But things can get far trickier with an internal Windows domain that does not map back to a second-level domain the organization actually owns and controls. And unfortunately, in early versions of Windows that supported Active Directory — Windows 2000 Server, for example — the default or example Active Directory path was given as “corp,” and many companies apparently adopted this setting without modifying it to include a domain they controlled.

Compounding things further, some companies then went on to build (and/or assimilate) vast networks of networks on top of this erroneous setting.

Now, none of this was much of a security concern back in the day when it was impractical for employees to lug their bulky desktop computers and monitors outside of the corporate network. But what happens when an employee working at a company with an Active Directory network path called “corp” takes a company laptop to the local Starbucks?

Chances are good that at least some resources on the employee’s laptop will still try to access that internal “corp” domain. And because of the way DNS name devolution works on Windows, that company laptop online via the Starbucks wireless connection is likely to then seek those same resources at “corp.com.”

In practical terms, this means that whoever controls corp.com can passively intercept private communications from hundreds of thousands of computers that end up being taken outside of a corporate environment which uses this “corp” designation for its Active Directory domain.

Microsoft just bought it, so it wouldn’t fall into the hands of any bad actors:

In a written statement, Microsoft said it acquired the domain to protect its customers.

“To help in keeping systems protected we encourage customers to practice safe security habits when planning for internal domain and network names,” the statement reads. “We released a security advisory in June of 2009 and a security update that helps keep customers safe. In our ongoing commitment to customer security, we also acquired the Corp.com domain.”

The Mistake that Caused 1.1.1.3 to Block LGBTQIA+ Sites Today

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/the-mistake-that-caused-1-1-1-3-to-block-lgbtqia-sites-today/

The Mistake that Caused 1.1.1.3 to Block LGBTQIA+ Sites Today

Today we made a mistake. The mistake caused a number of LGBTQIA+ sites to inadvertently be blocked by the new 1.1.1.1 for Families service. I wanted to walk through what happened, why, and what we’ve done to fix it.

As is our tradition for the last three years, we roll out new products for the general public that uses the Internet on April 1. This year, one of those products was a filtered DNS service, 1.1.1.1 for Families. The service allows anyone who chooses to use it to restrict certain categories of sites.

Filtered vs Unfiltered DNS

Nothing about our new filtered DNS service changes the unfiltered nature of our original 1.1.1.1 service. However, we recognized that some people want a way to control what content is in their home. For instance, I block social media sites from resolving while I am trying to get work done because it makes me more productive. The number one request from users of 1.1.1.1 was that we create a version of the service for home use to block certain categories of sites. And so, earlier today, we launched 1.1.1.1 for Families.

Over time, we’ll provide the ability for users of 1.1.1.1 for Families to customize exactly what categories they block (e.g., do what I do with social media sites to stay productive). But, initially, we created two default settings that were the most requested types of content people wanted to block: Malware (which you can block by setting 1.1.1.2 and 1.0.0.2 as your DNS resolvers) and Malware + Adult Content (which you can block by setting 1.1.1.3 and 1.0.0.3 as your DNS resolvers).

Licensed Categorization Data

To get data for 1.1.1.1 for Families  we licensed feeds from multiple different providers who specialize in site categorization. We spent the last several months reviewing classification providers to choose the ones that had the highest accuracy and lowest false positives.

Malware, encompassing a range of widely agreed upon cyber security threats, was the easier of the two categories to define. For Adult Content, we aimed to mirror the Google SafeSearch criteria. Google has been thoughtful in this area and their SafeSearch tool is designed to limit search results for “sexually explicit content.” The definition is focused on pornography and largely follows the requirements of the US Children’s Internet Protection Act (CIPA), which schools and libraries in the United States are required to follow.

Because it was the default for the 1.1.1.3 service, and because we planned in the future to allow individuals to set their own specifications beyond the default, we intended the Adult Content category to be narrow. What we did not intend to include in the Adult Content category was LGBTQIA+ content. And yet, when it launched, we were horrified to receive reports that those sites were being filtered.

Choosing the Wrong Feed

So what went wrong? The data providers that we license content from have different categorizations; those categorizations do not line up perfectly between different providers. One of the providers has multiple “Adult Content” categories. One “Adult Content” category includes content that mirrors the Google SafeSearch/CIPA definition. Another “Adult Content” content category includes a broader set of topics, including LGBTQIA+ sites.

While we had specifically reviewed the Adult Content category to ensure that it was narrowly tailored to mirror the Google SafeSearch/CIPA definition, when we released the production version this morning we included the wrong “Adult Content” category from the provider in the build. As a result, the first users who tried 1.1.1.3 saw a broader set of sites being filtered than was intended, including LGBTQIA+ content. We immediately worked to fix the issue.

Slow to Update Data Structures

In order to distribute the list of sites quickly to all our data centers we use a compact data structure. The upside is that we can replicate the data structure worldwide very efficiently. The downside is that generating a new version of the data structure takes several hours. The minute we saw that we’d made a mistake we pulled the incorrect data provider and began recreating the new data structure.

While the new data structure replicated across our network we pushed individual sites to an allow list immediately. We began compiling lists both from user reports as well as from other LGBTQIA+ resources. These updates went out instantly. We continuously added sites to the allow list as they were reported or we discovered them.

By 16:51 UTC, approximately two hours after we’d received the first report of the mistaken blocking, the data structure with the intended definition of Adult Content had been generated and we pushed it out live. The only users that would have seen over-broad blocking are those that had already switched to the 1.1.1.3 service. Users of 1.1.1.1 — which will remain unfiltered — and 1.1.1.2 would not have experienced this inadvertent blocking.

As of now, the filtering provided by the default setting of 1.1.1.3 is what we intended it to be, and should roughly match what you find if you use Google SafeSearch and LGBTQIA+ sites are not being blocked. If you see site being blocked that should not be, please report them to us here.

https://report.teams.cloudflare.com/

Protections for the Future

Going forward, we’ve set up a number of checks of known sites that should fall outside the intended categories, including many that we mistakenly listed today. Before defaults are updated in the future, our build system will confirm that none of these sites are listed. We hope this will help catch mistakes like this in the future.

I’m sorry for the error. While I understand how it happened, it should never have happened. I appreciate our team responding quickly to fix the mistake we made.

Introducing 1.1.1.1 for Families

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/introducing-1-1-1-1-for-families/

Introducing 1.1.1.1 for Families

Two years ago today we announced 1.1.1.1, a secure, fast, privacy-first DNS resolver free for anyone to use. In those two years, 1.1.1.1 has grown beyond our wildest imagination. Today, we process more than 200 billion DNS requests per day making us the second largest public DNS resolver in the world behind only Google.

Introducing 1.1.1.1 for Families

Yesterday, we announced the results of the 1.1.1.1 privacy examination. Cloudflare’s business has never involved selling user data or targeted advertising, so it was easy for us to commit to strong privacy protections for 1.1.1.1. We’ve also led the way supporting encrypted DNS technologies including DNS over TLS and DNS over HTTPS. It is long past time to stop transmitting DNS in plaintext and we’re excited that we see more and more encrypted DNS traffic every day.

1.1.1.1 for Families

Introducing 1.1.1.1 for Families

Since launching 1.1.1.1, the number one request we have received is to provide a version of the product that automatically filters out bad sites. While 1.1.1.1 can safeguard user privacy and optimize efficiency, it is designed for direct, fast DNS resolution, not for blocking or filtering content. The requests we’ve received largely come from home users who want to ensure that they have a measure of protection from security threats and can keep adult content from being accessed by their kids. Today, we’re happy to answer those requests.

Introducing 1.1.1.1 for Families — the easiest way to add a layer of protection to your home network and protect it from malware and adult content. 1.1.1.1 for Families leverages Cloudflare’s global network to ensure that it is fast and secure around the world. And it includes the same strong privacy guarantees that we committed to when we launched 1.1.1.1 two years ago. And, just like 1.1.1.1, we’re providing it for free and it’s for any home anywhere in the world.

Two Flavors: 1.1.1.2 (No Malware) & 1.1.1.3 (No Malware or Adult Content)

Introducing 1.1.1.1 for Families

1.1.1.1 for Families is easy to set up and install, requiring just changing two numbers in the settings of your home devices or network router: your primary DNS and your secondary DNS. Setting up 1.1.1.1 for Families usually takes less than a minute and we’ve provided instructions for common devices and routers through the installation guide.

1.1.1.1 for Families has two default options: one that blocks malware and the other that blocks malware and adult content. You choose which setting you want depending on which IP address you configure.

Malware Blocking Only
Primary DNS: 1.1.1.2
Secondary DNS: 1.0.0.2

Malware and Adult Content
Primary DNS: 1.1.1.3
Secondary DNS: 1.0.0.3

Additional Configuration

Introducing 1.1.1.1 for Families

In the coming months, we will provide the ability to define additional configuration settings for 1.1.1.1 for Families. This will include options to create specific whitelists and blacklists of certain sites. You will be able to set the times of the day when categories, such as social media, are blocked and get reports on your household’s Internet usage.

1.1.1.1 for Families is built on top of the same site categorization and filtering technology that powers Cloudflare’s Gateway product. With the success of Gateway, we wanted to provide an easy-to-use service that can help any home network be fast, reliable, secure, and protected from potentially harmful content.

Not A Joke

Most of Cloudflare’s business involves selling services to businesses. However, we’ve made it a tradition every April 1 to launch a new consumer product that leverages our network to bring more speed, reliability, and security to every Internet user. While we make money selling to businesses, the products we launch at this time of the year are close to our hearts because of the broad impact they have for every Internet user.

Introducing 1.1.1.1 for Families

This year, while many of us are confined to our homes, protecting our communities from COVID-19, and relying on our home networks more than ever it seemed especially important to launch 1.1.1.1 for Families. We hope during these troubled times it will help provide a bit of peace of mind for households everywhere.

Announcing the Beta for WARP for macOS and Windows

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/announcing-the-beta-for-warp-for-macos-and-windows/

Announcing the Beta for WARP for macOS and Windows

Announcing the Beta for WARP for macOS and Windows

Last April 1 we announced WARP — an option within the 1.1.1.1 iOS and Android app to secure and speed up Internet connections. Today, millions of users have secured their mobile Internet connections with WARP.

While WARP started as an option within the 1.1.1.1 app, it’s really a technology that can benefit any device connected to the Internet. In fact, one of the most common requests we’ve gotten over the last year is support for WARP for macOS and Windows. Today we’re announcing exactly that: the start of the WARP beta for macOS and Windows.

What’s The Same: Fast, Secure, and Free

We always wanted to build a WARP client for macOS and Windows. We started with mobile because it was the hardest challenge. And it turned out to be a lot harder than we anticipated. While we announced the beta of 1.1.1.1 with WARP on April 1, 2019 it took us until late September before we were able to open it up to general availability. We don’t expect the wait for macOS and Windows WARP to be nearly as long.

The WARP client for macOS and Windows relies on the same fast, efficient Wireguard protocol to secure Internet connections and keep them safe from being spied on by your ISP. Also, just like WARP on the 1.1.1.1 mobile app, the basic service will be free on macOS and Windows.

Announcing the Beta for WARP for macOS and Windows

WARP+ Gets You There Faster

We plan to add WARP+ support in the coming months to allow you to leverage Cloudflare’s Argo network for even faster Internet performance. We will provide a plan option for existing WARP+ subscribers to add additional devices at a discount. In the meantime, existing WARP+ users will be among the first to be invited to try WARP for macOS and Windows. If you are a WARP+ subscriber, check your 1.1.1.1 app over the coming weeks for a link to an invitation to try the new WARP for macOS and Windows clients.

If you’re not a WARP+ subscriber, you can add yourself to the waitlist by signing up on the page linked below. We’ll email as soon as it’s ready for you to try.

https://one.one.one.one

Linux Support

We haven’t forgotten about Linux. About 10% of Cloudflare’s employees run Linux on their desktops. As soon as we get the macOS and Windows clients out we’ll turn our attention to building a WARP client for Linux.

Thank you to everyone who helped us make WARP fast, efficient, and reliable on mobile. It’s incredible how far it’s come over the last year. If you tried it early in the beta last year but aren’t using it now, I encourage you to give it another try. We’re looking forward to bringing WARP speed and security to even more devices.

Announcing the Results of the 1.1.1.1 Public DNS Resolver Privacy Examination

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/announcing-the-results-of-the-1-1-1-1-public-dns-resolver-privacy-examination/

Announcing the Results of the 1.1.1.1 Public DNS Resolver Privacy Examination

Announcing the Results of the 1.1.1.1 Public DNS Resolver Privacy Examination

On April 1, 2018, we took a big step toward improving Internet privacy and security with the launch of the 1.1.1.1 public DNS resolver — the Internet’s fastest, privacy-first public DNS resolver. And we really meant privacy first. We were not satisfied with the status quo and believed that secure DNS resolution with transparent privacy practices should be the new normal. So we committed to our public resolver users that we would not retain any personal data about requests made using our 1.1.1.1 resolver. We also built in technical measures to facilitate DNS over HTTPS to help keep your DNS queries secure. We’ve never wanted to know what individuals do on the Internet, and we took technical steps to ensure we can’t know.

We knew there would be skeptics. Many consumers believe that if they aren’t paying for a product, then they are the product. We don’t believe that has to be the case. So we committed to retaining a Big 4 accounting firm to perform an examination of our 1.1.1.1 resolver privacy commitments.

Today we’re excited to announce that the 1.1.1.1 resolver examination has been completed and a copy of the independent accountants’ report can be obtained from our compliance page.

The examination process

We gained a number of observations and lessons from the privacy examination of the 1.1.1.1 resolver. First, we learned that it takes much longer to agree on terms and complete an examination when you ask an accounting firm to do what we believe is the first of its kind examination of custom privacy commitments for a recursive resolver.

We also observed that privacy by design works. Not that we were surprised — we use privacy by design principles in all our products and services. Because we baked anonymization best practices into the 1.1.1.1 resolver when we built it, we were able to demonstrate that we didn’t have any personal data to sell. More specifically, in accordance with RFC 6235, we decided to truncate the client/source IP at our edge data centers so that we never store in non-volatile storage the full IP address of the 1.1.1.1 resolver user.

We knew that a truncated IP address would be enough to help us understand general Internet trends and where traffic is coming from. In addition, we also further improved our privacy-first approach by replacing the truncated IP address with the network number (the ASN) for our internal logs. On top of that, we committed to only retaining those anonymized logs for a limited period of time. It’s the privacy version of belt plus suspenders plus another belt.

Finally, we learned that aligning our examination of the 1.1.1.1 resolver with our SOC 2 report most efficiently demonstrated that we had the appropriate change control procedures and audit logs in place to confirm that our IP truncation logic and limited data retention periods were in effect during the examination period. The 1.1.1.1 resolver examination period of February 1, 2019, through October 31, 2019, was the earliest we could go back to while relying on our SOC 2 report.

Details on the examination

When we launched the 1.1.1.1 resolver, we committed that we would not track what individual users of our 1.1.1.1 resolver are searching for online. The examination validated that our system is configured to achieve what we think is the most important part of this commitment — we never write the querying IP addresses together with the DNS query to disk and therefore have no idea who is making a specific request using the 1.1.1.1 resolver. This means we don’t track which sites any individual visits, and we won’t sell your personal data, ever.

We want to be fully transparent that during the examination we uncovered that our routers randomly capture up to 0.05% of all requests that pass through them, including the querying IP address of resolver users. We do this separately from the 1.1.1.1 service for all traffic passing into our network and we retain such data for a limited period of time for use in connection with network troubleshooting and mitigating denial of service attacks.

To explain — if a specific IP address is flowing through one of our data centers a large number of times, then it is often associated with malicious requests or a botnet. We need to keep that information to mitigate attacks against our network and to prevent our network from being used as an attack vector itself. This limited subsample of data is not linked up with DNS queries handled by the 1.1.1.1 service and does not have any impact on user privacy.

We also want to acknowledge that when we made our privacy promises about how we would handle non-personally identifiable log data for 1.1.1.1 resolver requests, we made what we now see were some confusing statements about how we would handle those anonymous logs.

For example, we learned that our blog post commitment about retention of anonymous log data was not written clearly enough and our previous statements were not as clear because we referred to temporary logs, transactional logs, and permanent logs in ways that could have been better defined. For example, our 1.1.1.1 resolver privacy FAQs stated that we would not retain transactional logs for more than 24 hours but that some anonymous logs would be retained indefinitely. However, our blog post announcing the public resolver didn’t capture that distinction. You can see a clearer statement about our handling of anonymous logs on our privacy commitments page mentioned below.

With this in mind, we updated and clarified our privacy commitments for the 1.1.1.1 resolver as outlined below. The most critical part of these commitments remains unchanged: We don’t want to know what you do on the Internet — it’s none of our business — and we’ve taken the technical steps to ensure we can’t.

Our 1.1.1.1 public DNS resolver commitments

We have refined our commitments to 1.1.1.1 resolver privacy as part of our examination effort. The nature and intent of our commitments remain consistent with our original commitments. These updated commitments are what was included in the examination:

  1. Cloudflare will not sell or share public resolver users’ personal data with third parties or use personal data from the public resolver to target any user with advertisements.
  2. Cloudflare will only retain or use what is being asked, not information that will identify who is asking it. Except for randomly sampled network packets captured from at most 0.05% of all traffic sent to Cloudflare’s network infrastructure, Cloudflare will not retain the source IP from DNS queries to the public resolver in non-volatile storage (more on that below). The randomly sampled packets are solely used for network troubleshooting and DoS mitigation purposes.
  3. A public resolver user’s IP address (referred to as the client or source IP address) will not be stored in non-volatile storage. Cloudflare will anonymize source IP addresses via IP truncation methods (last octet for IPv4 and last 80 bits for IPv6). Cloudflare will delete the truncated IP address within 25 hours.
  4. Cloudflare will retain only the limited transaction and debug log data (“Public Resolver Logs”) for the legitimate operation of our Public Resolver and research purposes, and Cloudflare will delete the Public Resolver Logs within 25 hours.
  5. Cloudflare will not share the Public Resolver Logs with any third parties except for APNIC pursuant to a Research Cooperative Agreement. APNIC will only have limited access to query the anonymized data in the Public Resolver Logs and conduct research related to the operation of the DNS system.

Proving privacy commitments

We created the 1.1.1.1 resolver because we recognized significant privacy problems: ISPs, WiFi networks you connect to, your mobile network provider, and anyone else listening in on the Internet can see every site you visit and every app you use — even if the content is encrypted. Some DNS providers even sell data about your Internet activity or use it to target you with ads. DNS can also be used as a tool of censorship against many of the groups we protect through our Project Galileo.

If you use DNS-over-HTTPS or DNS-over-TLS to our 1.1.1.1 resolver, your DNS lookup request will be sent over a secure channel. This means that if you use the 1.1.1.1 resolver then in addition to our privacy guarantees an eavesdropper can’t see your DNS requests. We promise we won’t be looking at what you’re doing.

We strongly believe that consumers should expect their service providers to be able to show proof that they are actually abiding by their privacy commitments. If we were able to have our 1.1.1.1 resolver privacy commitments examined by an independent accounting firm, we think other organizations can do the same. We encourage other providers to follow suit and help improve privacy and transparency for Internet users globally. And for our part, we will continue to engage well-respected auditing firms to audit our 1.1.1.1 resolver privacy commitments. We also appreciate the work that Mozilla has undertaken to encourage entities that operate recursive resolvers to adopt data handling practices that protect the privacy of user data.

Details of the 1.1.1.1 resolver privacy examination and our accountant’s opinion can be found on Cloudflare’s Compliance page.

Visit https://developers.cloudflare.com/1.1.1.1/ from any device to get started with the Internet’s fastest, privacy-first DNS service.

PS Cloudflare has traditionally used tomorrow, April 1, to release new products. Two years ago we launched the 1.1.1.1 free, fast, privacy-focused public DNS resolver. One year ago we launched Warp, our way of securing and accelerating mobile Internet access.

And tomorrow?

Then three key changes
One before the weft, also
Safety to the roost