Tag Archives: DNS

How to use domain with Amazon SES in multiple accounts or regions

Post Syndicated from Leonardo Azize original https://aws.amazon.com/blogs/messaging-and-targeting/how-to-use-domain-with-amazon-ses-in-multiple-accounts-or-regions/

Sometimes customers want to use their email domain with Amazon Simples Email Service (Amazon SES) across multiple accounts, or the same account but across multiple regions.

For example, AnyCompany is an insurance company with marketing and operations business units. The operations department sends transactional emails every time customers perform insurance simulations. The marketing department sends email advertisements to existing and prospective customers. Since they are different organizations inside AnyCompany, they want to have their own Amazon SES billing. At the same time, they still want to use the same AnyCompany domain.

Other use-cases include customers who want to setup multi-region redundancy, need to satisfy data residency requirements, or need to send emails on behalf of several different clients. In all of these cases, customers can use different regions, in the same or across different accounts.

This post shows how to verify and configure your domain on Amazon SES across multiple accounts or multiple regions.

Overview of solution

You can use the same domain with Amazon SES across multiple accounts or regions. Your options are: different accounts but the same region, different accounts and different regions, and the same account but different regions.

In all of these scenarios, you will have two SES instances running, each sending email for example.com domain – let’s call them SES1 and SES2. Every time you configure a domain in Amazon SES it will generate a series of DNS records you will have to add on your domain authoritative DNS server, which is unique for your domain. Those records are different for each SES instance.

You will need to modify your DNS to add one TXT record, with multiple values, for domain verification. If you decide to use DomainKeys Identified Mail (DKIM), you will modify your DNS to add six CNAME records, three records from each SES instance.

When you configure a domain on Amazon SES, you can also configure a MAIL FROM domain. If you decide to do so, you will need to modify your DNS to add one TXT record for Sender Policy Framework (SPF) and one MX record for bounce and complaint notifications that email providers send you.

Furthermore, your domain can be configured to support DMAC for email spoofing detection. It will rely on SPF or DKIM configured above. Below we walk you through these steps.

  • Verify domain
    You will take TXT values from both SES1 and SES2 instances and add them in DNS, so SES can validate you own the domain
  • Complying with DMAC
    You will add a TXT value with DMAC policy that applies to your domain. This is not tied to any specific SES instance
  • Custom MAIL FROM Domain and SPF
    You will take TXT and MX records related from your MAIL FROM domain from both SES1 and SES2 instances and add them in DNS, so SES can comply with DMARC

Here is a sample matrix of the various configurations:

Two accounts, same region Two accounts, different regions One account, two regions
TXT records for domain verification*

1 record with multiple values

_amazonses.example.com = “VALUE FROM SES1”
“VALUE FROM SES2”

CNAMES for DKIM verification

6 records, 3 from each SES instance

record1-SES1._domainkey.example.com = VALUE FROM SES1
record2-SES1._domainkey.example.com = VALUE FROM SES1
record3-SES1._domainkey.example.com = VALUE FROM SES1
record1-SES2._domainkey.example.com = VALUE FROM SES2
record2-SES2._domainkey.example.com = VALUE FROM SES2
record3-SES2._domainkey.example.com = VALUE FROM SES2

TXT record for DMARC

1 record. It is not related to SES instance or region

_dmarc.example.com = DMARC VALUE

MAIL FROM MX record to define message sender for SES

1 record for entire region

mail.example.com = 10 feedback-smtp.us-east-1.amazonses.com

2 records, one for each region

mail1.example.com = 10 feedback-smtp.us-east-1.amazonses.com
mail2.example.com = 10 feedback-smtp.eu-west-1.amazonses.com

MAIL FROM TXT record for SPF

1 record for entire region

mail.example.com = “v=spf1 include:amazonses.com ~all”

2 records, one for each region

mail1.example.com = “v=spf1 include:amazonses.com ~all”
mail2.example.com = “v=spf1 include:amazonses.com ~all”

* Considering your DNS supports multiple values for a TXT record

Setup SES1 and SES2

In this blog, we call SES1 your primary or existing SES instance. We assume that you have already setup SES, but if not, you can still follow the instructions and setup both at the same time. The settings on SES2 will differ slightly, and therefore you will need to add new DNS entries to support the two-instance setup.

In this document we will use configurations from the “Verification,” “DKIM,” and “Mail FROM Domain” sections of the SES Domains screen and configure SES2 and setup DNS correctly for the two-instance configuration.

Verify domain

Amazon SES requires that you verify, in DNS, your domain, to confirm that you own it and to prevent others from using it. When you verify an entire domain, you are verifying all email addresses from that domain, so you don’t need to verify email addresses from that domain individually.

You can instruct multiple SES instances, across multiple accounts or regions to verify your domain.  The process to verify your domain requires you to add some records in your DNS provider. In this post I am assuming Amazon Route 53 is an authoritative DNS server for example.com domain.

Verifying a domain for SES purposes involves initiating the verification in SES console, and adding DNS records and values to confirm you have ownership of the domain. SES will automatically check DNS to complete the verification process. We assume you have done this step for SES1 instance, and have a _amazonses.example.com TXT record with one value already in your DNS. In this section you will add a second value, from SES2, to the TXT record. If you do not have SES1 setup in DNS, complete these steps twice, once for SES1 and again for SES2. This will prove to both SES instances that you own the domain and are entitled to send email from them.

Initiate Verification in SES Console

Just like you have done on SES1, in the second SES instance (SES2) initiate a verification process for the same domain; in our case example.com

  1. Sign in to the AWS Management Console and open the Amazon SES console.
  2. In the navigation pane, under Identity Management, choose Domains.
  3. Choose Verify a New Domain.
  4. In the Verify a New Domain dialog box, enter the domain name (i.e. example.com).
  5. If you want to set up DKIM signing for this domain, choose Generate DKIM Settings.
  6. Click on Verify This Domain.
  7. In the Verify a New Domain dialog box, you will see a Domain Verification Record Set containing a Name, a Type, and a Value. Copy Name and Value and store them for the step below, where you will add this value to DNS.
    (This information is also available by choosing the domain name after you close the dialog box.)

To complete domain verification, add a TXT record with the displayed Name and Value to your domain’s DNS server. For information about Amazon SES TXT records and general guidance about how to add a TXT record to a DNS server, see Amazon SES domain verification TXT records.

Add DNS Values for SES2

To complete domain verification for your second account, edit current _amazonses TXT record and add the Value from the SES2 to it. If you do not have an _amazonses TXT record create it, and add the Domain Verification values from both SES1 and SES2 to it. We are showing how to add record to Route 53 DNS, but the steps should be similar in any DNS management service you use.

  1. Sign in to the AWS Management Console and open the Amazon Route 53 console.
  2. In the navigation pane, choose Hosted zones.
  3. Choose the domain name you are verifying.
  4. Choose the _amazonses TXT record you created when you verified your domain for SES1.
  5. Under Record details, choose Edit record.
  6. In the Value box, go to the end of the existing attribute value, and then press Enter.
  7. Add the attribute value for the additional account or region.
  8. Choose Save.
  9. To validate, run the following command:
    dig TXT _amazonses.example.com +short
  10. You should see the two values returned:
    "4AjLMzUu4nSjrz4QVqDD8rXq8X2AHr+JhGSl4foiMmU="
    "abcde12345Sjrz4QVqDD8rXq8X2AHr+JhGSl4foiMmU="

Please note:

  1. if your DNS provider does not allow underscores in record names, you can omit _amazonses from the Name.
  2. to help you easily identify this record within your domain’s DNS settings, you can optionally prefix the Value with “amazonses:”.
  3. some DNS providers automatically append the domain name to DNS record names. To avoid duplication of the domain name, you can add a period to the end of the domain name in the DNS record. This indicates that the record name is fully qualified and the DNS provider need not append an additional domain name.
  4. if your DNS server does not support two values for a TXT record, you can have one record named _amazonses.example.com and another one called example.com.

Finally, after some time SES will complete its validation of the domain name and you should see the “pending validation” change to “verified”.

Verify DKIM

DomainKeys Identified Mail (DKIM) is a standard that allows senders to sign their email messages with a cryptographic key. Email providers then use these signatures to verify that the messages weren’t modified by a third party while in transit.

An email message that is sent using DKIM includes a DKIM-Signature header field that contains a cryptographically signed representation of the message. A provider that receives the message can use a public key, which is published in the sender’s DNS record, to decode the signature. Email providers then use this information to determine whether messages are authentic.

When you enable DKIM it generates CNAME records you need to add into your DNS. As it generates different values for each SES instance, you can use DKIM with multiple accounts and regions.

To complete the DKIM verification, copy the three (3) DKIM Names and Values from SES1 and three (3) from SES2 and add them to your DNS authoritative server as CNAME records.

You will know you are successful because, after some time SES will complete the DKIM verification and the “pending verification” will change to “verified”.

Configuring for DMARC compliance

Domain-based Message Authentication, Reporting and Conformance (DMARC) is an email authentication protocol that uses Sender Policy Framework (SPF) and/or DomainKeys Identified Mail (DKIM) to detect email spoofing. In order to comply with DMARC, you need to setup a “_dmarc” DNS record and either SPF or DKIM, or both. The DNS record for compliance with DMARC is setup once per domain, but SPF and DKIM require DNS records for each SES instance.

  1. Setup “_dmarc” record in DNS for your domain; one time per domain. See instructions here
  2. To validate it, run the following command:
    dig TXT _dmarc.example.com +short
    "v=DMARC1;p=quarantine;pct=25;rua=mailto:[email protected]"
  3. For DKIM and SPF follow the instructions below

Custom MAIL FROM Domain and SPF

Sender Policy Framework (SPF) is an email validation standard that’s designed to prevent email spoofing. Domain owners use SPF to tell email providers which servers are allowed to send email from their domains. SPF is defined in RFC 7208.

To comply with Sender Policy Framework (SPF) you will need to use a custom MAIL FROM domain. When you enable MAIL FROM domain in SES console, the service generates two records you need to configure in your DNS to document who is authorized to send messages for your domain. One record is MX and another TXT; see screenshot for mail.example.com. Save these records and enter them in your DNS authoritative server for example.com.

Configure MAIL FROM Domain for SES2

  1. Open the Amazon SES console at https://console.aws.amazon.com/ses/.
  2. In the navigation pane, under Identity Management, choose Domains.
  3. In the list of domains, choose the domain and proceed to the next step.
  4. Under MAIL FROM Domain, choose Set MAIL FROM Domain.
  5. On the Set MAIL FROM Domain window, do the following:
    • For MAIL FROM domain, enter the subdomain that you want to use as the MAIL FROM domain. In our case mail.example.com.
    • For Behavior if MX record not found, choose one of the following options:
      • Use amazonses.com as MAIL FROM – If the custom MAIL FROM domain’s MX record is not set up correctly, Amazon SES will use a subdomain of amazonses.com. The subdomain varies based on the AWS Region in which you use Amazon SES.
      • Reject message – If the custom MAIL FROM domain’s MX record is not set up correctly, Amazon SES will return a MailFromDomainNotVerified error. Emails that you attempt to send from this domain will be automatically rejected.
    • Click Set MAIL FROM Domain.

You will need to complete this step on SES1, as well as SES2. The MAIL FROM records are regional and you will need to add them both to your DNS authoritative server.

Set MAIL FROM records in DNS

From both SES1 and SES2, take the MX and TXT records provided by the MAIL FROM configuration and add them to the DNS authoritative server. If SES1 and SES2 are in the same region (us-east-1 in our example) you will publish exactly one MX record (mail.example.com in our example) into DNS, pointing to endpoint for that region. If SES1 and SES2 are in different regions, you will create two different records (mail1.example.com and mail2.example.com) into DNS, each pointing to endpoint for specific region.

Verify MX record

Example of MX record where SES1 and SES2 are in the same region

dig MX mail.example.com +short
10 feedback-smtp.us-east-1.amazonses.com.

Example of MX records where SES1 and SES2 are in different regions

dig MX mail1.example.com +short
10 feedback-smtp.us-east-1.amazonses.com.

dig MX mail2.example.com +short
10 feedback-smtp.eu-west-1.amazonses.com.

Verify if it works

On both SES instances (SES1 and SES2), check that validations are complete. In the SES Console:

  • In Verification section, Status should be “verified” (in green color)
  • In DKIM section, DKIM Verification Status should be “verified” (in green color)
  • In MAIL FROM Domain section, MAIL FROM domain status should be “verified” (in green color)

If you have it all verified on both accounts or regions, it is correctly configured and ready to use.

Conclusion

In this post, we explained how to verify and use the same domain for Amazon SES in multiple account and regions and maintaining the DMARC, DKIM and SPF compliance and security features related to email exchange.

While each customer has different necessities, Amazon SES is flexible to allow customers decide, organize, and be in control about how they want to uses Amazon SES to send email.

Author bio

Leonardo Azize Martins is a Cloud Infrastructure Architect at Professional Services for Public Sector.

His background is on development and infrastructure for web applications, working on large enterprises.

When not working, Leonardo enjoys time with family, read technical content, watch movies and series, and play with his daughter.

Contributor

Daniel Tet is a senior solutions architect at AWS specializing in Low-Code and No-Code solutions. For over twenty years, he has worked on projects for Franklin Templeton, Blackrock, Stanford Children’s Hospital, Napster, and Twitter. He has a Bachelor of Science in Computer Science and an MBA. He is passionate about making technology easy for common people; he enjoys camping and adventures in nature.

 

What happened on the Internet during the Facebook outage

Post Syndicated from Celso Martinho original https://blog.cloudflare.com/during-the-facebook-outage/

What happened on the Internet during the Facebook outage

It’s been a few days now since Facebook, Instagram, and WhatsApp went AWOL and experienced one of the most extended and rough downtime periods in their existence.

When that happened, we reported our bird’s-eye view of the event and posted the blog Understanding How Facebook Disappeared from the Internet where we tried to explain what we saw and how DNS and BGP, two of the technologies at the center of the outage, played a role in the event.

In the meantime, more information has surfaced, and Facebook has published a blog post giving more details of what happened internally.

As we said before, these events are a gentle reminder that the Internet is a vast network of networks, and we, as industry players and end-users, are part of it and should work together.

In the aftermath of an event of this size, we don’t waste much time debating how peers handled the situation. We do, however, ask ourselves the more important questions: “How did this affect us?” and “What if this had happened to us?” Asking and answering these questions whenever something like this happens is a great and healthy exercise that helps us improve our own resilience.

Today, we’re going to show you how the Facebook and affiliate sites downtime affected us, and what we can see in our data.

1.1.1.1

1.1.1.1 is a fast and privacy-centric public DNS resolver operated by Cloudflare, used by millions of users, browsers, and devices worldwide. Let’s look at our telemetry and see what we find.

First, the obvious. If we look at the response rate, there was a massive spike in the number of SERVFAIL codes. SERVFAILs can happen for several reasons; we have an excellent blog called Unwrap the SERVFAIL that you should read if you’re curious.

In this case, we started serving SERVFAIL responses to all facebook.com and whatsapp.com DNS queries because our resolver couldn’t access the upstream Facebook authoritative servers. About 60x times more than the average on a typical day.

What happened on the Internet during the Facebook outage

If we look at all the queries, not specific to Facebook or WhatsApp domains, and we split them by IPv4 and IPv6 clients, we can see that our load increased too.

As explained before, this is due to a snowball effect associated with applications and users retrying after the errors and generating even more traffic. In this case, 1.1.1.1 had to handle more than the expected rate for A and AAAA queries.

What happened on the Internet during the Facebook outage

Here’s another fun one.

DNS vs. DoT and DoH. Typically, DNS queries and responses are sent in plaintext over UDP (or TCP sometimes), and that’s been the case for decades now. Naturally, this poses security and privacy risks to end-users as it allows in-transit attacks or traffic snooping.

With DNS over TLS (DoT) and DNS over HTTPS, clients can talk DNS using well-known, well-supported encryption and authentication protocols.

Our learning center has a good article on “DNS over TLS vs. DNS over HTTPS” that you can read. Browsers like Chrome, Firefox, and Edge have supported DoH for some time now, WAP uses DoH too, and you can even configure your operating system to use the new protocols.

When Facebook went offline, we saw the number of DoT+DoH SERVFAILs responses grow by over x300 vs. the average rate.

What happened on the Internet during the Facebook outage
What happened on the Internet during the Facebook outage
What happened on the Internet during the Facebook outage

So, we got hammered with lots of requests and errors, causing traffic spikes to our 1.1.1.1 resolver and causing an unexpected load in the edge network and systems. How did we perform during this stressful period?

Quite well. 1.1.1.1 kept its cool and continued serving the vast majority of requests around the famous 10ms mark. An insignificant fraction of p95 and p99 percentiles saw increased response times, probably due to timeouts trying to reach Facebook’s nameservers.

What happened on the Internet during the Facebook outage

Another interesting perspective is the distribution of the ratio between SERVFAIL and good DNS answers, by country. In theory, the higher this ratio is, the more the country uses Facebook. Here’s the map with the countries that suffered the most:

What happened on the Internet during the Facebook outage

Here’s the top twelve country list, ordered by those that apparently use Facebook, WhatsApp and Instagram the most:

Country SERVFAIL/Good Answers ratio
Turkey 7.34
Grenada 4.84
Congo 4.44
Lesotho 3.94
Nicaragua 3.57
South Sudan 3.47
Syrian Arab Republic 3.41
Serbia 3.25
Turkmenistan 3.23
United Arab Emirates 3.17
Togo 3.14
French Guiana 3.00

Impact on other sites

When Facebook, Instagram, and WhatsApp aren’t around, the world turns to other places to look for information on what’s going on, other forms of entertainment or other applications to communicate with their friends and family. Our data shows us those shifts. While Facebook was going down, other services and platforms were going up.

To get an idea of the changing traffic patterns we look at DNS queries as an indicator of increased traffic to specific sites or types of site.

Here are a few examples.

Other social media platforms saw a slight increase in use, compared to normal.

What happened on the Internet during the Facebook outage

Traffic to messaging platforms like Telegram, Signal, Discord and Slack got a little push too.

What happened on the Internet during the Facebook outage

Nothing like a little gaming time when Instagram is down, we guess, when looking at traffic to sites like Steam, Xbox, Minecraft and others.

What happened on the Internet during the Facebook outage

And yes, people want to know what’s going on and fall back on news sites like CNN, New York Times, The Guardian, Wall Street Journal, Washington Post, Huffington Post, BBC, and others:

What happened on the Internet during the Facebook outage

Attacks

One could speculate that the Internet was under attack from malicious hackers. Our Firewall doesn’t agree; nothing out of the ordinary stands out.

What happened on the Internet during the Facebook outage

Network Error Logs

Network Error Logging, NEL for short, is an experimental technology supported in Chrome. A website can issue a Report-To header and ask the browser to send reports about network problems, like bad requests or DNS issues, to a specific endpoint.

Cloudflare uses NEL data to quickly help triage end-user connectivity issues when end-users reach our network. You can learn more about this feature in our help center.

If Facebook is down and their DNS isn’t responding, Chrome will start reporting NEL events every time one of the pages in our zones fails to load Facebook comments, posts, ads, or authentication buttons. This chart shows it clearly.​​

What happened on the Internet during the Facebook outage

WARP

Cloudflare announced WARP in 2019, and called it “A VPN for People Who Don’t Know What V.P.N. Stands For” and offered it for free to its customers. Today WARP is used by millions of people worldwide to securely and privately access the Internet on their desktop and mobile devices. Here’s what we saw during the outage by looking at traffic volume between WARP and Facebook’s network:

What happened on the Internet during the Facebook outage

You can see how the steep drop in Facebook ASN traffic coincides with the start of the incident and how it compares to the same period the day before.

Our own traffic

People tend to think of Facebook as a place to visit. We log in, and we access Facebook, we post. It turns out that Facebook likes to visit us too, quite a lot. Like Google and other platforms, Facebook uses an army of crawlers to constantly check websites for data and updates. Those robots gather information about websites content, such as its titles, descriptions, thumbnail images, and metadata. You can learn more about this on the “The Facebook Crawler” page and the Open Graph website.

Here’s what we see when traffic is coming from the Facebook ASN, supposedly from crawlers, to our CDN sites:

What happened on the Internet during the Facebook outage

The robots went silent.

What about the traffic coming to our CDN sites from Facebook User-Agents? The gap is indisputable.

What happened on the Internet during the Facebook outage

We see about 30% of a typical request rate hitting us. But it’s not zero; why is that?

We’ll let you know a little secret. Never trust User-Agent information; it’s broken. User-Agent spoofing is everywhere. Browsers, apps, and other clients deliberately change the User-Agent string when they fetch pages from the Internet to hide, obtain access to certain features, or bypass paywalls (because pay-walled sites want sites like Facebook to index their content, so that then they get more traffic from links).

Fortunately, there are newer, and privacy-centric standards emerging like User-Agent Client Hints.

Core Web Vitals

Core Web Vitals are the subset of Web Vitals, an initiative by Google to provide a unified interface to measure real-world quality signals when a user visits a web page. Such signals include Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS).

We use Core Web Vitals with our privacy-centric Web Analytics product and collect anonymized data on how end-users experience the websites that enable this feature.

One of the metrics we can calculate using these signals is the page load time. Our theory is that if a page includes scripts coming from external sites (for example, Facebook “like” buttons, comments, ads), and they are unreachable, its total load time gets affected.

We used a list of about 400 domains that we know embed Facebook scripts in their pages and looked at the data.

What happened on the Internet during the Facebook outage

Now let’s look at the Largest Contentful Paint. LCP marks the point in the page load timeline when the page’s main content has likely loaded. The faster the LCP is, the better the end-user experience.

What happened on the Internet during the Facebook outage

Again, the page load experience got visibly degraded.

The outcome seems clear. The sites that use Facebook scripts in their pages took 1.5x more time to load their pages during the outage, with some of them taking more than 2x the usual time. Facebook’s outage dragged the performance of  some other sites down.

Conclusion

When Facebook, Instagram, and WhatsApp went down, the Web felt it. Some websites got slower or lost traffic, other services and platforms got unexpected load, and people lost the ability to communicate or do business normally.

Hypotheses About What Happened to Facebook

Post Syndicated from Bozho original https://techblog.bozho.net/hypotheses-about-what-happened-to-facebook/

Facebook was down. I’d recommend reading Cloudflare’s summary. Then I recommend reading Facebook’s own account on the incident. But let me expand on that. Facebook published announcements and withdrawals for certain BGP prefixes which lead to removing its DNS servers from “the map of the internet” – they told everyone “the part of our network where our DNS servers are doesn’t exist”. That was the result of a backbone self-inflicted failure due to a bug in the auditing tool that checks whether the commands executed aren’t doing harmful things.

Facebook owns a lot of IPs. According to RIPEstat they are part of 399 prefixes (147 of them are IPv4). The DNS servers are located in two of those 399. Facebook uses a.ns.facebook.com, b.ns.facebook.com, c.ns.facebook.com and d.ns.facebook.com, which get queries whenever someone wants to know the IPs of Facebook-owned domains. These four nameservers are served by the same Autonomous System from just two prefixes – 129.134.30.0/23 and 185.89.218.0/23. Of course “4 nameservers” is a logical construct, there are probably many actual servers behind that (using anycast).

I wrote a simple “script” to fetch all the withdrawals and announcements for all Facebook-owned prefixes (from the great API of RIPEstats). Facebook didn’t remove itself from the map entirely. As CloudFlare points out, it was just some prefixes that are affected. It can be just these two, or a few others as well, but it seems that just a handful were affected. If we sort the resulting CSV from the above script by withdrawals, we’ll notice that 129.134.30.0/23 and 185.89.218.0/23 are the pretty high up (alongside 185.89 and 123.134 with a /24, which are all included in the /23). Now that perfectly matches Facebook’s account that their nameservers automatically withdraw themselves if they fail to connect to other parts of the infrastructure. Everything may have also been down, but the logic for withdrawal is present only in the networks that have nameservers in them.

So first, let me make three general observations that are not as obvious and as universal as they may sound, but they are worth discussing:

  • Use longer DNS TTLs if possible – if Facebook had 6 hour TTL on its domains, we may have not figured out that their name servers are down. This is hard to ask for such a complex service that uses DNS for load-balancing and geographical distribution, but it’s worth considering. Also, if they killed their backbone and their entire infrastructure was down anyway, the DNS TTL would not have solved the issue. But
  • We need improved caching logic for DNS. It can’t be just “present or not”; DNS caches may keep “last known good state” in case of SERVFAIL and fallback to that. All of those DNS resolvers that had to ask the authoritative nameserver “where can I find facebook.com” knew where to find facebook.com just a minute ago. Then they got a failure and suddenly they are wondering “oh, where could Facebook be?”. It’s not that simple, of course, but such cache improvement is worth considering. And again, if their entire infrastructure was down, this would not have helped.
  • Consider having an authoritative nameserver outside your main AS. If something bad happens to your AS routes (regardless of the reason), you may still have DNS working. That may have downsides – generally, it will be hard to manage and sync your DNS infrastructure. But at least having a spare set of nameservers and the option to quickly point glue records there is worth considering. It would not have saved Facebook in this case, as again, they claim the entire infrastructure was inaccessible due to a “broken” backbone.
  • Have a 100% test coverage on critical tools, such as the auditing tool that had a bug. 100% test coverage is rarely achievable in any project, but in such critical tools it’s a must.

The main explanation is the accidental outage. This is what Facebook engineers explain in the blogpost and other accounts, and that’s what seems to have happened. However, there are alternative hypotheses floating around, so let me briefly discuss all of the options.

  • Accidental outage due to misconfiguration – a very likely scenario. These things may happen to everyone and Facebook is known for it “break things” mentality, so it’s not unlikely that they just didn’t have the right safeguards in place and that someone ran a buggy update. The scenarios why and how that may have happened are many, and we can’t know from the outside (even after Facebook’s brief description). This remains the primary explanation, following my favorite Hanlon’s razor. A bug in the audit tool is absolutely realistic (btw, I’d love Facebook to publish their internal tools).
  • Cyber attack – It cannot be known by the data we have, but this would be a sophisticated attack that gained access to their BGP administration interface, which I would assume is properly protected. Not impossible, but a 6-hour outage of a social network is not something a sophisticated actor (e.g. a nation state) would invest resources in. We can’t rule it out, as this might be “just a drill” for something bigger to follow. If I were an attacker that wanted to take Facebook down, I’d try to kill their DNS servers, or indeed, “de-route” them. If we didn’t know that Facebook lets its DNS servers cut themselves from the network in case of failures, the fact that so few prefixes were updated might be in indicator of targeted attack, but this seems less and less likely.
  • Deliberate self-sabotage1.5 billion records are claimed to be leaked yesterday. At the same time, a Facebook whistleblower is testifying in the US congress. Both of these news are potentially damaging to Facebook reputation and shares. If they wanted to drown the news and the respective share price plunge in a technical story that few people understand but everyone is talking about (and then have their share price rebound, because technical issues happen to everyone), then that’s the way to do it – just as a malicious actor would do, but without all the hassle to gain access from outside – de-route the prefixes for the DNS servers and you have a “perfect” outage. These coincidences have lead people to assume such a plot, but from the observed outage and the explanation given by Facebook on why the DNS prefixes have been automatically withdrawn, this sounds unlikely.

Distinguishing between the three options is actually hard. You can mask a deliberate outage as an accident, a malicious actor can make it look like a deliberate self-sabotage. That’s why there are speculations. To me, however, by all of the data we have in RIPEStat and the various accounts by CloudFlare, Facebook and other experts, it seems that a chain of mistakes (operational and possibly design ones) lead to this.

The post Hypotheses About What Happened to Facebook appeared first on Bozho's tech blog.

Facebook Is Down

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/10/facebook-is-down.html

Facebook — along with Instagram and WhatsApp — went down globally today. Basically, someone deleted their BGP records, which made their DNS fall apart.

…at approximately 11:39 a.m. ET today (15:39 UTC), someone at Facebook caused an update to be made to the company’s Border Gateway Protocol (BGP) records. BGP is a mechanism by which Internet service providers of the world share information about which providers are responsible for routing Internet traffic to which specific groups of Internet addresses.

In simpler terms, sometime this morning Facebook took away the map telling the world’s computers how to find its various online properties. As a result, when one types Facebook.com into a web browser, the browser has no idea where to find Facebook.com, and so returns an error page.

In addition to stranding billions of users, the Facebook outage also has stranded its employees from communicating with one another using their internal Facebook tools. That’s because Facebook’s email and tools are all managed in house and via the same domains that are now stranded.

What I heard is that none of the employee keycards work, since they have to ping a now-unreachable server. So people can’t get into buildings and offices.

And every third-party site that relies on “log in with Facebook” is stuck as well.

The fix won’t be quick:

As a former network admin who worked on the internet at this level, I anticipate Facebook will be down for hours more. I suspect it will end up being Facebook’s longest and most severe failure to date before it’s fixed.

We all know the security risks of monocultures.

EDITED TO ADD (10/6): Good explanation of what happened. Shorter from Jonathan Zittrain: “Facebook basically locked its keys in the car.”

Understanding How Facebook Disappeared from the Internet

Post Syndicated from Tom Strickx original https://blog.cloudflare.com/october-2021-facebook-outage/

Understanding How Facebook Disappeared from the Internet

Understanding How Facebook Disappeared from the Internet

“Facebook can’t be down, can it?”, we thought, for a second.

Today at 1651 UTC, we opened an internal incident entitled “Facebook DNS lookup returning SERVFAIL” because we were worried that something was wrong with our DNS resolver 1.1.1.1.  But as we were about to post on our public status page we realized something else more serious was going on.

Social media quickly burst into flames, reporting what our engineers rapidly confirmed too. Facebook and its affiliated services WhatsApp and Instagram were, in fact, all down. Their DNS names stopped resolving, and their infrastructure IPs were unreachable. It was as if someone had “pulled the cables” from their data centers all at once and disconnected them from the Internet.

How’s that even possible?

Meet BGP

BGP stands for Border Gateway Protocol. It’s a mechanism to exchange routing information between autonomous systems (AS) on the Internet. The big routers that make the Internet work have huge, constantly updated lists of the possible routes that can be used to deliver every network packet to their final destinations. Without BGP, the Internet routers wouldn’t know what to do, and the Internet wouldn’t work.

The Internet is literally a network of networks, and it’s bound together by BGP. BGP allows one network (say Facebook) to advertise its presence to other networks that form the Internet. As we write Facebook is not advertising its presence, ISPs and other networks can’t find Facebook’s network and so it is unavailable.

The individual networks each have an ASN: an Autonomous System Number. An Autonomous System (AS) is an individual network with a unified internal routing policy. An AS can originate prefixes (say that they control a group of IP addresses), as well as transit prefixes (say they know how to reach specific groups of IP addresses).

Cloudflare’s ASN is AS13335. Every ASN needs to announce its prefix routes to the Internet using BGP; otherwise, no one will know how to connect and where to find us.

Our learning center has a good overview of what BGP and ASNs are and how they work.

In this simplified diagram, you can see six autonomous systems on the Internet and two possible routes that one packet can use to go from Start to End. AS1 → AS2 → AS3 being the fastest, and AS1 → AS6 → AS5 → AS4 → AS3 being the slowest, but that can be used if the first fails.

Understanding How Facebook Disappeared from the Internet

At 1658 UTC we noticed that Facebook had stopped announcing the routes to their DNS prefixes. That meant that, at least, Facebook’s DNS servers were unavailable. Because of this Cloudflare’s 1.1.1.1 DNS resolver could no longer respond to queries asking for the IP address of facebook.com or instagram.com.

route-views>show ip bgp 185.89.218.0/23
% Network not in table
route-views>

route-views>show ip bgp 129.134.30.0/23
% Network not in table
route-views>

Meanwhile, other Facebook IP addresses remained routed but weren’t particularly useful since without DNS Facebook and related services were effectively unavailable:

route-views>show ip bgp 129.134.30.0   
BGP routing table entry for 129.134.0.0/17, version 1025798334
Paths: (24 available, best #14, table default)
  Not advertised to any peer
  Refresh Epoch 2
  3303 6453 32934
    217.192.89.50 from 217.192.89.50 (138.187.128.158)
      Origin IGP, localpref 100, valid, external
      Community: 3303:1004 3303:1006 3303:3075 6453:3000 6453:3400 6453:3402
      path 7FE1408ED9C8 RPKI State not found
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
route-views>

We keep track of all the BGP updates and announcements we see in our global network. At our scale, the data we collect gives us a view of how the Internet is connected and where the traffic is meant to flow from and to everywhere on the planet.

A BGP UPDATE message informs a router of any changes you’ve made to a prefix advertisement or entirely withdraws the prefix. We can clearly see this in the number of updates we received from Facebook when checking our time-series BGP database. Normally this chart is fairly quiet: Facebook doesn’t make a lot of changes to its network minute to minute.

But at around 15:40 UTC we saw a peak of routing changes from Facebook. That’s when the trouble began.

Understanding How Facebook Disappeared from the Internet

If we split this view by routes announcements and withdrawals, we get an even better idea of what happened. Routes were withdrawn, Facebook’s DNS servers went offline, and one minute after the problem occurred, Cloudflare engineers were in a room wondering why 1.1.1.1 couldn’t resolve facebook.com and worrying that it was somehow a fault with our systems.

Understanding How Facebook Disappeared from the Internet

With those withdrawals, Facebook and its sites had effectively disconnected themselves from the Internet.

DNS gets affected

As a direct consequence of this, DNS resolvers all over the world stopped resolving their domain names.

➜  ~ dig @1.1.1.1 facebook.com
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31322
;facebook.com.			IN	A
➜  ~ dig @1.1.1.1 whatsapp.com
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31322
;whatsapp.com.			IN	A
➜  ~ dig @8.8.8.8 facebook.com
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31322
;facebook.com.			IN	A
➜  ~ dig @8.8.8.8 whatsapp.com
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31322
;whatsapp.com.			IN	A

This happens because DNS, like many other systems on the Internet, also has its routing mechanism. When someone types the https://facebook.com URL in the browser, the DNS resolver, responsible for translating domain names into actual IP addresses to connect to, first checks if it has something in its cache and uses it. If not, it tries to grab the answer from the domain nameservers, typically hosted by the entity that owns it.

If the nameservers are unreachable or fail to respond because of some other reason, then a SERVFAIL is returned, and the browser issues an error to the user.

Again, our learning center provides a good explanation on how DNS works.

Understanding How Facebook Disappeared from the Internet

Due to Facebook stopping announcing their DNS prefix routes through BGP, our and everyone else’s DNS resolvers had no way to connect to their nameservers. Consequently, 1.1.1.1, 8.8.8.8, and other major public DNS resolvers started issuing (and caching) SERVFAIL responses.

But that’s not all. Now human behavior and application logic kicks in and causes another exponential effect. A tsunami of additional DNS traffic follows.

This happened in part because apps won’t accept an error for an answer and start retrying, sometimes aggressively, and in part because end-users also won’t take an error for an answer and start reloading the pages, or killing and relaunching their apps, sometimes also aggressively.

This is the traffic increase (in number of requests) that we saw on 1.1.1.1:

Understanding How Facebook Disappeared from the Internet

So now, because Facebook and their sites are so big, we have DNS resolvers worldwide handling 30x more queries than usual and potentially causing latency and timeout issues to other platforms.

Fortunately, 1.1.1.1 was built to be Free, Private, Fast (as the independent DNS monitor DNSPerf can attest), and scalable, and we were able to keep servicing our users with minimal impact.

The vast majority of our DNS requests kept resolving in under 10ms. At the same time, a minimal fraction of p95 and p99 percentiles saw increased response times, probably due to expired TTLs having to resort to the Facebook nameservers and timeout. The 10 seconds DNS timeout limit is well known amongst engineers.

Understanding How Facebook Disappeared from the Internet

Impacting other services

People look for alternatives and want to know more or discuss what’s going on. When Facebook became unreachable, we started seeing increased DNS queries to Twitter, Signal and other messaging and social media platforms.

Understanding How Facebook Disappeared from the Internet

We can also see another side effect of this unreachability in our WARP traffic to and from Facebook’s affected ASN 32934. This chart shows how traffic changed from 15:45 UTC to 16:45 UTC compared with three hours before in each country. All over the world WARP traffic to and from Facebook’s network simply disappeared.

Understanding How Facebook Disappeared from the Internet

The Internet

Today’s events are a gentle reminder that the Internet is a very complex and interdependent system of millions of systems and protocols working together. That trust, standardization, and cooperation between entities are at the center of making it work for almost five billion active users worldwide.

Update

At around 21:00 UTC we saw renewed BGP activity from Facebook’s network which peaked at 21:17 UTC.

Understanding How Facebook Disappeared from the Internet

This chart shows the availability of the DNS name ‘facebook.com’ on Cloudflare’s DNS resolver 1.1.1.1. It stopped being available at around 15:50 UTC and returned at 21:20 UTC.

Understanding How Facebook Disappeared from the Internet

Undoubtedly Facebook, WhatsApp and Instagram services will take further time to come online but as of 22:28 UTC Facebook appears to be reconnected to the global Internet and DNS working again.

Tackling Email Spoofing and Phishing

Post Syndicated from Hannes Gerhart original https://blog.cloudflare.com/tackling-email-spoofing/

Tackling Email Spoofing and Phishing

Tackling Email Spoofing and Phishing

Today we’re rolling out a new tool to tackle email spoofing and phishing and improve email deliverability: The new Email Security DNS Wizard can be used to create DNS records that prevent others from sending malicious emails on behalf of your domain. This new feature also warns users about insecure DNS configurations on their domain and shows recommendations on how to fix them. The feature will first be rolled out to users on the Free plan and over the next weeks be made available for Pro, Business and Enterprise customers, as well.

Tackling Email Spoofing and Phishing

Before we dive into what magic this wizard is capable of, let’s take a step back and take a look at the problem: email spoofing and phishing.

What is email spoofing and phishing?

Spoofing is the process of posing as someone else which can be used in order to gain some kind of illicit advantage. One example is domain spoofing where someone hosts a website like mycoolwebpaqe.xyz  to trick users of mycoolwebpage.xyz to provide sensitive information without knowing they landed on a false website. When looking at the address bar side by side in a browser, it’s very hard to spot the difference.

Tackling Email Spoofing and Phishing

Then, there is email spoofing. In order to understand how this works, let’s take a look at a Cloudflare product update email I received on my personal email address. With most email providers you can look at the full source of an email which contains a number of headers and of course the body of the email.

Date: Thu, 23 Sep 2021 10:30:02 -0500 (CDT)
From: Cloudflare <[email protected]>
Reply-To: [email protected]
To: <my_personal_email_address>

Above you can see four headers of the email, when it was received, who it came from, who I should reply to, and my personal email address. The value of the From header is used to display the sender in my email program.

Tackling Email Spoofing and Phishing

When I receive an email as above, I automatically assume this email has been sent from Cloudflare. However, nobody is stopped from sending an email with a modified From header from their mail server. If my email provider is not performing the right checks, which we will cover later in this blog post, I could be tricked into believing that an email was sent from Cloudflare, but it actually was not.

Tackling Email Spoofing and Phishing

This brings us to the second attack type: phishing. Let’s say a malicious actor has successfully used email spoofing to send emails to your company’s customers that seem to originate from one of your corporate service emails. The content of these emails look exactly like a legitimate email from your company using the same styling and format. The email text could be an urgent message to update some account information including a hyperlink to the alleged web portal. If the receiving mail server of a user does not flag the email as spam or insecure origin, the user might click on the link which could execute malicious code or lead them to a spoofed domain asking for sensitive information.

According to the FBI’s 2020 Internet Crime Report, phishing was the most common cyber crime in 2020 with over 240,000 victims leading to a loss of over $50M. And the number of victims has more than doubled since 2019 and is almost ten times higher than in 2018.

Tackling Email Spoofing and Phishing

In order to understand how most phishing attacks are carried out, let’s take a closer look at the findings of the 2020 Verizon Data Breach Investigations Report. It lies out that phishing accounts for more than 80% of all “social actions”, another word for social engineering attacks, making it by far the most common type of such an attack. Furthermore, the report states that 96% of social actions are sent via email and only 3% through a website and 1% via phone or text.

Tackling Email Spoofing and Phishing

This clearly shows that email phishing is a serious problem causing Internet users a big headache. So let’s see what domain owners can do to stop bad actors from misusing their domain for email phishing.

How can DNS help prevent this?

Luckily, there are three anti-spoofing mechanisms already built into the Domain Name System (DNS):

  • Sender Policy Framework (SPF)
  • DomainKeys Identified Mail (DKIM)
  • Domain-based Message Authentication Reporting and Conformance (DMARC)

However, it is not trivial to configure them correctly, especially for someone less experienced. In case your configuration is too strict, legitimate emails will be dropped or marked as spam. And if you keep your configuration too relaxed, your domain might be misused for email spoofing.

Sender Policy Framework (SPF)

SPF is used to specify which IP addresses and domains are permitted to send email on behalf of your domain. An SPF policy is published as a TXT record on your domain, so everyone can access it via DNS. Let’s inspect the TXT record for cloudflare.com:

cloudflare.com 	TXT	"v=spf1 ip4:199.15.212.0/22 ip4:173.245.48.0/20 include:_spf.google.com include:spf1.mcsv.net include:spf.mandrillapp.com include:mail.zendesk.com include:stspg-customer.com include:_spf.salesforce.com -all"

An SPF TXT record always needs to start with v=spf1. It usually contains a list of IP addresses of sending email servers using the ip4 or ip6 mechanism. The include mechanism is used to reference another SPF record on another domain. This is usually done if you are relying on other providers that need to send emails on our behalf. You can see a few examples in the SPF record of cloudflare.com above: we’re using Zendesk as customer support software and Mandrill for marketing and transactional emails.

Finally, there is the catch-all mechanism -all which specifies how all incoming, but unspecified emails should be treated The catch-all mechanism is preceded by a qualifier that can be either + (Allow), ~ (Softfail) or – (Fail). Using the Allow qualifier is not recommended as it basically makes the SPF record useless and allows all IP addresses and domains to send email on behalf of your domain. Softfail is interpreted differently by receiving mail servers, marking an email as Spam or insecure, depending on the server. Fail tells a server not to accept any emails originating from unspecified sources.

Tackling Email Spoofing and Phishing

The diagram above shows the steps taken to ensure a received email is SPF compliant.

  1. The email is sent from the IP address 203.0.113.10 and contains a From header with the value of [email protected].
  2. After receiving the email, the receiver queries the SPF record on mycoolwebpage.xyz to retrieve the SPF policy for this domain.
  3. The receiver checks if the sending IP address 203.0.113.10 is listed in the SPF record. If it is, the email succeeds the SPF check. If it is not, the qualifier of the all mechanism defines the outcome.

For a full list of all mechanisms and more details about SPF, refer to RFC7208.

DomainKeys Identified Mail (DKIM)

Okay, with SPF we’ve ensured that only permitted IP addresses and domains can send emails on behalf of cloudflare.com. But what if one of the IPs changes owner without us noticing and updating the SPF record? Or what if someone else who is also using Google’s email server on the same IP tries to spoof emails coming from cloudflare.com?

This is where DKIM comes in. DKIM provides a mechanism to cryptographically sign parts of an email — usually the body and certain headers — using public key encryption. Before we dive into how this works, let’s take a look at the DKIM record used for cloudflare.com:

google._domainkey.cloudflare.com.   TXT   "v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDMxbNxA2V84XMpZgzMgHHey3TQFvHkwlPF2a11Ex6PGD71Sp8elVMMCdZhPYqDlzbehg9aWVwPz0+n3oRD73o+JXoSswgUXPV82O8s8dGny/BAJklo0+y+Bh2Op4YPGhClT6mRO2i5Qiqo4vPCuc6GB34Fyx7yhreDNKY9BNMUtQIDAQAB"

The structure of a DKIM record is <selector>._domainkey.<domain>, where the selector is provided by your email provider. The content of the DKIM TXT record always starts with v=DKIM1 followed by the public key. We can see the public key type, referenced by the k tag, and the public key itself, preceded by the p tag.

Below is a simplified sequence how the signing and DKIM check work:

  1. The sending email server creates a hash from the email content.
  2. The sending email server encrypts this hash with the private DKIM key.
  3. The email is sent, containing the encrypted hash.
  4. The receiving email server retrieves the public key from the DKIM TXT record published on the email domain.
  5. The receiving email server decrypts the hash using the public DKIM key.
  6. The receiving email server generates the hash from the email content.
  7. If the decrypted hash and the generated hash match, email authenticity is proven. Otherwise, the DKIM check fails.

The full DKIM specification can be found in RFC4871 and RFC5672.

Domain-based Message Authentication Reporting and Conformance (DMARC)

Domain-based Message Authentication Reporting and Conformance, that’s definitely a mouthful. Let’s focus on two words: Reporting and Conformance. DMARC provides exactly that. Regular reports let the email sender know how many emails are non-conforming and potentially spoofed. Conformance helps provide a clear signal to the email receiver how to treat non-conforming emails. Email receivers might impose their own policies for emails that fail SPF or DKIM checks even without a DMARC record. However, the policy configured on the DMARC record is an explicit instruction by the email sender, so it increases the confidence for email receivers what to do with non-conforming emails.

Here is the DMARC record for cloudflare.com

_dmarc.cloudflare.com.   TXT   "v=DMARC1; p=reject; pct=100; rua=mailto:[email protected]"

The DMARC TXT record is always set on the _dmarc subdomain of the email domain and — similar to SPF and DKIM — the content needs to start with v=DMARC1. Then we see three additional tags:

The policy tag (p) indicates how email receivers should treat emails that fail SPF or DKIM checks. Possible values are none, reject, and quarantine. The none policy is also called monitoring only and allows emails failing the checks to still be accepted. By specifying quarantine, email receivers will put SPF or DKIM non-conforming emails in the Spam folder. With reject, emails are dropped and not delivered at all if they fail SPF or DKIM.

The percentage tag (pct) can be used to apply the specified policy to a subset of incoming emails. This is helpful if you’re just rolling out DMARC and want to make sure everything is configured correctly by testing on a subset.

The last tag we can find on the record is the reporting URI (rua). This is used to specify an email address that will receive aggregate reports (usually daily) about non-conforming emails.

Tackling Email Spoofing and Phishing

Above, we can see how DMARC works step by step.

  1. An email is sent with a From header containing [email protected].
  2. The receiver queries SPF, DKIM and DMARC records from the domain mycoolwebpage.xyz to retrieve the required policies and the DKIM public key.
  3. The receiver performs SPF and DKIM checks as outlined previously. If both succeed, the email is accepted and delivered to the inbox. If either SPF or DKIM check fails, the DMARC policy will be followed and determines if the email is still accepted, dropped or sent to the spam folder.
  4. Finally, the receiver sends back an aggregate report. Depending on the email specified in the rua tag this report could also be sent to a different email server which is responsible for that email address.

Other optional tags and the complete DMARC specification is described in RFC7489.

A few numbers on the current adoption

Now that we’ve learned what the problem is and how to tackle it using SPF, DKIM, and DMARC, let’s see how widely they’re adopted.

Dmarc.org has published the adoption of DMARC records of domains in a representative dataset. It shows that by the end of 2020, still less than 50% of domains even had a DMARC record. And remember, without a DMARC record there is no clear enforcement of SPF and DKIM checks. The study further shows that, of the domains that have a DMARC record, more than 65% are using the monitoring only policy (p=none), so there is a significant potential to drive this adoption higher. The study does not mention if these domains are sending or receiving emails, but even if they didn’t, a secure configuration should include a DMARC record (more about this later).

Tackling Email Spoofing and Phishing

Another report from August 1, 2021 tells a similar story for domains that belong to entities in the banking sector. Of 2,881 banking entities in the United States, only 44% have published a DMARC record on their domain. Of those that have a DMARC record, roughly 2 out of 5 have set the DMARC policy to None and another 8% are considered “Misconfigured”. Denmark has a very high adoption of DMARC on domains in the banking sector of 94%, in contrast to Japan where only 13% of domains are using DMARC. SPF adoption is on average significantly higher than DMARC, which might have to do with the fact that the SPF standard was first introduced as experimental RFC in 2006 and DMARC only became a standard in 2015.

Country Number of entities SPF present DMARC present
Denmark 53 91% 94%
UK 83 88% 76%
Canada 24 96% 67%
USA 2,881 91% 44%
Germany 39 74% 36%
Japan 90 82% 13%

This shows us there is quite some room for improvement.

Enter: Email Security DNS Wizard

So what are we doing to increase the adoption of SPF, DKIM, and DMARC and tackle email spoofing and phishing? Enter the new Cloudflare Email Security DNS Wizard.

Starting today, when you’re navigating to the DNS tab of the Cloudflare dashboard, you’ll see two new features:

  1. A new section called Email Security
Tackling Email Spoofing and Phishing
  1. New warnings about insecure configurations
Tackling Email Spoofing and Phishing

In order to start using the Email Security DNS Wizard, you can either directly click the link in the warning which brings you to the relevant section of the wizard or click Configure in the new Email Security section. The latter will show you the following available options:

Tackling Email Spoofing and Phishing

There are two scenarios. You’re using your domain to send email, or you don’t. If you do, you can configure SPF, DKIM, and DMARC records by following a few simple steps. Here you can see the steps for SPF:

If your domain is not sending email, there is an easy way to configure all three records with just one click:

Tackling Email Spoofing and Phishing

Once you click Submit, this will create all three records configured in such a way that all emails will fail the checks and be rejected by incoming email servers.

example.com			TXT	"v=spf1 -all"
*._domainkey.example.com.	TXT	"v=DKIM1; p="
_dmarc.example.com.		TXT	"v=DMARC1; p=reject; sp=reject; adkim=s; aspf=s;"

Help tackle email spoofing and phishing

Out you go and make sure your domain is secured against email spoofing and phishing. Just head over to the DNS tab in the Cloudflare dashboard or if you are not yet using Cloudflare DNS, sign up for free in just a few minutes on cloudflare.com.

If you want to read more about SPF, DKIM and DMARC, go check out our new learning pages about email related DNS records.

Oblivious DNS-over-HTTPS

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/12/oblivious-dns-over-https.html

This new protocol, called Oblivious DNS-over-HTTPS (ODoH), hides the websites you visit from your ISP.

Here’s how it works: ODoH wraps a layer of encryption around the DNS query and passes it through a proxy server, which acts as a go-between the internet user and the website they want to visit. Because the DNS query is encrypted, the proxy can’t see what’s inside, but acts as a shield to prevent the DNS resolver from seeing who sent the query to begin with.

IETF memo.

The paper:

Abstract: The Domain Name System (DNS) is the foundation of a human-usable Internet, responding to client queries for host-names with corresponding IP addresses and records. Traditional DNS is also unencrypted, and leaks user information to network operators. Recent efforts to secure DNS using DNS over TLS (DoT) and DNS over HTTPS (DoH) havebeen gaining traction, ostensibly protecting traffic and hiding content from on-lookers. However, one of the criticisms ofDoT and DoH is brought to bear by the small number of large-scale deployments (e.g., Comcast, Google, Cloudflare): DNS resolvers can associate query contents with client identities in the form of IP addresses. Oblivious DNS over HTTPS (ODoH) safeguards against this problem. In this paper we ask what it would take to make ODoH practical? We describe ODoH, a practical DNS protocol aimed at resolving this issue by both protecting the client’s content and identity. We implement and deploy the protocol, and perform measurements to show that ODoH has comparable performance to protocols like DoH and DoT which are gaining widespread adoption,while improving client privacy, making ODoH a practical privacy enhancing replacement for the usage of DNS.

Slashdot thread.

How to protect a self-managed DNS service against DDoS attacks using AWS Global Accelerator and AWS Shield Advanced

Post Syndicated from Chido Chemambo original https://aws.amazon.com/blogs/security/how-to-protect-a-self-managed-dns-service-against-ddos-attacks-using-aws-global-accelerator-and-aws-shield-advanced/

In this blog post, I show you how to improve the distributed denial of service (DDoS) resilience of your self-managed Domain Name System (DNS) service by using AWS Global Accelerator and AWS Shield Advanced. You can use those services to incorporate some of the techniques used by Amazon Route 53 to protect against DDoS attacks.

DNS routes users to your application by quickly translating a human-readable domain name to a machine-readable IP address. When protecting the availability of your application against DDoS attacks, it’s important to consider every part of the stack, including domain name resolution. The recommended best practice is to create hosted zones on Route 53, a scalable, highly available DNS service that’s protected against large DDoS attacks and query floods. Route 53 uses anycast routing to serve DNS queries from more than 150 edge locations around the globe. With anycast routing, DNS queries are served from locations that are closer to your users and the globally distributed DDoS mitigation capacity of Amazon Web Services (AWS) reduces the impact of attacks.

Optionally, you can also build your own DNS service on Amazon Elastic Compute Cloud (Amazon EC2). For example, you can run your own proprietary DNS server to take advantage of custom features that you wrote to integrate with an existing DNS service that isn’t running on AWS. When you register a domain name, you’re usually required to provide at least two name servers that can respond to queries from your users. It’s possible to build a DNS service on only two instances, but that provides limited DDoS resilience.

Solution overview

To protect your self-managed DNS service using this solution, you need a strong understanding of DNS and how to operate a distributed, self-managed DNS service on Amazon EC2. This solution improves upon an existing self-managed DNS service by significantly enhancing its ability to withstand DDoS attacks. There are two components that you add to your application:

  • You use Global Accelerator to provide your application with two static IP addresses that act as a fixed entry point to Amazon EC2 instances in multiple AWS Regions. Global Accelerator uses anycast to route your traffic to a point of entry close to the source of the traffic. In addition to providing availability and performance benefits, this gives you access to global DDoS mitigation capacity through AWS.
  • You use Shield Advanced to monitor the availability of your application and automatically engage the AWS Shield Response Team (SRT) if its availability is affected by a DDoS attack. When you associate a Route 53 health check to your protected resources, Shield Advanced uses the health of the application as an input for detection and as a signal to SRT to contact your operations center when needed. You can also engage with SRT to write custom mitigations for your application. For your self-managed DNS service use case, this can include mitigations like DNS packet validation and suspicion scoring that gives a higher priority to queries that are more likely to be legitimate traffic for your application.

As part of this solution, you will build a DNS canary that uses Amazon CloudWatch to update the status of a Route 53 health check if your self-managed DNS service stops responding to queries. An example architecture using Amazon EC2 based DNS behind Global Accelerator and Shield is shown in figure 1.

Figure 1: Amazon EC2 based DNS behind Global Accelerator and Shield

Figure 1: Amazon EC2 based DNS behind Global Accelerator and Shield

Create and configure an accelerator

To begin, create an accelerator and add your existing DNS servers as endpoints. The newly created accelerator will receive queries and forward them to your DNS service.

To create and configure an accelerator

Step 1: Create an accelerator

  1. Navigate to the AWS Global Accelerator dashboard.
  2. Choose Create accelerator.
  3. Enter a name for your accelerator.
  4. Choose Next.

Step 2: Add listeners

Since DNS uses both TCP and UDP protocols, you must create separate listeners to handle requests for each protocol.

At the Add Listeners step, enter the following:

  1. Ports: 53
  2. Protocol: TCP
  3. Client affinity: None

Choose Add listener again to add the UDP listener. Enter the following:

  1. Ports: 53
  2. Protocol: UDP
  3. Client affinity: None
  4. Choose Next

To learn more about the different options available in this step, see To create a listener in Getting started with AWS Global Accelerator.

Step 3: Add endpoint groups

Starting with the TCP listener, enter the following settings:

  1. Region: Choose a Region that your DNS instances are located in, for example, us-east-1.
  2. Traffic dial: 100
  3. If you have additional DNS instances in another AWS Region, choose Add endpoint group and repeat steps a) and b), entering the appropriate Region.
  4. Repeat steps a) through c) to add endpoint groups for the UDP listener, and then choose Next.

To learn more about the different options available in this step, for example, Traffic dial, see the Add endpoint groups in Getting started with AWS Global Accelerator.

Step 4: Add endpoints

Starting with the TCP listener, enter the following in the form boxes for each Region specified in the previous step:

  1. Endpoint type: Select EC2 instance from the drop-down list.
  2. Endpoint: Select a DNS instance from the drop-down list.
  3. Weight: 128

If you have additional DNS instances in the Region, choose Add endpoint and repeat the preceding steps, but select a DNS instance that hasn’t been added as an endpoint.

Repeat all of the preceding steps for the UDP listener, then choose Create accelerator.

To learn more about the different options available in this step, see the Add endpoints in Getting started with AWS Global Accelerator.

Step 5: Verification

When you choose the Create accelerator button, you’re redirected to a Global Accelerator console page that lists all the accelerators in your account. On this page, you can view the global IPs and DNS name allocated to your newly created accelerator, in addition to the current status.

Wait until the status of the accelerators changes to Deployed before proceeding with any tests.

Configure Shield Advanced and Shield Advanced proactive engagement

Protect your accelerator with Shield Advanced, monitor the health of your application, and configure proactive engagement. When you turn on proactive engagement, the SRT will directly contact you if an Amazon Route 53 health check associated with your protected resource becomes unhealthy during an event that’s detected by Shield Advanced.

To configure proactive engagement

Step 1: Create a Route 53 health check

If you already have a Route 53 health check that monitors the health of your DNS service, you can proceed to step 2 of this section. If you don’t yet have a health check, you can use this AWS CloudFormation template to create one. The template will:

  1. Create a Lambda function that queries your DNS server through the accelerator global IPs. This function posts metrics to CloudWatch to indicate whether the query was successful or not.
  2. Create a CloudWatch alarm that will detect when DNS queries fail.
  3. Create a Route 53 health check that tracks the CloudWatch alarm and changes status to unhealthy when the alarm changes to the Alarm state.

Step 2: Subscribe to Shield Advanced

Please note that with AWS Shield Advanced, you pay a monthly fee of $3,000 per month per organization. In addition, you also pay for AWS Shield Advanced Data Transfer usage fees for AWS resources enabled for advanced protection.

  1. Navigate to the AWS Shield console.
  2. In the AWS Shield navigation bar, choose Getting started, and then choose Subscribe to Shield Advanced.
  3. On the Subscribe to Shield Advanced page, read the terms of agreement, and then select all of the check boxes to indicate that you accept the terms.
  4. Choose Subscribe to Shield Advanced.

Step 3: Add resources to protect

  1. Do one of the following, depending on if you were already subscribed to Shield Advanced.
    • If you just subscribed to Shield Advanced by completing Step 2 above, choose Add resources to protect.
    • If you were already subscribed to Shield Advanced, open the Shield console and choose Protected Resources, and then choose Add resources to protect.
  2. In the Choose resources to protect with Shield Advanced page, select the Regions and resource types that you want to protect, then choose Load resources.
  3. Select the resources that you want to protect, and then choose Protect with Shield Advanced.
  4. In the Configure health check based DDoS detection page, under the Protected resources section, select a Route 53 health check to add—either one that you created previously, or a health check created by the AWS CloudFormation template—as the Associated Health Check.
  5. Choose Next until you reach the Review and configure DDoS mitigation and visibility page, and then review the settings and choose Finish configuration.

Step 4: Add contacts

  1. Navigate to the Overview tab of the AWS Shield console.
  2. In the Proactive engagements and contacts section, choose Edit under the Contacts heading.
  3. In the Add contact form, add the contact’s Email, Phone number, and Notes.
  4. Choose Save.

Step 5: Request proactive engagement

  1. Choose Edit proactive engagement feature.
  2. Select Enable.
  3. Choose Save.

Step 6: Configuration review with the SRT

After you enable proactive engagement, the state will be Proactive engagement requested and pending.

SRT will contact you to schedule a configuration review. The review will include a review of your Route 53 health check configuration and a consultation about custom mitigations that can be configured to support your DNS use case. Following this review, SRT will complete your request to enable proactive engagement.

Summary

DNS is a foundational part of the user experience for any application that is accessed via a human readable domain name. Your DNS service should be highly available, DDoS resilient, and accessible to your users with minimal latency. If you run your own DNS service on Amazon EC2, you can improve the DDoS resiliency using Global Accelerator and Shield Advanced. This solution provides your users with a low latency path to your DNS service and provides you with some of the DDoS mitigation that protects Route 53. To learn more about DDoS best practices, see AWS Best Practices for DDoS Resiliency.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Shield forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Chido Chemambo

Chido is a Security Engineer on the AWS Shield Team with 12 years of experience in the telecommunications industry. He specializes in network security and enjoys working with colleagues to improve AWS Shield, and with customers to improve their cloud architectures. Outside of work, Chido enjoys jumping rope, improving his development skills, and watching English Premier League soccer and Formula 1.

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

Post Syndicated from Tanya Verma original https://blog.cloudflare.com/oblivious-dns/

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

Today we are announcing support for a new proposed DNS standard — co-authored by engineers from Cloudflare, Apple, and Fastly — that separates IP addresses from queries, so that no single entity can see both at the same time. Even better, we’ve made source code available, so anyone can try out ODoH, or run their own ODoH service!

But first, a bit of context. The Domain Name System (DNS) is the foundation of a human-usable Internet. It maps usable domain names, such as cloudflare.com, to IP addresses and other information needed to connect to that domain. A quick primer about the importance and issues with DNS can be read in a previous blog post. For this post, it’s enough to know that, in the initial design and still dominant usage of DNS, queries are sent in cleartext. This means anyone on the network path between your device and the DNS resolver can see both the query that contains the hostname (or website) you want, as well as the IP address that identifies your device.

To safeguard DNS from onlookers and third parties, the IETF standardized DNS encryption with DNS over HTTPS (DoH) and DNS over TLS (DoT). Both protocols prevent queries from being intercepted, redirected, or modified between the client and resolver. Client support for DoT and DoH is growing, having been implemented in recent versions of Firefox, iOS, and more. Even so, until there is wider deployment among Internet service providers, Cloudflare is one of only a few providers to offer a public DoH/DoT service. This has raised two main concerns. One concern is that the centralization of DNS introduces single points of failure (although, with data centers in more than 100 countries, Cloudflare is designed to always be reachable). The other concern is that the resolver can still link all queries to client IP addresses.

Cloudflare is committed to end-user privacy. Users of our public DNS resolver service are protected by a strong, audited privacy policy. However, for some, trusting Cloudflare with sensitive query information is a barrier to adoption, even with such a strong privacy policy. Instead of relying on privacy policies and audits, what if we could give users an option to remove that bar with technical guarantees?

Today, Cloudflare and partners are launching support for a protocol that does exactly that: Oblivious DNS over HTTPS, or ODoH for short.

ODoH Partners:

We’re excited to launch ODoH with several leading launch partners who are equally committed to privacy.

A key component of ODoH is a proxy that is disjoint from the target resolver. Today, we’re launching ODoH with several leading proxy partners, including: PCCW, SURF, and Equinix.

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

“ODoH is a revolutionary new concept designed to keep users’ privacy at the center of everything. Our ODoH partnership with Cloudflare positions us well in the privacy and “Infrastructure of the Internet” space. As well as the enhanced security and performance of the underlying PCCW Global network, which can be accessed on-demand via Console Connect, the performance of the proxies on our network are now improved by Cloudflare’s 1.1.1.1 resolvers. This model for the first time completely decouples client proxy from the resolvers. This partnership strengthens our existing focus on privacy as the world moves to a more remote model and privacy becomes an even more critical feature.” — Michael Glynn, Vice President, Digital Automated Innovation, PCCW Global

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

“We are partnering with Cloudflare to implement better user privacy via ODoH. The move to ODoH is a true paradigm shift, where the users’ privacy or the IP address is not exposed to any provider, resulting in true privacy. With the launch of ODoH-pilot, we’re joining the power of Cloudflare’s network to meet the challenges of any users around the globe. The move to ODoH is not only a paradigm shift but it emphasizes how privacy is important to any users than ever, especially during 2020. It resonates with our core focus and belief around Privacy.” — Joost van Dijk, Technical Product Manager, SURF

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

How does Oblivious DNS over HTTPS (ODoH) work?

ODoH works by adding a layer of public key encryption, as well as a network proxy between clients and DoH servers such as 1.1.1.1. The combination of these two added elements guarantees that only the user has access to both the DNS messages and their own IP address at the same time.

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

There are three players in the ODoH path. Looking at the figure above, let’s begin with the target. The target decrypts queries encrypted by the client, via a proxy. Similarly, the target encrypts responses and returns them to the proxy. The standard says that the target may or may not be the resolver (we’ll touch on this later). The proxy does as a proxy is supposed to do, in that it forwards messages between client and target. The client behaves as it does in DNS and DoH, but differs by encrypting queries for the target, and decrypting the target’s responses. Any client that chooses to do so can specify a proxy and target of choice.

Together, the added encryption and proxying provide the following guarantees:

  1. The target sees only the query and the proxy’s IP address.
  2. The proxy has no visibility into the DNS messages, with no ability to identify, read, or modify either the query being sent by the client or the answer being returned by the target.
  3. Only the intended target can read the content of the query and produce a response.

These three guarantees improve client privacy while maintaining the security and integrity of DNS queries. However, each of these guarantees relies on one fundamental property — that the proxy and the target servers do not collude. So long as there is no collusion, an attacker succeeds only if both the proxy and target are compromised.

One aspect of this system worth highlighting is that the target is separate from the upstream recursive resolver that performs DNS resolution. In practice, for performance, we expect the target to be the same. In fact, 1.1.1.1 is now both a recursive resolver and a target! There is no reason that a target needs to exist separately from any resolver. If they are separated then the target is free to choose resolvers, and just act as a go-between. The only real requirement, remember, is that the proxy and target never collude.

Also, importantly, clients are in complete control of proxy and target selection. Without any need for TRR-like programs, clients can have privacy for their queries, in addition to security. Since the target only knows about the proxy, the target and any upstream resolver are oblivious to the existence of any client IP addresses. Importantly, this puts clients in greater control over their queries and the ways they might be used. For example, clients could select and alter their proxies and targets any time, for any reason!

ODoH Message Flow

In ODoH, the ‘O’ stands for oblivious, and this property comes from the level of encryption of the DNS messages themselves. This added encryption is `end-to-end` between client and target, and independent from the connection-level encryption provided by TLS/HTTPS. One might ask why this additional encryption is required at all in the presence of a proxy. This is because two separate TLS connections are required to support proxy functionality. Specifically, the proxy terminates a TLS connection from the client, and initiates another TLS connection to the target. Between those two connections, the DNS message contexts would otherwise appear in plaintext! For this reason, ODoH additionally encrypts messages between client and target so the proxy has no access to the message contents.

The whole process begins with clients that encrypt their query for the target using HPKE. Clients obtain the target’s public key via DNS, where it is bundled into a HTTPS resource record and protected by DNSSEC. When the TTL for this key expires, clients request a new copy of the key as needed (just as they would for an A/AAAA record when that record’s TTL expires). The usage of a target’s DNSSEC-validated public key guarantees that only the intended target can decrypt the query and encrypt a response (answer).

Clients transmit these encrypted queries to a proxy over an HTTPS connection. Upon receipt, the proxy forwards the query to the designated target. The target then decrypts the query, produces a response by sending the query to a recursive resolver such as 1.1.1.1, and then encrypts the response to the client. The encrypted query from the client contains encapsulated keying material from which targets derive the response encryption symmetric key.

This response is then sent back to the proxy, and then subsequently forwarded to the client. All communication is authenticated and confidential since these DNS messages are end-to-end encrypted, despite being transmitted over two separate HTTPS connections (client-proxy and proxy-target). The message that otherwise appears to the proxy as plaintext is actually an encrypted garble.

What about Performance? Do I have to trade performance to get privacy?

We’ve been doing lots of measurements to find out, and will be doing more as ODoH deploys more widely. Our initial set of measurement configurations spanned cities in the USA, Canada, and Brazil. Importantly, our measurements include not just 1.1.1.1, but also 8.8.8.8 and 9.9.9.9. The full set of measurements, so far, is documented for open access.

In those measurements, it was important to isolate the cost of proxying and additional encryption from the cost of TCP and TLS connection setup. This is because the TLS and TCP costs are incurred by DoH, anyway. So, in our setup, we ‘primed’ measurements by establishing connections once and reusing that connection for all measurements. We did this for both DoH and for ODoH, since the same strategy could be used in either case.

The first thing that we can say with confidence is that the additional encryption is marginal. We know this because we randomly selected 10,000 domains from the Tranco million dataset and measured both encryption of the A record with a different public key, as well as its decryption. The additional cost between a proxied DoH query/response and its ODoH counterpart is consistently less than 1ms at the 99th percentile.

The ODoH request-response pipeline, however, is much more than just encryption. A very useful way of looking at measurements is by looking at the cumulative distribution chart — if you’re familiar with these kinds of charts, skip to the next paragraph. In contrast to most charts where we start along the x-axis, with cumulative distributions we often start with the y-axis.

The chart below shows the cumulative distributions for query/response times in DoH, ODoH, and DoH when transmitted over the Tor Network. The dashed horizontal line that starts on the left from 0.5 is the 50% mark. Along this horizontal line, for any plotted curve, the part of the curve below the dashed line is 50% of the data points. Now look at the x-axis, which is a measure of time. The lines that appear to the left are faster than lines to the right. One last important detail is that the x-axis is plotted on a logarithmic scale. What does this mean? Notice that the distance between the labeled markers (10x) is equal in cumulative distributions but the ‘x’ is an exponent, and represents orders of magnitude. So, while the time difference between the first two markers is 9ms, the difference between the 3rd and 4th markers is 900ms.

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

In this chart, the middle curve represents ODoH measurements. We also measured the performance of privacy-preserving alternatives, for example, DoH queries transmitted over the Tor network as represented by the right curve in the chart. (Additional privacy-preserving alternatives are captured in the open access technical report.) Compared to other privacy-oriented DNS variants, ODoH cuts query time in half, or better. This point is important since privacy and performance rarely play nicely together, so seeing this kind of improvement is encouraging!

The chart above also tells us that 50% of the time ODoH queries are resolved in fewer than 228ms. Now compare the middle line to the left line that represents ‘straight-line’ (or normal) DoH without any modification. That left plotline says that 50% of the time, DoH queries are resolved in fewer than 146ms. Looking below the 50% mark, the curves also tell us that ½ the time that difference is never greater than 100ms. On the other side, looking at the curves above the 50% mark tells us that ½ ODoH queries are competitive with DoH.

Those curves also hide a lot of information, so it is important to delve further into the measurements. The chart below has three different cumulative distribution curves that describe ODoH performance if we select proxies and targets by their latency. This is also an example of the insights that measurements can reveal, some of which are counterintuitive. For example, looking above 0.5, these curves say that ½ of ODoH query/response times are virtually indistinguishable, no matter the choice of proxy and target. Now shift attention below 0.5 and compare the two solid curves against the dashed curve that represents overall average. This region suggests that selecting the lowest-latency proxy and target offers minimal improvement over the average but, most importantly, it shows that selecting the lowest-latency proxy leads to worse performance!

Improving DNS Privacy with Oblivious DoH in 1.1.1.1

Open questions remain, of course. This first set of measurements were executed largely in North America. Does performance change at a global level? How does this affect client performance, in practice? We’re working on finding out, and this release will help us to do that.

Interesting! Can I experiment with ODoH? Is there an open ODoH service?

Yes, and yes! We have open sourced our interoperable ODoH implementations in Rust, odoh-rs and Go, odoh-go, as well as integrated the target into the Cloudflare DNS Resolver. That’s right, 1.1.1.1 is ready to receive queries via ODoH.

We have also open sourced test clients in Rust, odoh-client-rs, and Go, odoh-client-go, to demo ODoH queries. You can also check out the HPKE configuration used by ODoH for message encryption to 1.1.1.1 by querying the service directly:

$ dig -t type65 +dnssec @ns1.cloudflare.com odoh.cloudflare-dns.com 

; <<>> DiG 9.10.6 <<>> -t type65 +dnssec @ns1.cloudflare.com odoh.cloudflare-dns.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19923
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;odoh.cloudflare-dns.com.	IN	TYPE65

;; ANSWER SECTION:
odoh.cloudflare-dns.com. 300	IN	TYPE65	\# 108 00010000010003026832000400086810F8F96810F9F9000600202606 470000000000000000006810F8F92606470000000000000000006810 F9F98001002E002CFF0200280020000100010020ED82DBE32CCDE189 BC6C643A80B5FAFF82548D21601C613408BACAAE6467B30A
odoh.cloudflare-dns.com. 300	IN	RRSIG	TYPE65 13 3 300 20201119163629 20201117143629 34505 odoh.cloudflare-dns.com. yny5+ApxPSO6Q4aegv09ZnBmPiXxDEnX5Xv21TAchxbxt1VhqlHpb5Oc 8yQPNGXb0fb+NyibmHlvTXjphYjcPA==

;; Query time: 21 msec
;; SERVER: 173.245.58.100#53(173.245.58.100)
;; WHEN: Wed Nov 18 07:36:29 PST 2020
;; MSG SIZE  rcvd: 291

We are working to add ODoH to existing stub resolvers such as cloudflared. If you’re interested in adding support to a client, or if you encounter bugs with the implementations, please drop us a line at [email protected]! Announcements about the ODoH specification and server will be sent to the IETF DPRIVE mailing list. You can subscribe and follow announcements and discussion about the specification here.

We are committed to moving it forward in the IETF and are already seeing interest from client vendors. Eric Rescorla, CTO of Firefox, says, “Oblivious DoH is a great addition to the secure DNS ecosystem. We’re excited to see it starting to take off and are looking forward to experimenting with it in Firefox.” We hope that more operators join us along the way and provide support for the protocol, by running either proxies or targets, and we hope client support will increase as the available infrastructure increases, too.

The ODoH protocol is a practical approach for improving privacy of users, and aims to improve the overall adoption of encrypted DNS protocols without compromising performance and user experience on the Internet.

Acknowledgements

Marek Vavruša and Anbang Wen were instrumental in getting the 1.1.1.1 resolver to support ODoH. Chris Wood and Peter Wu helped get the ODoH libraries ready and tested.

Helping build the next generation of privacy-preserving protocols

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/next-generation-privacy-protocols/

Helping build the next generation of privacy-preserving protocols

Helping build the next generation of privacy-preserving protocols

Over the last ten years, Cloudflare has become an important part of Internet infrastructure, powering websites, APIs, and web services to help make them more secure and efficient. The Internet is growing in terms of its capacity and the number of people using it and evolving in terms of its design and functionality. As a player in the Internet ecosystem, Cloudflare has a responsibility to help the Internet grow in a way that respects and provides value for its users. Today, we’re making several announcements around improving Internet protocols with respect to something important to our customers and Internet users worldwide: privacy.

These initiatives are:

Each of these projects impacts an aspect of the Internet that influences our online lives and digital footprints. Whether we know it or not, there is a lot of private information about us and our lives floating around online. This is something we can help fix.

For over a year, we have been working through standards bodies like the IETF and partnering with the biggest names in Internet technology (including Mozilla, Google, Equinix, and more) to design, deploy, and test these new privacy-preserving protocols at Internet scale. Each of these three protocols touches on a critical aspect of our online lives, and we expect them to help make real improvements to privacy online as they gain adoption.

A continuing tradition at Cloudflare

One of Cloudflare’s core missions is to support and develop technology that helps build a better Internet. As an industry, we’ve made exceptional progress in making the Internet more secure and robust. Cloudflare is proud to have played a part in this progress through multiple initiatives over the years.

Here are a few highlights:

  • Universal SSL™. We’ve been one of the driving forces for encrypting the web. We launched Universal SSL in 2014 to give website encryption to our customers for free and have actively been working along with certificate authorities like Let’s Encrypt, web browsers, and website operators to help remove mixed content. Before Universal SSL launched to give all Cloudflare customers HTTPS for free, only 30% of connections to websites were encrypted. Through the industry’s efforts, that number is now 80% — and a much more significant proportion of overall Internet traffic. Along with doing our part to encrypt the web, we have supported the Certificate Transparency project via Nimbus and Merkle Town, which has improved accountability for the certificate ecosystem HTTPS relies on for trust.
  • TLS 1.3 and QUIC. We’ve also been a proponent of upgrading existing security protocols. Take Transport Layer Security (TLS), the underlying protocol that secures HTTPS. Cloudflare engineers helped contribute to the design of TLS 1.3, the latest version of the standard, and in 2016 we launched support for an early version of the protocol. This early deployment helped lead to improvements to the final version of the protocol. TLS 1.3 is now the most widely used encryption protocol on the web and a vital component of the emerging QUIC standard, of which we were also early adopters.
  • Securing Routing, Naming, and Time. We’ve made major efforts to help secure other critical components of the Internet. Our efforts to help secure Internet routing through our RPKI toolkit, measurement studies, and “Is BGP Safe Yet” tool have significantly improved the Internet’s resilience against disruptive route leaks. Our time service (time.cloudflare.com) has helped keep people’s clocks in sync with more secure protocols like NTS and Roughtime. We’ve also made DNS more secure by supporting DNS-over-HTTPS and DNS-over-TLS in 1.1.1.1 at launch, along with one-click DNSSEC in our authoritative DNS service and registrar.

Continuing to improve the security of the systems of trust online is critical to the Internet’s growth. However, there is a more fundamental principle at play: respect. The infrastructure underlying the Internet should be designed to respect its users.

Building an Internet that respects users

When you sign in to a specific website or service with a privacy policy, you know what that site is expected to do with your data. It’s explicit. There is no such visibility to the users when it comes to the operators of the Internet itself. You may have an agreement with your Internet Service Provider (ISP) and the site you’re visiting, but it’s doubtful that you even know which networks your data is traversing. Most people don’t have a concept of the Internet beyond what they see on their screen, so it’s hard to imagine that people would accept or even understand what a privacy policy from a transit wholesaler or an inspection middlebox would even mean.

Without encryption, Internet browsing information is implicitly shared with countless third parties online as information passes between networks. Without secure routing, users’ traffic can be hijacked and disrupted. Without privacy-preserving protocols, users’ online life is not as private as they would think or expect. The infrastructure of the Internet wasn’t built in a way that reflects their expectations.

Helping build the next generation of privacy-preserving protocols
Normal network flow
Helping build the next generation of privacy-preserving protocols
Network flow with malicious route leak

The good news is that the Internet is continuously evolving. One of the groups that help guide that evolution is the Internet Architecture Board (IAB). The IAB provides architectural oversight to the Internet Engineering Task Force (IETF), the Internet’s main standard-setting body. The IAB recently published RFC 8890, which states that individual end-users should be prioritized when designing Internet protocols. It says that if there’s a conflict between the interests of end-users and the interest of service providers, corporations, or governments, IETF decisions should favor end users. One of the prime interests of end-users is the right to privacy, and the IAB published RFC 6973 to indicate how Internet protocols should take privacy into account.

Today’s technical blog posts are about improvements to the Internet designed to respect user privacy. Privacy is a complex topic that spans multiple disciplines, so it’s essential to clarify what we mean by “improving privacy.” We are specifically talking about changing the protocols that handle privacy-sensitive information exposed “on-the-wire” and modifying them so that this data is exposed to fewer parties. This data continues to exist. It’s just no longer available or visible to third parties without building a mechanism to collect it at a higher layer of the Internet stack, the application layer. These changes go beyond website encryption; they go deep into the design of the systems that are foundational to making the Internet what it is.

The toolbox: cryptography and secure proxies

Two tools for making sure data can be used without being seen are cryptography and secure proxies.

Helping build the next generation of privacy-preserving protocols

Cryptography allows information to be transformed into a format that a very limited number of people (those with the key) can understand. Some describe cryptography as a tool that transforms data security problems into key management problems. This is a humorous but fair description. Cryptography makes it easier to reason about privacy because only key holders can view data.

Another tool for protecting access to data is isolation/segmentation. By physically limiting which parties have access to information, you effectively build privacy walls. A popular architecture is to rely on policy-aware proxies to pass data from one place to another. Such proxies can be configured to strip sensitive data or block data transfers between parties according to what the privacy policy says.

Both these tools are useful individually, but they can be even more effective if combined. Onion routing (the cryptographic technique underlying Tor) is one example of how proxies and encryption can be used in tandem to enforce strong privacy. Broadly, if party A wants to send data to party B, they can encrypt the data with party B’s key and encrypt the metadata with a proxy’s key and send it to the proxy.

Platforms and services built on top of the Internet can build in consent systems, like privacy policies presented through user interfaces. The infrastructure of the Internet relies on layers of underlying protocols. Because these layers of the Internet are so far below where the user interacts with them, it’s almost impossible to build a concept of user consent. In order to respect users and protect them from privacy issues, the protocols that glue the Internet together should be designed with privacy enabled by default.

Data vs. metadata

The transition from a mostly unencrypted web to an encrypted web has done a lot for end-user privacy. For example, the “coffeeshop stalker” is no longer an issue for most sites. When accessing the majority of sites online, users are no longer broadcasting every aspect of their web browsing experience (search queries, browser versions, authentication cookies, etc.) over the Internet for any participant on the path to see. Suppose a site is configured correctly to use HTTPS. In that case, users can be confident their data is secure from onlookers and reaches only the intended party because their connections are both encrypted and authenticated.

However, HTTPS only protects the content of web requests. Even if you only browse sites over HTTPS, that doesn’t mean that your browsing patterns are private. This is because HTTPS fails to encrypt a critical aspect of the exchange: the metadata. When you make a phone call, the metadata is the phone number, not the call’s contents. Metadata is the data about the data.

To illustrate the difference and why it matters, here’s a diagram of what happens when you visit a website like an imageboard. Say you’re going to a specific page on that board (https://<imageboard>.com/room101/) that has specific embedded images hosted on <embarassing>.com.

Helping build the next generation of privacy-preserving protocols
Page load for an imageboard, returning an HTML page with an image from an embarassing site
Helping build the next generation of privacy-preserving protocols
Subresource fetch for the image from an embarassing site

The space inside the dotted line here represents the part of the Internet that your data needs to transit. They include your local area network or coffee shop, your ISP, an Internet transit provider, and it could be the network portion of the cloud provider that hosts the server. Users often don’t have a relationship with these entities or a contract to prevent these parties from doing anything with the user’s data. And even if those entities don’t look at the data, a well-placed observer intercepting Internet traffic could see anything sent unencrypted. It would be best if they just didn’t see it at all. In this example, the fact that the user visited <imageboard>.com can be seen by an observer, which is expected. However, though page content is encrypted, it’s possible to learn which specific page you’ve visited can be seen since <embarassing>.com is also visible.

It’s a general rule that if data is available to on-path parties on the Internet, some of these on-path parties will use this data. It’s also true that these on-path parties need some metadata in order to facilitate the transport of this data. This balance is explored in RFC 8558, which explains how protocols should be designed thoughtfully with respect to the balance between too much metadata (bad for privacy) and too little metadata (bad for operations).

In an ideal world, Internet protocols would be designed with the principle of least privilege. They would provide the minimum amount of information needed for the on-path parties (the pipes) to do the job of transporting the data to the right place and keep everything else confidential by default. Current protocols, including TLS 1.3 and QUIC, are important steps towards this ideal but fall short with respect to metadata privacy.

Knowing both who you are and what you do online can lead to profiling

Today’s announcements reflect two metadata protection levels: the first involves limiting the amount of metadata available to third-party observers (like ISPs). The second involves restricting the amount of metadata that users share with service providers themselves.

Hostnames are an example of metadata that needs to be protected from third-party observers, which DoH and ECH intend to do. However, it doesn’t make sense to hide the hostname from the site you’re visiting. It also doesn’t make sense to hide it from a directory service like DNS. A DNS server needs to know which hostname you’re resolving to resolve it for you!

A privacy issue arises when a service provider knows about both what sites you’re visiting and who you are. Individual websites do not have this dangerous combination of information (except in the case of third party cookies, which are going away soon in browsers), but DNS providers do. Thankfully, it’s not actually necessary for a DNS resolver to know *both* the hostname of the service you’re going to and which IP you’re coming from. Disentangling the two, which is the goal of ODoH, is good for privacy.

The Internet is part of ‘our’ Infrastructure

Roads should be well-paved, well lit, have accurate signage, and be optimally connected. They aren’t designed to stop a car based on who’s inside it. Nor should they be! Like transportation infrastructure, Internet infrastructure is responsible for getting data where it needs to go, not looking inside packets, and making judgments. But the Internet is made of computers and software, and software tends to be written to make decisions based on the data it has available to it.

Privacy-preserving protocols attempt to eliminate the temptation for infrastructure providers and others to peek inside and make decisions based on personal data. A non-privacy preserving protocol like HTTP keeps data and metadata, like passwords, IP addresses, and hostnames, as explicit parts of the data sent over the wire. The fact that they are explicit means that they are available to any observer to collect and act on. A protocol like HTTPS improves upon this by making some of the data (such as passwords and site content) invisible on the wire using encryption.

The three protocols we are exploring today extend this concept.

  • ECH takes most of the unencrypted metadata in TLS (including the hostname) and encrypts it with a key that was fetched ahead of time.
  • ODoH (a new variant of DoH co-designed by Apple, Cloudflare, and Fastly engineers) uses proxies and onion-like encryption to make the source of a DNS query invisible to the DNS resolver. This protects the user’s IP address when resolving hostnames.
  • OPAQUE uses a new cryptographic technique to keep passwords hidden even from the server. Utilizing a construction called an Oblivious Pseudo-Random Function (as seen in Privacy Pass), the server does not learn the password; it only learns whether or not the user knows the password.

By making sure Internet infrastructure acts more like physical infrastructure, user privacy is more easily protected. The Internet is more private if private data can only be collected where the user has a chance to consent to its collection.

Doing it together

As much as we’re excited about working on new ways to make the Internet more private, innovation at a global scale doesn’t happen in a vacuum. Each of these projects is the output of a collaborative group of individuals working out in the open in organizations like the IETF and the IRTF. Protocols must come about through a consensus process that involves all the parties that make up the interconnected set of systems that power the Internet. From browser builders to cryptographers, from DNS operators to website administrators, this is truly a global team effort.

We also recognize that sweeping technical changes to the Internet will inevitably also impact the technical community. Adopting these new protocols may have legal and policy implications. We are actively working with governments and civil society groups to help educate them about the impact of these potential changes.

We’re looking forward to sharing our work today and hope that more interested parties join in developing these protocols. The projects we are announcing today were designed by experts from academia, industry, and hobbyists together and were built by engineers from Cloudflare Research (including the work of interns, which we will highlight) with everyone’s support Cloudflare.

If you’re interested in this type of work, we’re hiring!

Good-bye ESNI, hello ECH!

Post Syndicated from Christopher Patton original https://blog.cloudflare.com/encrypted-client-hello/

Good-bye ESNI, hello ECH!

Good-bye ESNI, hello ECH!

Most communication on the modern Internet is encrypted to ensure that its content is intelligible only to the endpoints, i.e., client and server. Encryption, however, requires a key and so the endpoints must agree on an encryption key without revealing the key to would-be attackers. The most widely used cryptographic protocol for this task, called key exchange, is the Transport Layer Security (TLS) handshake.

In this post we’ll dive into Encrypted Client Hello (ECH), a new extension for TLS that promises to significantly enhance the privacy of this critical Internet protocol. Today, a number of privacy-sensitive parameters of the TLS connection are negotiated in the clear. This leaves a trove of metadata available to network observers, including the endpoints’ identities, how they use the connection, and so on.

ECH encrypts the full handshake so that this metadata is kept secret. Crucially, this closes a long-standing privacy leak by protecting the Server Name Indication (SNI) from eavesdroppers on the network. Encrypting the SNI secret is important because it is the clearest signal of which server a given client is communicating with. However, and perhaps more significantly, ECH also lays the groundwork for adding future security features and performance enhancements to TLS while minimizing their impact on the privacy of end users.

ECH is the product of close collaboration, facilitated by the IETF, between academics and the tech industry leaders, including Cloudflare, our friends at Fastly and Mozilla (both of whom are the affiliations of co-authors of the standard), and many others. This feature represents a significant upgrade to the TLS protocol, one that builds on bleeding edge technologies, like DNS-over-HTTPS, that are only now coming into their own. As such, the protocol is not yet ready for Internet-scale deployment. This article is intended as a sign post on the road to full handshake encryption.

Background

The story of TLS is the story of the Internet. As our reliance on the Internet has grown, so the protocol has evolved to address ever-changing operational requirements, use cases, and threat models. The client and server don’t just exchange a key: they negotiate a wide variety of features and parameters: the exact method of key exchange; the encryption algorithm; who is authenticated and how; which application layer protocol to use after the handshake; and much, much more. All of these parameters impact the security properties of the communication channel in one way or another.

SNI is a prime example of a parameter that impacts the channel’s security. The SNI extension is used by the client to indicate to the server the website it wants to reach. This is essential for the modern Internet, as it’s common nowadays for many origin servers to sit behind a single TLS operator. In this setting, the operator uses the SNI to determine who will authenticate the connection: without it, there would be no way of knowing which TLS certificate to present to the client. The problem is that SNI leaks to the network the identity of the origin server the client wants to connect to, potentially allowing eavesdroppers to infer a lot of information about their communication. (Of course, there are other ways for a network observer to identify the origin — the origin’s IP address, for example. But co-locating with other origins on the same IP address makes it much harder to use this metric to determine the origin than it is to simply inspect the SNI.)

Although protecting SNI is the impetus for ECH, it is by no means the only privacy-sensitive handshake parameter that the client and server negotiate. Another is the ALPN extension, which is used to decide which application-layer protocol to use once the TLS connection is established. The client sends the list of applications it supports — whether it’s HTTPS, email, instant messaging, or the myriad other applications that use TLS for transport security — and the server selects one from this list, and sends its selection to the client. By doing so, the client and server leak to the network a clear signal of their capabilities and what the connection might be used for.

Some features are so privacy-sensitive that their inclusion in the handshake is a non-starter. One idea that has been floated is to replace the key exchange at the heart of TLS with password-authenticated key-exchange (PAKE). This would allow password-based authentication to be used alongside (or in lieu of) certificate-based authentication, making TLS more robust and suitable for a wider range of applications. The privacy issue here is analogous to SNI: servers typically associate a unique identifier to each client (e.g., a username or email address) that is used to retrieve the client’s credentials; and the client must, somehow, convey this identity to the server during the course of the handshake. If sent in the clear, then this personally identifiable information would be easily accessible to any network observer.

A necessary ingredient for addressing all of these privacy leaks is handshake encryption, i.e., the encryption of handshake messages in addition to application data. Sounds simple enough, but this solution presents another problem: how do the client and server pick an encryption key if, after all, the handshake is itself a means of exchanging a key? Some parameters must be sent in the clear, of course, so the goal of ECH is to encrypt all handshake parameters except those that are essential to completing the key exchange.

In order to understand ECH and the design decisions underpinning it, it helps to understand a little bit about the history of handshake encryption in TLS.

Handshake encryption in TLS

TLS had no handshake encryption at all prior to the latest version, TLS 1.3. In the wake of the Snowden revelations in 2013, the IETF community began to consider ways of countering the threat that mass surveillance posed to the open Internet. When the process of standardizing TLS 1.3 began in 2014, one of its design goals was to encrypt as much of the handshake as possible. Unfortunately, the final standard falls short of full handshake encryption, and several parameters, including SNI, are still sent in the clear. Let’s take a closer look to see why.

The TLS 1.3 protocol flow is illustrated in Figure 1. Handshake encryption begins as soon as the client and server compute a fresh shared secret. To do this, the client sends a key share in its ClientHello message, and the server responds in its ServerHello with its own key share. Having exchanged these shares, the client and server can derive a shared secret. Each subsequent handshake message is encrypted using the handshake traffic key derived from the shared secret. Application data is encrypted using a different key, called the application traffic key, which is also derived from the shared secret. These derived keys have different security properties: to emphasize this, they are illustrated with different colors.

The first handshake message that is encrypted is the server’s EncryptedExtensions. The purpose of this message is to protect the server’s sensitive handshake parameters, including the server’s ALPN extension, which contains the application selected from the client’s ALPN list. Key-exchange parameters are sent unencrypted in the ClientHello and ServerHello.

Good-bye ESNI, hello ECH!
Figure 1: The TLS 1.3 handshake.

All of the client’s handshake parameters, sensitive or not, are sent in the ClientHello. Looking at Figure 1, you might be able to think of ways of reworking the handshake so that some of them can be encrypted, perhaps at the cost of additional latency (i.e., more round trips over the network). However, extensions like SNI create a kind of “chicken-and-egg” problem.

The client doesn’t encrypt anything until it has verified the server’s identity (this is the job of the Certificate and CertificateVerify messages) and the server has confirmed that it knows the shared secret (the job of the Finished message). These measures ensure the key exchange is authenticated, thereby preventing monster-in-the-middle (MITM) attacks in which the adversary impersonates the server to the client in a way that allows it to decrypt messages sent by the client.  Because SNI is needed by the server to select the certificate, it needs to be transmitted before the key exchange is authenticated.

In general, ensuring confidentiality of handshake parameters used for authentication is only possible if the client and server already share an encryption key. But where might this key come from?

Full handshake encryption in the early days of TLS 1.3. Interestingly, full handshake encryption was once proposed as a core feature of TLS 1.3. In early versions of the protocol (draft-10, circa 2015), the server would offer the client a long-lived public key during the handshake, which the client would use for encryption in subsequent handshakes. (This design came from a protocol called OPTLS, which in turn was borrowed from the original QUIC proposal.) Called “0-RTT”, the primary purpose of this mode was to allow the client to begin sending application data prior to completing a handshake. In addition, it would have allowed the client to encrypt its first flight of handshake messages following the ClientHello, including its own EncryptedExtensions, which might have been used to protect the client’s sensitive handshake parameters.

Ultimately this feature was not included in the final standard (RFC 8446, published in 2018), mainly because its usefulness was outweighed by its added complexity. In particular, it does nothing to protect the initial handshake in which the client learns the server’s public key. Parameters that are required for server authentication of the initial handshake, like SNI, would still be transmitted in the clear.

Nevertheless, this scheme is notable as the forerunner of other handshake encryption mechanisms, like ECH, that use public key encryption to protect sensitive ClientHello parameters. The main problem these mechanisms must solve is key distribution.

Before ECH there was (and is!) ESNI

The immediate predecessor of ECH was the Encrypted SNI (ESNI) extension. As its name implies, the goal of ESNI was to provide confidentiality of the SNI. To do so, the client would encrypt its SNI extension under the server’s public key and send the ciphertext to the server. The server would attempt to decrypt the ciphertext using the secret key corresponding to its public key. If decryption were to succeed, then the server would proceed with the connection using the decrypted SNI. Otherwise, it would simply abort the handshake. The high-level flow of this simple protocol is illustrated in Figure 2.

Good-bye ESNI, hello ECH!
Figure 2: The TLS 1.3 handshake with the ESNI extension. It is identical to the TLS 1.3 handshake, except the SNI extension has been replaced with ESNI.

For key distribution, ESNI relied on another critical protocol: Domain Name Service (DNS). In order to use ESNI to connect to a website, the client would piggy-back on its standard A/AAAA queries a request for a TXT record with the ESNI public key. For example, to get the key for crypto.dance, the client would request the TXT record of _esni.crypto.dance:

$ dig _esni.crypto.dance TXT +short
"/wGuNThxACQAHQAgXzyda0XSJRQWzDG7lk/r01r1ZQy+MdNxKg/mAqSnt0EAAhMBAQQAAAAAX67XsAAAAABftsCwAAA="

The base64-encoded blob contains an ESNI public key and related parameters such as the encryption algorithm.

But what’s the point of encrypting SNI if we’re just going to leak the server name to network observers via a plaintext DNS query? Deploying ESNI this way became feasible with the introduction of DNS-over-HTTPS (DoH), which enables encryption of DNS queries to resolvers that provide the DoH service (1.1.1.1 is an example of such a service.). Another crucial feature of DoH is that it provides an authenticated channel for transmitting the ESNI public key from the DoH server to the client. This prevents cache-poisoning attacks that originate from the client’s local network: in the absence of DoH, a local attacker could prevent the client from offering the ESNI extension by returning an empty TXT record, or coerce the client into using ESNI with a key it controls.

While ESNI took a significant step forward, it falls short of our goal of achieving full handshake encryption. Apart from being incomplete — it only protects SNI — it is vulnerable to a handful of sophisticated attacks, which, while hard to pull off, point to theoretical weaknesses in the protocol’s design that need to be addressed.

ESNI was deployed by Cloudflare and enabled by Firefox, on an opt-in basis, in 2018, an  experience that laid bare some of the challenges with relying on DNS for key distribution. Cloudflare rotates its ESNI key every hour in order to minimize the collateral damage in case a key ever gets compromised. DNS artifacts are sometimes cached for much longer, the result of which is that there is a decent chance of a client having a stale public key. While Cloudflare’s ESNI service tolerates this to a degree, every key must eventually expire. The question that the ESNI protocol left open is how the client should proceed if decryption fails and it can’t access the current public key, via DNS or otherwise.

Another problem with relying on DNS for key distribution is that several endpoints might be authoritative for the same origin server, but have different capabilities. For example, a request for the A record of “example.com” might return one of two different IP addresses, each operated by a different CDN. The TXT record for “_esni.example.com” would contain the public key for one of these CDNs, but certainly not both. The DNS protocol does not provide a way of atomically tying together resource records that correspond to the same endpoint. In particular, it’s possible for a client to inadvertently offer the ESNI extension to an endpoint that doesn’t support it, causing the handshake to fail. Fixing this problem requires changes to the DNS protocol. (More on this below.)

The future of ESNI. In the next section, we’ll describe the ECH specification and how it addresses the shortcomings of ESNI. Despite its limitations, however, the practical privacy benefit that ESNI provides is significant. Cloudflare intends to continue its support for ESNI until ECH is production-ready.

The ins and outs of ECH

The goal of ECH is to encrypt the entire ClientHello, thereby closing the gap left in TLS 1.3 and ESNI by protecting all privacy-sensitive handshake-parameters. Similar to ESNI, the protocol uses a public key, distributed via DNS and obtained using DoH, for encryption during the client’s first flight. But ECH has improvements to key distribution that make the protocol more robust to DNS cache inconsistencies. Whereas the ESNI server aborts the connection if decryption fails, the ECH server attempts to complete the handshake and supply the client with a public key it can use to retry the connection.

But how can the server complete the handshake if it’s unable to decrypt the ClientHello? As illustrated in Figure 3, the ECH protocol actually involves two ClientHello messages: the ClientHelloOuter, which is sent in the clear, as usual; and the ClientHelloInner, which is encrypted and sent as an extension of the ClientHelloOuter. The server completes the handshake with just one of these ClientHellos: if decryption succeeds, then it proceeds with the ClientHelloInner; otherwise, it proceeds with the ClientHelloOuter.

Good-bye ESNI, hello ECH!
Figure 3: The TLS 1.3 handshake with the ECH extension.

The ClientHelloInner is composed of the handshake parameters the client wants to use for the connection. This includes sensitive values, like the SNI of the origin server it wants to reach (called the backend server in ECH parlance), the ALPN list, and so on. The ClientHelloOuter, while also a fully-fledged ClientHello message, is not used for the intended connection. Instead, the handshake is completed by the ECH service provider itself (called the client-facing server), signaling to the client that its intended destination couldn’t be reached due to decryption failure. In this case, the service provider also sends along the correct ECH public key with which the client can retry handshake, thereby “correcting” the client’s configuration. (This mechanism is similar to how the server distributed its public key for 0-RTT mode in the early days of TLS 1.3.)

At a minimum, both ClientHellos must contain the handshake parameters that are required for a server-authenticated key-exchange. In particular, while the ClientHelloInner contains the real SNI, the ClientHelloOuter also contains an SNI value, which the client expects to verify in case of ECH decryption failure (i.e., the client-facing server). If the connection is established using the ClientHelloOuter, then the client is expected to immediately abort the connection and retry the handshake with the public key provided by the server. It’s not necessary that the client specify an ALPN list in the ClientHelloOuter, nor any other extension used to guide post-handshake behavior. All of these parameters are encapsulated by the encrypted ClientHelloInner.

This design resolves — quite elegantly, I think — most of the challenges for securely deploying handshake encryption encountered by earlier mechanisms. Importantly, the design of ECH was not conceived in a vacuum. The protocol reflects the diverse perspectives of the IETF community, and its development dovetails with other IETF standards that are crucial to the success of ECH.

The first is an important new DNS feature known as the HTTPS resource record type. At a high level, this record type is intended to allow multiple HTTPS endpoints that are authoritative for the same domain name to advertise different capabilities for TLS. This makes it possible to rely on DNS for key distribution, resolving one of the deployment challenges uncovered by the initial ESNI deployment. For a deep dive into this new record type and what it means for the Internet more broadly, check out Alessandro Ghedini’s recent blog post on the subject.

The second is the CFRG’s Hybrid Public Key Encryption (HPKE) standard, which specifies an extensible framework for building public key encryption schemes suitable for a wide variety of applications. In particular, ECH delegates all of the details of its handshake encryption mechanism to HPKE, resulting in a much simpler and easier-to-analyze specification. (Incidentally, HPKE is also one of the main ingredients of Oblivious DNS-over-HTTPS.

The road ahead

The current ECH specification is the culmination of a multi-year collaboration. At this point, the overall design of the protocol is fairly stable. In fact, the next draft of the specification will be the first to be targeted for interop testing among implementations. Still, there remain a number of details that need to be sorted out. Let’s end this post with a brief overview of the road ahead.

Resistance to traffic analysis

Ultimately, the goal of ECH is to ensure that TLS connections made to different origin servers behind the same ECH service provider are indistinguishable from one another. In other words, when you connect to an origin behind, say, Cloudflare, no one on the network between you and Cloudflare should be able to discern which origin you reached, or which privacy-sensitive handshake-parameters you and the origin negotiated. Apart from an immediate privacy boost, this property, if achieved, paves the way for the deployment of new features for TLS without compromising privacy.

Encrypting the ClientHello is an important step towards achieving this goal, but we need to do a bit more. An important attack vector we haven’t discussed yet is traffic analysis. This refers to the collection and analysis of properties of the communication channel that betray part of the ciphertext’s contents, but without cracking the underlying encryption scheme. For example, the length of the encrypted ClientHello might leak enough information about the SNI for the adversary to make an educated guess as to its value (this risk is especially high for domain names that are either particularly short or particularly long). It is therefore crucial that the length of each ciphertext is independent of the values of privacy-sensitive parameters. The current ECH specification provides some mitigations, but their coverage is incomplete. Thus, improving ECH’s resistance to traffic analysis is an important direction for future work.

The spectre of ossification

An important open question for ECH is the impact it will have on network operations.

One of the lessons learned from the deployment of TLS 1.3 is that upgrading a core Internet protocol can trigger unexpected network behavior. Cloudflare was one of the first major TLS operators to deploy TLS 1.3 at scale; when browsers like Firefox and Chrome began to enable it on an experimental basis, they observed a significantly higher rate of connection failures compared to TLS 1.2. The root cause of these failures was network ossification, i.e., the tendency of middleboxes — network appliances between clients and servers that monitor and sometimes intercept traffic — to write software that expects traffic to look and behave a certain way. Changing the protocol before middleboxes had the chance to update their software led to middleboxes trying to parse packets they didn’t recognize, triggering software bugs that, in some instances, caused connections to be dropped completely.

This problem was so widespread that, instead of waiting for network operators to update their software, the design of TLS 1.3 was altered in order to mitigate the impact of network ossification. The ingenious solution was to make TLS 1.3 “look like” another protocol that middleboxes are known to tolerate. Specifically, the wire format and even the contents of handshake messages were made to resemble TLS 1.2. These two protocols aren’t identical, of course — a curious network observer can still distinguish between them — but they look and behave similar enough to ensure that the majority of existing middleboxes don’t treat them differently. Empirically, it was found that this strategy significantly reduced the connection failure rate enough to make deployment of TLS 1.3 viable.

Once again, ECH represents a significant upgrade for TLS for which the spectre of network ossification looms large. The ClientHello contains parameters, like SNI, that have existed in the handshake for a long time, and we don’t yet know what the impact will be of encrypting them. In anticipation of the deployment issues ossification might cause, the ECH protocol has been designed to look as much like a standard TLS 1.3 handshake as possible. The most notable difference is the ECH extension itself: if middleboxes ignore it — as they should, if they are compliant with the TLS 1.3 standard — then the rest of the handshake will look and behave very much as usual.

It remains to be seen whether this strategy will be enough to ensure the wide-scale deployment of ECH. If so, it is notable that this new feature will help to mitigate the impact of future TLS upgrades on network operations. Encrypting the full handshake reduces the risk of ossification since it means that there are less visible protocol features for software to ossify on. We believe this will be good for the health of the Internet overall.

Conclusion

The old TLS handshake is (unintentionally) leaky. Operational requirements of both the client and server have led to privacy-sensitive parameters, like SNI, being negotiated completely in the clear and available to network observers. The ECH extension aims to close this gap by enabling encryption of the full handshake. This represents a significant upgrade to TLS, one that will help preserve end-user privacy as the protocol continues to evolve.

The ECH standard is a work-in-progress. As this work continues, Cloudflare is committed to doing its part to ensure this important upgrade for TLS reaches Internet-scale deployment.

Improving the Resiliency of Our Infrastructure DNS Zone

Post Syndicated from Ryan Timken original https://blog.cloudflare.com/improving-the-resiliency-of-our-infrastructure-dns-zone/

Improving the Resiliency of Our Infrastructure DNS Zone

In this blog post we will discuss how we made our infrastructure DNS zone more reliable by using multiple primary nameservers to leverage our own DNS product running on our edge as well as a third-party DNS provider.

Improving the Resiliency of Our Infrastructure DNS Zone

Authoritative Nameservers

You can think of an authoritative nameserver as the source of truth for the records of a given DNS zone. When a recursive resolver wants to look up a record, it will eventually need to talk to the authoritative nameserver(s) for the zone in question. If you’d like to read more on the topic, our learning center provides some additional information.

Here’s an example of our authoritative nameservers (replacing our actual domain with example.com):

~$ dig NS example.com +short
ns1.example.com.
ns2.example.com.
ns3.example.com.

As you can see, there are three nameservers listed. You’ll notice that the nameservers happen to reside in the same zone, but they don’t have to. Those three nameservers point to six anycasted IP addresses (3 x IPv4, 3 x IPv6) announced from our edge, comprising data centers from 200+ cities around the world.

The Problem

We store the hostnames for all of our machines, both the ones at the edge and the ones at core data centers, in a DNS zone we refer to as our infrastructure zone. By using our own DNS product to run our infrastructure zone, we get the reliability and performance of our Global Anycast Network. However, what would happen if those nameservers became unavailable due to an issue on our edge? Eventually DNS lookups would fail, as resolvers would not be able to connect to the only authoritative nameservers configured on the zone: the ones hosted on our edge. This is the main problem we set out to solve.

When an incident occurs, our engineering teams are busy investigating, debugging, and fixing the issue at hand. If they cannot resolve the hostnames of the infrastructure they need to use, it will lead to confusion and delays in resolving the incident.

Imagine you are part of a team tackling an incident. As you quickly begin to investigate, you try to connect to a specific machine to gather some debugging information, but you get a DNS resolution error. What now? You know you are using the right hostname. Is this a result of the incident at hand, some side effect, or is this completely unrelated? You need to keep investigating, so you try another way. Maybe you have memorized the IP address of the machine and you can connect. More likely, you ask another resolver which still has the answer in its cache and resolve the IP that way. One thing is certain, the extra “mini-debug” step costs you precious time and detracts from debugging the real root cause.

Unavailable nameservers and the problems they cause are not unique to Cloudflare, and are not specific to our use case. Even if we hosted our authoritative nameservers with a single external provider, they could also experience their own issues causing the nameservers to fail.

Within the DNS community it is always understood that the more diversity, the better. Some common methods include diversifying network/routing configurations and software choices between a standard primary/secondary setup. However, no matter how resilient the network or software stack is, a single authority is still responsible for the zone. Luckily, there are ways to solve this problem!

Our Solution: Multiple Primary Nameservers

Our solution was to set up our infrastructure zone with multiple primary nameservers. This type of setup is referred to as split authority, multi-primary, or primary/primary. This way, instead of using just one provider for our authoritative nameservers, we use two.

We added three nameservers from our additional provider to our zone at our registrar. Using multiple primaries allows changes to our zone at each provider, to be controlled separately and completely by us. Instead of using zone transfers to keep our zone in sync, we use OctoDNS to independently and simultaneously manage the zone at both providers. We’ll talk more about our use of OctoDNS in a bit.

This setup is similar to using a primary and secondary server. The main difference is that the nameservers operate independently from one another, and do not use the usual DNS AXFR/IXFR method to keep the zone up to date. If you’d like to learn more about this type of solution, here is a great blog post by Dina Kozlov, our Product Manager for DNS, about Secondary DNS.

The nameservers in our zone after adding an additional provider would look something like:

~$ dig NS example.com +short
ns1.example.com.
ns2.example.com.
ns3.example.com.
ns1.additional-provider.net.
ns2.additional-provider.net.
ns3.additional-provider.net.

Predictable Query Routing

Currently, we cannot coordinate which provider should be used when querying a record within the zone. Recursive resolvers have different methods of choosing which nameserver should be used when presented with multiple NS records. A popular choice is to use the server with the lowest RTT (round trip time). A great blog post from APNIC on recursive resolver authoritative nameserver selection can be consulted for a detailed explanation.

We needed to enforce specific routing decisions between the two authorities. Our requirement was for a weighted routing policy preferring Cloudflare based on availability checks for requests originating from our infrastructure servers. This reduces RTT, since the queries originate from the same servers hosting our nameservers in the same data center, and do not need to travel further to the external provider when all is well.

Our edge infrastructure servers are configured to use DNSDist as their primary system resolver. DNSDist load balances queries using multiple upstream recursive DNS servers (Unbound) in each data center to provide DNS resolution. This setup is used for internal DNS resolution only, and as such, it is independent from our authoritative DNS and public resolver 1.1.1.1 service.

Additionally, we modified our DNSDist configuration by adding two server pools for our infrastructure zone authoritative servers, and set up active checks with weighted routes to always use the Cloudflare pool when available. With the active checking and weighted routing in place, our queries are always routed internally when available. In the event of a Cloudflare authoritative DNS failure, DNSDist will route all requests for the zone to our external provider.

Maintaining Our Infrastructure DNS Zone

In addition to a more reliable infrastructure zone, we also wanted to further automate the provisioning of DNS records. We had been relying on an older manual tool that became quite slow handling our growing number of DNS records. In a nutshell, this tool queried our provisioning database to gather all of the machine names, and then created, deleted, or updated the required DNS records using the Cloudflare API. We had to run this tool whenever we provisioned or decommissioned  machines that serve our customers’ requests.

Part of the procedure for running this tool was to first run it in dry-run mode, and then paste the results for review in our team chat room. This review step ensured the changes the tool found were expected and safe to run, and it is something we want to keep as part of our plan to automate the process.

Here’s what the old tool looked like:

$ cf-provision update-example -l ${USER}@cloudflare.com -u
Running update-example.sh -t /var/tmp/cf-provision -l [email protected] -u
* Loading up possible records …

deleting: {"name":"node44.example.com","ttl":300, "type":"A","content":"192.0.2.2","proxied":false}
 rec_id:abcdefg
updating: { "name":"build-ts.example.com", "ttl":120, "type":"TXT", "content":"1596722274"} rec_id:123456 previous content: 1596712896

Result    : SUCCESS
Task      : BUILD
Dest dir  :
Started   : Thu, 06 Aug 2020 13:57:40 +0000
Finished  : Thu, 06 Aug 2020 13:58:16 +0000
Elapsed   : 36 secs
Records Changed In API  : 1 record(s) changed
Records Deleted In API  : 1 record(s) deleted

Zone Management with OctoDNS

As we mentioned earlier, when using a multi-primary setup for our infrastructure zone, we are required to maintain the zone data outside of traditional DNS replication. Before rolling out our own solution we wanted to see what was out there, and we stumbled upon OctoDNS, a project from GitHub. OctoDNS provides a set of tools that make it easy to manage your DNS records across multiple providers.

OctoDNS uses a pluggable architecture. This means we can rewrite our old script as a plugin (which is called a ‘source’), and use other existing provider plugins (called ‘providers’) to interact with both the Cloudflare and our other provider’s APIs. Should we decide to sync to more external DNS providers in the future, we would just need to add them to our OctoDNS configuration. This allows our records to stay up to date, and, as an added benefit, records OctoDNS doesn’t know about will be removed during operation (for example, changes made outside of OctoDNS). This ensures that manual changes do not diverge from what is present in the provisioning database.

Our goal was to keep the zone management workflow as simple as possible. At Cloudflare, we use TeamCity for CI/CD, and we realized that it could not only facilitate code builds of our OctoDNS implementation, but it could also be used to deploy the zone.

There are a number of benefits to using our existing TeamCity infrastructure.

  • Our DevTools team can reliably manage it as a service
  • It has granular permissions, which allow us to control who can deploy the changes
  • It provides storage of the logs for auditing
  • It allows easy rollback of zone revisions
  • It integrates easily with our chat ops workflow via Google Chat webhooks

Below is a high-level overview of how we manage our zone through OctoDNS.

Improving the Resiliency of Our Infrastructure DNS Zone

There are three steps in the workflow:

  1. Build – evaluate the sources and build the complete zone.
  2. Compare – parse the built zone and compare to the records on the providers. Changes found are sent to SRE for evaluation.
  3. Deploy – deploy the changes to the providers.

Step 1: Build

Sources
OctoDNS consumes data from our internal systems via custom source modules. So far, we have built two source modules for querying data:

  1. ProvAPI queries our internal provisioning API providing data center and node configuration.
  2. NetBox queries our internal NetBox deployment providing hardware metadata.

Additionally, static records are defined in a YAML file. These sources form the infrastructure zone source of truth.

Providers
The build process establishes a staging area per execution. The YAML provider builds a static YAML file containing the complete zone. The hosts file provider is used to generate an emergency hosts file for the zone. Contents from the staging area are revision controlled in CI, which allows us to easily deploy previous versions if required.

Improving the Resiliency of Our Infrastructure DNS Zone

Step 2: Compare

Following a successful zone build CI executes the compare build. During this phase, OctoDNS performs an octodns-sync in dry-run mode. The compare build consumes the staging YAML zone configuration, and compares the records against the records on our authoritative providers via their respective APIs. Any identified zone changes are parsed, and a summary line is generated and sent to the SRE Chat room for approval. SRE are automatically linked to the changes and associated CI build for deployment.

Improving the Resiliency of Our Infrastructure DNS Zone

Step 3: Deploy

The deployment CI build is access-controlled and scoped to the SRE group using our single sign-on provider. Following successful approval and peer review, an SRE can deploy the changes by executing the deploy build.

The deployment process is simple; consume the YAML zone data from our staging area and deploy the changes to the zone’s authoritative providers via ‘octodns-sync –doit’. The hosts file generated for the zone is packaged and deployed, to be used in the event of complete DNS failure.

Here’s an example of how the message looks. When the deploy is finished, the thread is updated to indicate which user initiated it.

Improving the Resiliency of Our Infrastructure DNS Zone

Future Improvements

In the future, we would like to automate the process further by reducing the need for approvals. There is usually no harm in adding new records, and it is done very often during the provisioning of new machines. Removing the need to approve those records would take out another step in the provisioning process, which is something we are always looking to optimize.

Introducing OctoDNS and an additional provider allowed us to make our infrastructure DNS zone more reliable and easier to manage. We can now easily include new sources of record data, with OctoDNS allowing us to focus more on new and exciting projects and less on managing DNS records.

SAD DNS Explained

Post Syndicated from Marek Vavruša original https://blog.cloudflare.com/sad-dns-explained/

SAD DNS Explained

This week, at the ACM CCS 2020 conference, researchers from UC Riverside and Tsinghua University announced a new attack against the Domain Name System (DNS) called SAD DNS (Side channel AttackeD DNS). This attack leverages recent features of the networking stack in modern operating systems (like Linux) to allow attackers to revive a classic attack category: DNS cache poisoning. As part of a coordinated disclosure effort earlier this year, the researchers contacted Cloudflare and other major DNS providers and we are happy to announce that 1.1.1.1 Public Resolver is no longer vulnerable to this attack.

In this post, we’ll explain what the vulnerability was, how it relates to previous attacks of this sort, what mitigation measures we have taken to protect our users, and future directions the industry should consider to prevent this class of attacks from being a problem in the future.

DNS Basics

The Domain Name System (DNS) is what allows users of the Internet to get around without memorizing long sequences of numbers. What’s often called the “phonebook of the Internet” is more like a helpful system of translators that take natural language domain names (like blog.cloudflare.com or gov.uk) and translate them into the native language of the Internet: IP addresses (like 192.0.2.254 or [2001:db8::cf]). This translation happens behind the scenes so that users only need to remember hostnames and don’t have to get bogged down with remembering IP addresses.

DNS is both a system and a protocol. It refers to the hierarchical system of computers that manage the data related to naming on a network and it refers to the language these computers use to speak to each other to communicate answers about naming. The DNS protocol consists of pairs of messages that correspond to questions and responses. Each DNS question (query) and answer (reply) follows a standard format and contains a set of parameters that contain relevant information such as the name of interest (such as blog.cloudflare.com) and the type of response record desired (such as A for IPv4 or AAAA for IPv6).

The DNS Protocol and Spoofing

These DNS messages are exchanged over a network between machines using a transport protocol. Originally, DNS used UDP, a simple stateless protocol in which messages are endowed with a set of metadata indicating a source port and a destination port. More recently, DNS has adapted to use more complex transport protocols such as TCP and even advanced protocols like TLS or HTTPS, which incorporate encryption and strong authentication into the mix (see Peter Wu’s blog post about DNS protocol encryption).

Still, the most common transport protocol for message exchange is UDP, which has the advantages of being fast, ubiquitous and requiring no setup. Because UDP is stateless, the pairing of a response to an outstanding query is based on two main factors: the source address and port pair, and information in the DNS message. Given that UDP is both stateless and unauthenticated, anyone, and not just the recipient, can send a response with a forged source address and port, which opens up a range of potential problems.

SAD DNS Explained
The blue portions contribute randomness

Since the transport layer is inherently unreliable and untrusted, the DNS protocol was designed with additional mechanisms to protect against forged responses. The first two bytes in the message form a message or transaction ID that must be the same in the query and response. When a DNS client sends a query, it will set the ID to a random value and expect the value in the response to match. This unpredictability introduces entropy into the protocol, which makes it less likely that a malicious party will be able to construct a valid DNS reply without first seeing the query. There are other potential variables to account for, like the DNS query name and query type are also used to pair query and response, but these are trivial to guess and don’t introduce an additional entropy.

Those paying close attention to the diagram may notice that the amount of entropy introduced by this measure is only around 16 bits, which means that there are fewer than a hundred thousand possibilities to go through to find the matching reply to a given query. More on this later.

The DNS Ecosystem

DNS servers fall into one of a few main categories: recursive resolvers (like 1.1.1.1 or 8.8.8.8), nameservers (like the DNS root servers or Cloudflare Authoritative DNS). There are also elements of the ecosystem that act as “forwarders” such as dnsmasq. In a typical DNS lookup, these DNS servers work together to complete the task of delivering the IP address for a specified domain to the client (the client is usually a stub resolver – a simple resolver built into an operating system). For more detailed information about the DNS ecosystem, take a look at our learning site. The SAD DNS attack targets the communication between recursive resolvers and nameservers.

Each of the participants in DNS (client, resolver, nameserver) uses the DNS protocol to communicate with each other. Most of the latest innovations in DNS revolve around upgrading the transport between users and recursive resolvers to use encryption. Upgrading the transport protocol between resolvers and authoritative servers is a bit more complicated as it requires a new discovery mechanism to instruct the resolver when to (and when not to use) a more secure channel.  Aside from a few examples like our work with Facebook to encrypt recursive-to-authoritative traffic with DNS-over-TLS, most of these exchanges still happen over UDP. This is the core issue that enables this new attack on DNS, and one that we’ve seen before.

Kaminsky’s Attack

Prior to 2008, recursive resolvers typically used a single open port (usually port 53) to send and receive messages to authoritative nameservers. This made guessing the source port trivial, so the only variable an attacker needed to guess to forge a response to a query was the 16-bit message ID. The attack Kaminsky described was relatively simple: whenever a recursive resolver queried the authoritative name server for a given domain, an attacker would flood the resolver with DNS responses for some or all of the 65 thousand or so possible message IDs. If the malicious answer with the right message ID arrived before the response from the authoritative server, then the DNS cache would be effectively poisoned, returning the attacker’s chosen answer instead of the real one for as long as the DNS response was valid (called the TTL, or time-to-live).

For popular domains, resolvers contact authoritative servers once per TTL (which can be as short as 5 minutes), so there are plenty of opportunities to mount this attack. Forwarders that cache DNS responses are also vulnerable to this type of attack.

SAD DNS Explained

In response to this attack, DNS resolvers started doing source port randomization and careful checking of the security ranking of cached data. To poison these updated resolvers, forged responses would not only need to guess the message ID, but they would also have to guess the source port, bringing the number of guesses from the tens of thousands to over a trillion. This made the attack effectively infeasible. Furthermore, the IETF published RFC 5452 on how to harden DNS from guessing attacks.

It should be noted that this attack did not work for DNSSEC-signed domains since their answers are digitally signed. However, even now in 2020, DNSSEC is far from universal.

Defeating Source Port Randomization with Fragmentation

Another way to avoid having to guess the source port number and message ID is to split the DNS response in two. As is often the case in computer security, old attacks become new again when attackers discover new capabilities. In 2012, researchers Amir Herzberg and Haya Schulman from Bar Ilan University discovered that it was possible for a remote attacker to defeat the protections provided by source port randomization. This new attack leveraged another feature of UDP: fragmentation. For a primer on the topic of UDP fragmentation, check out our previous blog post on the subject by Marek Majkowski.

The key to this attack is the fact that all the randomness that needs to be guessed in a DNS poisoning attack is concentrated at the beginning of the DNS message (UDP header and DNS header).If the UDP response packet (sometimes called a datagram) is split into two fragments, the first half containing the message ID and source port and the second containing part of the DNS response, then all an attacker needs to do is forge the second fragment and make sure that the fake second fragment arrives at the resolver before the true second fragment does. When a datagram is fragmented, each fragment is assigned a 16-bit IDs (called IP-ID), which is used to reassemble it at the other end of the connection. Since the second fragment only has the IP-ID as entropy (again, this is a familiar refrain in this area), this attack is feasible with a relatively small number of forged packets. The downside of this attack is the precondition that the response must be fragmented in the first place, and the fragment must be carefully altered to pass the original section counts and UDP checksum.

SAD DNS Explained

Also discussed in the original and follow-up papers is a method of forcing two remote servers to send packets between each other which are fragmented at an attacker-controlled point, making this attack much more feasible. The details are in the paper, but it boils down to the fact that the control mechanism for describing the maximum transmissible unit (MTU) between two servers — which determines at which point packets are fragmented — can be set via a forged UDP packet.

SAD DNS Explained

We explored this risk in a previous blog post in the context of certificate issuance last year when we introduced our multi-path DCV service, which mitigates this risk in the context of certificate issuance by making DNS queries from multiple vantage points. Nevertheless, fragmentation-based attacks are proving less and less effective as DNS providers move to eliminate support for fragmented DNS packets (one of the major goals of DNS Flag Day 2020).

Defeating Source Port Randomization via ICMP error messages

Another way to defeat the source port randomization is to use some measurable property of the server that makes the source port easier to guess. If the attacker could ask the server which port number is being used for a pending query, that would make the construction of a spoofed packet much easier. No such thing exists, but it turns out there is something close enough – the attacker can discover which ports are surely closed (and thus avoid having to send traffic). One such mechanism is the ICMP “port unreachable” message.

Let’s say the target receives a UDP datagram destined for its IP and some port, the datagram either ends up either being accepted and silently discarded by the application, or rejected because the port is closed. If the port is closed, or more importantly, closed to the IP address that the UDP datagram was sent from, the target will send back an ICMP message notifying the attacker that the port is closed. This is handy to know since the attacker now doesn’t have to bother trying to guess the pending message ID on this port and move to other ports. A single scan of the server effectively reduces the search space of valid UDP responses from 232 (over a trillion) to 217 (around a hundred thousand), at least in theory.

This trick doesn’t always work. Many resolvers use “connected” UDP sockets instead of “open” UDP sockets to exchange messages between the resolver and nameserver. Connected sockets are tied to the peer address and port on the OS layer, which makes it impossible for an attacker to guess which “connected” UDP sockets are established between the target and the victim, and since the attacker isn’t the victim, it can’t directly observe the outcome of the probe.

To overcome this, the researchers found a very clever trick: they leverage ICMP rate limits as a side channel to reveal whether a given port is open or not. ICMP rate limiting was introduced (somewhat ironically, given this attack) as a security feature to prevent a server from being used as an unwitting participant in a reflection attack. In broad terms, it is used to limit how many ICMP responses a server will send out in a given time period. Say an attacker wanted to scan 10,000 ports and sent a burst of 10,000 UDP packets to a server configured with an ICMP rate limit of 50 per second, then only the first 50 would get an ICMP “port unreachable” message in reply.

Rate limiting seems innocuous until you remember one of the core rules of data security: don’t let private information influence publicly measurable metrics. ICMP rate limiting violates this rule because the rate limiter’s behavior can be influenced by an attacker making guesses as to whether a “secret” port number is open or not.

don’t let private information influence publicly measurable metrics

An attacker wants to know whether the target has an open port, so it sends a spoofed UDP message from the authoritative server to that port. If the port is open, no ICMP reply is sent and the rate counter remains unchanged. If the port is inaccessible, then an ICMP reply is sent (back to the authoritative server, not to the attacker) and the rate is increased by one. Although the attacker doesn’t see the ICMP response, it has influenced the counter. The counter itself isn’t known outside the server, but whether it has hit the rate limit or not can be measured by any outside observer by sending a UDP packet and waiting for a reply. If an ICMP “port unreachable” reply comes back, the rate limit hasn’t been reached. No reply means the rate limit has been met. This leaks one bit of information about the counter to the outside observer, which in the end is enough to reveal the supposedly secret information (whether the spoofed request got through or not).

SAD DNS Explained
Diagram inspired by original paper‌‌

Concretely, the attack works as follows: the attacker sends a bunch (large enough to trigger the rate limiting) of probe messages to the target, but with a forged source address of the victim. In the case where there are no open ports in the probed set, the target will send out the same amount of ICMP “port unreachable” responses back to the victim and trigger the rate limit on outgoing ICMP messages. The attacker can now send an additional verification message from its own address and observe whether an ICMP response comes back or not. If it does then there was at least one port open in the set and the attacker can divide the set and try again, or do a linear scan by inserting the suspected port number into a set of known closed ports. Using this approach, the attacker can narrow down to the open ports and try to guess the message ID until it is successful or gives up, similarly to the original Kaminsky attack.

In practice there are some hurdles to successfully mounting this attack.

  • First, the target IP, or a set of target IPs must be discovered. This might be trivial in some cases – a single forwarder, or a fixed set of IPs that can be discovered by probing and observing attacker controlled zones, but more difficult if the target IPs are partitioned across zones as the attacker can’t see the resolver egress IP unless she can monitor the traffic for the victim domain.
  • The attack also requires a large enough ICMP outgoing rate limit in order to be able to scan with a reasonable speed. The scan speed is critical, as it must be completed while the query to the victim nameserver is still pending. As the scan speed is effectively fixed, the paper instead describes a method to potentially extend the window of opportunity by triggering the victim’s response rate limiting (RRL), a technique to protect against floods of forged DNS queries. This may work if the victim implements RRL and the target resolver doesn’t implement a retry over TCP (A Quantitative Study of the Deployment of DNS Rate Limiting shows about 16% of nameservers implement some sort of RRL).
  • Generally, busy resolvers will have ephemeral ports opening and closing, which introduces false positive open ports for the attacker, and ports open for different pending queries than the one being attacked.

We’ve implemented an additional mitigation to 1.1.1.1 to prevent message ID guessing – if the resolver detects an ID enumeration attempt, it will stop accepting any more guesses and switches over to TCP. This reduces the number of attempts for the attacker even if it guesses the IP address and port correctly, similarly to how the number of password login attempts is limited.

Outlook

Ultimately these are just mitigations, and the attacker might be willing to play the long game. As long as the transport layer is insecure and DNSSEC is not widely deployed, there will be different methods of chipping away at these mitigations.

It should be noted that trying to hide source IPs or open port numbers is a form of security through obscurity. Without strong cryptographic authentication, it will always be possible to use spoofing to poison DNS resolvers. The silver lining here is that DNSSEC exists, and is designed to protect against this type of attack, and DNS servers are moving to explore cryptographically strong transports such as TLS for communicating between resolvers and authoritative servers.

At Cloudflare, we’ve been helping to reduce the friction of DNSSEC deployment, while also helping to improve transport security in the long run. There is also an effort to increase entropy in DNS messages with RFC 7873 – Domain Name System (DNS) Cookies, and make DNS over TCP support mandatory RFC 7766 – DNS Transport over TCP – Implementation Requirements, with even more documentation around ways to mitigate this type of issue available in different places. All of these efforts are complementary, which is a good thing. The DNS ecosystem consists of many different parties and software with different requirements and opinions, as long as the operators support at least one of the preventive measures, these types of attacks will become more and more difficult.

If you are an operator of an authoritative DNS server, you should consider taking the following steps to protect yourself from this attack:

We’d like to thank the researchers for responsibly disclosing this attack and look forward to working with them in the future on efforts to strengthen the DNS.

Unwrap the SERVFAIL

Post Syndicated from Anbang Wen original https://blog.cloudflare.com/unwrap-the-servfail/

Unwrap the SERVFAIL

We recently released a new version of Cloudflare Resolver which adds a piece of information called “Extended DNS Errors” (EDE) along with the response code under certain circumstances. This will be helpful in tracing DNS resolution errors and figuring out what went wrong behind the scenes.

Unwrap the SERVFAIL
(image from: https://www.pxfuel.com/en/free-photo-expka)

A tight-lipped agent

The DNS protocol was designed to map domain names to IP addresses. To inform the client about the result of the lookup, the protocol has a 4 bit field, called response code/RCODE. The logic to serve a response might look something like this:

function lookup(domain) {
    ...
    switch result {
    case "No error condition":
        return NOERROR with client expected answer
    case "No record for the request type":
        return NOERROR
    case "The request domain does not exist":
        return NXDOMAIN
    case "Refuse to perform the specified operation for policy reasons":
        return REFUSE
    default("Server failure: unable to process this query due to a problem with the name server"):
        return SERVFAIL
    }
}

try {
    lookup(domain)
} catch {
    return SERVFAIL
}

Although the context hasn’t changed much, protocol extensions such as DNSSEC have been added, which makes the RCODE run out of space to express the server’s internal status. To keep backward compatibility, DNS servers have to squeeze various statuses into existing ones. This behavior could confuse the client, especially with the “catch-all” SERVFAIL: something went wrong but what exactly?

Most often, end users don’t talk to authoritative name servers directly, but use a stub and/or a recursive resolver as an agent to acquire the information it needs. When a user receives  SERVFAIL, the failure can be one of the following:

  • The stub resolver fails to send the request.
  • The stub resolver doesn’t get a response.
  • The recursive resolver, which the stub resolver sends its query to, is overloaded.
  • The recursive resolver is unable to communicate with upstream authoritative servers.
  • The recursive resolver fails to verify the DNSSEC chain.
  • The authoritative server takes too long to respond.

In such cases, it is nearly impossible for the user to know exactly what’s wrong. The resolver is usually the one to be blamed, because, as an agent, it fails to get back the answer, and doesn’t return a clear reason for the failure in the response.

Keep backward compatibility

It seems we need to return more information, but (there’s always a but) we also need to keep the behavior of existing clients unchanged.

One way is to extend the RCODE space, which came out with the Extension mechanisms for DNS or EDNS. It defines a 8 bit EXTENDED-RCODE, as high-order bits to current 4 bit RCODE. Together they make up a 12 bit integer. This changes the processing of RCODE, requires both client and server to fully support the logic unfortunately.

Another approach is to provide out-of-band data without touching the current RCODE. This is how Extended DNS Errors is defined. It introduces a new option to EDNS, containing an INFO-CODE to describe error details with an EXTRA-TEXT as an optional supplement. The option can be repeated as many times as needed, so it’s possible for the client to get a full error chain with detailed messages. The INFO-CODE is just something like RCODE, but is 16 bits wide, while the EXTRA-TEXT is an utf-8 encoded string. For example, let’s say a client sends a request to a resolver, and the requested domain has two name servers. The client may receive a SERVFAIL response with an OPT record (see below) which contains two extended errors, one from one of the authoritative servers that shows it’s not ready to serve, and the other from the resolver, showing it cannot connect to the other name server.

;; OPT PSEUDOSECTION:
; ...
; EDE: 14 (Not Ready)
; EDE: 23 (Network Error): (cannot reach upstream 192.0.2.1)
; ...

Google has something similar in their DoH JSON API, which provides diagnostic information in the “Comment” field.

Let’s dig into it

Our 1.1.1.1 service has an initial support of the draft version of Extended DNS Errors, while we are still trying to find the best practice. As we mentioned above, this is not a breaking change, and existing clients will not be affected. The additional options can be safely ignored without any problem, since the RCODE stays the same.

If you have a newer version of dig, you can simply check it out with a known problematic domain. As you can see, due to DNSSEC verification failing, the RCODE is still SERVFAIL, but the extended error shows the failure is “DNSSEC Bogus”.

$ dig @1.1.1.1 dnssec-failed.org

; <<>> DiG 9.16.4-Debian <<>> @1.1.1.1 dnssec-failed.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1111
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; EDE: 6 (DNSSEC Bogus)
;; QUESTION SECTION:
;dnssec-failed.org.		IN	A

;; Query time: 111 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Wed Sep 01 00:00:00 PDT 2020
;; MSG SIZE  rcvd: 52

Note that Extended DNS Error relies on EDNS. So to be able to get one, the client needs to support EDNS, and needs to enable it in the request. At the time of writing this blog post, we see about 17% of queries that 1.1.1.1 received had EDNS enabled within a short time range. We hope this information will help you uncover the root cause of a SERVFAIL in the future.

DNS Flag Day 2020

Post Syndicated from Christian Elmerot original https://blog.cloudflare.com/dns-flag-day-2020/

DNS Flag Day 2020

DNS Flag Day 2020

October 1 was this year’s DNS Flag Day. Read on to find out all about DNS Flag Day and how it affects Cloudflare’s DNS services (hint: it doesn’t, we already did the work to be compliant).

What is DNS Flag Day?

DNS Flag Day is an initiative by several DNS vendors and operators to increase the compliance of implementations with DNS standards. The goal is to make DNS more secure, reliable and robust. Rather than a push for new features, DNS flag day is meant to ensure that workarounds for non-compliance can be reduced and a common set of functionalities can be established and relied upon.

Last year’s flag day was February 1, and it set forth that servers and clients must be able to properly handle the Extensions to DNS (EDNS0) protocol (first RFC about EDNS0 are from 1999 – RFC 2671). This way, by assuming clients have a working implementation of EDNS0, servers can resort to always sending messages as EDNS0. This is needed to support DNSSEC, the DNS security extensions. We were, of course, more than thrilled to support the effort, as we’re keen to push DNSSEC adoption forward .

DNS Flag Day 2020

The goal for this year’s flag day is to increase DNS messaging reliability by focusing on problems around IP fragmentation of DNS packets. The intention is to reduce DNS message fragmentation which continues to be a problem. We can do that by ensuring cleartext DNS messages sent over UDP are not too large, as large messages risk being fragmented during the transport. Additionally, when sending or receiving large DNS messages, we have the ability to do so over TCP.

Problem with DNS transport over UDP

A potential issue with sending DNS messages over UDP is that the sender has no indication of the recipient actually receiving the message. When using TCP, each packet being sent is acknowledged (ACKed) by the recipient, and the sender will attempt to resend any packets not being ACKed. UDP, although it may be faster than TCP, does not have the same mechanism of messaging reliability. Anyone still wishing to use UDP as their transport protocol of choice will have to implement this reliability mechanism in higher layers of the network stack. For instance, this is what is being done in QUIC, the new Internet transport protocol used by HTTP/3 that is built on top of UDP.

Even the earliest DNS standards (RFC 1035) specified the use of sending DNS messages over TCP as well as over UDP. Unfortunately, the choice of supporting TCP or not was up to the implementer/operator, and then firewalls were sometimes set to block DNS over TCP. More recent updates to RFC 1035, on the other hand, require that the DNS server is available to query using DNS over TCP.

DNS message fragmentation

Sending data over networks and the Internet is restricted to the limitation of how large each packet can be. Data is chopped up into a stream of packets, and sized to adhere to the Maximum Transmission Unit (MTU) of the network. MTU is typically 1500 bytes for IPv4 and, in the case of IPv6, the minimum is 1280 bytes. Subtracting both the IP header size (IPv4 20 bytes/IPv6 40 bytes) and the UDP protocol header size (8 bytes) from the MTU, we end up with a maximum DNS message size of 1472 bytes for IPv4 and 1232 bytes in order for a message to fit within a single packet. If the message is any larger than that, it will have to be fragmented into more packets.

Sending large messages causes them to get fragmented into more than one pack. This is not a problem with TCP transports since each packet is ACK:ed to ensure proper delivery. However, the same does not hold true when sending large DNS messages over UDP. For many intents and purposes, UDP has been treated as a second-class citizen to TCP as far as network routing is concerned. It is quite common to see UDP packet fragments being dropped by routers and firewalls, potentially causing parts of a message to be lost. To avoid fragmentation over UDP it is better to truncate the DNS message and set the Truncation Flag in the DNS response. This tells the recipient that more data is available if the query is retried over TCP.

DNS Flag Day 2020 wants to ensure that DNS message fragmentation does not happen. When larger DNS messages need to be sent, we need to ensure it can be done reliably over TCP.

DNS servers need to support DNS message transport over TCP in order to be compliant with this year’s flag day. Also, DNS messages sent over UDP must never exceed the limit over which they risk being fragmented.

Cloudflare authoritative DNS and 1.1.1.1

We fully support the DNS Flag Day initiative, as it aims to make DNS more reliable and robust, and it ensures a common set of features for the DNS community to evolve on. In the DNS ecosystem, we are as much a client as we are a provider. When we perform DNS lookups on behalf of our customers and users, we rely on other providers to follow standards and be compliant. When they are not, and we can’t work around the issues, it leads to problems resolving names and reaching resources.

Both our public resolver 1.1.1.1 as well as our authoritative DNS service, set and enforce reasonable limits on DNS message sizes when sent over UDP. Of course, both services are available over TCP. If you’re already using Cloudflare, there is nothing you need to do but to keep using our DNS services! We will continually work on improving DNS.

Oh, and you can test your domain on the DNS Flag Day site: https://dnsflagday.net/2020/

Secondary DNS – Deep Dive

Post Syndicated from Alex Fattouche original https://blog.cloudflare.com/secondary-dns-deep-dive/

How Does Secondary DNS Work?

Secondary DNS - Deep Dive

If you already understand how Secondary DNS works, please feel free to skip this section. It does not provide any Cloudflare-specific information.

Secondary DNS has many use cases across the Internet; however, traditionally, it was used as a synchronized backup for when the primary DNS server was unable to respond to queries. A more modern approach involves focusing on redundancy across many different nameservers, which in many cases broadcast the same anycasted IP address.

Secondary DNS involves the unidirectional transfer of DNS zones from the primary to the Secondary DNS server(s). One primary can have any number of Secondary DNS servers that it must communicate with in order to keep track of any zone updates. A zone update is considered a change in the contents of a  zone, which ultimately leads to a Start of Authority (SOA) serial number increase. The zone’s SOA serial is one of the key elements of Secondary DNS; it is how primary and secondary servers synchronize zones. Below is an example of what an SOA record might look like during a dig query.

example.com	3600	IN	SOA	ashley.ns.cloudflare.com. dns.cloudflare.com. 
2034097105  // Serial
10000 // Refresh
2400 // Retry
604800 // Expire
3600 // Minimum TTL

Each of the numbers is used in the following way:

  1. Serial – Used to keep track of the status of the zone, must be incremented at every change.
  2. Refresh – The maximum number of seconds that can elapse before a Secondary DNS server must check for a SOA serial change.
  3. Retry – The maximum number of seconds that can elapse before a Secondary DNS server must check for a SOA serial change, after previously failing to contact the primary.
  4. Expire – The maximum number of seconds that a Secondary DNS server can serve stale information, in the event the primary cannot be contacted.
  5. Minimum TTL – Per RFC 2308, the number of seconds that a DNS negative response should be cached for.

Using the above information, the Secondary DNS server stores an SOA record for each of the zones it is tracking. When the serial increases, it knows that the zone must have changed, and that a zone transfer must be initiated.  

Serial Tracking

Serial increases can be detected in the following ways:

  1. The fastest way for the Secondary DNS server to keep track of a serial change is to have the primary server NOTIFY them any time a zone has changed using the DNS protocol as specified in RFC 1996, Secondary DNS servers will instantly be able to initiate a zone transfer.
  2. Another way is for the Secondary DNS server to simply poll the primary every “Refresh” seconds. This isn’t as fast as the NOTIFY approach, but it is a good fallback in case the notifies have failed.

One of the issues with the basic NOTIFY protocol is that anyone on the Internet could potentially notify the Secondary DNS server of a zone update. If an initial SOA query is not performed by the Secondary DNS server before initiating a zone transfer, this is an easy way to perform an amplification attack. There is two common ways to prevent anyone on the Internet from being able to NOTIFY Secondary DNS servers:

  1. Using transaction signatures (TSIG) as per RFC 2845. These are to be placed as the last record in the extra records section of the DNS message. Usually the number of extra records (or ARCOUNT) should be no more than two in this case.
  2. Using IP based access control lists (ACL). This increases security but also prevents flexibility in server location and IP address allocation.

Generally NOTIFY messages are sent over UDP, however TCP can be used in the event the primary server has reason to believe that TCP is necessary (i.e. firewall issues).

Zone Transfers

In addition to serial tracking, it is important to ensure that a standard protocol is used between primary and Secondary DNS server(s), to efficiently transfer the zone. DNS zone transfer protocols do not attempt to solve the confidentiality, authentication and integrity triad (CIA); however, the use of TSIG on top of the basic zone transfer protocols can provide integrity and authentication. As a result of DNS being a public protocol, confidentiality during the zone transfer process is generally not a concern.

Authoritative Zone Transfer (AXFR)

AXFR is the original zone transfer protocol that was specified in RFC 1034 and RFC 1035 and later further explained in RFC 5936. AXFR is done over a TCP connection because a reliable protocol is needed to ensure packets are not lost during the transfer. Using this protocol, the primary DNS server will transfer all of the zone contents to the Secondary DNS server, in one connection, regardless of the serial number. AXFR is recommended to be used for the first zone transfer, when none of the records are propagated, and IXFR is recommended after that.

Incremental Zone Transfer (IXFR)

IXFR is the more sophisticated zone transfer protocol that was specified in RFC 1995. Unlike the AXFR protocol, during an IXFR, the primary server will only send the secondary server the records that have changed since its current version of the zone (based on the serial number). This means that when a Secondary DNS server wants to initiate an IXFR, it sends its current serial number to the primary DNS server. The primary DNS server will then format its response based on previous versions of changes made to the zone. IXFR messages must obey the following pattern:

  1. Current latest SOA
  2. Secondary server current SOA
  3. DNS record deletions
  4. Secondary server current SOA + changes
  5. DNS record additions
  6. Current latest SOA

Steps 2,3,4,5,6 can be repeated any number of times, as each of those represents one change set of deletions and additions, ultimately leading to a new serial.

IXFR can be done over UDP or TCP, but again TCP is generally recommended to avoid packet loss.

How Does Secondary DNS Work at Cloudflare?

The DNS team loves microservice architecture! When we initially implemented Secondary DNS at Cloudflare, it was done using Mesos Marathon. This allowed us to separate each of our services into several different marathon apps, individually scaling apps as needed. All of these services live in our core data centers. The following services were created:

  1. Zone Transferer – responsible for attempting IXFR, followed by AXFR if IXFR fails.
  2. Zone Transfer Scheduler – responsible for periodically checking zone SOA serials for changes.
  3. Rest API – responsible for registering new zones and primary nameservers.

In addition to the marathon apps, we also had an app external to the cluster:

  1. Notify Listener – responsible for listening for notifies from primary servers and telling the Zone Transferer to initiate an AXFR/IXFR.

Each of these microservices communicates with the others through Kafka.

Secondary DNS - Deep Dive
Figure 1: Secondary DNS Microservice Architecture‌‌

Once the zone transferer completes the AXFR/IXFR, it then passes the zone through to our zone builder, and finally gets pushed out to our edge at each of our 200 locations.

Although this current architecture worked great in the beginning, it left us open to many vulnerabilities and scalability issues down the road. As our Secondary DNS product became more popular, it was important that we proactively scaled and reduced the technical debt as much as possible. As with many companies in the industry, Cloudflare has recently migrated all of our core data center services to Kubernetes, moving away from individually managed apps and Marathon clusters.

What this meant for Secondary DNS is that all of our Marathon-based services, as well as our NOTIFY Listener, had to be migrated to Kubernetes. Although this long migration ended up paying off, many difficult challenges arose along the way that required us to come up with unique solutions in order to have a seamless, zero downtime migration.

Challenges When Migrating to Kubernetes

Although the entire DNS team agreed that kubernetes was the way forward for Secondary DNS, it also introduced several challenges. These challenges arose from a need to properly scale up across many distributed locations while also protecting each of our individual data centers. Since our core does not rely on anycast to automatically distribute requests, as we introduce more customers, it opens us up to denial-of-service attacks.

The two main issues we ran into during the migration were:

  1. How do we create a distributed and reliable system that makes use of kubernetes principles while also making sure our customers know which IPs we will be communicating from?
  2. When opening up a public-facing UDP socket to the Internet, how do we protect ourselves while also preventing unnecessary spam towards primary nameservers?.

Issue 1:

As was previously mentioned, one form of protection in the Secondary DNS protocol is to only allow certain IPs to initiate zone transfers. There is a fine line between primary servers allow listing too many IPs and them having to frequently update their IP ACLs. We considered several solutions:

  1. Open source k8s controllers
  2. Altering Network Address Translation(NAT) entries
  3. Do not use k8s for zone transfers
  4. Allowlist all Cloudflare IPs and dynamically update
  5. Proxy egress traffic

Ultimately we decided to proxy our egress traffic from k8s, to the DNS primary servers, using static proxy addresses. Shadowsocks-libev was chosen as the SOCKS5 implementation because it is fast, secure and known to scale. In addition, it can handle both UDP/TCP and IPv4/IPv6.

Secondary DNS - Deep Dive
Figure 2: Shadowsocks proxy Setup

The partnership of k8s and Shadowsocks combined with a large enough IP range brings many benefits:

  1. Horizontal scaling
  2. Efficient load balancing
  3. Primary server ACLs only need to be updated once
  4. It allows us to make use of kubernetes for both the Zone Transferer and the Local ShadowSocks Proxy.
  5. Shadowsocks proxy can be reused by many different Cloudflare services.

Issue 2:

The Notify Listener requires listening on static IPs for NOTIFY Messages coming from primary DNS servers. This is mostly a solved problem through the use of k8s services of type loadbalancer, however exposing this service directly to the Internet makes us uneasy because of its susceptibility to attacks. Fortunately DDoS protection is one of Cloudflare’s strengths, which lead us to the likely solution of dogfooding one of our own products, Spectrum.

Spectrum provides the following features to our service:

  1. Reverse proxy TCP/UDP traffic
  2. Filter out Malicious traffic
  3. Optimal routing from edge to core data centers
  4. Dual Stack technology
Secondary DNS - Deep Dive
Figure 3: Spectrum interaction with Notify Listener

Figure 3 shows two interesting attributes of the system:

  1. Spectrum <-> k8s IPv4 only:
  2. This is because our custom k8s load balancer currently only supports IPv4; however, Spectrum has no issue terminating the IPv6 connection and establishing a new IPv4 connection.
  3. Spectrum <-> k8s routing decisions based of L4 protocol:
  4. This is because k8s only supports one of TCP/UDP/SCTP per service of type load balancer. Once again, spectrum has no issues proxying this correctly.

One of the problems with using a L4 proxy in between services is that source IP addresses get changed to the source IP address of the proxy (Spectrum in this case). Not knowing the source IP address means we have no idea who sent the NOTIFY message, opening us up to attack vectors. Fortunately, Spectrum’s proxy protocol feature is capable of adding custom headers to TCP/UDP packets which contain source IP/Port information.

As we are using miekg/dns for our Notify Listener, adding proxy headers to the DNS NOTIFY messages would cause failures in validation at the DNS server level. Alternatively, we were able to implement custom read and write decorators that do the following:

  1. Reader: Extract source address information on inbound NOTIFY messages. Place extracted information into new DNS records located in the additional section of the message.
  2. Writer: Remove additional records from the DNS message on outbound NOTIFY replies. Generate a new reply using proxy protocol headers.

There is no way to spoof these records, because the server only permits two extra records, one of which is the optional TSIG. Any other records will be overwritten.

Secondary DNS - Deep Dive
Figure 4: Proxying Records Between Notifier and Spectrum‌‌

This custom decorator approach abstracts the proxying away from the Notify Listener through the use of the DNS protocol.  

Although knowing the source IP will block a significant amount of bad traffic, since NOTIFY messages can use both UDP and TCP, it is prone to IP spoofing. To ensure that the primary servers do not get spammed, we have made the following additions to the Zone Transferer:

  1. Always ensure that the SOA has actually been updated before initiating a zone transfer.
  2. Only allow at most one working transfer and one scheduled transfer per zone.

Additional Technical Challenges

Zone Transferer Scheduling

As shown in figure 1, there are several ways of sending Kafka messages to the Zone Transferer in order to initiate a zone transfer. There is no benefit in having a large backlog of zone transfers for the same zone. Once a zone has been transferred, assuming no more changes, it does not need to be transferred again. This means that we should only have at most one transfer ongoing, and one scheduled transfer at the same time, for any zone.

If we want to limit our number of scheduled messages to one per zone, this involves ignoring Kafka messages that get sent to the Zone Transferer. This is not as simple as ignoring specific messages in any random order. One of the benefits of Kafka is that it holds on to messages until the user actually decides to acknowledge them, by committing that messages offset. Since Kafka is just a queue of messages, it has no concept of order other than first in first out (FIFO). If a user is capable of reading from the Kafka topic concurrently, it is entirely possible that a message in the middle of the queue be committed before a message at the end of the queue.

Most of the time this isn’t an issue, because we know that one of the concurrent readers has read the message from the end of the queue and is processing it. There is one Kubernetes-related catch to this issue, though: pods are ephemeral. The kube master doesn’t care what your concurrent reader is doing, it will kill the pod and it’s up to your application to handle it.

Consider the following problem:

Secondary DNS - Deep Dive
Figure 5: Kafka Partition‌‌
  1. Read offset 1. Start transferring zone 1.
  2. Read offset 2. Start transferring zone 2.
  3. Zone 2 transfer finishes. Commit offset 2, essentially also marking offset 1.
  4. Restart pod.
  5. Read offset 3 Start transferring zone 3.

If these events happen, zone 1 will never be transferred. It is important that zones stay up to date with the primary servers, otherwise stale data will be served from the Secondary DNS server. The solution to this problem involves the use of a list to track which messages have been read and completely processed. In this case, when a zone transfer has finished, it does not necessarily mean that the kafka message should be immediately committed. The solution is as follows:

  1. Keep a list of Kafka messages, sorted based on offset.
  2. If finished transfer, remove from list:
  3. If the message is the oldest in the list, commit the messages offset.
Secondary DNS - Deep Dive
Figure 6: Kafka Algorithm to Solve Message Loss

This solution is essentially soft committing Kafka messages, until we can confidently say that all other messages have been acknowledged. It’s important to note that this only truly works in a distributed manner if the Kafka messages are keyed by zone id, this will ensure the same zone will always be processed by the same Kafka consumer.

Life of a Secondary DNS Request

Although Cloudflare has a large global network, as shown above, the zone transferring process does not take place at each of the edge datacenter locations (which would surely overwhelm many primary servers), but rather in our core data centers. In this case, how do we propagate to our edge in seconds? After transferring the zone, there are a couple more steps that need to be taken before the change can be seen at the edge.

  1. Zone Builder – This interacts with the Zone Transferer to build the zone according to what Cloudflare edge understands. This then writes to Quicksilver, our super fast, distributed KV store.
  2. Authoritative Server – This reads from Quicksilver and serves the built zone.
Secondary DNS - Deep Dive
Figure 7: End to End Secondary DNS‌‌

What About Performance?

At the time of writing this post, according to dnsperf.com, Cloudflare leads in global performance for both Authoritative and Resolver DNS. Here, Secondary DNS falls under the authoritative DNS category here. Let’s break down the performance of each of the different parts of the Secondary DNS pipeline, from the primary server updating its records, to them being present at the Cloudflare edge.

  1. Primary Server to Notify Listener – Our most accurate measurement is only precise to the second, but we know UDP/TCP communication is likely much faster than that.
  2. NOTIFY to Zone Transferer – This is negligible
  3. Zone Transferer to Primary Server – 99% of the time we see ~800ms as the average latency for a zone transfer.
Secondary DNS - Deep Dive
Figure 8: Zone XFR latency

4. Zone Transferer to Zone Builder – 99% of the time we see ~10ms to build a zone.

Secondary DNS - Deep Dive
Figure 9: Zone Build time

5. Zone Builder to Quicksilver edge: 95% of the time we see less than 1s propagation.

Secondary DNS - Deep Dive
Figure 10: Quicksilver propagation time

End to End latency: less than 5 seconds on average. Although we have several external probes running around the world to test propagation latencies, they lack precision due to their sleep intervals, location, provider and number of zones that need to run. The actual propagation latency is likely much lower than what is shown in figure 10. Each of the different colored dots is a separate data center location around the world.

Secondary DNS - Deep Dive
Figure 11: End to End Latency

An additional test was performed manually to get a real world estimate, the test had the following attributes:

Primary server: NS1
Number of records changed: 1
Start test timer event: Change record on NS1
Stop test timer event: Observe record change at Cloudflare edge using dig
Recorded timer value: 6 seconds

Conclusion

Cloudflare serves 15.8 trillion DNS queries per month, operating within 100ms of 99% of the Internet-connected population. The goal of Cloudflare operated Secondary DNS is to allow our customers with custom DNS solutions, be it on-premise or some other DNS provider, to be able to take advantage of Cloudflare’s DNS performance and more recently, through Secondary Override, our proxying and security capabilities too. Secondary DNS is currently available on the Enterprise plan, if you’d like to take advantage of it, please let your account team know. For additional documentation on Secondary DNS, please refer to our support article.