Tag Archives: DNS

Zero-Trust DNS

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/05/zero-trust-dns.html

Microsoft is working on a promising-looking protocol to lock down DNS.

ZTDNS aims to solve this decades-old problem by integrating the Windows DNS engine with the Windows Filtering Platform—the core component of the Windows Firewall—directly into client devices.

Jake Williams, VP of research and development at consultancy Hunter Strategy, said the union of these previously disparate engines would allow updates to be made to the Windows firewall on a per-domain name basis. The result, he said, is a mechanism that allows organizations to, in essence, tell clients “only use our DNS server, that uses TLS, and will only resolve certain domains.” Microsoft calls this DNS server or servers the “protective DNS server.”

By default, the firewall will deny resolutions to all domains except those enumerated in allow lists. A separate allow list will contain IP address subnets that clients need to run authorized software. Key to making this work at scale inside an organization with rapidly changing needs. Networking security expert Royce Williams (no relation to Jake Williams) called this a “sort of a bidirectional API for the firewall layer, so you can both trigger firewall actions (by input *to* the firewall), and trigger external actions based on firewall state (output *from* the firewall). So instead of having to reinvent the firewall wheel if you are an AV vendor or whatever, you just hook into WFP.”

Improving authoritative DNS with the official release of Foundation DNS

Post Syndicated from Hannes Gerhart original https://blog.cloudflare.com/foundation-dns-launch


We are very excited to announce the official release of Foundation DNS, with new advanced nameservers, even more resilience, and advanced analytics to meet the complex requirements of our enterprise customers. Foundation DNS is one of Cloudflare’s largest leaps forward in our authoritative DNS offering since its launch in 2010, and we know our customers are interested in an enterprise-ready authoritative DNS service with the highest level of performance, reliability, security, flexibility, and advanced analytics.

Starting today, every new enterprise contract that includes authoritative DNS will have access to the Foundation DNS feature set and existing enterprise customers will have Foundation DNS features made available to them over the course of this year. If you are an existing enterprise customer already using our authoritative DNS services, and you’re interested in getting your hands on Foundation DNS earlier, just reach out to your account team, and they can enable it for you. Let’s get started…

Why is DNS so important?

From an end user perspective, DNS makes the Internet usable. DNS is the phone book of the Internet which translates hostnames like www.cloudflare.com into IP addresses that our browsers, applications, and devices use to connect to services. Without DNS, users would have to remember IP addresses like 108.162.193.147 or 2606:4700:58::adf5:3b93 every time they wanted to visit a website on their mobile device or desktop – imagine having to remember something like that instead of just www.cloudflare.com. DNS is used in every end user application on the Internet, from social media to banking to healthcare portals. People’s usage of the Internet is entirely reliant on DNS.

From a business perspective, DNS is the very first step in reaching websites and connecting to applications. Devices need to know where to connect in order to reach services, authenticate users, and provide the information being requested. Resolving DNS queries quickly can be the difference between a website or application being perceived as responsive or not and can have a real impact on user experience.

When DNS outages occur, the impacts are obvious. Imagine your go-to ecommerce site not loading, just like what happened with the outage Dyn experienced in 2016, which took down multiple popular ecommerce sites among others. Or, if you are part of a company, and customers aren’t able to reach your website to purchase the goods or services you are selling, a DNS outage will literally lose you money. DNS is often taken for granted, but make no mistake, you’ll notice it when it’s not working properly. Thankfully, if you use Cloudflare Authoritative DNS, these are problems you don’t worry about very much.

There is always room for improvement

Cloudflare has been providing authoritative DNS services for over a decade. Our authoritative DNS service hosts millions of domains across many different top level domains (TLDs). We have customers of all sizes, from single domains with just a few records to customers with tens of millions of records spread across multiple domains. Our enterprise customers, rightfully, demand the highest level of performance, reliability, security, and flexibility from our DNS service, along with detailed analytics. While our customers love our authoritative DNS, we recognize there is always room for improvement in some of those categories. To that end, we set off to make some major improvements to our DNS architecture, with new features as well as structural changes. We are proudly calling this improved offering Foundation DNS.

Meet Foundation DNS

As our new enterprise authoritative DNS offering, Foundation DNS was designed to enhance the reliability, security, flexibility, and analytics of our existing authoritative DNS service. Before we dive into all the specifics of Foundation DNS, here is a quick summary of what Foundation DNS brings to our authoritative DNS offering:

  • Advanced nameservers bring DNS reliability to the next level.
  • New zone-level DNS settings provide more flexible configuration of DNS specific settings.
  • Unique DNSSEC keys per account and zone provide additional security and flexibility for DNSSEC.
  • GraphQL-based DNS analytics provide even more insights into your DNS queries.
  • A new release process ensures enterprise customers have the utmost stability and reliability.
  • Simpler DNS pricing with more generous quotas for DNS-only zones and DNS records.

Now, let’s dive deeper into each of these new Foundation DNS features:

Advanced nameservers

With Foundation DNS, we’re introducing advanced nameservers with a specific focus on enhancing reliability for your enterprise. You might be familiar with our standard authoritative nameservers which come as a pair per zone and use names within the cloudflare.com domain. Here’s an example:

$ dig mycoolwebpage.xyz ns +noall +answer 
mycoolwebpage.xyz.	86400	IN	NS	kelly.ns.cloudflare.com.
mycoolwebpage.xyz.	86400	IN	NS	christian.ns.cloudflare.com.

Now, let’s look at the same zone using Foundation DNS advanced nameservers:

$ dig mycoolwebpage.xyz ns +noall +answer 
mycoolwebpage.xyz.	86400	IN	NS	blue.foundationdns.com.
mycoolwebpage.xyz.	86400	IN	NS	blue.foundationdns.net.
mycoolwebpage.xyz.	86400	IN	NS	blue.foundationdns.org.

Advanced nameservers improve reliability in a few different ways. The first improvement comes from the Foundation DNS authoritative servers being spread across multiple TLDs. This provides protection from larger scale DNS outages and DDoS attacks that could potentially affect DNS infrastructure further up the tree, including TLD name servers. Foundation DNS authoritative nameservers are now located across multiple branches of the global DNS tree structure, further insulating our customers from these potential outages and attacks.

You might also have noticed that there is an additional nameserver listed with Foundation DNS. While this is an improvement, it’s not for the reason you might think it is. If we resolve each one of these nameservers to their respective IP addresses, we can make this a little easier to understand. Let’s do that here starting with our standard nameservers:

$ dig kelly.ns.cloudflare.com. +noall +answer
kelly.ns.cloudflare.com. 86353  IN      A       108.162.194.91
kelly.ns.cloudflare.com. 86353  IN      A       162.159.38.91
kelly.ns.cloudflare.com. 86353  IN      A       172.64.34.91

$ dig christian.ns.cloudflare.com. +noall +answer
christian.ns.cloudflare.com. 86353 IN   A       108.162.195.247
christian.ns.cloudflare.com. 86353 IN   A       162.159.44.247
christian.ns.cloudflare.com. 86353 IN   A       172.64.35.247

There are six total IP addresses for the two nameservers. As it turns out, this is all DNS resolvers actually care about when querying authoritative nameservers. DNS resolvers usually don’t track the actual domain names of authoritative servers; they simply maintain an unordered list of IP addresses that they can use to resolve queries for a given domain. So with our standard authoritative nameservers, we give resolvers six IP addresses to use to resolve DNS queries. Now, let’s look at the IP addresses for our Foundation DNS advanced nameservers:

$ dig blue.foundationdns.com. +noall +answer
blue.foundationdns.com. 300     IN      A       108.162.198.1
blue.foundationdns.com. 300     IN      A       162.159.60.1
blue.foundationdns.com. 300     IN      A       172.64.40.1

$ dig blue.foundationdns.net. +noall +answer
blue.foundationdns.net. 300     IN      A       108.162.198.1
blue.foundationdns.net. 300     IN      A       162.159.60.1
blue.foundationdns.net. 300     IN      A       172.64.40.1

$ dig blue.foundationdns.org. +noall +answer
blue.foundationdns.org. 300     IN      A       108.162.198.1
blue.foundationdns.org. 300     IN      A       162.159.60.1
blue.foundationdns.org. 300     IN      A       172.64.40.1

Would you look at that! Foundation DNS provides the same IP addresses for each of the authoritative nameservers that we provide to a zone. So in this case, we have only provided three IP addresses for resolvers to use to resolve DNS queries. And you might be wondering,“isn’t six better than three? Isn’t this a downgrade?” It turns out more isn’t always better. Let’s talk about why.

You are probably aware of Cloudflare’s use of Anycast and, as you might assume, our DNS services leverage Anycast to ensure that our authoritative DNS servers are available globally and as close as possible to users and resolvers across the Internet. Our standard nameservers are all advertised out of every Cloudflare data center by a single Anycast group. If we zoom in on Europe, you can see that in a standard nameserver deployment, both nameservers are advertised from every data center.

We can take those six IP addresses from our standard nameservers above and perform a lookup for their “hostname.bind” TXT record which will show us the airport code or physical location of the closest data center where our DNS queries are being resolved from. This output helps explain the reason why more isn’t always better.

$ dig @108.162.194.91 ch txt hostname.bind +short
"LHR"

$ dig @162.159.38.91 ch txt hostname.bind +short
"LHR"

$ dig @172.64.34.91 ch txt hostname.bind +short
"LHR"

$ dig @108.162.195.247 ch txt hostname.bind +short
"LHR"

$ dig @162.159.44.247 ch txt hostname.bind +short
"LHR"

$ dig @172.64.35.247 ch txt hostname.bind +short
"LHR"

As you can see, when queried from near London, all six of those IP addresses route to the same London (LHR) data center. Meaning that when a resolver in London is resolving DNS queries for a domain using Cloudflare’s standard authoritative DNS, no matter which nameserver IP address is being queried, they are always connecting to the same physical location.

You might be asking, “So what? What does that mean to me?” Let’s look at an example. If you wanted to resolve a domain using Cloudflare standard nameservers from London, and I am using a public resolver that is also located in London, the resolver will always connect to the Cloudflare LHR data center regardless of which nameserver it’s trying to reach. It doesn’t have any other option, because of Anycast.

Because of Anycast, should the LHR data center go offline completely, all the traffic intended for LHR would be routed to other nearby data centers and resolvers would continue to function normally. However, in the unlikely scenario where the LHR data center was online, but our DNS services aren’t able to respond to DNS queries, the resolver would have no way to resolve these DNS queries since they can’t reach out to any other data center. We could have 100 IP addresses, and it would not help us in this scenario. Eventually, cached responses will expire, and the domain will eventually stop being resolved.

Foundation DNS advanced nameservers are changing the way we use Anycast by leveraging two Anycast groups, which breaks the previous paradigm of every authoritative nameserver IP being advertised from every data center. Using two Anycast groups means that Foundation DNS authoritative nameservers actually have different physical locations from one another, rather than all being advertised from each data center. Here is how that same region would look using two Anycast groups:

Let’s go back and finish our comparison of six authoritative IP addresses for standard authoritative DNS vs three IP addresses for Foundation DNS now that it’s understood that Foundation DNS is using two Anycast groups for advertising nameservers. Let’s see where Foundation DNS servers are being advertised from for our example:

$ dig @108.162.198.1 ch txt hostname.bind +short
"LHR"

$ dig @162.159.60.1 ch txt hostname.bind +short
"LHR"

$ dig @172.64.40.1 ch txt hostname.bind +short
"MAN"

Look at that! One of our three nameserver IP addresses is being advertised out of a different data center, Manchester (MAN), making Foundation DNS more reliable and resilient for the previously mentioned outage scenario. It’s worth mentioning that in some cities, Cloudflare operates out of multiple data centers which will result in all three queries returning the same airport code. While we guarantee that at least one of those IP addresses is being advertised out of a different data center, we understand some customers may want to test for themselves. In those cases, an additional query can show that IP addresses are being advertised out of different data centers.

$ dig @108.162.198.1 +nsid | grep NSID:
; NSID: 39 34 6d 33 39 ("94m39")

In the “94m30” returned in the response, the number before the “m” represents the data center that answered the query. As long as that number is different in one of the three responses, you know that one of your Foundation DNS authoritative nameservers is being advertised out of a different physical location.

With Foundation DNS leveraging two Anycast groups, the previous outage scenario is handled seamlessly. DNS resolvers monitor requests to all the authoritative nameservers returned for a given domain, but primarily use the nameserver that is providing the fastest responses.

With this configuration, DNS resolvers are able to send requests to two different Cloudflare data centers, so, should a failure happen at one physical location, queries are then automatically sent to the second data center where they can be properly resolved.

Foundation DNS advanced nameservers are a big step forward in reliability for our enterprise customers. We welcome our enterprise customers to enable advanced nameservers for existing zones today. Migrating to Foundation DNS won’t involve any downtime either because even after Foundation DNS advanced nameservers are enabled for a zone, the previous standard authoritative DNS nameservers will continue to function and respond to queries for the zone. Customers don’t need to plan for a cutover or other service-impacting event to migrate to Foundation DNS advanced nameservers.

New zone-level DNS settings

Historically, we have received regular requests from our enterprise customers to adjust specific DNS settings that were not exposed via our API or dashboard, such as enabling secondary DNS overrides. When customers wanted these settings adjusted, they had to reach out to their account teams, who would change the configurations. With Foundation DNS, we are exposing the most commonly requested settings via the API and dashboard to give our customers increased flexibility with their Cloudflare authoritative DNS solution.

Enterprise customers can now configure the following DNS settings on their zones:

Setting Zone Type Description
Foundation DNS advanced nameservers Primary and secondary zones Allows you to enable advanced nameservers on your zone.
Secondary DNS override Secondary zones Allows you to enable Secondary DNS Override on your zone in order to proxy HTTP/S traffic through Cloudflare.
Multi-provider DNS Primary and secondary zones Allows you to have multiple authoritative DNS providers while using Cloudflare as a primary nameserver.

Unique DNSSEC keys per account and zone

DNSSEC, which stands for Domain Name System Security Extensions, adds security to a domain or zone by providing a way to check that the response you receive for a DNS query is authentic and hasn’t been modified. DNSSEC prevents DNS cache poisoning (DNS spoofing) which helps ensure that DNS resolvers are responding to DNS queries with the correct IP addresses.

Since we launched Universal DNSSEC in 2015, we’ve made quite a few improvements, like adding support for pre-signed DNSSEC for secondary zones and multi-signer DNSSEC. By default, Cloudflare signs DNS records on the fly (live signing) as we respond to DNS queries. This allows Cloudflare to host a DNSSEC-secured domain while dynamically allocating IP addresses for the proxied origins. It also enables certain load balancing use cases since the IP addresses served in the DNS response in these cases change based on steering.

Cloudflare uses the Elliptic Curve algorithm ECDSA P-256, which is stronger than most RSA keys used today. It uses less CPU to generate signatures, making them more efficient to generate on the fly. Usually two keys are used as part of DNSSEC, the Zone Signing Key (ZSK) and the Key Signing Key (KSK). At the simplest level, the ZSK is used for signing the DNS records that are served in response to queries and the KSK is used to sign the DNSKEYs, including the ZSK to ensure its authenticity.

Today, Cloudflare uses a shared ZSK and KSK globally for all DNSSEC signing, and since we use such a strong cryptographic algorithm, we know how secure this key set is and as such, do not believe there is a need to regularly rotate the ZSK or KSK – at least for security reasons. There are customers, however, that have policies that require the rotation of these keys at certain intervals. Because of this, we’ve added the ability for our new Foundation DNS advanced nameservers to rotate both their ZSK and KSK as needed per account or per zone. This will first be available via the API and subsequently through the Cloudflare dashboard. So now, customers with strict policy requirements around their DNSSEC key rotation can meet those requirements with Cloudflare Foundation DNS.

GraphQL-based DNS analytics

For those who are not familiar with it, GraphQL is a query language for APIs and a runtime for executing those queries. It allows clients to request exactly what they need, no more, no less, enabling them to aggregate data from multiple sources through a single API call, and supports real-time updates through subscriptions.

As you might know, Cloudflare has had a GraphQL API for a while now, but as part of Foundation DNS we are adding a new DNS dataset to that API that is only available with our new Foundation DNS advanced nameservers.

The new DNS dataset in our GraphQL API can be used to fetch information about the DNS queries a zone has received. This faster and more powerful alternative to our current DNS Analytics API allows you to query data from large time periods quickly and efficiently without running into limits or timeouts. The GraphQL API is more flexible with regard to which queries it accepts, and exposes more information than the DNS Analytics API.

For example, you can run this query to fetch the mean and 90th percentile processing time of your queries, grouped by source IP address, in 15 minute buckets. A query like this would be useful to see which IPs are querying your records most often for a given time range:

{
  "query": "{
    viewer {
      zones(filter: { zoneTag: $zoneTag }) {
        dnsAnalyticsAdaptiveGroups(
          filter: $filter
          limit: 10
          orderBy: [datetime_ASC]
        ) {
          avg {
            processingTimeUs
          }
          quantiles {
            processingTimeUsP90
          }
          dimensions {
            datetimeFifteenMinutes
            sourceIP
          }
        }
      }
    }
  }",
  "variables": {
    "zoneTag": "<zone-tag>",
    "filter": {
      "datetime_geq": "2024-05-01T00:00:00Z",
      "datetime_leq": "2024-06-01T00:00:00Z"
    }
  }
}

Previously, a query like this wouldn’t have been possible for several reasons. The first is that we have added new fields like sourceIP, which allows us to filter data based on which client IP addresses (usually resolvers) are making DNS queries. The second is that the GraphQL API query is able to process and return data from much larger time ranges. A DNS zone with sufficiently large amounts of queries was previously only able to query across a few days of traffic, while the new GraphQL API can provide data for a period of up to 31 days. We are planning further enhancements to that range, as well as how far back historical data can be stored and queried.

The GraphQL API also allows us to add a new DNS analytics section to the Cloudflare dashboard. Customers will be able to track the most queried records, see which data centers are answering those queries, see how many queries are being made, and much more.

The new DNS dataset in our GraphQL API and the new DNS analytics page work together to help our DNS customers to monitor, analyze, and troubleshoot their Foundation DNS deployments.

New release process

Cloudflare’s Authoritative DNS product receives software updates roughly once a week. Cloudflare has a sophisticated release process that helps prevent regressions from affecting production traffic. While uncommon, it’s possible to have issues only surface once the new release is subject to the volume and uniqueness of production traffic.

Because our enterprise customers desire stability even more than new features, new releases will be subject to a two-week soak time with our standard nameservers before our Foundation DNS advanced nameservers are upgraded. After two weeks with no issues, the Foundation DNS advanced nameservers will be upgraded as well.

Zones using Foundation DNS advanced nameservers will see increased reliability as they are better protected against regressions in new software releases.

Simpler DNS pricing

Historically, Cloudflare has charged for Authoritative DNS based on monthly DNS queries and the number of domains in the account. Our enterprise DNS customers are often interested in DNS-only zones, which are DNS zones hosted in Cloudflare that do not use our reverse proxy (layer 7) services such as our CDN, WAF, or Bot Management. With Foundation DNS, we’re making pricing simpler for the vast majority of those customers by including 10,000 DNS only domains by default. This change means most customers will only pay for the number of DNS queries they consume.

We’re also including 1 million DNS records across all domains in an account. But that doesn’t mean we can’t support more. In fact, the biggest single zone on our platform has over 3.9 million records, while our largest DNS account is just shy of 30 million DNS records spread across multiple zones. With Cloudflare DNS, there is no trouble handling even the largest deployments.

There is more to come

We are just getting started. In the future, we will add more exclusive features to Foundation DNS. One example is a highly requested feature: per-record scoped API tokens and user permissions. This will allow you to configure permissions on an even more granular level. For example, you could specify that a particular member of your account is only allowed to create and manage records of the type TXT and MX, so they don’t accidentally delete or edit address records impacting web traffic to your domain. Another example would be to specify permissions based on subdomain to further restrict the scope of specific users.

If you’re an existing enterprise customer and want to use Foundation DNS, get in touch with your account team to provision Foundation DNS on your account.

Advanced DNS Protection: mitigating sophisticated DNS DDoS attacks

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/advanced-dns-protection


We’re proud to introduce the Advanced DNS Protection system, a robust defense mechanism designed to protect against the most sophisticated DNS-based DDoS attacks. This system is engineered to provide top-tier security, ensuring your digital infrastructure remains resilient in the face of evolving threats.

Our existing systems have been successfully detecting and mitigating ‘simpler’ DDoS attacks against DNS, but they’ve struggled with the more complex ones. The Advanced DNS Protection system is able to bridge that gap by leveraging new techniques that we will showcase in this blog post.

Advanced DNS Protection is currently in beta and available for all Magic Transit customers at no additional cost. Read on to learn more about DNS DDoS attacks, how the new system works, and what new functionality is expected down the road.

Register your interest to learn more about how we can help keep your DNS servers protected, available, and performant.

A third of all DDoS attacks target DNS servers

Distributed Denial of Service (DDoS) attacks are a type of cyber attack that aim to disrupt and take down websites and other online services. When DDoS attacks succeed and websites are taken offline, it can lead to significant revenue loss and damage to reputation.

Distribution of DDoS attack types for 2023

One common way to disrupt and take down a website is to flood its servers with more traffic than it can handle. This is known as an HTTP flood attack. It is a type of DDoS attack that targets the website directly with a lot of HTTP requests. According to our last DDoS trends report, in 2023 our systems automatically mitigated 5.2 million HTTP DDoS attacks — accounting for 37% of all DDoS attacks.

Diagram of an HTTP flood attack

However, there is another way to take down websites: by targeting them indirectly. Instead of flooding the website servers, the threat actor floods the DNS servers. If the DNS servers are overwhelmed with more queries than their capacity, hostname to IP address translation fails and the website experiences an indirectly inflicted outage because the DNS server cannot respond to legitimate queries.

One notable example is the attack that targeted Dyn, a DNS provider, in October 2016. It was a devastating DDoS attack launched by the infamous Mirai botnet. It caused disruptions for major sites like Airbnb, Netflix, and Amazon, and it took Dyn an entire day to restore services. That’s a long time for service disruptions that can lead to significant reputation and revenue impact.

Over seven years later, Mirai attacks and DNS attacks are still incredibly common. In 2023, DNS attacks were the second most common attack type — with a 33% share of all DDoS attacks (4.6 million attacks). Attacks launched by Mirai-variant botnets were the fifth most common type of network-layer DDoS attack, accounting for 3% of all network-layer DDoS attacks.

Diagram of a DNS query flood attack

What are sophisticated DNS-based DDoS attacks?

DNS-based DDoS attacks can be easier to mitigate when there is a recurring pattern in each query. This is what’s called the “attack fingerprint”. Fingerprint-based mitigation systems can identify those patterns and then deploy a mitigation rule that surgically filters the attack traffic without impacting legitimate traffic.

For example, let’s take a scenario where an attacker sends a flood of DNS queries to their target. In this example, the attacker only randomized the source IP address. All other query fields remained consistent. The mitigation system detected the pattern (source port is 1024 and the queried domain is example.com) and will generate an ephemeral mitigation rule to filter those queries.

A simplified diagram of the attack fingerprinting concept

However, there are DNS-based DDoS attacks that are much more sophisticated and randomized, lacking an apparent attack pattern. Without a consistent pattern to lock on to, it becomes virtually impossible to mitigate the attack using a fingerprint-based mitigation system. Moreover, even if an attack pattern is detected in a highly randomized attack, the pattern would probably be so generic that it would mistakenly mitigate legitimate user traffic and/or not catch the entire attack.

In this example, the attacker also randomized the queried domain in their DNS query flood attack. Simultaneously, a legitimate client (or server) is also querying example.com. They were assigned a random port number which happened to be 1024. The mitigation system detected a pattern (source port is 1024 and the queried domain is example.com) that caught only the part of the attack that matched the fingerprint. The mitigation system missed the part of the attack that queried other hostnames. Lastly, the mitigation system mistakenly caught legitimate traffic that happened to appear similar to the attack traffic.

A simplified diagram of a randomized DNS flood attack

This is just one very simple example of how fingerprinting can fail in stopping randomized DDoS attacks. This challenge is amplified when attackers “launder” their attack traffic through reputable public DNS resolvers (a DNS resolver, also known as a recursive DNS server, is a type of DNS server that is responsible for tracking down the IP address of a website from various other DNS servers). This is known as a DNS laundering attack.

Diagram of the DNS resolution process

During a DNS laundering attack, the attacker queries subdomains of a real domain that is managed by the victim’s authoritative DNS server. The prefix that defines the subdomain is randomized and is never used more than once. Due to the randomization element, recursive DNS servers will never have a cached response and will need to forward the query to the victim’s authoritative DNS server. The authoritative DNS server is then bombarded by so many queries until it cannot serve legitimate queries or even crashes altogether.

Diagram of a DNS Laundering attack

The complexity of sophisticated DNS DDoS attacks lies in their paradoxical nature: while they are relatively easy to detect, effectively mitigating them is significantly more difficult. This difficulty stems from the fact that authoritative DNS servers cannot simply block queries from recursive DNS servers, as these servers also make legitimate requests. Moreover, the authoritative DNS server is unable to filter queries aimed at the targeted domain because it is a genuine domain that needs to remain accessible.

Mitigating sophisticated DNS-based DDoS attacks with the Advanced DNS Protection system

The rise in these types of sophisticated DNS-based DDoS attacks motivated us to develop a new solution — a solution that would better protect our customers and bridge the gap of more traditional fingerprinting approaches. This solution came to be the Advanced DNS Protection system. Similar to the Advanced TCP Protection system, it is a software-defined system that we built, and it is powered by our stateful mitigation platform, flowtrackd (flow tracking daemon).

The Advanced DNS Protection system complements our existing suite of DDoS defense systems. Following the same approach as our other DDoS defense systems, the Advanced DNS Protection system is also a distributed system, and an instance of it runs on every Cloudflare server around the world. Once the system has been initiated, each instance can detect and mitigate attacks autonomously without requiring any centralized regulation. Detection and mitigation is instantaneous (zero seconds). Each instance also communicates with other instances on other servers in a data center. They gossip and share threat intelligence to deliver a comprehensive mitigation within each data center.

Screenshots from the Cloudflare dashboard showcasing a DNS-based DDoS attack that was mitigated by the Advanced DNS Protection system 

Together, our fingerprinting-based systems (the DDoS protection managed rulesets) and our stateful mitigation systems provide a robust multi-layered defense strategy to defend against the most sophisticated and randomized DNS-based DDoS attacks. The system is also customizable, allowing Cloudflare customers to tailor it for their needs. Review our documentation for more information on configuration options.

Diagram of Cloudflare’s DDoS protection systems

We’ve also added new DNS-centric data points to help customers better understand their DNS traffic patterns and attacks. These new data points are available in a new “DNS Protection” tab within the Cloudflare Network Analytics dashboard. The new tab provides insights about which DNS queries are passed and dropped, as well as the characteristics of those queries, including the queried domain name and the record type. The analytics can also be fetched by using the Cloudflare GraphQL API and by exporting logs into your own monitoring dashboards via Logpush.

DNS queries: discerning good from bad

To protect against sophisticated and highly randomized DNS-based DDoS attacks, we needed to get better at deciding which DNS queries are likely to be legitimate for our customers. However, it’s not easy to infer what’s legitimate and what’s likely to be a part of an attack just based on the query name. We can’t rely solely on fingerprint-based detection mechanisms, since sometimes seemingly random queries, like abc123.example.com, can be legitimate. The opposite is true as well: a query for mailserver.example.com might look legitimate, but can end up not being a real subdomain for a customer.

To make matters worse, our Layer 3 packet routing-based mitigation service, Magic Transit, uses direct server return (DSR), meaning we can not see the DNS origin server’s responses to give us feedback about which queries are ultimately legitimate.

Diagram of Magic Transit with Direct Server Return (DSR)

We decided that the best way to combat these attacks is to build a data model of each customer’s expected DNS queries, based on a historical record that we build. With this model in hand, we can decide with higher confidence which queries are likely to be legitimate, and drop the ones that we think are not, shielding our customer’s DNS servers.

This is the basis of Advanced DNS Protection. It inspects every DNS query sent to our Magic Transit customers, and passes or drops them based on the data model and each customer’s individual settings.

To do so, each server at our global network continually sends certain DNS-related data such as query type (for example, A record) and the queried domains (but not the source of the query) to our core data centers, where we periodically compute DNS query traffic profiles for each customer. Those profiles are distributed across our global network, where they are consulted to help us more confidently and accurately decide which queries are good and which are bad. We drop the bad queries and pass on the good ones, taking into account a customer’s tolerance for unexpected DNS queries based on their configurations.

Solving the technical challenges that emerged when designing the Advanced DNS Protection system

In building this system, we faced several specific technical challenges:

Data processing

We process tens of millions of DNS queries per day across our global network for our Magic Transit customers, not counting Cloudflare’s suite of other DNS products, and use the DNS-related data mentioned above to build custom query traffic profiles. Analyzing this type of data requires careful treatment of our data pipelines. When building these traffic profiles, we use sample-on-write and adaptive bitrate technologies when writing and reading the necessary data, respectively, to ensure that we capture the data with a fine granularity while protecting our data infrastructure, and we drop information that might impact the privacy of end users.

Compact representation of query data

Some of our customers see tens of millions of DNS queries per day alone. This amount of data would be prohibitively expensive to store and distribute in an uncompressed format. To solve this challenge, we decided to use a counting Bloom filter for each customer’s traffic profile. This is a probabilistic data structure that allows us to succinctly store and distribute each customer’s DNS profile, and then efficiently query it at packet processing time.

Data distribution

We periodically need to recompute and redistribute every customer’s DNS traffic profile between our data centers to each server in our fleet. We used our very own R2 storage service to greatly simplify this task. With regional hints and custom domains enabled, we enabled caching and used only a handful of R2 buckets. Each time we need to update the global view of the customer data models across our edge fleet, 98% of the bits transferred are served from cache.

Built-in tolerance

When new domain names are put into service, our data models will not immediately be aware of them because queries with these names have never been seen before. This and other reasons for potential false positives mandate that we need to build a certain amount of tolerance into the system to allow through potentially legitimate queries. We do so by leveraging token bucket algorithms. Customers can configure the size of the token buckets by changing the sensitivity levels of the Advanced DNS Protection system. The lower the sensitivity, the larger the token bucket — and vice versa. A larger token bucket provides more tolerance for unexpected DNS queries and expected DNS queries that deviate from the profile. A high sensitivity level translates to a smaller token bucket and a stricter approach.

Leveraging Cloudflare’s global software-defined network

At the end of the day, these are the types of challenges that Cloudflare is excellent at solving. Our customers trust us with handling their traffic, and ensuring their Internet properties are protected, available and performant. We take that trust extremely seriously.

The Advanced DNS Protection system leverages our global infrastructure and data processing capabilities alongside intelligent algorithms and data structures to protect our customers.

If you are not yet a Cloudflare customer, let us know if you’d like to protect your DNS servers. Existing Cloudflare customers can enable the new systems by contacting their account team or Cloudflare Support.

Remediating new DNSSEC resource exhaustion vulnerabilities

Post Syndicated from Vicky Shrestha original https://blog.cloudflare.com/remediating-new-dnssec-resource-exhaustion-vulnerabilities


Cloudflare has been part of a multivendor, industry-wide effort to mitigate two critical DNSSEC vulnerabilities. These vulnerabilities exposed significant risks to critical infrastructures that provide DNS resolution services. Cloudflare provides DNS resolution for anyone to use for free with our public resolver 1.1.1.1 service. Mitigations for Cloudflare’s public resolver 1.1.1.1 service were applied before these vulnerabilities were disclosed publicly. Internal resolvers using unbound (open source software) were upgraded promptly after a new software version fixing these vulnerabilities was released.

All Cloudflare DNS infrastructure was protected from both of these vulnerabilities before they were disclosed and is safe today. These vulnerabilities do not affect our Authoritative DNS or DNS firewall products.

All major DNS software vendors have released new versions of their software. All other major DNS resolver providers have also applied appropriate mitigations. Please update your DNS resolver software immediately, if you haven’t done so already.

Background

Domain name system (DNS) security extensions, commonly known as DNSSEC, are extensions to the DNS protocol that add authentication and integrity capabilities. DNSSEC uses cryptographic keys and signatures that allow DNS responses to be validated as authentic. DNSSEC protocol specifications have certain requirements that prioritize availability at the cost of increased complexity and computational cost for the validating DNS resolvers. The mitigations for the vulnerabilities discussed in this blog require local policies to be applied that relax these requirements in order to avoid exhausting the resources of validators.

The design of the DNS and DNSSEC protocols follows the Robustness principle: “be conservative in what you do, be liberal in what you accept from others”. There have been many vulnerabilities in the past that have taken advantage of protocol requirements following this principle. Malicious actors can exploit these vulnerabilities to attack DNS infrastructure, in this case by causing additional work for DNS resolvers by crafting DNSSEC responses with complex configurations. As is often the case, we find ourselves having to create a pragmatic balance between the flexibility that allows a protocol to adapt and evolve and the need to safeguard the stability and security of the services we operate.

Cloudflare’s public resolver 1.1.1.1 is a privacy-centric public resolver service. We have been using stricter validations and limits aimed at protecting our own infrastructure in addition to shielding authoritative DNS servers operated outside our network. As a result, we often receive complaints about resolution failures. Experience shows us that strict validations and limits can impact availability in some edge cases, especially when DNS domains are improperly configured. However, these strict validations and limits are necessary to improve the overall reliability and resilience of the DNS infrastructure.

The vulnerabilities and how we mitigated them are described below.

Keytrap vulnerability (CVE-2023-50387)

Introduction

A DNSSEC signed zone can contain multiple keys (DNSKEY) to sign the contents of a DNS zone and a Resource Record Set (RRSET) in a DNS response can have multiple signatures (RRSIG). Multiple keys and signatures are required to support things like key rollover, algorithm rollover, and multi-signer DNSSEC. DNSSEC protocol specifications require a validating DNS resolver to try every possible combination of keys and signatures when validating a DNS response.

During validation, a resolver looks at the key tag of every signature and tries to find the associated key that was used to sign it. A key tag is an unsigned 16-bit number calculated as a checksum over the key’s resource data (RDATA). Key tags are intended to allow efficient pairing of a signature with the key which has supposedly created it.  However, key tags are not unique, and it is possible that multiple keys can have the same key tag. A malicious actor can easily craft a DNS response with multiple keys having the same key tag together with multiple signatures, none of which might validate. A validating resolver would have to try every combination (number of keys multiplied by number of signatures) when trying to validate this response. This increases the computational cost of the validating resolver many-fold, degrading performance for all its users. This is known as the Keytrap vulnerability.

Variations of this vulnerability include using multiple signatures with one key, using one signature with multiple keys having colliding key tags, and using multiple keys with corresponding hashes added to the parent delegation signer record.

Mitigation

We have limited the maximum number of keys we will accept at a zone cut. A zone cut is where a parent zone delegates to a child zone, e.g. where the .com zone delegates cloudflare.com to Cloudflare nameservers. Even with this limit already in place and various other protections built for our platform, we realized that it would still be computationally costly to process a malicious DNS answer from an authoritative DNS server.

To address and further mitigate this vulnerability, we added a signature validations limit per RRSET and a total signature validations limit per resolution task. One resolution task might include multiple recursive queries to external authoritative DNS servers in order to answer a single DNS question. Clients queries exceeding these limits will fail to resolve and will receive a response with an Extended DNS Error (EDE) code 0. Furthermore, we added metrics which will allow us to detect attacks attempting to exploit this vulnerability.

NSEC3 iteration and closest encloser proof vulnerability (CVE-2023-50868)

Introduction

NSEC3 is an alternative approach for authenticated denial of existence. You can learn more about authenticated denial of existence here. NSEC3 uses a hash derived from DNS names instead of the DNS names directly in an attempt to prevent zone enumeration and the standard supports multiple iterations for hash calculations. However, because the full DNS name is used as input to the hash calculation, increasing hashing iterations beyond the initial doesn’t provide any additional value and is not recommended in RFC9276. This complication is further inflated while finding the closest enclosure proof. A malicious DNS response from an authoritative DNS server can set a high NSEC3 iteration count and long DNS names with multiple DNS labels to exhaust the computing resources of a validating resolver by making it do unnecessary hash computations.

Mitigation

For this vulnerability, we applied a similar mitigation technique as we did for Keytrap. We added a limit for total hash calculations per resolution task to answer a single DNS question. Similarly, clients queries exceeding this limit will fail to resolve and will receive a response with an EDE code 27. We also added metrics to track hash calculations allowing early detection of attacks attempting to exploit this vulnerability.

Timeline

Date and time in UTC

Event

2023-11-03 16:05

John Todd from Quad9 invites Cloudflare to participate in a joint task force to discuss a new DNS vulnerability. 

2023-11-07 14:30

A group of DNS vendors and service providers meet to discuss the vulnerability during IETF 118. Discussions and collaboration continues in a closed chat group hosted at DNS-OARC

2023-12-08 20:20

Cloudflare public resolver 1.1.1.1 is fully patched to mitigate Keytrap vulnerability (CVE-2023-50387)

2024-01-17 22:39

Cloudflare public resolver 1.1.1.1 is fully patched to mitigate NSEC3 iteration count and closest encloser vulnerability (CVE-2023-50868)

2024-02-13 13:04

Unbound package is released 

2024-02-13 23:00

Cloudflare internal CDN resolver is fully patched to mitigate both CVE-2023-50387 and CVE-2023-50868

Credits

We would like to thank Elias Heftrig, Haya Schulmann, Niklas Vogel, Michael Waidner from the German National Research Center for Applied Cybersecurity ATHENE, for discovering the Keytrap vulnerability and doing a responsible disclosure.

We would like to thank Petr Špaček from Internet Systems Consortium (ISC) for discovering the NSEC3 iteration and closest encloser proof vulnerability and doing a responsible disclosure.

We would like to thank John Todd from Quad9  and the DNS Operations Analysis and Research Center (DNS-OARC) for facilitating coordination amongst various stakeholders.

And finally, we would like to thank the DNS-OARC community members, representing various DNS vendors and service providers, who all came together and worked tirelessly to fix these vulnerabilities, working towards a common goal of making the internet resilient and secure.

DDoS threat report for 2023 Q4

Post Syndicated from Omer Yoachimik http://blog.cloudflare.com/author/omer/ original https://blog.cloudflare.com/ddos-threat-report-2023-q4


Welcome to the sixteenth edition of Cloudflare’s DDoS Threat Report. This edition covers DDoS trends and key findings for the fourth and final quarter of the year 2023, complete with a review of major trends throughout the year.

What are DDoS attacks?

DDoS attacks, or distributed denial-of-service attacks, are a type of cyber attack that aims to disrupt websites and online services for users, making them unavailable by overwhelming them with more traffic than they can handle. They are similar to car gridlocks that jam roads, preventing drivers from getting to their destination.

There are three main types of DDoS attacks that we will cover in this report. The first is an HTTP request intensive DDoS attack that aims to overwhelm HTTP servers with more requests than they can handle to cause a denial of service event. The second is an IP packet intensive DDoS attack that aims to overwhelm in-line appliances such as routers, firewalls, and servers with more packets than they can handle. The third is a bit-intensive attack that aims to saturate and clog the Internet link causing that ‘gridlock’ that we discussed. In this report, we will highlight various techniques and insights on all three types of attacks.

Previous editions of the report can be found here, and are also available on our interactive hub, Cloudflare Radar. Cloudflare Radar showcases global Internet traffic, attacks, and technology trends and insights, with drill-down and filtering capabilities for zooming in on insights of specific countries, industries, and service providers. Cloudflare Radar also offers a free API allowing academics, data sleuths, and other web enthusiasts to investigate Internet usage across the globe.

To learn how we prepare this report, refer to our Methodologies.

Key findings

  1. In Q4, we observed a 117% year-over-year increase in network-layer DDoS attacks, and overall increased DDoS activity targeting retail, shipment and public relations websites during and around Black Friday and the holiday season.
  2. In Q4, DDoS attack traffic targeting Taiwan registered a 3,370% growth, compared to the previous year, amidst the upcoming general election and reported tensions with China. The percentage of DDoS attack traffic targeting Israeli websites grew by 27% quarter-over-quarter, and the percentage of DDoS attack traffic targeting Palestinian websites grew by 1,126% quarter-over-quarter — as the military conflict between Israel and Hamas continues.
  3. In Q4, there was a staggering 61,839% surge in DDoS attack traffic targeting Environmental Services websites compared to the previous year, coinciding with the 28th United Nations Climate Change Conference (COP 28).

For an in-depth analysis of these key findings and additional insights that could redefine your understanding of current cybersecurity challenges, read on!

Illustration of a DDoS attack

Hyper-volumetric HTTP DDoS attacks

2023 was the year of uncharted territories. DDoS attacks reached new heights — in size and sophistication. The wider Internet community, including Cloudflare, faced a persistent and deliberately engineered campaign of thousands of hyper-volumetric DDoS attacks at never before seen rates.

These attacks were highly complex and exploited an HTTP/2 vulnerability. Cloudflare developed purpose-built technology to mitigate the vulnerability’s effect and worked with others in the industry to responsibly disclose it.

As part of this DDoS campaign, in Q3 our systems mitigated the largest attack we’ve ever seen — 201 million requests per second (rps). That’s almost 8 times larger than our previous 2022 record of 26 million rps.

Largest HTTP DDoS attacks as seen by Cloudflare, by year

Growth in network-layer DDoS attacks

After the hyper-volumetric campaign subsided, we saw an unexpected drop in HTTP DDoS attacks. Overall in 2023, our automated defenses mitigated over 5.2 million HTTP DDoS attacks consisting of over 26 trillion requests. That averages at 594 HTTP DDoS attacks and 3 billion mitigated requests every hour.

Despite these astronomical figures, the amount of HTTP DDoS attack requests actually declined by 20% compared to 2022. This decline was not just annual but was also observed in 2023 Q4 where the number of HTTP DDoS attack requests decreased by 7% YoY and 18% QoQ.

On the network-layer, we saw a completely different trend. Our automated defenses mitigated 8.7 million network-layer DDoS attacks in 2023. This represents an 85% increase compared to 2022.

In 2023 Q4, Cloudflare’s automated defenses mitigated over 80 petabytes of network-layer attacks. On average, our systems auto-mitigated 996 network-layer DDoS attacks and 27 terabytes every hour. The number of network-layer DDoS attacks in 2023 Q4 increased by 175% YoY and 25% QoQ.

HTTP and Network-layer DDoS attacks by quarter

DDoS attacks increase during and around COP 28

In the final quarter of 2023, the landscape of cyber threats witnessed a significant shift. While the Cryptocurrency sector was initially leading in terms of the volume of HTTP DDoS attack requests, a new target emerged as a primary victim. The Environmental Services industry experienced an unprecedented surge in HTTP DDoS attacks, with these attacks constituting half of all its HTTP traffic. This marked a staggering 618-fold increase compared to the previous year, highlighting a disturbing trend in the cyber threat landscape.

This surge in cyber attacks coincided with COP 28, which ran from November 30th to December 12th, 2023. The conference was a pivotal event, signaling what many considered the ‘beginning of the end’ for the fossil fuel era. It was observed that in the period leading up to COP 28, there was a noticeable spike in HTTP attacks targeting Environmental Services websites. This pattern wasn’t isolated to this event alone.

Looking back at historical data, particularly during COP 26 and COP 27, as well as other UN environment-related resolutions or announcements, a similar pattern emerges. Each of these events was accompanied by a corresponding increase in cyber attacks aimed at Environmental Services websites.

In February and March 2023, significant environmental events like the UN’s resolution on climate justice and the launch of United Nations Environment Programme’s Freshwater Challenge potentially heightened the profile of environmental websites, possibly correlating with an increase in attacks on these sites​​​​.

This recurring pattern underscores the growing intersection between environmental issues and cyber security, a nexus that is increasingly becoming a focal point for attackers in the digital age.

DDoS attacks and Iron Swords

It’s not just UN resolutions that trigger DDoS attacks. Cyber attacks, and particularly DDoS attacks, have long been a tool of war and disruption. We witnessed an increase in DDoS attack activity in the Ukraine-Russia war, and now we’re also witnessing it in the Israel-Hamas war. We first reported the cyber activity in our report Cyber attacks in the Israel-Hamas war, and we continued to monitor the activity throughout Q4.

Operation “Iron Swords” is the military offensive launched by Israel against Hamas following the Hamas-led 7 October attack. During this ongoing armed conflict, we continue to see DDoS attacks targeting both sides.

DDoS attacks targeting Israeli and Palestinian websites, by industry

Relative to each region’s traffic, the Palestinian territories was the second most attacked region by HTTP DDoS attacks in Q4. Over 10% of all HTTP requests towards Palestinian websites were DDoS attacks, a total of 1.3 billion DDoS requests — representing a 1,126% increase in QoQ. 90% of these DDoS attacks targeted Palestinian Banking websites. Another 8% targeted Information Technology and Internet platforms.

Top attacked Palestinian industries

Similarly, our systems automatically mitigated over 2.2 billion HTTP DDoS requests targeting Israeli websites. While 2.2 billion represents a decrease compared to the previous quarter and year, it did amount to a larger percentage out of the total Israel-bound traffic. This normalized figure represents a 27% increase QoQ but a 92% decrease YoY. Notwithstanding the larger amount of attack traffic, Israel was the 77th most attacked region relative to its own traffic. It was also the 33rd most attacked by total volume of attacks, whereas the Palestinian territories was 42nd.

Of those Israeli websites attacked, Newspaper & Media were the main target — receiving almost 40% of all Israel-bound HTTP DDoS attacks. The second most attacked industry was the Computer Software industry. The Banking, Financial Institutions, and Insurance (BFSI) industry came in third.

Top attacked Israeli industries

On the network layer, we see the same trend. Palestinian networks were targeted by 470 terabytes of attack traffic — accounting for over 68% of all traffic towards Palestinian networks. Surpassed only by China, this figure placed the Palestinian territories as the second most attacked region in the world, by network-layer DDoS attack, relative to all Palestinian territories-bound traffic. By absolute volume of traffic, it came in third. Those 470 terabytes accounted for approximately 1% of all DDoS traffic that Cloudflare mitigated.

Israeli networks, though, were targeted by only 2.4 terabytes of attack traffic, placing it as the 8th most attacked country by network-layer DDoS attacks (normalized). Those 2.4 terabytes accounted for almost 10% of all traffic towards Israeli networks.

Top attacked countries

When we turned the picture around, we saw that 3% of all bytes that were ingested in our Israeli-based data centers were network-layer DDoS attacks. In our Palestinian-based data centers, that figure was significantly higher — approximately 17% of all bytes.

On the application layer, we saw that 4% of HTTP requests originating from Palestinian IP addresses were DDoS attacks, and almost 2% of HTTP requests originating from Israeli IP addresses were DDoS attacks as well.

Main sources of DDoS attacks

In the third quarter of 2022, China was the largest source of HTTP DDoS attack traffic. However, since the fourth quarter of 2022, the US took the first place as the largest source of HTTP DDoS attacks and has maintained that undesirable position for five consecutive quarters. Similarly, our data centers in the US are the ones ingesting the most network-layer DDoS attack traffic — over 38% of all attack bytes.

HTTP DDoS attacks originating from China and the US by quarter

Together, China and the US account for a little over a quarter of all HTTP DDoS attack traffic in the world. Brazil, Germany, Indonesia, and Argentina account for the next twenty-five percent.

Top source of HTTP DDoS attacks

These large figures usually correspond to large markets. For this reason, we also normalize the attack traffic originating from each country by comparing their outbound traffic. When we do this, we often get small island nations or smaller market countries that a disproportionate amount of attack traffic originates from. In Q4, 40% of Saint Helena’s outbound traffic were HTTP DDoS attacks — placing it at the top. Following the ‘remote volcanic tropical island’, Libya came in second, Swaziland (also known as Eswatini) in third. Argentina and Egypt follow in fourth and fifth place.

Top source of HTTP DDoS attacks with respect to each country’s traffic

On the network layer, Zimbabwe came in first place. Almost 80% of all traffic we ingested in our Zimbabwe-based data center was malicious. In second place, Paraguay, and Madagascar in third.

Top source of Network-layer DDoS attacks with respect to each country’s traffic

Most attacked industries

By volume of attack traffic, Cryptocurrency was the most attacked industry in Q4. Over 330 billion HTTP requests targeted it. This figure accounts for over 4% of all HTTP DDoS traffic for the quarter. The second most attacked industry was Gaming & Gambling. These industries are known for being coveted targets and attract a lot of traffic and attacks.

Top industries targeted by HTTP DDoS attacks

On the network layer, the Information Technology and Internet industry was the most attacked — over 45% of all network-layer DDoS attack traffic was aimed at it. Following far behind were the Banking, Financial Services and Insurance (BFSI), Gaming & Gambling, and Telecommunications industries.

Top industries targeted by Network-layer DDoS attacks

To change perspectives, here too, we normalized the attack traffic by the total traffic for a specific industry. When we do that, we get a different picture.

Top attacked industries by HTTP DDoS attacks, by region

We already mentioned in the beginning of this report that the Environmental Services industry was the most attacked relative to its own traffic. In second place was the Packaging and Freight Delivery industry, which is interesting because of its timely correlation with online shopping during Black Friday and the winter holiday season. Purchased gifts and goods need to get to their destination somehow, and it seems as though attackers tried to interfere with that. On a similar note, DDoS attacks on retail companies increased by 23% compared to the previous year.

Top industries targeted by HTTP DDoS attacks with respect to each industry’s traffic

On the network layer, Public Relations and Communications was the most targeted industry — 36% of its traffic was malicious. This too is very interesting given its timing. Public Relations and Communications companies are usually linked to managing public perception and communication. Disrupting their operations can have immediate and widespread reputational impacts which becomes even more critical during the Q4 holiday season. This quarter often sees increased PR and communication activities due to holidays, end-of-year summaries, and preparation for the new year, making it a critical operational period — one that some may want to disrupt.

Top industries targeted by Network-layer DDoS attacks with respect to each industry’s traffic

Most attacked countries and regions

Singapore was the main target of HTTP DDoS attacks in Q4. Over 317 billion HTTP requests, 4% of all global DDoS traffic, were aimed at Singaporean websites. The US followed closely in second and Canada in third. Taiwan came in as the fourth most attacked region — amidst the upcoming general elections and the tensions with China. Taiwan-bound attacks in Q4 traffic increased by 847% compared to the previous year, and 2,858% compared to the previous quarter. This increase is not limited to the absolute values. When normalized, the percentage of HTTP DDoS attack traffic targeting Taiwan relative to all Taiwan-bound traffic also significantly increased. It increased by 624% quarter-over-quarter and 3,370% year-over-year.

Top targeted countries by HTTP DDoS attacks

While China came in as the ninth most attacked country by HTTP DDoS attacks, it’s the number one most attacked country by network-layer attacks. 45% of all network-layer DDoS traffic that Cloudflare mitigated globally was China-bound. The rest of the countries were so far behind that it is almost negligible.

Top targeted countries by Network-layer DDoS attacks

When normalizing the data, Iraq, Palestinian territories, and Morocco take the lead as the most attacked regions with respect to their total inbound traffic. What’s interesting is that Singapore comes up as fourth. So not only did Singapore face the largest amount of HTTP DDoS attack traffic, but that traffic also made up a significant amount of the total Singapore-bound traffic. By contrast, the US was second most attacked by volume (per the application-layer graph above), but came in the fiftieth place with respect to the total US-bound traffic.

Top targeted countries by HTTP DDoS attacks with respect to each country’s traffic

Similar to Singapore, but arguably more dramatic, China is both the number one most attacked country by network-layer DDoS attack traffic, and also with respect to all China-bound traffic. Almost 86% of all China-bound traffic was mitigated by Cloudflare as network-layer DDoS attacks. The Palestinian territories, Brazil, Norway, and again Singapore followed with large percentages of attack traffic.

Top targeted countries by Network-layer DDoS attacks with respect to each country’s traffic

Attack vectors and attributes

The majority of DDoS attacks are short and small relative to Cloudflare’s scale. However, unprotected websites and networks can still suffer disruption from short and small attacks without proper inline automated protection — underscoring the need for organizations to be proactive in adopting a robust security posture.

In 2023 Q4, 91% of attacks ended within 10 minutes, 97% peaked below 500 megabits per second (mbps), and 88% never exceeded 50 thousand packets per second (pps).

Two out of every 100 network-layer DDoS attacks lasted more than an hour, and exceeded 1 gigabit per second (gbps). One out of every 100 attacks exceeded 1 million packets per second. Furthermore, the amount of network-layer DDoS attacks exceeding 100 million packets per second increased by 15% quarter-over-quarter.

DDoS attack stats you should know

One of those large attacks was a Mirai-botnet attack that peaked at 160 million packets per second. The packet per second rate was not the largest we’ve ever seen. The largest we’ve ever seen was 754 million packets per second. That attack occurred in 2020, and we have yet to see anything larger.

This more recent attack, though, was unique in its bits per second rate. This was the largest network-layer DDoS attack we’ve seen in Q4. It peaked at 1.9 terabits per second and originated from a Mirai botnet. It was a multi-vector attack, meaning it combined multiple attack methods. Some of those methods included UDP fragments flood, UDP/Echo flood, SYN Flood, ACK Flood, and TCP malformed flags.

This attack targeted a known European Cloud Provider and originated from over 18 thousand unique IP addresses that are assumed to be spoofed. It was automatically detected and mitigated by Cloudflare’s defenses.

This goes to show that even the largest attacks end very quickly. Previous large attacks we’ve seen ended within seconds — underlining the need for an in-line automated defense system. Though still rare, attacks in the terabit range are becoming more and more prominent.

1.9 Terabit per second Mirai DDoS attacks

The use of Mirai-variant botnets is still very common. In Q4, almost 3% of all attacks originate from Mirai. Though, of all attack methods, DNS-based attacks remain the attackers’ favorite. Together, DNS Floods and DNS Amplification attacks account for almost 53% of all attacks in Q4. SYN Flood follows in second and UDP floods in third. We’ll cover the two DNS attack types here, and you can visit the hyperlinks to learn more about UDP and SYN floods in our Learning Center.

DNS floods and amplification attacks

DNS floods and DNS amplification attacks both exploit the Domain Name System (DNS), but they operate differently. DNS is like a phone book for the Internet, translating human-friendly domain names like “www.cloudfare.com” into numerical IP addresses that computers use to identify each other on the network.

Simply put, DNS-based DDoS attacks comprise the method computers and servers used to identify one another to cause an outage or disruption, without actually ‘taking down’ a server. For example, a server may be up and running, but the DNS server is down. So clients won’t be able to connect to it and will experience it as an outage.

A DNS flood attack bombards a DNS server with an overwhelming number of DNS queries. This is usually done using a DDoS botnet. The sheer volume of queries can overwhelm the DNS server, making it difficult or impossible for it to respond to legitimate queries. This can result in the aforementioned service disruptions, delays or even an outage for those trying to access the websites or services that rely on the targeted DNS server.

On the other hand, a DNS amplification attack involves sending a small query with a spoofed IP address (the address of the victim) to a DNS server. The trick here is that the DNS response is significantly larger than the request. The server then sends this large response to the victim’s IP address. By exploiting open DNS resolvers, the attacker can amplify the volume of traffic sent to the victim, leading to a much more significant impact. This type of attack not only disrupts the victim but also can congest entire networks.

In both cases, the attacks exploit the critical role of DNS in network operations. Mitigation strategies typically include securing DNS servers against misuse, implementing rate limiting to manage traffic, and filtering DNS traffic to identify and block malicious requests.

Top attack vectors

Amongst the emerging threats we track, we recorded a 1,161% increase in ACK-RST Floods as well as a 515% increase in CLDAP floods, and a 243% increase in SPSS floods, in each case as compared to last quarter. Let’s walk through some of these attacks and how they’re meant to cause disruption.

Top emerging attack vectors

ACK-RST floods

An ACK-RST Flood exploits the Transmission Control Protocol (TCP) by sending numerous ACK and RST packets to the victim. This overwhelms the victim’s ability to process and respond to these packets, leading to service disruption. The attack is effective because each ACK or RST packet prompts a response from the victim’s system, consuming its resources. ACK-RST Floods are often difficult to filter since they mimic legitimate traffic, making detection and mitigation challenging.

CLDAP floods

CLDAP (Connectionless Lightweight Directory Access Protocol) is a variant of LDAP (Lightweight Directory Access Protocol). It’s used for querying and modifying directory services running over IP networks. CLDAP is connectionless, using UDP instead of TCP, making it faster but less reliable. Because it uses UDP, there’s no handshake requirement which allows attackers to spoof the IP address thus allowing attackers to exploit it as a reflection vector. In these attacks, small queries are sent with a spoofed source IP address (the victim’s IP), causing servers to send large responses to the victim, overwhelming it. Mitigation involves filtering and monitoring unusual CLDAP traffic.

SPSS floods

Floods abusing the SPSS (Source Port Service Sweep) protocol is a network attack method that involves sending packets from numerous random or spoofed source ports to various destination ports on a targeted system or network. The aim of this attack is two-fold: first, to overwhelm the victim’s processing capabilities, causing service disruptions or network outages, and second, it can be used to scan for open ports and identify vulnerable services. The flood is achieved by sending a large volume of packets, which can saturate the victim’s network resources and exhaust the capacities of its firewalls and intrusion detection systems. To mitigate such attacks, it’s essential to leverage in-line automated detection capabilities.

Cloudflare is here to help – no matter the attack type, size, or duration

Cloudflare’s mission is to help build a better Internet, and we believe that a better Internet is one that is secure, performant, and available to all. No matter the attack type, the attack size, the attack duration or the motivation behind the attack, Cloudflare’s defenses stand strong. Since we pioneered unmetered DDoS Protection in 2017, we’ve made and kept our commitment to make enterprise-grade DDoS protection free for all organizations alike — and of course, without compromising performance. This is made possible by our unique technology and robust network architecture.

It’s important to remember that security is a process, not a single product or flip of a switch. Atop of our automated DDoS protection systems, we offer comprehensive bundled features such as firewall, bot detection, API protection, and caching to bolster your defenses. Our multi-layered approach optimizes your security posture and minimizes potential impact. We’ve also put together a list of recommendations to help you optimize your defenses against DDoS attacks, and you can follow our step-by-step wizards to secure your applications and prevent DDoS attacks. And, if you’d like to benefit from our easy to use, best-in-class protection against DDoS and other attacks on the Internet, you can sign up — for free! — at cloudflare.com. If you’re under attack, register or call the cyber emergency hotline number shown here for a rapid response.

Using DNS to estimate the worldwide state of IPv6 adoption

Post Syndicated from Carlos Rodrigues http://blog.cloudflare.com/author/carlos-rodrigues/ original https://blog.cloudflare.com/ipv6-from-dns-pov


In order for one device to talk to other devices on the Internet using the aptly named Internet Protocol (IP), it must first be assigned a unique numerical address. What this address looks like depends on the version of IP being used: IPv4 or IPv6.

IPv4 was first deployed in 1983. It’s the IP version that gave birth to the modern Internet and still remains dominant today. IPv6 can be traced back to as early as 1998, but only in the last decade did it start to gain significant traction — rising from less than 1% to somewhere between 30 and 40%, depending on who’s reporting and what and how they’re measuring.

With the growth in connected devices far exceeding the number of IPv4 addresses available, and its costs rising, the much larger address space provided by IPv6 should have made it the dominant protocol by now. However, as we’ll see, this is not the case.

Cloudflare has been a strong advocate of IPv6 for many years and, through Cloudflare Radar, we’ve been closely following IPv6 adoption across the Internet. At three years old, Radar is still a relatively recent platform. To go further back in time, we can briefly turn to our friends at APNIC1 — one of the five Regional Internet Registries (RIRs). Through their data, going back to 2012, we can see that IPv6 experienced a period of seemingly exponential growth until mid-2017, after which it entered a period of linear growth that’s still ongoing:

IPv6 adoption is slowed down by the lack of compatibility between both protocols — devices must be assigned both an IPv4 and an IPv6 address — along with the fact that virtually all devices on the Internet still support IPv4. Nevertheless, IPv6 is critical for the future of the Internet, and continued effort is required to increase its deployment.

Cloudflare Radar, like APNIC and most other sources today, publishes numbers that primarily reflect the extent to which Internet Service Providers (ISPs) have deployed IPv6: the client side. It’s a very important angle, and one that directly impacts end users, but there’s also the other end of the equation: the server side.

With this in mind, we invite you to follow us on a quick experiment where we aim for a glimpse of server side IPv6 adoption, and how often clients are actually (or likely) able to talk to servers over IPv6. We’ll rely on DNS for this exploration and, as they say, the results may surprise you.

IPv6 Adoption on the Client Side (from HTTP)

By the end of October 2023, from Cloudflare’s perspective, IPv6 adoption across the Internet was at roughly 36% of all traffic, with slight variations depending on the time of day and day of week. When excluding bots the estimate goes up to just over 46%, while excluding humans pushes it down close to 24%. These numbers refer to the share of HTTP requests served over IPv6 across all IPv6-enabled content (the default setting).

For this exercise, what matters most is the number for both humans and bots. There are many reasons for the adoption gap between both kinds of traffic — from varying levels of IPv6 support in the plethora of client software used, to varying levels of IPv6 deployment inside the many networks where traffic comes from, to the varying size of such networks, etc. — but that’s a rabbit hole for another day. If you’re curious about the numbers for a particular country or network, you can find them on Cloudflare Radar and in our Year in Review report for 2023.

It Takes Two to Dance

You, the reader, might point out that measuring the client side of the client-server equation only tells half the story: for an IPv6-capable client to establish a connection with a server via IPv6, the server must also be IPv6-capable.

This raises two questions:

  1. What’s the extent of IPv6 adoption on the server side?
  2. How well does IPv6 adoption on the client side align with adoption on the server side?

There are several possible answers, depending on whether we’re talking about users, devices, bytes transferred, and so on. We’ll focus on connections (it will become clear why in a moment), and the combined question we’re asking is:

How often can an IPv6-capable client use IPv6 when connecting to servers on the Internet, under typical usage patterns?

Typical usage patterns include people going about their day visiting some websites more often than others or automated clients calling APIs. We’ll turn to DNS to get this perspective.

Enter DNS

Before a client can attempt to establish a connection with a server by name, using either the classic IPv4 protocol or the more modern IPv6, it must look up the server’s IP address in the phonebook of the Internet, the Domain Name System (DNS).

Looking up a hostname in DNS is a recursive process. To find the IP address of a server, the domain hierarchy (the dot-separated components of a server’s name) must be followed across several DNS authoritative servers until one of them returns the desired response2. Most clients, however, don’t do this directly and instead ask an intermediary server called a recursive resolver to do it for them. Cloudflare operates one such recursive resolver that anyone can use: 1.1.1.1.

As a simplified example, when a client asks 1.1.1.1 for the IP address where “www.example.com” lives, 1.1.1.1 will go out and ask the DNS root servers3 about “.com”, then ask the .com DNS servers about “example.com”, and finally ask the example.com DNS servers about “www.example.com”, which has direct knowledge of it and answers with an IP address. To make things faster for the next client asking a similar question, 1.1.1.1 caches (remembers for a while) both the final answer and the steps in between.

This means 1.1.1.1 is in a very good position to count how often clients try to look up IPv4 addresses (A-type queries) vs. how often they try to look up IPv6 addresses (AAAA-type queries), covering most of the observable Internet.

But how does a client know when to ask for a server’s IPv4 address or its IPv6 address?

The short answer is that clients with IPv6 available to them just ask for both — doing separate A and AAAA lookups for every server they wish to connect to. These IPv6-capable clients will prioritize connecting over IPv6 when they get a non-empty AAAA answer, whether they also get a non-empty A answer (which they almost always get, as we’ll see). The algorithm driving this preference for modernity is called Happy Eyeballs, if you’re interested in the details.

We’re now ready to start looking at some actual data…

IPv6 Adoption on the Client Side (from DNS)

The first step is establishing a baseline by measuring client IPv6 deployment from 1.1.1.1’s perspective and comparing it with the numbers from HTTP requests that we started with.

It’s tempting to count how often clients connect to 1.1.1.1 using IPv6, but the results are misleading for a couple of reasons, the strongest one being hidden in plain sight: 1.1.1.1 is the most memorable address of the set of IPv4 and IPv6 addresses that clients can use to perform DNS lookups through the 1.1.1.1 service. Ideally, IPv6-capable clients using 1.1.1.1 as their recursive resolver should have all four of the following IP addresses configured, not just the first two:

  • 1.1.1.1 (IPv4)
  • 1.0.0.1 (IPv4)
  • 2606:4700:4700::1111 (IPv6)
  • 2606:4700:4700::1001 (IPv6)

But, when manual configuration is involved4, humans find IPv6 addresses less memorable than IPv4 addresses and are less likely to configure them, viewing the IPv4 addresses as enough.

A related, but less obvious, confounding factor is that many IPv6-capable clients will still perform DNS lookups over IPv4 even when they have 1.1.1.1’s IPv6 addresses configured, as spreading lookups over the available addresses is a popular default option.

A more sensible approach to assess IPv6 adoption from DNS client activity is calculating the percentage of AAAA-type queries over the total amount of A-type queries, assuming IPv6-clients always perform both5, as mentioned earlier.

Then, from 1.1.1.1’s perspective, IPv6 adoption on the client side is estimated at 30.5% by query volume. This is a bit under what we observed from HTTP traffic over the same time period (35.9%) but such a difference between two different perspectives is not unexpected.

A Note on TTLs

It’s not only recursive resolvers that cache DNS responses, most DNS clients have their own local caches as well. Your web browser, operating system, and even your home router, keep answers around hoping to speed up subsequent queries.

How long each answer remains in cache depends on the time-to-live (TTL) field sent back with DNS records. If you’re familiar with DNS, you might be wondering if A and AAAA records have similar TTLs. If they don’t, we may be getting fewer queries for just one of these two types (because it gets cached for longer at the client level), biasing the resulting adoption figures.

The pie charts here break down the minimum TTLs sent back by 1.1.1.1 in response to A and AAAA queries6. There is some difference between both types, but the difference is very small.

IPv6 Adoption on the Server Side

The following graph shows how often A and AAAA-type queries get non-empty responses, shedding light on server side IPv6 adoption and getting us closer to the answer we’re after:

IPv6 adoption by servers is estimated at 43.3% by query volume, noticeably higher than what was observed for clients.

How Often Both Sides Swipe Right

If 30.5% of IP address lookups handled by 1.1.1.1 could make use of an IPv6 address to connect to their intended destinations, but only 43.3% of those lookups get a non-empty response, that can give us a pretty good basis for how often IPv6 connections are made between client and server — roughly 13.2% of the time.

The Potential Impact of Popular Domains

IPv6 server side adoption measured by query volume for the domains in Radar’s Top 100 list is 60.8%. If we exclude these domains from our overall calculations, the previous 13.2% figure drops to 8%. This is a significant difference, but not unexpected, as the Top 100 domains make up over 55% of all A and AAAA queries to 1.1.1.1.

If just a few more of these highly popular domains were to deploy IPv6 today, observed adoption would noticeably increase and, with it, the chance of IPv6-capable clients establishing connections using IPv6.

Closing Thoughts

Observing the extent of IPv6 adoption across the Internet can mean different things:

  • Counting users with IPv6-capable Internet access;
  • Counting IPv6-capable devices or software on those devices (clients and/or servers);
  • Calculating the amount of traffic flowing through IPv6 connections, measured in bytes;
  • Counting the fraction of connections (or individual requests) over IPv6.

In this exercise we chose to look at connections and requests. Keeping in mind that the underlying reality can only be truly understood by considering several different perspectives, we saw three different IPv6 adoption figures:

  • 35.9% (client side) when counting HTTP requests served from Cloudflare’s CDN;
  • 30.5% (client side) when counting A and AAAA-type DNS queries handled by 1.1.1.1;
  • 43.3% (server side) of positive responses to AAAA-type DNS queries, also from 1.1.1.1.

We combined the client side and server side figures from the DNS perspective to estimate how often connections to third-party servers are likely to be established over IPv6 rather than IPv4: just 13.2% of the time.

To improve on these numbers, ISPs, cloud and hosting providers, and corporations alike must increase the rate at which they make IPv6 available for devices in their networks. But large websites and content sources also have a critical role to play in enabling IPv6-capable clients to use IPv6 more often, as 39.2% of queries for domains in the Radar Top 100 (representing over half of all A and AAAA queries to 1.1.1.1) are still limited to IPv4-only responses.

On the road to full IPv6 adoption, the Internet isn’t quite there yet. But continued effort from all those involved can help it to continue to move forward, and perhaps even accelerate progress.

On the server side, Cloudflare has been helping with this worldwide effort for many years by providing free IPv6 support for all domains. On the client side, the 1.1.1.1 app automatically enables your device for IPv6 even if your ISP doesn’t provide any IPv6 support. And, if you happen to manage an IPv6-only network, 1.1.1.1’s DNS64 support also has you covered.

***
1Cloudflare’s public DNS resolver (1.1.1.1) is operated in partnership with APNIC. You can read more about it in the original announcement blog post and in 1.1.1.1’s privacy policy.
2There’s more information on how DNS works in the “What is DNS?” section of our website. If you’re a hands-on learner, we suggest you take a look at Julia Evans’ “mess with dns”.
3Any recursive resolver will already know the IP addresses of the 13 root servers beforehand. Cloudflare also participates at the topmost level of DNS by providing anycast service to the E and F-Root instances, which means 1.1.1.1 doesn’t need to go far for that first lookup step.
4When using the 1.1.1.1 app, all four IP addresses are configured automatically.
5For simplification, we assume the amount of IPv6-only clients is still negligibly small. It’s a reasonable assumption in general, and other datasets available to us confirm it.
61.1.1.1, like other recursive resolvers, returns adjusted TTLs: the record’s original TTL minus the number of seconds since the record was last cached. Empty A and AAAA answers get cached for the amount of time defined in the domain’s Start of Authority (SOA) record, as specified by RFC 2308.

Amazon SES: Email Authentication and Getting Value out of Your DMARC Policy

Post Syndicated from Bruno Giorgini original https://aws.amazon.com/blogs/messaging-and-targeting/email-authenctication-dmarc-policy/

Amazon SES: Email Authentication and Getting Value out of Your DMARC Policy

Introduction

For enterprises of all sizes, email is a critical piece of infrastructure that supports large volumes of communication. To enhance the security and trustworthiness of email communication, many organizations turn to email sending providers (ESPs) like Amazon Simple Email Service (Amazon SES). These ESPs allow users to send authenticated emails from their domains, employing industry-standard protocols such as the Sender Policy Framework (SPF) and DomainKeys Identified Mail (DKIM). Messages authenticated with SPF or DKIM will successfully pass your domain’s Domain-based Message Authentication, Reporting, and Conformance (DMARC) policy. This blog post will focus on the DMARC policy enforcement mechanism. The blog will explore some of the reasons why email may fail DMARC policy evaluation and propose solutions to fix any failures that you identify. For an introduction to DMARC and how to carefully choose your email sending domain identity, you can refer to Choosing the Right Domain for Optimal Deliverability with Amazon SES The relationship between DMARC compliance and email deliverability rates is crucial for organizations aiming to maintain a positive sender reputation and ensure successful email delivery. There are many advantages when organizations have this correctly setup, these include:

  • Improved Email Deliverability
  • Reduction in Email Spoofing and Phishing
  • Positive Sender Reputation
  • Reduced Risk of Email Marked as Spam
  • Better Email Engagement Metrics
  • Enhanced Brand Reputation

With this foundation, let’s explore the intricacies of DMARC and how it can benefit your organization’s email communication.

What is DMARC?

DMARC is a mechanism for domain owners to advertise SPF and DKIM protection and to tell receivers how to act if those authentication methods fail. The domain’s DMARC policy protects your domain from third parties attempting to spoof the domain in the “From” header of emails. Malicious email messages that aim to send phishing attempts using your domain will be subject to DMARC policy evaluation, which may result in their quarantine or rejection by the email receiving organization. This stringent policy ensures that emails received by email recipients are genuinely from the claimed sending domain, thereby minimizing the risk of people falling victim to email-based scams. Domain owners publish DMARC policies as a TXT record in the domain’s _dmarc.<domain> DNS record. For example, if the domain used in the “From” header is example.com, then the domain’s DMARC policy would be located in a DNS TXT record named _dmarc.example.com. The DMARC policy can have one of three policy modes:

  • A typical DMARC deployment of an existing domain will start with publishing "p=none". A none policy means that the domain owner is in a monitoring phase; the domain owner is monitoring for messages that aren’t authenticated with SPF and DKIM and seeks to ensure all email is properly authenticated
  • When the domain owner is comfortable that all legitimate use cases are properly authenticated with SPF and/or DKIM, they may change the DMARC policy to "p=quarantine". A quarantine policy means that messages which fail to produce a domain-aligned authenticated identifier via SPF or DKIM will be quarantined by the mail receiving organization. The mail receiving organization may filter these messages into Junk folders, or take another action that they feel best protects their recipients.
  • Finally, domain owners who are confident that all of the legitimate messages using their domain are authenticated with SPF or DKIM, may change the DMARC policy to "p=reject". A reject policy means that messages which fail to produce a domain-aligned authenticated identifier via SPF or DKIM will be rejected by the mail receiving organization.

The following are examples of a TXT record that contains a DMARC policy, depending on the desired policy (the ‘p’ tag):

  Name Type Value
1 _dmarc.example.com TXT “v=DMARC1;p=reject;rua=mailto:[email protected]
2 _dmarc.example.com TXT “v=DMARC1;p=quarantine;rua=mailto:[email protected]
3 _dmarc.example.com TXT “v=DMARC1;p=none;rua=mailto:[email protected]
Table 1 – Example DMARC policy

This policy tells email providers to apply the DMARC policy to messages that fail to produce a DKIM or SPF authenticated identifier that is aligned to the domain in the “From” header. Alignment means that one or both of the following occurs:

  • The messages pass the SPF policy for the MAIL FROM domain and the MAIL FROM domain is the same as the domain in the “From” header, or a subdomain. Reference Using a custom MAIL FROM domain to learn more about how to send SPF aligned messages with SES.
  • The messages have a DKIM signature signed by a public key in DNS at a location within the domain of the “From” header. Reference Authenticating Email with DKIM in Amazon SES to learn more about how to send DKIM aligned messages with SES.

DMARC reporting

The rua tag in the domain’s DMARC policy indicates the location to which mail receiving organizations should send aggregate reports about messages that pass or fail SPF and DKIM alignment. Domain owners analyze these reports to discover messages which are using the domain in the “From” header but are not properly authenticated with SPF or DKIM. The domain owner will attempt to ensure that all legitimate messages are authenticated through analysis of the DMARC aggregate reports over time. Mail receiving organizations which support sending DMARC reports typically send these aggregated reports once per day, although these practices differ from provider to provider.

What does a typical DMARC deployment look like?

A DMARC deployment is the process of:

  1. Ensuring that all emails using the domain in the “From” header are authenticated with DKIM and SPF domain-aligned identifiers. Focus on DKIM as the primary means of authentication.
  2. Publishing a DMARC policy (none, quarantine, or reject) for the domain that reflects how the domain owner would like mail receiving organizations to handle unauthenticated email claiming to be from their domain.

New domains and subdomains

Deploying a DMARC policy is easy for organizations that have created a new domain or subdomain for the purpose of a new email sending use case on SES; for example email marketing, transaction emails, or one-time pass codes (OTP). These domains can start with the "p=reject" DMARC enforcement policy because the policy will not affect existing email sending programs. This strict enforcement is to ensure that there is no unauthenticated use of the domain and its subdomains.

Existing domains

For existing domains, a DMARC deployment is an iterative process because the domain may have a history of email sending by one or multiple email sending programs. It is important to gain a complete understanding of how the domain and its subdomains are being used for email sending before publishing a restrictive DMARC policy (p=quarantine or p=reject) because doing so would affect any unauthenticated email sending programs using the domain in the “From” header of messages. To get started with the DMARC implementation, these are a few actions to take:

  • Publish a p=none DMARC policy (sometimes referred to as monitoring mode), and set the rua tag to the location in which you would like to receive aggregate reports.
  • Analyze the aggregate reports. Mail receiving organizations will send reports which contain information to determine if the domain, and its subdomains, are being used for sending email, and how the messages are (or are not) being authenticated with a DKIM or SPF domain-aligned identifier. An easy to use analysis tool is the Dmarcian XML to Human Converter.
  • Avoid prematurely publishing a “p=quarantine” or “p=reject” policy. Doing so may result in blocked or reduced delivery of legitimate messages of existing email sending programs.

The image below illustrates how DMARC will be applied to an email received by the email receiving server and actions taken based on the enforcement policy:

DMARC flow Figure 1 – DMARC Flow

How do SPF and DKIM cause DMARC policies to pass

When you start sending emails using Amazon SES, messages that you send through Amazon SES automatically use a subdomain of amazonses.com as the default MAIL FROM domain. SPF evaluators will see that these messages pass the SPF policy evaluation because the default MAIL FROM domain has a SPF policy which includes the IP addresses of the SES infrastructure that sent the message. SPF authentication will result in an “SPF=PASS” and the authenticated identifier is the domain of the MAIL FROM address. The published SPF record applies to every message that is sent using SES regardless of whether you are using a shared or dedicated IP address. The amazonses.com SPF record lists all shared and dedicated IP addresses, so it is inclusive of all potential IP addresses that may be involved with sending email as the MAIL FROM domain. You can use ‘dig’ to look up the IP addresses that SES will use to send email:

dig txt amazonses.com | grep "v=spf1" amazonses.com. 850 IN TXT "v=spf1 ip4:199.255.192.0/22 ip4:199.127.232.0/22 ip4:54.240.0.0/18 ip4:69.169.224.0/20 ip4:23.249.208.0/20 ip4:23.251.224.0/19 ip4:76.223.176.0/20 ip4:54.240.64.0/19 ip4:54.240.96.0/19 ip4:52.82.172.0/22 ip4:76.223.128.0/19 -all"

Custom MAIL FROM domains

It is best practice for customers to configure a custom MAIL FROM domain, and not use the default amazonses.com MAIL FROM domain. The custom MAIL FROM domain will always be a subdomain of the customer’s verified domain identity. Once you configure the MAIL FROM domain, messages sent using SES will continue to result in an “SPF=PASS” as it does with the default MAIL FROM domain. Additionally, DMARC authentication will result in “DMARC=PASS” because the MAIL FROM domain and the domain in the “From” header are in alignment. It’s important to understand that customers must use a custom MAIL FROM domain if they want “SPF=PASS” to result in a “DMARC=PASS”.

For example, an Amazon SES-verified example.com domain will have the custom MAIL FROM domain “bounce.example.com”. The configured SPF record will be:

dig txt bounce.example.com | grep "v=spf1" "v=spf1 include:amazonses.com ~all"

Note: The chosen MAIL FROM domain could be any sub-domain of your choice. If you have the same domain identity configured in multiple regions, then you should create region-specific custom MAIL FROM domains for each region. e.g. bounce-us-east-1.example.com and bounce-eu-west-2.example.com so that asynchronously bounced messages are delivered directly to the region from which the messages were sent.

DKIM results in DMARC pass

For customers that establish Amazon SES Domain verification using DKIM signatures, DKIM authentication will result in a DKIM=PASS, and DMARC authentication will result in “DMARC=PASS” because the domain that publishes the DKIM signature is aligned to the domain in the “From” header (the SES domain identity).

DKIM and SPF together

Email messages are fully authenticated when the messages pass both DKIM and SPF, and both DKIM and SPF authenticated identifiers are domain-aligned. If only DKIM is domain-aligned, then the messages will still pass the DMARC policy, even if the SPF “pass” is unaligned. Mail receivers will consider the full context of SPF and DKIM when determining how they will handle the disposition of the messages you send, so it is best to fully authenticate your messages whenever possible. Amazon SES has taken care of the heavy lifting of the email authentication process away from our customers, and so, establishing SPF, DKIM and DMARC authentication has been reduced to a few clicks which allows SES customers to get started easily and scale fast.

Why is DMARC failing?

There are scenarios when you may notice that messages fail DMARC, whether your messages are fully authenticated, or partially authenticated. The following are things that you should look out for:

Email Content Modification

Sometimes email content is modified during the delivery to the recipients’ mail servers. This modification could be as a result of a security device or anti-spam agent along the delivery path (for example: the message Subject may be modified with an “[EXTERNAL]” warning to recipients). The modified message invalidates the DKIM signature which causes a DKIM failure. Remember, the purpose of DKIM is to ensure that the content of an email has not been tampered with during the delivery process. If this happens, the DKIM authentication will fail with an authentication error similar to “DKIM-signature body hash not verified“.

Solutions:

  • If you control the full path that the email message will traverse from sender to recipient, ensure that no intermediary mail servers modify the email content in transit.
  • Ensure that you configure a custom MAIL FROM domain so that the messages have a domain-aligned SPF identifier.
  • Keep the DMARC policy in monitoring mode (p=none) until these issues are identified/solved.

Email Forwarding

Email Forwarding There are multiple scenarios in which a message may be forwarded, and they may result in both/either SPF and DKIM failing to produce a domain-aligned authenticated identifier. For SPF, it means that the forwarding mail server is not listed in the MAIL FROM domain’s SPF policy. It is best practice for a forwarding mail server to avoid SPF failures and assume responsibility of mail handling for the messages it forwards by rewriting the MAIL FROM address to be in the domain controlled by the forwarding server. Forwarding servers that do not rewrite the MAIL FROM address pose a risk of impersonation attacks and phishing. Do not add the IP addresses of forwarding servers to your MAIL FROM domain’s SPF policy unless you are in complete control of all sources of mail being forwarded through this infrastructure. For DKIM, it means that the messages are being modified in some way that causes DKIM signature validation failure (see Email Content Modification section above). A responsible forwarding server will rewrite the MAIL FROM domain so that the messages pass SPF with a non-aligned authenticated identifier. These servers will attempt to forward the message without alteration in order to preserve DKIM signatures, but that is sometimes challenging to do in practice. In this scenario, since the messages carry no domain-aligned authenticated identifier, the messages will fail the DMARC policy.

Solution:

  • Email forwarding is an expected type of failure of which you will see in the DMARC aggregate reports. The domain owner must weigh the risk of causing forwarded messages to be rejected against the risk of not publishing a reject DMARC policy. Reference 8.6. Interoperability Considerations. Forwarding servers that wish to forward messages that they know will result in a DMARC failure will commonly rewrite the “From” header address of messages it forwards so that the messages pass a DMARC policy for a domain that the forwarding server is responsible for. The way to identify forwarding servers that rewrite the “From” header in this situation is to publish “p=quarantine pct=0 t=y” in your domain’s DMARC policy before publishing “p=reject”.

Multiple email sending providers are sending using the same domain

Multiple email sending providers: There are situations where an organization will have multiple business units sending email using the same domain, and these business units may be using an email sending provider other than SES. If neither SPF nor DKIM is configured with domain-alignment for these email sending providers, you will see DMARC failures in the DMARC aggregate report.

Solution:

  • Analyze the DMARC aggregate reports to identify other email sending providers, track down the business units responsible for each email sending program, and follow the instructions offered by the email sending provider about how to configure SPF and DKIM to produce a domain-aligned authenticated identifier.

What does a DMARC aggregate report look like?

The following XML example shows the general format of a DMARC aggregate report that you will receive from participating email service providers.

<?xml version="1.0" encoding="UTF-8" ?> 
<feedback> 
  <report_metadata> 
    <org_name>email-service-provider-domain.com</org_name> 
    <email>[email protected]</email> 
    <extra_contact_info>https://email-service-provider-domain.com/> 
    <report_id>620501112281841510</report_id> 
    <date_range> 
      <begin>1685404800</begin> 
      <end>1685491199</end> 
    </date_range> 
  </report_metadata> 
  <policy_published> 
    <domain>example.com</domain>
    <adkim>r</adkim> 
    <aspf>r</aspf> 
    <p>none</p> 
    <sp>none</sp> 
    <pct>100</pct> 
  </policy_published> 
  <record> 
    <row> 
      <source_ip>192.0.2.10</source_ip>
      <count>1</count> 
      <policy_evaluated> 
        <disposition>none</disposition> 
        <dkim>pass</dkim> 
        <spf>fail</spf> 
      </policy_evaluated> 
    </row> 
    <identifiers> 
      <header_from>example.com</header_from>
    </identifiers> 
    <auth_results> 
      <dkim> 
        <domain>example.com</domain> 
        <result>pass</result> 
        <selector>gm5h7da67oqhnr3ccji35fdskt</selector> 
      </dkim> 
      <dkim> 
        <domain>amazonses.com</domain> 
        <result>pass</result> 
        <selector>224i4yxa5dv7c2xz3womw6peua</selector> 
      </dkim> 
      <spf> 
        <domain>amazonses.com</domain> 
        <result>pass</result> 
      </spf> 
    </auth_results> 
  </record> 
</feedback> 

 

How to address DMARC deployment for domains confirmed to be unused for email (dangling or otherwise)

Deploying DMARC for unused or dangling domains is a proactive step to prevent abuse or unauthorized use of your domain. Once you have confirmed that all subdomains being used for sending email have the desired DMARC policies, you can publish a ‘p=reject’ tag on the organizational domain, which will prevent unauthorized usage of unused subdomains without the need to publish DMARC policies for every conceivable subdomain. For more advanced subdomain policy scenarios, read the “tree walk” definitions in https://datatracker.ietf.org/doc/draft-ietf-dmarc-dmarcbis/

Conclusion:

In conclusion, DMARC is not only a technology but also a commitment to email security, integrity, and trust. By embracing DMARC best practices, organizations can protect their users, maintain a positive brand reputation, and ensure seamless email deliverability. Every message from SES passes SPF and DKIM for “amazonses.com”, but the authenticated identifiers are not always in alignment with the domain in the “From” header which carries the DMARC policy. If email authentication is not fully configured, your messages are susceptible to delivery issues like spam filtering, or being rejected or blocked by the recipient ESP. As a best practice, you can configure both DKIM and SPF to attain optimum deliverability while sending email with SES.

 

About the Authors

Bruno Giorgini Bruno Giorgini is a Senior Solutions Architect specializing in Pinpoint and SES. With over two decades of experience in the IT industry, Bruno has been dedicated to assisting customers of all sizes in achieving their objectives. When he is not crafting innovative solutions for clients, Bruno enjoys spending quality time with his wife and son, exploring the scenic hiking trails around the SF Bay Area.
Jesse Thompson Jesse Thompson is an Email Deliverability Manager with the Amazon Simple Email Service team. His background is in enterprise IT development and operations, with a focus on email abuse mitigation and encouragement of authenticity practices with open standard protocols. Jesse’s favorite activity outside of technology is recreational curling.
Sesan Komaiya Sesan Komaiya is a Solutions Architect at Amazon Web Services. He works with a variety of customers, helping them with cloud adoption, cost optimization and emerging technologies. Sesan has over 15 year’s experience in Enterprise IT and has been at AWS for 5 years. In his free time, Sesan enjoys watching various sporting activities like Soccer, Tennis and Moto sport. He has 2 kids that also keeps him busy at home.
Mudassar Bashir Mudassar Bashir is a Solutions Architect at Amazon Web Services. He has over ten years of experience in enterprise software engineering. His interests include web applications, containerization, and serverless technologies. He works with different customers, helping them with cloud adoption strategies.
Priya Priya Singh is a Cloud Support Engineer at AWS and subject matter expert in Amazon Simple Email Service. She has a 6 years of diverse experience in supporting enterprise customers across different industries. Along with Amazon SES, she is a Cloudfront enthusiast. She loves helping customers in solving issues related to Cloudfront and SES in their environment.

 

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections

Post Syndicated from Suleman Ahmad original http://blog.cloudflare.com/connection-coalescing-with-origin-frames-fewer-dns-queries-fewer-connections/

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections

This blog reports and summarizes the contents of a Cloudflare research paper which appeared at the ACM Internet Measurement Conference, that measures and prototypes connection coalescing with ORIGIN Frames.

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections

Some readers might be surprised to hear that a single visit to a web page can cause a browser to make tens, sometimes even hundreds, of web connections. Take this very blog as an example. If it is your first visit to the Cloudflare blog, or it has been a while since your last visit, your browser will make multiple connections to render the page. The browser will make DNS queries to find IP addresses corresponding to blog.cloudflare.com and then subsequent requests to retrieve any necessary subresources on the web page needed to successfully render the complete page. How many? Looking below, at the time of writing, there are 32 different hostnames used to load the Cloudflare Blog. That means 32 DNS queries and at least 32 TCP (or QUIC) connections, unless the client is able to reuse (or coalesce) some of those connections.

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections

Each new web connection not only introduces additional load on a server's processing capabilities – potentially leading to scalability challenges during peak usage hours – but also exposes client metadata to the network, such as the plaintext hostnames being accessed by an individual. Such meta information can potentially reveal a user’s online activities and browsing behaviors to on-path network adversaries and eavesdroppers!

In this blog we’re going to take a closer look at “connection coalescing”. Since our initial look at IP-based coalescing in 2021, we have done further large-scale measurements and modeling across the Internet, to understand and predict if and where coalescing would work best. Since IP coalescing is difficult to manage at large scale, last year we implemented and experimented with a promising standard called the HTTP/2 ORIGIN Frame extension that we leveraged to coalesce connections to our edge without worrying about managing IP addresses.

All told, there are opportunities being missed by many large providers. We hope that this blog (and our publication at ACM IMC 2022 with full details) offers a first step that helps servers and clients take advantage of the ORIGIN Frame standard.

Setting the stage

At a high level, as a user navigates the web, the browser renders web pages by retrieving dependent subresources to construct the complete web page. This process bears a striking resemblance to the way physical products are assembled in a factory. In this sense, a modern web page can be considered like an assembly plant. It relies on a ‘supply chain’ of resources that are needed to produce the final product.

An assembly plant in the physical world can place a single order for different parts and get a single shipment from the supplier (similar to the kitting process for maximizing value and minimizing response time); no matter the manufacturer of those parts or where they are made — one ‘connection’ to the supplier is all that is needed. Any single truck from a supplier to an assembly plant can be filled with parts from multiple manufacturers.

The design of the web causes browsers to typically do the opposite in nature. To retrieve the images, JavaScript, and other resources on a web page (the parts), web clients (assembly plants) have to make at least one connection to every hostname (the manufacturers) defined in the HTML that is returned by the server (the supplier). It makes no difference if the connections to those hostnames go to the same server or not, for example they could go to a reverse proxy like Cloudflare. For each manufacturer a ‘new’ truck would be needed to transfer the materials to the assembly plant from the same supplier, or more formally, a new connection would need to be made to request a subresource from a hostname on the same web page.

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections
Without connection coalescing

The number of connections used to load a web page can be surprisingly high. It is also common for the subresources to need yet other sub-subresources, and so new connections emerge as a result of earlier ones. Remember, too, that HTTP connections to hostnames are often preceded by DNS queries! Connection coalescing allows us to use fewer connections, or ‘reuse’ the same set of trucks to carry parts from multiple manufacturers from a single supplier.

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections
With connection coalescing 

Connection coalescing in principle

Connection coalescing was introduced in HTTP/2, and carried over into HTTP/3. We’ve blogged about connection coalescing previously (for a detailed primer we encourage going over that blog). While the idea is simple, implementing it can present a number of engineering challenges. For example, recall from above that there are 32 hostnames (at the time of writing) to load the web page you are reading right now. Among the 32 hostnames are 16 unique domains (defined as “Effective TLD+1”). Can we create fewer connections or ‘coalesce’ existing connections for each unique domain? The answer is ‘Yes, but it depends’.

The exact number of connections to load the blog page is not at all obvious, and hard to know. There may be 32 hostnames attached to 16 domains but, counter-intuitively, this does not mean the answer to “how many unique connections?” is 16. The true answer could be as few as one connection if all the hostnames are reachable at a single server; or as many as 32 independent connections if a different and distinct server is needed to access each individual hostname.

Connection reuse comes in many forms, so it’s important to define “connection coalescing” in the HTTP space. For example, the reuse of an existing TCP or TLS connection to a hostname to make multiple requests for subresources from that same hostname is connection reuse, but not coalescing.

Coalescing occurs when an existing TLS channel for some hostname can be repurposed or used for connecting to a different hostname. For example, upon visiting blog.cloudflare.com, the HTML points to subresources at cdnjs.cloudflare.com. To reuse the same TLS connection for the subresources, it is necessary for both hostnames to appear together in the TLS certificate's “Server Alternative Name (SAN)” list, but this step alone is not sufficient to convince browsers to coalesce. After all, the cdnjs.cloudflare.com service may or may not be reachable at the same server as blog.cloudflare.com, despite being on the same certificate. So how can the browser know? Coalescing only works if servers set up the right conditions, but clients have to decide whether to coalesce or not – thus, browsers require a signal to coalesce beyond the SANs list on the certificate. Revisiting our analogy, the assembly plant may order a part from a manufacturer directly, not knowing that the supplier already has the same part in its warehouse.

There are two explicit signals a browser can use to decide whether connections can be coalesced: one is IP-based, the other ORIGIN Frame-based. The former requires the server operators to tightly bind DNS records to the HTTP resources available on the server. This is difficult to manage and deploy, and actually creates a risky dependency, because you have to place all the resources behind a specific set or a single IP address. The way IP addresses influence coalescing decisions varies among browsers, with some choosing to be more conservative and others more permissive. Alternatively, the HTTP ORIGIN Frame is an easier signal for the servers to orchestrate; it’s also flexible and has graceful failure with no interruption to service (for a specification compliant implementation).

A foundational difference between both these coalescing signals is: IP-based coalescing signals are implicit, even accidental, and force clients to infer coalescing possibilities that may exist, or not. None of this is surprising since IP addresses are designed to have no real relationship with names! In contrast, ORIGIN Frame is an explicit signal from servers to clients that coalescing is available no matter what DNS says for any particular hostname.

We have experimented with IP-based coalescing previously; for the purpose of this blog we will take a deeper look at ORIGIN Frame-based coalescing.

What is the ORIGIN Frame standard?

The ORIGIN Frame is an extension to the HTTP/2 and HTTP/3 specification, a special Frame sent on stream 0 or the control stream of the connection respectively. The Frame allows the servers to send an ‘origin-set’ to the clients on an existing established TLS connection, which includes hostnames that it is authorized for and will not incur any HTTP 421 errors. Hostnames in the origin-set MUST also appear in the certificate SAN list for the server, even if those hostnames are announced on different IP addresses via DNS.

Specifically, two different steps are required:

  1. Web servers must send a list enumerating the Origin Set (the hostnames that a given connection might be used for) in the ORIGIN Frame extension.
  2. The TLS certificate returned by the web server must cover the additional hostnames being returned in the ORIGIN Frame in the DNS names SAN entries.

At a high-level ORIGIN Frames are a supplement to the TLS certificate that operators can attach to say, “Psst! Hey, client, here are the names in the SANs that are available on this connection — you can coalesce!” Since the ORIGIN Frame is not part of the certificate itself, its contents can be made to change independently. No new certificate is required. There is also no dependency on IP addresses. For a coalesceable hostname, existing TCP/QUIC+TLS connections can be reused without requiring new connections or DNS queries.

Many websites today rely on content which is served by CDNs, like Cloudflare CDN service. The practice of using external CDN services offers websites the advantages of speed, reliability, and reduces the load of content served by their origin servers. When both the website, and the resources are served by the same CDN, despite being different hostnames, owned by different entities, it opens up some very interesting opportunities for CDN operators to allow connections to be reused and coalesced since they can control both the certificate management and connection requests for sending ORIGIN frames on behalf of the real origin server.

Unfortunately, there has been no way to turn the possibilities enabled by ORIGIN Frame into practice. To the best of our knowledge, until today, there has been no server implementation that supports ORIGIN Frames. Among browsers, only Firefox supports ORIGIN Frames. Since IP coalescing is challenging and ORIGIN Frame has no deployed support, is the engineering time and energy to better support coalescing worth the investment? We decided to find out with a large-scale Internet-wide measurement to understand the opportunities and predict the possibilities, and then implemented the ORIGIN Frame to experiment on production traffic.

Experiment #1: What is the scale of required changes?

In February 2021, we collected data for 500K of the most popular websites on the Internet, using a modified Web Page Test on 100 virtual machines. An automated Chrome (v88) browser instance was launched for every visit to a web page to eliminate caching effects (because we wanted to understand coalescing, not caching). On successful completion of each session, Chrome developer tools were used to retrieve and write the page load data as an HTTP Archive format (HAR) file with a full timeline of events, as well as additional information about certificates and their validation. Additionally, we parsed the certificate chains for the root web page and new TLS connections triggered by subresource requests to (i) identify certificate issuers for the hostnames, (ii) inspect the presence of the Subject Alternative Name (SAN) extension, and (iii) validate that DNS names resolve to the IP address used. Further details about our methodology and results can be found in the technical paper.

The first step was to understand what resources are requested by web pages to successfully render the page contents, and where these resources were present on the Internet. Connection coalescing becomes possible when subresource domains are ideally co-located. We approximated the location of a domain by finding its corresponding autonomous system (AS). For example, the domain attached to cdnjs is reachable via AS 13335 in the BGP routing table, and that AS number belongs to Cloudflare. The figure below describes the percentage of web pages and the number of unique ASes needed to fully load a web page.

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections

Around 14% of the web pages need two ASes to fully load i.e. pages that have a dependency on one additional AS for subresources. More than 50% of the web pages need to contact no more than six ASes to obtain all the necessary subresources. This finding as shown in the plot above implies that a relatively small number of operators serve the sub-resource content necessary for a majority (~50%) of the websites, and any usage of ORIGIN Frames would need only a few changes to have its intended impact. The potential for connection coalescing can therefore be optimistically approximated to the number of unique ASes needed to retrieve all subresources in a web page. In practice however, this may be superseded by operational factors such as SLAs or helped by flexible mappings between sockets, names, and IP addresses which we worked on previously at Cloudflare.

We then tried to understand the impact of coalescing on connection metrics. The measured and ideal number of DNS queries and TLS connections needed to load a web page are summarized by their CDFs in the figure below.

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections

Through modeling and extensive analysis, we identify that connection coalescing through ORIGIN Frames could reduce the number of DNS and TLS connections made by browsers by over 60% at the median. We performed this modeling by identifying the number of times the clients requested DNS records, and combined them with the ideal ORIGIN Frames to serve.

Many multi-origin servers such as those operated by CDNs tend to reuse certificates and serve the same certificate with multiple DNS SAN entries. This allows the operators to manage fewer certificates through their creation and renewal cycles. While theoretically one can have millions of names in the certificate, creating such certificates is unreasonable and a challenge to manage effectively. By continuing to rely on existing certificates, our modeling measurements bring to light the volume of changes required to enable perfect coalescing, while presenting information about the scale of changes needed, as highlighted in the figure below.

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections

We identify that over 60% of the certificates served by websites do not need any modifications and could benefit from ORIGIN Frames, while with no more than 10 additions to the DNS SAN names in certificates we’re able to successfully coalesce connections to over 92% of the websites in our measurement. The most effective changes could be made by CDN providers by adding three or four of their most popular requested hostnames into each certificate.

Experiment #2: ORIGIN Frames in action

In order to validate our modeling expectations, we then took a more active approach in early 2022. Our next experiment focused on 5,000 websites that make extensive use of cdnjs.cloudflare.com as a subresource. By modifying our experimental TLS termination endpoint we deployed HTTP/2 ORIGIN Frame support as defined in the RFC standard. This involved changing the internal fork of net and http dependency modules of Golang which we have open sourced (see here, and here).

During the experiments, connecting to a website in the experiment set would return cdnjs.cloudflare.com in the ORIGIN frame, while the control set returned an arbitrary (unused) hostname. All existing edge certificates for the 5000 websites were also modified. For the experimental group, the corresponding certificates were renewed with cdnjs.cloudflare.com added to the SAN. To ensure integrity between control and experimental sets, control group domains certificates were also renewed with a valid and identical size third party domain used by none of the control domains. This is done to ensure that the relative size changes to the certificates is kept constant avoiding potential biases due to different certificate sizes. Our results were striking!

Connection coalescing with ORIGIN Frames: fewer DNS queries, fewer connections

Sampling 1% of the requests we received from Firefox to the websites in the experiment, we identified over 50% reduction in new TLS connections per second indicating a lesser number of cryptographic verification operations done by both the client and reduced server compute overheads. As expected there were no differences in the control set indicating the effectiveness of connection re-use as seen by the CDN or server operators.

Discussion and insights

While our modeling measurements indicated that we could anticipate some performance improvements, in practice it was not significantly better suggesting that ‘no-worse’ is the appropriate mental model regarding performance. The subtle interplay between resource object sizes, competing connections, and congestion control is subject to network conditions. Bottleneck-share capacity, for example, diminishes as fewer connections compete for bottleneck resources on network links. It would be interesting to revisit these measurements as more operators deploy support on their servers for ORIGIN Frames.

Apart from performance, one major benefit of ORIGIN frames is in terms of privacy. How? Well, each coalesced connection hides client metadata that is otherwise leaked from non-coalesced connections. Certain resources on a web page are loaded depending on how one is interacting with the website. This means for every new connection for retrieving some resource from the server, TLS plaintext metadata like SNI (in the absence of Encrypted Client Hello) and at least one plaintext DNS query, if transmitted over UDP or TCP on port 53, is exposed to the network. Coalescing connections helps remove the need for browsers to open new TLS connections, and the need to do extra DNS queries. This prevents metadata leakage from anyone listening on the network. ORIGIN Frames help minimize those signals from the network path, improving privacy by reducing the amount of cleartext information leaked on path to network eavesdroppers.

While the browsers benefit from reduced cryptographic computations needed to verify multiple certificates, a major advantage comes from the fact that it opens up very interesting future opportunities for resource scheduling at the endpoints (the browsers, and the origin servers) such as prioritization, or recent proposals like HTTP early hints to provide clients experiences where connections are not overloaded or competing for those resources. When coupled with CERTIFICATE Frames IETF draft, we can further eliminate the need for manual certificate modifications as a server can prove its authority of hostnames after connection establishment without any additional SAN entries on the website’s TLS certificate.

Conclusion and call to action

In summary, the current Internet ecosystem has a lot of opportunities for connection coalescing with only a few changes to certificates and their server infrastructure. Servers can significantly reduce the number of TLS handshakes by roughly 50%, while reducing the number of render blocking DNS queries by over 60%. Clients additionally reap these benefits in privacy by reducing cleartext DNS exposure to network on-lookers.

To help make this a reality we are currently planning to add support for both HTTP/2 and HTTP/3 ORIGIN Frames for our customers. We also encourage other operators that manage third party resources to adopt support of ORIGIN Frame to improve the Internet ecosystem.
Our paper submission was accepted to the ACM Internet Measurement Conference 2022 and is available for download. If you’d like to work on projects like this, where you get to see the rubber meet the road for new standards, visit our careers page!

Connection errors in Asia Pacific region on July 9, 2023

Post Syndicated from Christian Elmerot original http://blog.cloudflare.com/connection-errors-in-asia-pacific-region-on-july-9-2023/

Connection errors in Asia Pacific region on July 9, 2023

Connection errors in Asia Pacific region on July 9, 2023

On Sunday, July 9, 2023, early morning UTC time, we observed a high number of DNS resolution failures — up to 7% of all DNS queries across the Asia Pacific region — caused by invalid DNSSEC signatures from Verisign .com and .net Top Level Domain (TLD) nameservers. This resulted in connection errors for visitors of Internet properties on Cloudflare in the region.

The local instances of Verisign’s nameservers started to respond with expired DNSSEC signatures in the Asia Pacific region. In order to remediate the impact, we have rerouted upstream DNS queries towards Verisign to locations on the US west coast which are returning valid signatures.

We have already reached out to Verisign to get more information on the root cause. Until their issues have been resolved, we will keep our DNS traffic to .com and .net TLD nameservers rerouted, which might cause slightly increased latency for the first visitor to domains under .com and .net in the region.

Background

In order to proxy a domain’s traffic through Cloudflare’s network, there are two components involved with respect to the Domain Name System (DNS) from the perspective of a Cloudflare data center: external DNS resolution, and upstream or origin DNS resolution.

To understand this, let’s look at the domain name blog.cloudflare.com — which is proxied through Cloudflare’s network — and let’s assume it is configured to use origin.example as the origin server:

Connection errors in Asia Pacific region on July 9, 2023

Here, the external DNS resolution is the part where DNS queries to blog.cloudflare.com sent by public resolvers like 1.1.1.1 or 8.8.8.8 on behalf of a visitor return a set of Cloudflare Anycast IP addresses. This ensures that the visitor’s browser knows where to send an HTTPS request to load the website. Under the hood your browser performs a DNS query that looks something like this (the trailing dot indicates the DNS root zone):

$ dig blog.cloudflare.com. +short
104.18.28.7
104.18.29.7

(Your computer doesn’t actually use the dig command internally; we’ve used it to illustrate the process) Then when the next closest Cloudflare data center receives the HTTPS request for blog.cloudflare.com, it needs to fetch the content from the origin server (assuming it is not cached).

There are two ways Cloudflare can reach the origin server. If the DNS settings in Cloudflare contain IP addresses then we can connect directly to the origin. In some cases, our customers use a CNAME which means Cloudflare has to perform another DNS query to get the IP addresses associated with the CNAME. In the example above, blog.cloudflare.com has a CNAME record instructing us to look at origin.example for IP addresses. During the incident, only customers with CNAME records like this going to .com and .net domain names may have been affected.

The domain origin.example needs to be resolved by Cloudflare as part of the upstream or origin DNS resolution. This time, the Cloudflare data center needs to perform an outbound DNS query that looks like this:

$ dig origin.example. +short
192.0.2.1

DNS is a hierarchical protocol, so the recursive DNS resolver, which usually handles DNS resolution for whoever wants to resolve a domain name, needs to talk to several involved nameservers until it finally gets an answer from the authoritative nameservers of the domain (assuming no DNS responses are cached). This is the same process during the external DNS resolution and the origin DNS resolution. Here is an example for the origin DNS resolution:

Connection errors in Asia Pacific region on July 9, 2023

Inherently, DNS is a public system that was initially published without any means to validate the integrity of the DNS traffic. So in order to prevent someone from spoofing DNS responses, DNS Security Extensions (DNSSEC) were introduced as a means to authenticate that DNS responses really come from who they claim to come from. This is achieved by generating cryptographic signatures alongside existing DNS records like A, AAAA, MX, CNAME, etc. By validating a DNS record’s associated signature, it is possible to verify that a requested DNS record really comes from its authoritative nameserver and wasn’t altered en-route. If a signature cannot be validated successfully, recursive resolvers usually return an error indicating the invalid signature. This is exactly what happened on Sunday.

Incident timeline and impact

On Saturday, July 8, 2023, at 21:10 UTC our logs show the first instances of DNSSEC validation errors that happened during upstream DNS resolution from multiple Cloudflare data centers in the Asia Pacific region. It appeared DNS responses from the TLD nameservers of .com and .net of the type NSEC3 (a DNSSEC mechanism to prove non-existing DNS records) did not include signatures. About an hour later at 22:16 UTC, the first internal alerts went off (since it required issues to be consistent over a certain period of time), but error rates were still at a level at around 0.5% of all upstream DNS queries.

After several hours, the error rate had increased to a point in which ~13% of our upstream DNS queries in affected locations were failing. This percentage continued to fluctuate over the duration of the incident between the ranges of 10-15% of upstream DNS queries, and roughly 5-7% of all DNS queries (external & upstream resolution) to affected Cloudflare data centers in the Asia Pacific region.

Connection errors in Asia Pacific region on July 9, 2023

Initially it appeared as though only a single upstream nameserver was having issues with DNS resolution, however upon further investigation it was discovered that the issue was more widespread. Ultimately, we were able to verify that the Verisign nameservers for .com and .net were returning expired DNSSEC signatures on a portion of DNS queries in the Asia Pacific region. Based on our tests, other nameserver locations were correctly returning valid DNSSEC signatures.

In response, we rerouted our DNS traffic to the .com and .net TLD nameserver IP addresses to Verisign’s US west locations. After this change was propagated, the issue very quickly subsided and origin resolution error rates returned to normal levels.

All times are in UTC:

2023-07-08 21:10 First instances of DNSSEC validation errors appear in our logs for origin DNS resolution.

2023-07-08 22:16 First internal alerts for Asia Pacific data centers go off indicating origin DNS resolution error on our test domain. Very few errors for other domains at this point.

2023-07-09 02:58 Error rates have increased substantially since the first instance. An incident is declared.

2023-07-09 03:28 DNSSEC validation issues seem to be isolated to a single upstream provider. It is not abnormal that a single upstream has issues that propagate back to us, and in this case our logs were predominantly showing errors from domains that resolve to this specific upstream.

2023-07-09 04:52 We realize that DNSSEC validation issues are more widespread and affect multiple .com and .net domains. Validation issues continue to be isolated to the Asia Pacific region only, and appear to be intermittent.

2023-07-09 05:15 DNS queries via popular recursive resolvers like 8.8.8.8 and 1.1.1.1 do not return invalid DNSSEC signatures at this point. DNS queries using the local stub resolver continue to return DNSSEC errors.

2023-07-09 06:24 Responses from .com and .net Verisign nameservers in Singapore contain expired DNSSEC signatures, but responses from Verisign TLD nameservers in other locations are fine.

2023-07-09 06:41 We contact Verisign and notify them about expired DNSSEC signatures.

2023-07-09 06:50 In order to remediate the impact, we reroute DNS traffic via IPv4 for the .com and .net Verisign nameserver IPs to US west IPs instead. This immediately leads to a substantial drop in the error rate.

2023-07-09 07:06 We also reroute DNS traffic via IPv6 for the .com and .net Verisign nameserver IPs to US west IPs. This leads to the error rate going down to zero.

2023-07-10 09:23 The rerouting is still in place, but the underlying issue with expired signatures in the Asia Pacific region has still not been resolved.

2023-07-10 18:23 Versign gets back to us confirming that they “were serving stale data” at their local site and have resolved the issues.

Technical description of the error and how it happened

As mentioned in the introduction, the underlying cause for this incident was expired DNSSEC signatures for .net and .com zones. Expired DNSSEC signatures will cause a DNS response to be interpreted as invalid. There are two main scenarios in which this error was observed by a user:

  1. CNAME flattening for external DNS resolution. This is when our authoritative nameservers attempt to return the IP address(es) that a CNAME record target resolves to rather than the CNAME record itself.
  2. CNAME target lookup for origin DNS resolution. This is most commonly used when an HTTPS request is sent to a Cloudflare anycast IP address, and we need to determine what IP address to forward the request to. See How Cloudflare works for more details.

In both cases, behind the scenes the DNS query goes through an in-house recursive DNS resolver in order to lookup what a hostname resolves to. The purpose of this resolver is to cache queries, optimize future queries and provide DNSSEC validation. If the query from this resolver fails for whatever reason, our authoritative DNS system will not be able to perform the two scenarios outlined above.

Connection errors in Asia Pacific region on July 9, 2023

During the incident, the recursive resolver was failing to validate the DNSSEC signatures in DNS responses for domains ending in .com and .net. These signatures are sent in the form of an RRSIG record alongside the other DNS records they cover. Together they form a Resource Record set (RRset). Each RRSIG has the corresponding fields:

Connection errors in Asia Pacific region on July 9, 2023

As you can see, the main part of the payload is associated with the signature and its corresponding metadata. Each recursive resolver is responsible for not only checking the signature but also the expiration time of the signature. It is important to obey the expiration time in order to avoid returning responses for RRsets that have been signed by old keys, which could have potentially been compromised by that time.

During this incident, Verisign, the authoritative operator for the .com and .net TLD zones, was occasionally returning expired signatures in its DNS responses in the Asia Pacific region. As a result our recursive resolver was not able to validate the corresponding RRset. Ultimately this meant that a percentage of DNS queries would return SERVFAIL as response code to our authoritative nameserver. This in turn caused connection issues for users trying to connect to a domain on Cloudflare, because we weren't able to resolve the upstream target of affected domain names and thus didn’t know where to send proxied HTTPS requests to upstream servers.

Remediation and follow-up steps

Once we had identified the root cause we started to look at different ways to remedy the issue. We came to the conclusion that the fastest way to work around this very regionalized issue was to stop using the local route to Verisign's nameservers. This means that, at the time of writing this, our outgoing DNS traffic towards Verisign's nameservers in the Asia Pacific region now traverses the Pacific and ends up at the US west coast, rather than being served by closer nameservers. Due to the nature of DNS and the important role of DNS caching, this has less impact than you might initially expect. Frequently looked up names will be cached after the first request, and this only needs to happen once per data center, as we share and pool the local DNS recursor caches to maximize their efficiency.

Ideally, we would have been able to fix the issue right away as it potentially affected others in the region too, not just our customers. We will therefore work diligently to improve our incident communications channels with other providers in order to ensure that the DNS ecosystem remains robust against issues such as this. Being able to quickly get hold of other providers that can take action is vital when urgent issues like these arise.

Conclusion

This incident once again shows how impactful DNS failures are and how crucial this technology is for the Internet. We will investigate how we can improve our systems to detect and resolve issues like this more efficiently and quickly if they occur again in the future.

While Cloudflare was not the cause of this issue, and we are certain that we were not the only ones affected by this, we are still sorry for the disruption to our customers and to all the users who were unable to access Internet properties during this incident.

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile

Post Syndicated from Dirk-Jan van Helmond original http://blog.cloudflare.com/how-cloudflare-scaled-and-protected-eurovision-2023-voting/

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile

2023 was the first year that non-participating countries could vote for their favorites during the Eurovision Song Contest, adding millions of additional viewers and voters to an already impressive 162 million tuning in from the participating countries. It became a truly global event with a potential for disruption from multiple sources. To prepare for anything, Cloudflare helped scale and protect the voting application, used by millions of dedicated fans around the world to choose the winner.

In this blog we will cover how once.net built their platform based.io to monitor, manage and scale the Eurovision voting application to handle all traffic using many Cloudflare services. The speed with which DNS changes made through the Cloudflare API propagate globally allowed them to scale their backend within seconds. At the same time, Cloudflare Pages was ready to serve any amount of traffic to the voting landing page so fans didn’t miss a beat. And to cap it off, by combining Cloudflare CDN, DDoS protection, WAF, and Turnstile, they made sure that attackers didn’t steal any of the limelight.

The unsung heroes

Based.io is a resilient live data platform built by the once.net team, with the capability to scale up to 400 million concurrent connected users. It’s built from the ground up for speed and performance, consisting of an observable real time graph database, networking layer, cloud functions, analytics and infrastructure orchestration. Since all system information, traffic analysis and disruptions are monitored in real time, it makes the platform instantly responsive to variable demand, which enables real time scaling of your infrastructure during spikes, outages and attacks.

Although the based.io platform on its own is currently in closed beta, it is already serving a few flagship customers in production assisted by the software and services of the once.net team. One such customer is Tally, a platform used by multiple broadcasters in Europe to add live interaction to traditional television. Over 100 live shows have been performed using the platform. Another is Airhub, a startup that handles and logs automatic drone flights. And of course the star of this blog post, the Eurovision Song Contest.

Setting the stage

The Eurovision Song Contest is one of the world’s most popular broadcasted contests, and this year it reached 162 million people across 38 broadcasting countries. In addition, on TikTok the three live shows were viewed 4.8 million times, while 7.6 million people watched the Grand Final live on YouTube. With such an audience, it is no surprise that Cloudflare sees the impact of it on the Internet. Last year, we wrote a blog post where we showed lower than average traffic during, and higher than average traffic after the grand final. This year, the traffic from participating countries showed an even more remarkable surge:

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile
HTTP Requests per Second from Norway, with a similar pattern visible in countries such as the UK, Sweden and France. Internet traffic spiked at 21:20 UTC, when voting started.

Such large amounts of traffic are nothing new to the Eurovision Song Contest. Eurovision has relied on Cloudflare’s services for over a decade now and Cloudflare has helped to protect Eurovision.tv and improve its performance through noticeable faster load time to visitors from all corners of the world. Year after year, the team of Eurovision continued to use our services more, discovering additional features to improve performance and reliability further, with increasingly fine-grained control over their traffic flows. Eurovision.tv uses Page Rules to cache additional content on Cloudflare’s edge, speeding up delivery without sacrificing up-to-the-minute updates during the global event. Finally, to protect their backend and content management system, the team has placed their admin portals behind Cloudflare Zero Trust to delegate responsibilities down to individual levels.

Since then the contest itself has also evolved – sometimes by choice, sometimes by force. During the COVID-19 pandemic in-person cheering became impossible for many people due to a reduced live audience, resulting in the Eurovision Song Contest asking once.net to build a new iOS and Android application in which fans could cheer virtually. The feature was an instant hit, and it was clear that it would become part of this year’s contest as well.

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile
A screenshot of the official Eurovision Song Contest application showing the real-time number of connected fans (1) and allowing them to cheer (2) for their favorites.

In 2023, once.net was also asked to handle the paid voting from the regions where phone and SMS voting was not possible. It was the first time that Eurovision allowed voting online. The challenge that had to be overcome was the extreme peak demand on the platform when the show was live, and especially when the voting window started.

Complicating it further, was the fact that during last year’s show, there had been a large number of targeted and coordinated attacks.

To prepare for these spikes in demand and determined adversaries, once.net needed a platform that isn’t only resilient and highly scalable, but could also act as a mitigation layer in front of it. once.net selected Cloudflare for this functionality and integrated Cloudflare deeply with its real-time monitoring and management platform. To understand how and why, it’s essential to understand based.io underlying architecture.

The based.io platform

Instead of relying on network or HTTP load balancers, based.io uses a client-side service discovery pattern, selecting the most suitable server to connect to and leveraging Cloudflare's fast cache propagation infrastructure to handle spikes in traffic (both malicious and benign).

First, each server continuously registers a unique access key that has an expiration of 15 seconds, which must be used when a client connects to the server. In addition, the backend servers register their health (such as active connections, CPU, memory usage, requests per second, etc.) to the service registry every 300 milliseconds. Clients then request the optimal server URL and associated access key from a central discovery registry and proceed to establish a long lived connection with that server. When a server gets overloaded it will disconnect a certain amount of clients and those clients will go through the discovery process again.

The central discovery registry would normally be a huge bottleneck and attack target. based.io resolves this by putting the registry behind Cloudflare's global network with a cache time of three seconds. Since the system relies on real-time stats to distribute load and uses short lived access keys, it is crucial that the cache updates fast and reliably. This is where Cloudflare’s infrastructure proved its worth, both due to the fast updating cache and reducing load with Tiered Caching.

Not using load balancers means the based.io system allows clients to connect to the backend servers through Cloudflare, resulting in  better performance and a more resilient infrastructure by eliminating the load balancers as potential attack surface. It also results in a better distribution of connections, using the real-time information of server health, amount of active connections, active subscriptions.

Scaling up the platform happens automatically under load by deploying additional machines that can each handle 40,000 connected users. These are spun up in batches of a couple of hundred and as each machine spins up, it reaches out directly to the Cloudflare API to configure its own DNS record and proxy status. Thanks to Cloudflare’s high speed DNS system, these changes are then propagated globally within seconds, resulting in a total machine turn-up time of around three seconds. This means faster discovery of new servers and faster dynamic rebalancing from the clients. And since the voting window of the Eurovision Song Contest is only 45 minutes, with the main peak within minutes after the window opens, every second counts!

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile
High level architecture of the based.io platform used for the 2023 Eurovision Song Contest‌ ‌

To vote, users of the mobile app and viewers globally were pointed to the voting landing page, esc.vote. Building a frontend web application able to handle this kind of an audience is a challenge in itself. Although hosting it yourself and putting a CDN in front seems straightforward, this still requires you to own, configure and manage your origin infrastructure. once.net decided to leverage Cloudflare’s infrastructure directly by hosting the voting landing page on Cloudflare Pages. Deploying was as quick as a commit to their Git repository, and they never had to worry about reachability or scaling of the webpage.

once.net also used Cloudflare Turnstile to protect their payment API endpoints that were used to validate online votes. They used the invisible Turnstile widget to make sure the request was not coming from emulated browsers (e.g. Selenium). And best of all, using the invisible Turnstile widget the user did not have to go through extra steps, which allowed for a better user experience and better conversion.

Cloudflare Pages stealing the show!

After the two semi-finals went according to plan with approximately 200,000 concurrent users during each,May 13 brought the Grand Final. The once.net team made sure that there were enough machines ready to take the initial load, jumped on a call with Cloudflare to monitor and started looking at the number of concurrent users slowly increasing. During the event, there were a few attempts to DDoS the site, which were automatically and instantaneously mitigated without any noticeable impact to any visitors.

The based.io discovery registry server also got some attention. Since the cache TTL was set quite low at five seconds, a high rate of distributed traffic to it could still result in a significant load. Luckily, on its own, the highly optimized based.io server can already handle around 300,000 requests per second. Still, it was great to see that during the event the cache hit ratio for normal traffic was 20%, and during one significant attack the cache hit ratio peaked towards 80%. This showed how easy it is to leverage a combination of Cloudflare CDN and DDoS protection to mitigate such attacks, while still being able to serve dynamic and real time content.

When the curtains finally closed, 1.3 million concurrent users connected to the based.io platform at peak. The based.io platform handled a total of 350 million events and served seven million unique users in three hours. The voting landing page hosted by Cloudflare Pages served 2.3 million requests per second at peak, and made sure that the voting payments were by real human fans using Turnstile. Although the Cloudflare platform didn’t blink for such a flood of traffic, it is no surprise that it shows up as a short crescendo in our traffic statistics:

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile

Get in touch with us

If you’re also working on or with an application that would benefit from Cloudflare’s speed and security, but don’t know where to start, reach out and we’ll work together.

How Rust and Wasm power Cloudflare’s 1.1.1.1

Post Syndicated from Anbang Wen original https://blog.cloudflare.com/big-pineapple-intro/

How Rust and Wasm power Cloudflare's 1.1.1.1

How Rust and Wasm power Cloudflare's 1.1.1.1

On April 1, 2018, Cloudflare announced the 1.1.1.1 public DNS resolver. Over the years, we added the debug page for troubleshooting, global cache purge, 0 TTL for zones on Cloudflare, Upstream TLS, and 1.1.1.1 for families to the platform. In this post, we would like to share some behind the scenes details and changes.

When the project started, Knot Resolver was chosen as the DNS resolver. We started building a whole system on top of it, so that it could fit Cloudflare’s use case. Having a battle tested DNS recursive resolver, as well as a DNSSEC validator, was fantastic because we could spend our energy elsewhere, instead of worrying about the DNS protocol implementation.

Knot Resolver is quite flexible in terms of its Lua-based plugin system. It allowed us to quickly extend the core functionality to support various product features, like DoH/DoT, logging, BPF-based attack mitigation, cache sharing, and iteration logic override. As the traffic grew, we reached certain limitations.

Lessons we learned

Before going any deeper, let’s first have a bird’s-eye view of a simplified Cloudflare data center setup, which could help us understand what we are going to talk about later. At Cloudflare, every server is identical: the software stack running on one server is exactly the same as on another server, only the configuration may be different. This setup greatly reduces the complexity of fleet maintenance.

How Rust and Wasm power Cloudflare's 1.1.1.1
Figure 1 Data center layout

The resolver runs as a daemon process, kresd, and it doesn’t work alone. Requests, specifically DNS requests, are load-balanced to the servers inside a data center by Unimog. DoH requests are terminated at our TLS terminator. Configs and other small pieces of data can be delivered worldwide by Quicksilver in seconds. With all the help, the resolver can concentrate on its own goal – resolving DNS queries, and not worrying about transport protocol details. Now let’s talk about 3 key areas we wanted to improve here – blocking I/O in plugins, a more efficient use of cache space, and plugin isolation.

Callbacks blocking the event loop

Knot Resolver has a very flexible plugin system for extending its core functionality. The plugins are called modules, and they are based on callbacks. At certain points during request processing, these callbacks will be invoked with current query context. This gives a module the ability to inspect, modify, and even produce requests / responses. By design, these callbacks are supposed to be simple, in order to avoid blocking the underlying event loop. This matters because the service is single threaded, and the event loop is in charge of serving many requests at the same time. So even just one request being held up in a callback means that no other concurrent requests can be progressed until the callback finishes.

The setup worked well enough for us until we needed to do blocking operations, for example, to pull data from Quicksilver before responding to the client.

Cache efficiency

As requests for a domain could land on any node inside a data center, it would be wasteful to repetitively resolve a query when another node already has the answer. By intuition, the latency could be improved if the cache could be shared among the servers, and so we created a cache module which multicasted the newly added cache entries. Nodes inside the same data center could then subscribe to the events and update their local cache.

The default cache implementation in Knot Resolver is LMDB. It is fast and reliable for small to medium deployments. But in our case, cache eviction shortly became a problem. The cache itself doesn’t track for any TTL, popularity, etc. When it’s full, it just clears all the entries and starts over. Scenarios like zone enumeration could fill the cache with data that is unlikely to be retrieved later.

Furthermore, our multicast cache module made it worse by amplifying the less useful data to all the nodes, and led them to the cache high watermark at the same time. Then we saw a latency spike because all the nodes dropped the cache and started over around the same time.

Module isolation

With the list of Lua modules increasing, debugging issues became increasingly difficult. This is because a single Lua state is shared among all the modules, so one misbehaving module could affect another. For example, when something went wrong inside the Lua state, like having too many coroutines, or being out of memory, we got lucky if the program just crashed, but the resulting stack traces were hard to read. It is also difficult to forcibly tear down, or upgrade, a running module as it not only has state in the Lua runtime, but also FFI, so memory safety is not guaranteed.

Hello BigPineapple

We didn’t find any existing software that would meet our somewhat niche requirements, so eventually we started building something ourselves. The first attempt was to wrap Knot Resolver’s core with a thin service written in Rust (modified edgedns).

This proved to be difficult due to having to constantly convert between the storage, and C/FFI types, and some other quirks (for example, the ABI for looking up records from cache expects the returned records to be immutable until the next call, or the end of the read transaction). But we learned a lot from trying to implement this sort of split functionality where the host (the service) provides some resources to the guest (resolver core library), and how we would make that interface better.

In the later iterations, we replaced the entire recursive library with a new one based around an async runtime; and a redesigned module system was added to it, sneakily rewriting the service into Rust over time as we swapped out more and more components. That async runtime was tokio, which offered a neat thread pool interface for running both non-blocking and blocking tasks, as well as a good ecosystem for working with other crates (Rust libraries).

After that, as the futures combinators became tedious, we started converting everything to async/await. This was before the async/await feature that landed in Rust 1.39, which led us to use nightly (Rust beta) for a while and had some hiccups. When the async/await stabilized, it enabled us to write our request processing routine ergonomically, similar to Go.

All the tasks can be run concurrently, and certain I/O heavy ones can be broken down into smaller pieces, to benefit from a more granular scheduling. As the runtime executes tasks on a threadpool, instead of a single thread, it also benefits from work stealing. This avoids a problem we previously had, where a single request taking a lot of time to process, that blocks all the other requests on the event loop.

How Rust and Wasm power Cloudflare's 1.1.1.1
Figure 2 Components overview

Finally, we forged a platform that we are happy with, and we call it BigPineapple. The figure above shows an overview of its main components and the data flow between them. Inside BigPineapple, the server module gets inbound requests from the client, validates and transforms them into unified frame streams, which can then be processed by the worker module. The worker module has a set of workers, whose task is to figure out the answer to the question in the request. Each worker interacts with the cache module to check if the answer is there and still valid, otherwise it drives the recursor module to recursively iterate the query. The recursor doesn’t do any I/O, when it needs anything, it delegates the sub-task to the conductor module. The conductor then uses outbound queries to get the information from upstream nameservers. Through the whole process, some modules can interact with the Sandbox module, to extend its functionality by running the plugins inside.

Let’s look at some of them in more detail, and see how they helped us overcome the problems we had before.

Updated I/O architecture

A DNS resolver can be seen as an agent between a client and several authoritative nameservers: it receives requests from the client, recursively fetches data from the upstream nameservers, then composes the responses and sends them back to the client. So it has both inbound and outbound traffic, which are handled by the server and the conductor component respectively.

The server listens on a list of interfaces using different transport protocols. These are later abstracted into streams of “frames”. Each frame is a high level representation of a DNS message, with some extra metadata. Underneath, it can be a UDP packet, a segment of TCP stream, or the payload of a HTTP request, but they are all processed the same way. The frame is then converted into an asynchronous task, which in turn is picked up by a set of workers in charge of resolving these tasks. The finished tasks are converted back into responses, and sent back to the client.

This “frame” abstraction over the protocols and their encodings simplified the logic used to regulate the frame sources, such as enforcing fairness to prevent starving and controlling pacing to protect the server from being overwhelmed. One of the things we’ve learned with the previous implementations is that, for a service open to the public, a peak performance of the I/O matters less than the ability to pace clients fairly. This is mainly because the time and computational cost of each recursive request is vastly different (for example a cache hit from a cache miss), and it’s difficult to guess it beforehand. The cache misses in recursive service not only consume Cloudflare’s resources, but also the resources of the authoritative nameservers being queried, so we need to be mindful of that.

On the other side of the server is the conductor, which manages all the outbound connections. It helps to answer some questions before reaching out to the upstream: Which is the fastest nameserver to connect to in terms of latency? What to do if all the nameservers are not reachable? What protocol to use for the connection, and are there any better options? The conductor is able to make these decisions by tracking the upstream server’s metrics, such as RTT, QoS, etc. With that knowledge, it can also guess for things like upstream capacity, UDP packet loss, and take necessary actions, e.g. retry when it thinks the previous UDP packet didn’t reach the upstream.

How Rust and Wasm power Cloudflare's 1.1.1.1
Figure 3 I/O conductor

Figure 3 shows a simplified data flow about the conductor. It is called by the exchanger mentioned above, with upstream requests as input. The requests will be deduplicated first: meaning in a small window, if a lot of requests come to the conductor and ask for the same question, only one of them will pass, the others are put into a waiting queue. This is common when a cache entry expires, and can reduce unnecessary network traffic. Then based on the request and upstream metrics, the connection instructor either picks an open connection if available, or generates a set of parameters. With these parameters, the I/O executor is able to connect to the upstream directly, or even take a route via another Cloudflare data center using our Argo Smart Routing technology!

The cache

Caching in a recursive service is critical as a server can return a cached response in under one millisecond, while it will be hundreds of milliseconds to respond on a cache miss. As the memory is a finite resource (and also a shared resource in Cloudflare’s architecture), more efficient use of space for cache was one of the key areas we wanted to improve. The new cache is implemented with a cache replacement data structure (ARC), instead of a KV store. This makes good use of the space on a single node, as less popular entries are progressively evicted, and the data structure is resistant to scans.

Moreover, instead of duplicating the cache across the whole data center with multicast, as we did before, BigPineapple is aware of its peer nodes in the same data center, and relays queries from one node to another if it cannot find an entry in its own cache. This is done by consistent hashing the queries onto the healthy nodes in each data center. So, for example, queries for the same registered domain go through the same subset of nodes, which not only increases the cache hit ratio, but also helps the infrastructure cache, which stores information about performance and features of nameservers.

How Rust and Wasm power Cloudflare's 1.1.1.1
Figure 4 Updated data center layout

Async recursive library

The recursive library is the DNS brain of BigPineapple, as it knows how to find the answer to the question in the query. Starting from the root, it breaks down the client query into subqueries, and uses them to collect knowledge recursively from various authoritative nameservers on the internet. The product of this process is the answer. Thanks to the async/await it can be abstracted as a function like such:

async fn resolve(Request, Exchanger) → Result<Response>;

The function contains all the logic necessary to generate a response to a given request, but it doesn’t do any I/O on its own. Instead, we pass an Exchanger trait(Rust interface) that knows how to exchange DNS messages with upstream authoritative nameservers asynchronously. The exchanger is usually called at various await points – for example, when a recursion starts, one of the first things it does is that it looks up the closest cached delegation for the domain. If it doesn’t have the final delegation in cache, it needs to ask what nameservers are responsible for this domain and wait for the response, before it can proceed any further.

Thanks to this design, which decouples the “waiting for some responses” part from the recursive DNS logic, it is much easier to test by providing a mock implementation of the exchanger. In addition, it makes the recursive iteration code (and DNSSEC validation logic in particular) much more readable, as it’s written sequentially instead of being scattered across many callbacks.

Fun fact: writing a DNS recursive resolver from scratch is not fun at all!

Not only because of the complexity of DNSSEC validation, but also because of the necessary “workarounds” needed for various RFC incompatible servers, forwarders, firewalls, etc. So we ported deckard into Rust to help test it. Additionally, when we started migrating over to this new async recursive library, we first ran it in “shadow” mode: processing real world query samples from the production service, and comparing differences. We’ve done this in the past on Cloudflare’s authoritative DNS service as well. It is slightly more difficult for a recursive service due to the fact that a recursive service has to look up all the data over the Internet, and authoritative nameservers often give different answers for the same query due to localization, load balancing and such, leading to many false positives.

In December 2019, we finally enabled the new service on a public test endpoint (see the announcement) to iron out remaining issues before slowly migrating the production endpoints to the new service. Even after all that, we continued to find edge cases with the DNS recursion (and DNSSEC validation in particular), but fixing and reproducing these issues has become much easier due to the new architecture of the library.

Sandboxed plugins

Having the ability to extend the core DNS functionality on the fly is important for us, thus BigPineapple has its redesigned plugin system. Before, the Lua plugins run in the same memory space as the service itself, and are generally free to do what they want. This is convenient, as we can freely pass memory references between the service and modules using C/FFI. For example, to read a response directly from cache without having to copy to a buffer first. But it is also dangerous, as the module can read uninitialized memory, call a host ABI using a wrong function signature, block on a local socket, or do other undesirable things, in addition the service doesn’t have a way to restrict these behaviors.

So we looked at replacing the embedded Lua runtime with JavaScript, or native modules, but around the same time, embedded runtimes for WebAssembly (Wasm for short) started to appear. Two nice properties of WebAssembly programs are that it allows us to write them in the same language as the rest of the service, and that they run in an isolated memory space. So we started modeling the guest/host interface around the limitations of WebAssembly modules, to see how that would work.

BigPineapple’s Wasm runtime is currently powered by Wasmer. We tried several runtimes over time like Wasmtime, WAVM in the beginning, and found Wasmer was simpler to use in our case. The runtime allows each module to run in its own instance, with an isolated memory and a signal trap, which naturally solved the module isolation problem we described before. In addition to this, we can have multiple instances of the same module running at the same time. Being controlled carefully, the apps can be hot swapped from one instance to another without missing a single request! This is great because the apps can be upgraded on the fly without a server restart. Given that the Wasm programs are distributed via Quicksilver, BigPineapple’s functionality can be safely changed worldwide within a few seconds!

To better understand the WebAssembly sandbox, several terms need to be introduced first:

  • Host: the program which runs the Wasm runtime. Similar to a kernel, it has full control through the runtime over the guest applications.
  • Guest application: the Wasm program inside the sandbox. Within a restricted environment, it can only access its own memory space, which is provided by the runtime, and call the imported Host calls. We call it an app for short.
  • Host call: the functions defined in the host that can be imported by the guest. Comparable to syscall, it’s the only way guest apps can access the resources outside the sandbox.
  • Guest runtime: a library for guest applications to easily interact with the host. It implements some common interfaces, so an app can just use async, socket, log and tracing without knowing the underlying details.

Now it’s time to dive into the sandbox, so stay awhile and listen. First let’s start from the guest side, and see what a common app lifespan looks like. With the help of the guest runtime, guest apps can be written similar to regular programs. So like other executables, an app begins with a start function as an entrypoint, which is called by the host upon loading. It is also provided with arguments as from the command line. At this point, the instance normally does some initialization, and more importantly, registers callback functions for different query phases. This is because in a recursive resolver, a query has to go through several phases before it gathers enough information to produce a response, for example a cache lookup, or making subrequests to resolve a delegation chain for the domain, so being able to tie into these phases is necessary for the apps to be useful for different use cases. The start function can also run some background tasks to supplement the phase callbacks, and store global state. For example – report metrics, or pre-fetch shared data from external sources, etc. Again, just like how we write a normal program.

But where do the program arguments come from? How could a guest app send log and metrics? The answer is, external functions.

How Rust and Wasm power Cloudflare's 1.1.1.1
Figure 5 Wasm based Sandbox

In figure 5, we can see a barrier in the middle, which is the sandbox boundary, that separates the guest from the host. The only way one side can reach out to the other, is via a set of functions exported by the peer beforehand. As in the picture, the “hostcalls” are exported by the host, imported and called by the guest; while the “trampoline” are guest functions that the host has knowledge of.

It is called trampoline because it is used to invoke a function or a closure inside a guest instance that’s not exported. The phase callbacks are one example of why we need a trampoline function: each callback returns a closure, and therefore can’t be exported on instantiation. So a guest app wants to register a callback, it calls a host call with the callback address “hostcall_register_callback(pre_cache, #30987)”, when the callback needs to be invoked, the host cannot just call that pointer as it’s pointing to the guest’s memory space. What it can do instead is, to leverage one of the aforementioned trampolines, and give it the address of the callback closure: “trampoline_call(#30987)”.

Isolation overhead
Like a coin that has two sides, the new sandbox does come with some additional overhead. The portability and isolation that WebAssembly offers bring extra cost. Here, we’ll list two examples.

Firstly, guest apps are not allowed to read host memory. The way it works is the guest provides a memory region via a host call, then the host writes the data into the guest memory space. This introduces a memory copy that would not be needed if we were outside the sandbox. The bad news is, in our use case, the guest apps are supposed to do something on the query and/or the response, so they almost always need to read data from the host on every single request. The good news, on the other hand, is that during a request life cycle, the data won’t change. So we pre-allocate a bulk of memory in the guest memory space right after the guest app instantiates. The allocated memory is not going to be used, but instead serves to occupy a hole in the address space. Once the host gets the address details, it maps a shared memory region with the common data needed by the guest into the guest’s space. When the guest code starts to execute, it can just access the data in the shared memory overlay, and no copy is needed.

Another issue we ran into was when we wanted to add support for a modern protocol, oDoH, into BigPineapple. The main job of it is to decrypt the client query, resolve it, then encrypt the answers before sending it back. By design, this doesn’t belong to core DNS, and should instead be extended with a Wasm app. However, the WebAssembly instruction set doesn’t provide some crypto primitives, such as AES and SHA-2, which prevents it from getting the benefit of host hardware. There is ongoing work to bring this functionality to Wasm with WASI-crypto. Until then, our solution for this is to simply delegate the HPKE to the host via host calls, and we already saw 4x performance improvements, compared to doing it inside Wasm.

Async in Wasm
Remember the problem we talked about before that the callbacks could block the event loop? Essentially, the problem is how to run the sandboxed code asynchronously. Because no matter how complex the request processing callback is, if it can yield, we can put an upper bound on how long it is allowed to block. Luckily, Rust’s async framework is both elegant and lightweight. It gives us the opportunity to use a set of guest calls to implement the “Future”s.

In Rust, a Future is a building block for asynchronous computations. From the user’s perspective, in order to make an asynchronous program, one has to take care of two things: implement a pollable function that drives the state transition, and place a waker as a callback to wake itself up, when the pollable function should be called again due to some external event (e.g. time passes, socket becomes readable, and so on). The former is to be able to progress the program gradually, e.g. read buffered data from I/O and return a new state indicating the status of the task: either finished, or yielded. The latter is useful in case of task yielding, as it will trigger the Future to be polled when the conditions that the task was waiting for are fulfilled, instead of busy looping until it’s complete.

Let’s see how this is implemented in our sandbox. For a scenario when the guest needs to do some I/O, it has to do so via the host calls, as it is inside a restricted environment. Assuming the host provides a set of simplified host calls which mirror the basic socket operations: open, read, write, and close, the guest can have its pseudo poller defined as below:

fn poll(&mut self, wake: fn()) -> Poll {
	match hostcall_socket_read(self.sock, self.buffer) {
    	    HostOk  => Poll::Ready,
    	    HostEof => Poll::Pending,
	}
}

Here the host call reads data from a socket into a buffer, depending on its return value, the function can move itself to one of the states we mentioned above: finished(Ready), or yielded(Pending). The magic happens inside the host call. Remember in figure 5, that it is the only way to access resources? The guest app doesn’t own the socket, but it can acquire a “handle” via “hostcall_socket_open”, which will in turn create a socket on the host side, and return a handle. The handle can be anything in theory, but in practice using integer socket handles map well to file descriptors on the host side, or indices in a vector or slab. By referencing the returned handle, the guest app is able to remotely control the real socket. As the host side is fully asynchronous, it can simply relay the socket state to the guest. If you noticed that the waker function isn’t used above, well done! That’s because when the host call is called, it not only starts opening a socket, but also registers the current waker to be called then the socket is opened (or fails to do so). So when the socket becomes ready, the host task will be woken up, it will find the corresponding guest task from its context, and wakes it up using the trampoline function as shown in figure 5. There are other cases where a guest task needs to wait for another guest task, an async mutex for example. The mechanism here is similar: using host calls to register wakers.

All of these complicated things are encapsulated in our guest async runtime, with easy to use API, so the guest apps get access to regular async functions without having to worry about the underlying details.

(Not) The End

Hopefully, this blog post gave you a general idea of the innovative platform that powers 1.1.1.1. It is still evolving. As of today, several of our products, such as 1.1.1.1 for Families, AS112, and Gateway DNS, are supported by guest apps running on BigPineapple. We are looking forward to bringing new technologies into it. If you have any ideas, please let us know in the community or via email.

One of our most requested features is here: DNS record comments and tags

Post Syndicated from Hannes Gerhart original https://blog.cloudflare.com/dns-record-comments/

One of our most requested features is here: DNS record comments and tags

One of our most requested features is here: DNS record comments and tags

Starting today, we’re adding support on all zone plans to add custom comments on your DNS records. Users on the Pro, Business and Enterprise plan will also be able to tag DNS records.

DNS records are important

DNS records play an essential role when it comes to operating a website or a web application. In general, they are used to mapping human-readable hostnames to machine-readable information, most commonly IP addresses. Besides mapping hostnames to IP addresses they also fulfill many other use cases like:

  • Ensuring emails can reach your inbox, by setting up MX records.
  • Avoiding email spoofing and phishing by configuring SPF, DMARC and DKIM policies as TXT records.
  • Validating a TLS certificate by adding a TXT (or CNAME) record.
  • Specifying allowed certificate authorities that can issue certificates on behalf of your domain by creating a CAA record.
  • Validating ownership of your domain for other web services (website hosting, email hosting, web storage, etc.) – usually by creating a TXT record.
  • And many more.

With all these different use cases, it is easy to forget what a particular DNS record is for and it is not always possible to derive the purpose from the name, type and content of a record. Validation TXT records tend to be on seemingly arbitrary names with rather cryptic content. When you then also throw multiple people or teams into the mix who have access to the same domain, all creating and updating DNS records, it can quickly happen that someone modifies or even deletes a record causing the on-call person to get paged in the middle of the night.

Enter: DNS record comments & tags 📝

Starting today, everyone with a zone on Cloudflare can add custom comments on each of their DNS records via the API and through the Cloudflare dashboard.

One of our most requested features is here: DNS record comments and tags

To add a comment, just click on the Edit action of the respective DNS record and fill out the Comment field. Once you hit Save, a small icon will appear next to the record name to remind you that this record has a comment. Hovering over the icon will allow you to take a quick glance at it without having to open the edit panel.

One of our most requested features is here: DNS record comments and tags

What you also can see in the screenshot above is the new Tags field. All users on the Pro, Business, or Enterprise plans now have the option to add custom tags to their records. These tags can be just a key like “important” or a key-value pair like “team:DNS” which is separated by a colon. Neither comments nor tags have any impact on the resolution or propagation of the particular DNS record, and they’re only visible to people with access to the zone.

Now we know that some of our users love automation by using our API. So if you want to create a number of zones and populate all their DNS records by uploading a zone file as part of your script, you can also directly include the DNS record comments and tags in that zone file. And when you export a zone file, either to back up all records of your zone or to easily move your zone to another account on Cloudflare, it will also contain comments and tags. Learn more about importing and exporting comments and tags on our developer documentation.

;; A Records
*.mycoolwebpage.xyz.     1      IN  A    192.0.2.3
mycoolwebpage.xyz.       1      IN  A    203.0.113.1 ; Contact Hannes for details.
sub1.mycoolwebpage.xyz.  1      IN  A    192.0.2.2 ; Test origin server. Can be deleted eventually. cf_tags=testing
sub1.mycoolwebpage.xyz.  1      IN  A    192.0.2.1 ; Production origin server. cf_tags=important,prod,team:DNS

;; MX Records
mycoolwebpage.xyz.       1      IN  MX   1 mailserver1.example.
mycoolwebpage.xyz.       1      IN  MX   2 mailserver2.example.

;; TXT Records
mycoolwebpage.xyz.       86400	IN  TXT  "v=spf1 ip4:192.0.2.0/24 -all" ; cf_tags=important,team:EMAIL
sub1.mycoolwebpage.xyz.  86400  IN  TXT  "hBeFxN3qZT40" ; Verification record for service XYZ. cf_tags=team:API

New filters

It might be that your zone has hundreds or thousands of DNS records, so how on earth would you find all the records that belong to the same team or that are needed for one particular application?

For this we created a new filter option in the dashboard. This allows you to not only filter for comments or tags but also for other record data like name, type, content, or proxy status. The general search bar for a quick and broader search will still be available, but it cannot (yet) be used in conjunction with the new filters.

One of our most requested features is here: DNS record comments and tags

By clicking on the “Add filter” button, you can select individual filters that are connected with a logical AND. So if I wanted to only look at TXT records that are tagged as important, I would add these filters:

One more thing (or two)

Another change we made is to replace the Advanced button with two individual actions: Import and Export, and Dashboard Display Settings.

You can find them in the top right corner under DNS management. When you click on Import and Export you have the option to either export all existing DNS records (including their comments and tags) into a zone file or import new DNS records to your zone by uploading a zone file.

The action Dashboard Display Settings allows you to select which special record types are shown in the UI. And there is an option to toggle showing the record tags inline under the respective DNS record or just showing an icon if there are tags present on the record.

And last but not least, we increased the width of the DNS record table as part of this release. The new table makes better use of the existing horizontal space and allows you to see more details of your DNS records, especially if you have longer subdomain names or content.

Try it now

DNS record comments and tags are available today. Just navigate to the DNS tab of your zone in the Cloudflare dashboard and create your first comment or tag. If you are not yet using Cloudflare DNS, sign up for free in just a few minutes.

Learn more about DNS record comments and tags on our developer documentation.

Cloudflare is joining the AS112 project to help the Internet deal with misdirected DNS queries

Post Syndicated from Hunts Chen original https://blog.cloudflare.com/the-as112-project/

Cloudflare is joining the AS112 project to help the Internet deal with misdirected DNS queries

Cloudflare is joining the AS112 project to help the Internet deal with misdirected DNS queries

Today, we’re excited to announce that Cloudflare is participating in the AS112 project, becoming an operator of this community-operated, loosely-coordinated anycast deployment of DNS servers that primarily answer reverse DNS lookup queries that are misdirected and create significant, unwanted load on the Internet.

With the addition of Cloudflare global network, we can make huge improvements to the stability, reliability and performance of this distributed public service.

What is AS112 project

The AS112 project is a community effort to run an important network service intended to handle reverse DNS lookup queries for private-only use addresses that should never appear in the public DNS system. In the seven days leading up to publication of this blog post, for example, Cloudflare’s 1.1.1.1 resolver received more than 98 billion of these queries — all of which have no useful answer in the Domain Name System.

Some history is useful for context. Internet Protocol (IP) addresses are essential to network communication. Many networks make use of IPv4 addresses that are reserved for private use, and devices in the network are able to connect to the Internet with the use of network address translation (NAT), a process that maps one or more local private addresses to one or more global IP addresses and vice versa before transferring the information.

Your home Internet router most likely does this for you. You will likely find that, when at home, your computer has an IP address like 192.168.1.42. That’s an example of a private use address that is fine to use at home, but can’t be used on the public Internet. Your home router translates it, through NAT, to an address your ISP assigned to your home and that can be used on the Internet.

Here are the reserved “private use” addresses designated in RFC 1918.

Address block Address range Number of addresses
10.0.0.0/8 10.0.0.0 – 10.255.255.255 16,777,216
172.16.0.0/12 172.16.0.0 – 172.31.255.255 1,048,576
192.168.0.0/16 192.168.0.0 – 192.168.255.255 65,536

(Reserved private IPv4 network ranges)

Although the reserved addresses themselves are blocked from ever appearing on the public Internet, devices and programs in private environments may occasionally originate DNS queries corresponding to those addresses. These are called “reverse lookups” because they ask the DNS if there is a name associated with an address.

Reverse DNS lookup

A reverse DNS lookup is an opposite process of the more commonly used DNS lookup (which is used every day to translate a name like www.cloudflare.com to its corresponding IP address). It is a query to look up the domain name associated with a given IP address, in particular those addresses associated with routers and switches. For example, network administrators and researchers use reverse lookups to help understand paths being taken by data packets in the network, and it’s much easier to understand meaningful names than a meaningless number.

A reverse lookup is accomplished by querying DNS servers for a pointer record (PTR). PTR records store IP addresses with their segments reversed, and by appending “.in-addr.arpa” to the end. For example, the IP address 192.0.2.1 will have the PTR record stored as 1.2.0.192.in-addr.arpa. In IPv6, PTR records are stored within the “.ip6.arpa” domain instead of “.in-addr.arpa.”. Below are some query examples using the dig command line tool.

# Lookup the domain name associated with IPv4 address 172.64.35.46
# “+short” option make it output the short form of answers only
$ dig @1.1.1.1 PTR 46.35.64.172.in-addr.arpa +short
hunts.ns.cloudflare.com.

# Or use the shortcut “-x” for reverse lookups
$ dig @1.1.1.1 -x 172.64.35.46 +short
hunts.ns.cloudflare.com.

# Lookup the domain name associated with IPv6 address 2606:4700:58::a29f:2c2e
$ dig @1.1.1.1 PTR e.2.c.2.f.9.2.a.0.0.0.0.0.0.0.0.0.0.0.0.8.5.0.0.0.0.7.4.6.0.6.2.ip6.arpa. +short
hunts.ns.cloudflare.com.

# Or use the shortcut “-x” for reverse lookups
$ dig @1.1.1.1 -x 2606:4700:58::a29f:2c2e +short  
hunts.ns.cloudflare.com.

The problem that private use addresses cause for DNS

The private use addresses concerned have only local significance and cannot be resolved by the public DNS. In other words, there is no way for the public DNS to provide a useful answer to a question that has no global meaning. It is therefore a good practice for network administrators to ensure that queries for private use addresses are answered locally. However, it is not uncommon for such queries to follow the normal delegation path in the public DNS instead of being answered within the network. That creates unnecessary load.

By definition of being private use, they have no ownership in the public sphere, so there are no authoritative DNS servers to answer the queries. At the very beginning, root servers respond to all these types of queries since they serve the IN-ADDR.ARPA zone.

Over time, due to the wide deployment of private use addresses and the continuing growth of the Internet, traffic on the IN-ADDR.ARPA DNS infrastructure grew and the load due to these junk queries started to cause some concern. Therefore, the idea of offloading IN-ADDR.ARPA queries related to private use addresses was proposed. Following that, the use of anycast for distributing authoritative DNS service for that idea was subsequently proposed at a private meeting of root server operators. And eventually the AS112 service was launched to provide an alternative target for the junk.

The AS112 project is born

To deal with this problem, the Internet community set up special DNS servers called “blackhole servers” as the authoritative name servers that respond to the reverse lookup of the private use address blocks 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 and the link-local address block 169.254.0.0/16 (which also has only local significance). Since the relevant zones are directly delegated to the blackhole servers, this approach has come to be known as Direct Delegation.

The first two blackhole servers set up by the project are: blackhole-1.iana.org and blackhole-2.iana.org.

Any server, including DNS name server, needs an IP address to be reachable. The IP address must also be associated with an Autonomous System Number (ASN) so that networks can recognize other networks and route data packets to the IP address destination. To solve this problem, a new authoritative DNS service would be created but, to make it work, the community would have to designate IP addresses for the servers and, to facilitate their availability, an AS number that network operators could use to reach (or provide) the new service.

The selected AS number (provided by American Registry for Internet Numbers) and namesake of the project, was 112. It was started by a small subset of root server operators, later grown to a group of volunteer name server operators that include many other organizations. They run anycasted instances of the blackhole servers that, together, form a distributed sink for the reverse DNS lookups for private network and link-local addresses sent to the public Internet.

A reverse DNS lookup for a private use address would see responses like in the example below, where the name server blackhole-1.iana.org is authoritative for it and says the name does not exist, represented in DNS responses by NXDOMAIN.

$ dig @blackhole-1.iana.org -x 192.168.1.1 +nord

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 23870
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;1.1.168.192.in-addr.arpa.	IN	PTR

;; AUTHORITY SECTION:
168.192.in-addr.arpa.	10800	IN	SOA	168.192.in-addr.arpa. nobody.localhost. 42 86400 43200 604800 10800

At the beginning of the project, node operators set up the service in the direct delegation fashion (RFC 7534). However, adding delegations to this service requires all AS112 servers to be updated, which is difficult to ensure in a system that is only loosely-coordinated. An alternative approach using DNAME redirection was subsequently introduced by RFC 7535 to allow new zones to be added to the system without reconfiguring the blackhole servers.

Direct delegation

DNS zones are directly delegated to the blackhole servers in this approach.

RFC 7534 defines the static set of reverse lookup zones for which AS112 name servers should answer authoritatively. They are as follows:

  • 10.in-addr-arpa
  • 16.172.in-addr.arpa
  • 17.172.in-addr.arpa
  • 18.172.in-addr.arpa
  • 19.172.in-addr.arpa
  • 20.172.in-addr.arpa
  • 21.172.in-addr.arpa
  • 22.172.in-addr.arpa
  • 23.172.in-addr.arpa
  • 24.172.in-addr.arpa
  • 25.172.in-addr.arpa
  • 26.172.in-addr.arpa
  • 27.172.in-addr.arpa
  • 28.172.in-addr.arpa
  • 29.172.in-addr.arpa
  • 30.172.in-addr.arpa
  • 31.172.in-addr.arpa
  • 168.192.in-addr.arpa
  • 254.169.in-addr.arpa (corresponding to the IPv4 link-local address block)

Zone files for these zones are quite simple because essentially they are empty apart from the required  SOA and NS records. A template of the zone file is defined as:

  ; db.dd-empty
   ;
   ; Empty zone for direct delegation AS112 service.
   ;
   $TTL    1W
   @  IN  SOA  prisoner.iana.org. hostmaster.root-servers.org. (
                                  1         ; serial number
                                  1W      ; refresh
                                  1M      ; retry
                                  1W      ; expire
                                  1W )    ; negative caching TTL
   ;
          NS     blackhole-1.iana.org.
          NS     blackhole-2.iana.org.

IP addresses of the direct delegation name servers are covered by the single IPv4 prefix 192.175.48.0/24 and the IPv6 prefix 2620:4f:8000::/48.

Name server IPv4 address IPv6 address
blackhole-1.iana.org 192.175.48.6 2620:4f:8000::6
blackhole-2.iana.org 192.175.48.42 2620:4f:8000::42

DNAME redirection

Firstly, what is DNAME? Introduced by RFC 6672, a DNAME record or Delegation Name Record creates an alias for an entire subtree of the domain name tree. In contrast, the CNAME record creates an alias for a single name and not its subdomains. For a received DNS query, the DNAME record instructs the name server to substitute all those appearing in the left hand (owner name) with the right hand (alias name). The substituted query name, like the CNAME, may live within the zone or may live outside the zone.

Like the CNAME record, the DNS lookup will continue by retrying the lookup with the substituted name. For example, if there are two DNS zone as follows:

# zone: example.com
www.example.com.	A		203.0.113.1
foo.example.com.	DNAME	example.net.

# zone: example.net
example.net.		A		203.0.113.2
bar.example.net.	A		203.0.113.3

The query resolution scenarios would look like this:

Query (Type + Name) Substitution Final result
A www.example.com (no DNAME, don’t apply) 203.0.113.1
DNAME foo.example.com (don’t apply to the owner name itself) example.net
A foo.example.com (don’t apply to the owner name itself) <NXDOMAIN>
A bar.foo.example.com bar.example.net 203.0.113.2

RFC 7535 specifies adding another special zone, empty.as112.arpa, to support DNAME redirection for AS112 nodes. When there are new zones to be added, there is no need for AS112 node operators to update their configuration: instead, the zones’ parents will set up DNAME records for the new domains with the target domain empty.as112.arpa. The redirection (which can be cached and reused) causes clients to send future queries to the blackhole server that is authoritative for the target zone.

Note that blackhole servers do not have to support DNAME records themselves, but they do need to configure the new zone to which root servers will redirect queries at. Considering there may be existing node operators that do not update their name server configuration for some reasons and in order to not cause interruption to the service, the zone was delegated to a new blackhole server instead – blackhole.as112.arpa.

This name server uses a new pair of IPv4 and IPv6 addresses, 192.31.196.1 and 2001:4:112::1, so queries involving DNAME redirection will only land on those nodes operated by entities that also set up the new name server. Since it is not necessary for all AS112 participants to reconfigure their servers to serve empty.as112.arpa from this new server for this system to work, it is compatible with the loose coordination of the system as a whole.

The zone file for empty.as112.arpa is defined as:

   ; db.dr-empty
   ;
   ; Empty zone for DNAME redirection AS112 service.
   ;
   $TTL    1W
   @  IN  SOA  blackhole.as112.arpa. noc.dns.icann.org. (
                                  1         ; serial number
                                  1W      ; refresh
                                  1M      ; retry
                                  1W      ; expire
                                  1W )    ; negative caching TTL
   ;
          NS     blackhole.as112.arpa.

The addresses of the new DNAME redirection name server are covered by the single IPv4 prefix 192.31.196.0/24 and the IPv6 prefix 2001:4:112::/48.

Name server IPv4 address IPv6 address
blackhole.as112.arpa 192.31.196.1 2001:4:112::1

Node identification

RFC 7534 recommends every AS112 node also to host the following metadata zones as well: hostname.as112.net and hostname.as112.arpa.

These zones only host TXT records and serve as identifiers for querying metadata information about an AS112 node. At Cloudflare nodes, the zone files look like this:

$ORIGIN hostname.as112.net.
;
$TTL    604800
;
@       IN  SOA     ns3.cloudflare.com. dns.cloudflare.com. (
                       1                ; serial number
                       604800           ; refresh
                       60               ; retry
                       604800           ; expire
                       604800 )         ; negative caching TTL
;
            NS      blackhole-1.iana.org.
            NS      blackhole-2.iana.org.
;
            TXT     "Cloudflare DNS, <DATA_CENTER_AIRPORT_CODE>"
            TXT     "See http://www.as112.net/ for more information."
;

$ORIGIN hostname.as112.arpa.
;
$TTL    604800
;
@       IN  SOA     ns3.cloudflare.com. dns.cloudflare.com. (
                       1                ; serial number
                       604800           ; refresh
                       60               ; retry
                       604800           ; expire
                       604800 )         ; negative caching TTL
;
            NS      blackhole.as112.arpa.
;
            TXT     "Cloudflare DNS, <DATA_CENTER_AIRPORT_CODE>"
            TXT     "See http://www.as112.net/ for more information."
;

Helping AS112 helps the Internet

As the AS112 project helps reduce the load on public DNS infrastructure, it plays a vital role in maintaining the stability and efficiency of the Internet. Being a part of this project aligns with Cloudflare’s mission to help build a better Internet.

Cloudflare is one of the fastest global anycast networks on the planet, and operates one of the largest, highly performant and reliable DNS services. We run authoritative DNS for millions of Internet properties globally. We also operate the privacy- and performance-focused public DNS resolver 1.1.1.1 service. Given our network presence and scale of operations, we believe we can make a meaningful contribution to the AS112 project.

How we built it

We’ve publicly talked about the Cloudflare in-house built authoritative DNS server software, rrDNS, several times in the past, but haven’t talked much about the software we built to power the Cloudflare public resolver – 1.1.1.1. This is an opportunity to shed some light on the technology we used to build 1.1.1.1, because this AS112 service is built on top of the same platform.

A platform for DNS workloads

Cloudflare is joining the AS112 project to help the Internet deal with misdirected DNS queries

We’ve created a platform to run DNS workloads. Today, it powers 1.1.1.1, 1.1.1.1 for Families, Oblivious DNS over HTTPS (ODoH), Cloudflare WARP and Cloudflare Gateway.

The core part of the platform is a non-traditional DNS server, which has a built-in DNS recursive resolver and a forwarder to forward queries to other servers. It consists of four key modules:

  1. A highly efficient listener module that accepts connections for incoming requests.
  2. A query router module that decides how a query should be resolved.
  3. A conductor module that figures out the best way of exchanging DNS messages with upstream servers.
  4. A sandbox environment to host guest applications.

The DNS server itself doesn’t include any business logic, instead the guest applications run in the sandbox environment can implement concrete business logic such as request filtering, query processing, logging, attack mitigation, cache purging, etc.

The server is written in Rust and the sandbox environment is built on top of a WebAssembly runtime. The combination of Rust and WebAssembly allow us to implement high efficient connection handling, request filtering and query dispatching modules, while having the flexibility of implementing custom business logic in a safe and efficient manner.

The host exposes a set of APIs, called hostcalls, for the guest applications to accomplish a variety of tasks. You can think of them like syscalls on Linux. Here are few examples functions provided by the hostcalls:

  • Obtain the current UNIX timestamp
  • Lookup geolocation data of IP addresses
  • Spawn async tasks
  • Create local sockets
  • Forward DNS queries to designated servers
  • Register callback functions of the sandbox hooks
  • Read current request information, and write responses
  • Emit application logs, metric data points and tracing spans/events

The DNS request lifecycle is broken down into phases. A request phase is a point in processing at which sandboxed apps can be called to change the course of request resolution. And each guest application can register callbacks for each phase.

Cloudflare is joining the AS112 project to help the Internet deal with misdirected DNS queries

AS112 guest application

The AS112 service is built as a guest application written in Rust and compiled to WebAssembly. The zones listed in RFC 7534 and RFC 7535 are loaded as static zones in memory and indexed as a tree data structure. Incoming queries are answered locally by looking up entries in the zone tree.

A router setting in the app manifest is added to tell the host what kind of DNS queries should be processed by the guest application, and a fallback_action setting is added to declare the expected fallback behavior.

# Declare what kind of queries the app handles.
router = [
    # The app is responsible for all the AS112 IP prefixes.
    "dst in { 192.31.196.0/24 192.175.48.0/24 2001:4:112::/48 2620:4f:8000::/48 }",
]

# If the app fails to handle the query, servfail should be returned.
fallback_action = "fail"

The guest application, along with its manifest, is then compiled and deployed through a deployment pipeline that leverages Quicksilver to store and replicate the assets worldwide.

The guest application is now up and running, but how does the DNS query traffic destined to the new IP prefixes reach the DNS server? Do we have to restart the DNS server every time we add a new guest application? Of course there is no need. We use software we developed and deployed earlier, called Tubular. It allows us to change the addresses of a service on the fly. With the help of Tubular, incoming packets destined to the AS112 service IP prefixes are dispatched to the right DNS server process without the need to make any change or release of the DNS server itself.

Meanwhile, in order to make the misdirected DNS queries land on the Cloudflare network in the first place, we use BYOIP (Bringing Your Own IPs to Cloudflare), a Cloudflare product that can announce customer’s own IP prefixes in all our locations. The four AS112 IP prefixes are boarded onto the BYOIP system, and will be announced by it globally.

Testing

How can we ensure the service we set up does the right thing before we announce it to the public Internet? 1.1.1.1 processes more than 13 billion of these misdirected queries every day, and it has logic in place to directly return NXDOMAIN for them locally, which is a recommended practice per RFC 7534.

However, we are able to use a dynamic rule to change how the misdirected queries are handled in Cloudflare testing locations. For example, a rule like following:

phase = post-cache and qtype in { PTR } and colo in { test1 test2 } and qname-suffix in { 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa 168.192.in-addr.arpa 254.169.in-addr.arpa } forward 192.175.48.6:53

The rule instructs that in data center test1 and test2, when the DNS query type is PTR, and the query name ends with those in the list, forward the query to server 192.175.48.6 (one of the AS112 service IPs) on port 53.

Because we’ve provisioned the AS112 IP prefixes in the same node, the new AS112 service will receive the queries and respond to the resolver.

It’s worth mentioning that the above-mentioned dynamic rule that intercepts a query at the post-cache phase, and changes how the query gets processed, is executed by a guest application too, which is named override. This app loads all dynamic rules, parses the DSL texts and registers callback functions at phases declared by each rule. And when an incoming query matches the expressions, it executes the designated actions.

Public reports

We collect the following metrics to generate the public statistics that an AS112 operator is expected to share to the operator community:

  • Number of queries by query type
  • Number of queries by response code
  • Number of queries by protocol
  • Number of queries by IP versions
  • Number of queries with EDNS support
  • Number of queries with DNSSEC support
  • Number of queries by ASN/Data center

We’ll serve the public statistics page on the Cloudflare Radar website. We are still working on implementing the required backend API and frontend of the page – we’ll share the link to this page once it is available.

What’s next?

We are going to announce the AS112 prefixes starting December 15, 2022.

After the service is launched, you can run a dig command to check if you are hitting an AS112 node operated by Cloudflare, like:

$ dig @blackhole-1.iana.org TXT hostname.as112.arpa +short

"Cloudflare DNS, SFO"
"See http://www.as112.net/ for more information."

How to control non-HTTP and non-HTTPS traffic to a DNS domain with AWS Network Firewall and AWS Lambda

Post Syndicated from Tyler Applebaum original https://aws.amazon.com/blogs/security/how-to-control-non-http-and-non-https-traffic-to-a-dns-domain-with-aws-network-firewall-and-aws-lambda/

Security and network administrators can control outbound access from a virtual private cloud (VPC) to specific destinations by using a service like AWS Network Firewall. You can use stateful rule groups to control outbound access to domains for HTTP and HTTPS by default in Network Firewall. In this post, we’ll walk you through how to accomplish this access control for non-HTTP and non-HTTPS traffic, such as SSH (Secure Shell). This solution is extensible to other protocols with static port assignments.

In the example scenario in this post, the network administrator needs to permit outbound SSH access on port 22/tcp to a third-party domain, example.org, from a group of Amazon Elastic Compute Cloud (Amazon EC2) instances that sits inside of a protected VPC that restricts outbound SSH traffic with Network Firewall. Non-HTTP traffic can’t currently be controlled with a domain rule in Network Firewall.

This solution allows administrators to control outbound access to a given domain in a granular way, by resolving the domain name inside of an AWS Lambda function, and updating a Network Firewall rule variable with the results of the DNS query. This solution further restricts specific non-HTTP and non-HTTPS traffic to those allowed domains to only what is explicitly specified by the administrator.

Solution overview

Figure 1 provides an overview of the solution and the resulting traffic flow.

Figure 1: Overview of the solution and the resulting traffic flow

Figure 1: Overview of the solution and the resulting traffic flow

The solution workflow is as follows:

  1. An Amazon EventBridge rule invokes the Lambda function every 10 minutes. You can modify this frequency to meet your needs. You should consider the time-to-live (TTL) record of the DNS record that you are configuring when choosing this interval.
  2. The Lambda function performs the DNS lookup for the provided domain, and updates a variable in an existing Network Firewall rule group. The rule group changes take a few seconds to fully apply to the nodes in your Network Firewall deployment.
  3. The newly created Network Firewall rule group is associated with the Network Firewall policy to control traffic.
  4. Traffic from the instances in your VPC flows through the Network Firewall endpoint, and if allowed, is routed through an internet gateway to the target server.

Prerequisites

This solution has the following prerequisites:

  1. An AWS account. If you don’t have an AWS account, create and activate one.
  2. An existing VPC with default routing to an internet gateway through a network firewall that has a firewall policy attached to it. The example rule included in the solution’s AWS CloudFormation template expects the firewall policy to use the default action order for stateful rule groups. If you don’t have an existing network firewall associated with your VPC, see the AWS Network Firewall Developer Guide to get started. For a walkthrough of the Network Firewall configuration and rules engine, see the blog post Hands-on walkthrough of the AWS Network Firewall flexible rules engine – Part 1.
  3. A DNS domain that you provide, which allows traffic for the protocol and port (or ports) that you plan to allow traffic to. This DNS domain needs to resolve to an IPv4 address or set of addresses; IPv6 is not supported, at this point.

Deploy the solution

We’ve provided a CloudFormation template to deploy this solution, which is located in the GitHub repository that accompanies this blog post.

To deploy the solution

  1. Download the CloudFormation template from our GitHub repository.
  2. Sign in to your AWS account and select the AWS Region where your Network Firewall is deployed.
  3. Navigate to the CloudFormation service.
  4. Choose Stacks > Create Stack > With new resources (standard).
  5. In the Specify template section, choose Upload a template file.
  6. Choose Choose file, navigate to where you saved the CloudFormation template, and upload it. Then choose Next.
  7. Specify a stack name for your CloudFormation stack.
  8. In the Parameters section, for the Domain parameter, specify the name of the domain to which you will control access. The default value is set to example.org; however, note that the actual example.org doesn’t allow SSH traffic.
  9. The remaining parameters have defaults to allow outbound SSH traffic to the specified domain. Adjust the LambdaJobFrequency variable so that it corresponds with the TTL of the DNS record that it will resolve. This allows the Lambda function to keep the IP address of the DNS record up to date, in the event that it changes. After you’ve configured the parameters, choose Next.
    Figure 2: CloudFormation stack parameters

    Figure 2: CloudFormation stack parameters

  10. On the Configure stack options page, specify any further options needed or keep the default options, and then choose Next.
  11. On the Review page, review the stack and parameters and select the check box to acknowledge that this template will create IAM resources. Choose Create Stack.
  12. Check the stack creation status. Upon successful completion, the status shows CREATE_COMPLETE.
    Figure 3: The successful creation of the CloudFormation stack

    Figure 3: The successful creation of the CloudFormation stack

Test the solution

Before you test the newly created rule, make sure that the Lambda function has been invoked at least once from the EventBridge rule.

To verify the Lambda function results

  1. In the AWS Management Console, navigate to the Lambda function Network-Firewall-Resolver-Function, and on the Monitor tab, choose View logs in CloudWatch.
    Figure 4: Navigating to view logs in CloudWatch

    Figure 4: Navigating to view logs in CloudWatch

  2. Select the most recent log stream.
  3. Verify that that a log line contains the entry StatefulRuleGroup updated successfully.
    Figure 5: Examining the CloudWatch logs to verify that the Lambda function ran successfully

    Figure 5: Examining the CloudWatch logs to verify that the Lambda function ran successfully

  4. Associate the stateful rule group that was created by the stack, Lambda-Managed-Stateful-Rule with the existing Network Firewall policy that is attached to your VPC. To do this:
    1. Navigate to VPC > Network Firewall > Firewall Policies and select your existing firewall policy.
    2. In the Stateful rule groups section, for Actions, choose Add unmanaged stateful rule groups.
  5. Select the check box for Lambda-Managed-Stateful-Rule, and then choose Add stateful rule group.
  6. When the newly provisioned Lambda function runs successfully, it will resolve the IPv4 address for the domain (example.org) and associate the address with the stateful rule variable IP_NET. To validate that this has happened, do the following:
    1. Navigate to VPC > Network Firewall > Network Firewall rule groups.
    2. Choose the Lambda-Managed-Stateful-Rule rule group.
    3. Navigate to the rule variable section, and choose IP_NET. If the Lambda function successfully resolved the provided domain name, the variable will contain the IPv4 addresses for the domain you provided, as shown in Figure 6.
      Figure 6: Validating the rule variable details

      Figure 6: Validating the rule variable details

  7. Test the rule by attempting to connect to the domain that you specified in the CloudFormation template. Use an EC2 instance within the VPC that the network firewall rule is associated with, and attempt to establish an SSH connection to the domain that you specified. As shown by the SSH key negotiation in Figure 7, traffic is allowed through the network firewall, as intended.
    Figure 7: SSH connectivity to the domain was successful

    Figure 7: SSH connectivity to the domain was successful

    You can also configure the rule to drop the SSH connection, rather than permit it. To do this:

    1. Navigate to VPC > Network Firewall > Network Firewall rule groups.
    2. Choose the Lambda-Managed-Stateful-Rule rule group. In the Rules section, choose Edit Rules.
    3. Modify the rule to take the Drop action, and save the rule group.

    As shown by the lack of response from the host in Figure 8, the SSH connection cannot be established anymore.

    Figure 8: An SSH connection cannot be established, due to the connection timing out

    Figure 8: An SSH connection cannot be established, due to the connection timing out

Cleanup

Follow the steps in this section to remove the resources created by this solution.

To remove the resources

  1. Sign in to your AWS account where you deployed the CloudFormation stack and navigate to the Network Firewall console.
  2. In the Stateful rule groups section, select the check box for Lambda-Managed-Stateful-Rule. For Actions, choose Disassociate from policy.
    Figure 9: Disassociating the stateful rule from the existing policy

    Figure 9: Disassociating the stateful rule from the existing policy

  3. Navigate to the CloudFormation console, select the stack that you created, and then choose Delete. Upon successful deletion, the resources created by the stack will be deleted.

Conclusion

In this post, we’ve demonstrated how security and network administrators have the ability to permit or restrict non-HTTP and non-HTTPS traffic to a given domain by using Network Firewall. With this solution, administrators can enforce granular port- and protocol-level control to third-party domains. To learn more about rule group configuration in AWS Network Firewall, see Managing your own rule groups in the Developer Guide.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support. You can also start a new thread on AWS Network Firewall re:Post to get answers from the community.

Want more AWS Security news? Follow us on Twitter.

Tyler Applebaum

Tyler Applebaum

Tyler is a Sr. Solutions Architect in the Charlotte, NC area helping customers migrate to AWS and modernize their applications. He has previous experience as a network engineer working in healthcare and finance.

Bhavin Lakhani

Bhavin is a Technical Account Manager in the US West, Northern California, helping customers with their AWS Enterprise Support, Operational, and Architectural needs. He has cloud infrastructure and database platform leadership experience in his previous roles. He loves soccer and is an ardent “Arsenal” fan.

Goodbye, Alexa. Hello, Cloudflare Radar Domain Rankings

Post Syndicated from Celso Martinho original https://blog.cloudflare.com/radar-domain-rankings/

Goodbye, Alexa. Hello, Cloudflare Radar Domain Rankings

Goodbye, Alexa. Hello, Cloudflare Radar Domain Rankings

The Internet is a living organism. Technology changes, shifts in human behavior, social events, intentional disruptions, and other occurrences change the Internet in unpredictable ways, even to the trained eye.

Cloudflare Radar has long been the place to visit for accessing data and getting unique insights into how people and organizations are using the Internet across the globe, as well as those unpredictable changes to the Internet.

One of the most popular features on Radar has always been the “Most Popular Domains,” with both global and country-level perspectives. Domain usage signals provide a proxy for user behavior over time and are a good representation of what people are doing on the Internet.

Today, we’re going one step further and launching a new dataset called Radar Domain Rankings (Beta). Domain Rankings is based on aggregated 1.1.1.1 resolver data that is anonymized in accordance with our privacy commitments. The dataset aims to identify the top most popular domains based on how people use the Internet globally, without tracking individuals’ Internet use.

There are a few reasons why we’re doing this now. One is obviously to improve our Radar features with better data and incorporate new learnings. But also, ranking lists are used all over the Internet in all sorts of systems. One of the most used and trusted sources of domain rankings was Alexa, but that service was recently deprecated. We believe we are in a good position to provide a strong alternative.

Let’s see how we built it.

Differences in domain names

Before we dig into the data science behind Domain Rankings, it’s important to understand what a domain and DNS are. Internet domain names are human-readable dot-separated letters, digits and hyphens that correspond to a network resource, like a server or a website. However, your computer and applications don’t know what to do with a domain name; they need IP addresses to send and receive information over the network. DNS is the system that converts, or resolves, a domain name into an IP address. Think of it as an Internet phonebook for domain names.

Note: This is a simplification. A new standard called Internationalized Domain Names, or IDN, allows using Unicode strings in domain names.

Each dot defines a new hierarchy level, reading right to left. Domains can have multiple levels of depth. The highest level corresponds to country code top-level domains (ccTLDs) like .uk, .fr or .pt, or generic top-level domains (gTLDs) like .com, .org, or .net. These are normally assigned to and managed by either country-level entities or administrative organizations operating a registry.

Then there are the second-level domains like cloudflare.com or google.com. These are normally purchased and registered by individuals or organizations, which are then free to create and manage as many hostnames and hierarchy levels as they want.

Unfortunately, however, there are exceptions. For instance, many countries use second-level domain registration. One such example is the United Kingdom, where commercial domains can only be registered under the .co.uk hierarchy. That’s why Google in the UK isn’t google.uk, but rather google.co.uk.

But that’s not all. Some countries use 3rd level domain registrations. One example is Japan, which offers Regional Domain registration under cities like *.aisai.aichi.jp.

Projects like the Public Suffix List are a good starting point for understanding the variations involved, and how they affect validations and assumptions in other systems, such as cookies in web browsers.

Domain Rankings takes some of this nuance into account to inform the definition of our current ruleset:

  • We boil everything down to second-level domains, such as cloudflare.com or google.com.
  • However, if the second level is .edu, .com, .org, .gov, .net, .gov, .net, .co or .mil, then we use third-level domains.
  • We don’t distinguish between what we think is a website or an infrastructure system. A domain represents an Internet-available resource.
  • We will also semi-automate, curate and maintain a list of domains that map to popular platforms and services in the future. Example: fb.audio, fb.com, fb.watch, all map to a “facebook” platform.

Defining popularity

Definitions are important. We established what we consider a domain, but what does domain popularity mean exactly? Our research showed that the volume of traffic generated to a given domain doesn’t really work as a proxy for what we perceive as popular. Instead, Domain Rankings looks at the size of the population of users that look up a domain per unit of time. The more people who are interested in a domain, the more popular it is.

Sounds pretty straightforward, right? Well, it’s not. Our databases don’t have cookies, IPs, or other tracking artifacts, and we strip information that leads to identifying an individual from all of our data, by design.

The good news, however, is that we do a very good job at identifying automated traffic (for instance, you can read about Bot Management and how we use Machine Learning to detect bots in HTTP traffic in our blog) and we found we could develop a reasonable proxy for the unique users metric without sacrificing privacy (using other data points that we store for a limited period of time, like the ASN and high-level geolocation information of the request or the Cloudflare data center that served it).

Domain Rankings’ popularity metric is best described as the estimated relative size of the user population that accesses a domain over some period of time.

Our approach

We announced 1.1.1.1, our privacy-first consumer DNS resolver in 2018, and over the years it’s grown to become one of the top DNS services in the world. 1.1.1.1 is also part of a Research Agreement with APNIC in which we collaborate with them doing public research and DNS data insights.

The data we collect from it honors our privacy commitments, and is aggregated and stripped of any information that could lead to identifying or tracking users. We conducted a privacy examination by a Big Four accounting firm to determine whether the 1.1.1.1 resolver was effectively configured to meet our privacy commitments. You can read more about it in this blog, and the full report is publicly available on our compliance page.

Even without this personally identifying information, the resulting collection is vast and representative of Internet activity.

The 1.1.1.1 service is used in many ways. Regular (human) Internet users use it as their DNS resolver, either because they explicitly configured it in their devices, or their ISP did, or because they use WARP, or their browser uses 1.1.1.1 under the hood. However, servers and cloud infrastructure, IoT devices, home routers, and bots also use 1.1.1.1 extensively, which creates a lot of challenges for us when trying to identify human traffic.

We’ve been using DNS data to calculate the top and trending domains found on both the global and country pages on Cloudflare Radar. It’s been quite a learning experience trying to improve these lists. We have implemented aggregations, counts, filters, handling exceptions, and tried reducing noise, and yet they’re far from perfect. We felt that there had to be a better way.

We’ve spent the last six months building a variety of machine learning models to help us predict the rank of a domain.

Building the model was no easy feat. We experimented with multiple regression types first, to know exactly what the model was doing, and then more complex algorithms to get better performance. We played with different datasets, changed the population groups, variables (features), and combinations of variables, and used synthetic data.

After evaluation, one of our first conclusions was that building a model that could produce good results for the highest ranked domains and the long tail would be difficult.

The paper “A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists” describes this problem well. “The ranking of domains in the long tail should be based on significantly smaller and hence less reliable numbers.” Talking to our Research Team who submitted the collaboration paper “Toppling Top Lists: Evaluating the Accuracy of Popular Website Lists” to IMC 2022, got us to the same conclusion: the most popular domains (like google.com and facebook.com) have feature values disproportionally higher than the lower-ranked domains.

Therefore, we selected the two models that performed best. One model was trained on the population with the highest feature values, uses more features, and is used to generate the ordered top 100 domain list. A second model was trained on a more general group of domains, uses fewer features, and is used to get the top one million most popular domains, which we then divide into ranking buckets.

These buckets are ranked, but each bucket’s contents are intentionally unordered. For example, the second bucket of 10,000 most popular domains includes the set of domains that rank from 10,001 to 20,000, but give no further indication of the individual ranking of domains in that bucket. Given the size of some of these buckets and the window of time we use to populate them, they will inherently be exposed to more instability, too. We feel this is a good compromise between the described natural uncertainties of our long tail model and providing a reasonable idea of how close to the top a domain is.

Results

It’s important to mention there is no global view that can establish the perfect rank, and there’s no easy mechanism to confirm if a ranking is, ultimately, good. Data-driven results are always subject to some bias and skewing, related to the context of the organizations and systems that collect them. Sometimes all that can be done is to be transparent about potential sources of bias. The geographical distribution of customers and users, product characteristics, platform features, and behavioral diversity play an essential role in the final result. We are presenting the Cloudflare view, what we see.

Having said this, Cloudflare sits in a privileged position and handles a significant amount of Internet traffic. We have plenty of signals we can extract from our aggregated data, and believe that makes it possible to generate high quality domain rankings.

Domain Rankings are available today. You can head up to the Domains page and check it out:

  • Ordered list of the top 100 most popular domains globally and per country, based on our first model. Last 24 hours, updated daily.
  • Unordered global most popular domains datasets divided into buckets of the following sizes: 200, 500, 1,000, 2,000, 5,000, 10,000, 20,000, 50,000, 100,000, 200,000, 500,000, 1,000,000. Last 7 days, updated weekly.
Goodbye, Alexa. Hello, Cloudflare Radar Domain Rankings

Next steps

We will keep improving Domain Rankings and monitoring the results. Anyone can access them on Cloudflare Radar, read the results, and download the CSV files.

Feel free to explore our Domain Rankings and share feedback with us.

Dig through SERVFAILs with EDE

Post Syndicated from Stanley Chiang original https://blog.cloudflare.com/dig-through-servfails-with-ede/

Dig through SERVFAILs with EDE

Dig through SERVFAILs with EDE

It can be frustrating to get errors (SERVFAIL response codes) returned from your DNS queries. It can be even more frustrating if you don’t get enough information to understand why the error is occurring or what to do next. That’s why back in 2020, we launched support for Extended DNS Error (EDE) Codes to 1.1.1.1.

As a quick refresher, EDE codes are a proposed IETF standard enabled by the Extension Mechanisms for DNS (EDNS) spec. The codes return extra information about DNS or DNSSEC issues without touching the RCODE so that debugging is easier.

Now we’re happy to announce we will return more error code types and include additional helpful information to further improve your debugging experience. Let’s run through some examples of how these error codes can help you better understand the issues you may face.

To try for yourself, you’ll need to run the dig or kdig command in the terminal. For dig, please ensure you have v9.11.20 or above. If you are on macOS 12.1, by default you only have dig 9.10.6. Install an updated version of BIND to fix that.

Let’s start with the output of an example dig command without EDE support.

% dig @1.1.1.1 dnssec-failed.org +noedns

; <<>> DiG 9.18.0 <<>> @1.1.1.1 dnssec-failed.org +noedns
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8054
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;dnssec-failed.org.		IN	A

;; Query time: 23 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Thu Mar 17 10:12:57 PDT 2022
;; MSG SIZE  rcvd: 35

In the output above, we tried to do DNSSEC validation on dnssec-failed.org. It returns a SERVFAIL, but we don’t have context as to why.

Now let’s try that again with 1.1.1.1’s EDE support.

% dig @1.1.1.1 dnssec-failed.org +dnssec

; <<>> DiG 9.18.0 <<>> @1.1.1.1 dnssec-failed.org +dnssec
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 34492
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
; EDE: 9 (DNSKEY Missing): (no SEP matching the DS found for dnssec-failed.org.)
;; QUESTION SECTION:
;dnssec-failed.org.		IN	A

;; Query time: 15 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Fri Mar 04 12:53:45 PST 2022
;; MSG SIZE  rcvd: 103

We can see there is still a SERVFAIL. However, this time there is also an EDE Code 9 which stands for “DNSKey Missing”. Accompanying that, we also have additional information saying “no SEP matching the DS found” for dnssec-failed.org. That’s better!

Another nifty feature is that we will return multiple errors when appropriate, so you can debug each one separately. In the example below, we returned a SERVFAIL with three different error codes: “Unsupported DNSKEY Algorithm”, “No Reachable Authority”, and “Network Error”.

dig @1.1.1.1 [domain] +dnssec

; <<>> DiG 9.18.0 <<>> @1.1.1.1 [domain] +dnssec
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 55957
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
; EDE: 1 (Unsupported DNSKEY Algorithm): (no supported DNSKEY algorithm for [domain].)
; EDE: 22 (No Reachable Authority): (at delegation [domain].)
; EDE: 23 (Network Error): (135.181.58.79:53 rcode=REFUSED for [domain] A)
;; QUESTION SECTION:
;[domain].		IN	A

;; Query time: 1197 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Wed Mar 02 13:41:30 PST 2022
;; MSG SIZE  rcvd: 202

Here’s a list of the additional codes we now support:

Error Code Number Error Code Name
1 Unsupported DNSKEY Algorithm
2 Unsupported DS Digest Type
5 DNSSEC Indeterminate
7 Signature Expired
8 Signature Not Yet Valid
9 DNSKEY Missing
10 RRSIGs Missing
11 No Zone Key Bit Set
12 NSEC Missing

We have documented all the error codes we currently support with additional information you may find helpful. Refer to our dev docs for more information.

How we improved DNS record build speed by more than 4,000x

Post Syndicated from Alex Fattouche original https://blog.cloudflare.com/dns-build-improvement/

How we improved DNS record build speed by more than 4,000x

How we improved DNS record build speed by more than 4,000x

Since my previous blog about Secondary DNS, Cloudflare’s DNS traffic has more than doubled from 15.8 trillion DNS queries per month to 38.7 trillion. Our network now spans over 270 cities in over 100 countries, interconnecting with more than 10,000 networks globally. According to w3 stats, “Cloudflare is used as a DNS server provider by 15.3% of all the websites.” This means we have an enormous responsibility to serve DNS in the fastest and most reliable way possible.

Although the response time we have on DNS queries is the most important performance metric, there is another metric that sometimes goes unnoticed. DNS Record Propagation time is how long it takes changes submitted to our API to be reflected in our DNS query responses. Every millisecond counts here as it allows customers to quickly change configuration, making their systems much more agile. Although our DNS propagation pipeline was already known to be very fast, we had identified several improvements that, if implemented, would massively improve performance. In this blog post I’ll explain how we managed to drastically improve our DNS record propagation speed, and the impact it has on our customers.

How DNS records are propagated

Cloudflare uses a multi-stage pipeline that takes our customers’ DNS record changes and pushes them to our global network, so they are available all over the world.

How we improved DNS record build speed by more than 4,000x

The steps shown in the diagram above are:

  1. Customer makes a change to a record via our DNS Records API (or UI).
  2. The change is persisted to the database.
  3. The database event triggers a Kafka message which is consumed by the Zone Builder.
  4. The Zone Builder takes the message, collects the contents of the zone from the database and pushes it to Quicksilver, our distributed KV store.
  5. Quicksilver then propagates this information to the network.

Of course, this is a simplified version of what is happening. In reality, our API receives thousands of requests per second. All POST/PUT/PATCH/DELETE requests ultimately result in a DNS record change. Each of these changes needs to be actioned so that the information we show through our API and in the Cloudflare dashboard is eventually consistent with the information we use to respond to DNS queries.

Historically, one of the largest bottlenecks in the DNS propagation pipeline was the Zone Builder, shown in step 4 above. Responsible for collecting and organizing records to be written to our global network, our Zone Builder often ate up most of the propagation time, especially for larger zones. As we continue to scale, it is important for us to remove any bottlenecks that may exist in our systems, and this was clearly identified as one such bottleneck.

Growing pains

When the pipeline shown above was first announced, the Zone Builder received somewhere between 5 and 10 DNS record changes per second. Although the Zone Builder at the time was a massive improvement on the previous system, it was not going to last long given the growth that Cloudflare was and still is experiencing. Fast-forward to today, we receive on average 250 DNS record changes per second, a staggering 25x growth from when the Zone Builder was first announced.

How we improved DNS record build speed by more than 4,000x

The way that the Zone Builder was initially designed was quite simple. When a zone changed, the Zone Builder would grab all the records from the database for that zone and compare them with the records stored in Quicksilver. Any differences were fixed to maintain consistency between the database and Quicksilver.

This is known as a full build. Full builds work great because each DNS record change corresponds to one zone change event. This means that multiple events can be batched and subsequently dropped if needed. For example, if a user makes 10 changes to their zone, this will result in 10 events. Since the Zone Builder grabs all the records for the zone anyway, there is no need to build the zone 10 times. We just need to build it once after the final change has been submitted.

What happens if the zone contains one million records or 10 million records? This is a very real problem, because not only is Cloudflare scaling, but our customers are scaling with us. Today our largest zone currently has millions of records. Although our database is optimized for performance, even one full build containing one million records took up to 35 seconds, largely caused by database query latency. In addition, when the Zone Builder compares the zone contents with the records stored in Quicksilver, we need to fetch all the records from Quicksilver for the zone, adding time. However, the impact doesn’t just stop at the single customer. This also eats up more resources from other services reading from the database and slows down the rate at which our Zone Builder can build other zones.

Per-record build: a new build type

Many of you might already have the solution to this problem in your head:

Why doesn’t the Zone Builder just query the database for the record that has changed and propagate just the single record?

Of course this is the correct solution, and the one we eventually ended up at. However, the road to get there was not as simple as it might seem.

Firstly, our database uses a series of functions that, at zone touch time, create a PostgreSQL Queue (PGQ) event that ultimately gets turned into a Kafka event. Initially, we had no distinction for individual record events, which meant our Zone Builder had no idea what had actually changed until it queried the database.

Next, the Zone Builder is still responsible for DNS zone settings in addition to records. Some examples of DNS zone settings include custom nameserver control and DNSSEC control. As a result, our Zone Builder needed to be aware of specific build types to ensure that they don’t step on each other. Furthermore, per-record builds cannot be batched in the same way that zone builds can because each event needs to be actioned separately.

As a result, a brand new scheduling system needed to be written. Lastly, Quicksilver interaction needed to be re-written to account for the different types of schedulers. These issues can be broken down as follows:

  1. Create a new Kafka event pipeline for record changes that contain information about the changed record.
  2. Separate the Zone Builder into a new type of scheduler that implements some defined scheduler interface.
  3. Implement the per-record scheduler to read events one by one in the correct order.
  4. Implement the new Quicksilver interface for the per-record scheduler.

Below is a high level diagram of how the new Zone Builder looks internally with the new scheduler types.

How we improved DNS record build speed by more than 4,000x

It is critically important that we lock between these two schedulers because it would otherwise be possible for the full build scheduler to overwrite the per-record scheduler’s changes with stale data.

It is important to note that none of this per-record architecture would be possible without the use of Cloudflare’s black lie approach to negative answers with DNSSEC. Normally, in order to properly serve negative answers with DNSSEC, all the records within the zone must be canonically sorted. This is needed in order to maintain a list of references from the apex record through all the records in the zone. With this normal approach to negative answers, a single record that has been added to the zone requires collecting all records to determine its insertion point within this sorted list of names.

Bugs

I would love to be able to write a Cloudflare blog where everything went smoothly; however, that is never the case. Bugs happen, but we need to be ready to react to them and set ourselves up so that next time this specific bug cannot happen.

In this case, the major bug we discovered was related to the cleanup of old records in Quicksilver. With the full Zone Builder, we have the luxury of knowing exactly what records exist in both the database and in Quicksilver. This makes writing and cleaning up a fairly simple task.

When the per-record builds were introduced, record events such as creates, updates, and deletes all needed to be treated differently. Creates and deletes are fairly simple because you are either adding or removing a record from Quicksilver. Updates introduced an unforeseen issue due to the way that our PGQ was producing Kafka events. Record updates only contained the new record information, which meant that when the record name was changed, we had no way of knowing what to query for in Quicksilver in order to clean up the old record. This meant that any time a customer changed the name of a record in the DNS Records API, the old record would not be deleted. Ultimately, this was fixed by replacing those specific update events with both a creation and a deletion event so that the Zone Builder had the necessary information to clean up the stale records.

None of this is rocket surgery, but we spend engineering effort to continuously improve our software so that it grows with the scaling of Cloudflare. And it’s challenging to change such a fundamental low-level part of Cloudflare when millions of domains depend on us.

Results

Today, all DNS Records API record changes are treated as per-record builds by the Zone Builder. As I previously mentioned, we have not been able to get rid of full builds entirely; however, they now represent about 13% of total DNS builds. This 13% corresponds to changes made to DNS settings that require knowledge of the entire zone’s contents.

How we improved DNS record build speed by more than 4,000x

When we compare the two build types as shown below we can see that per-record builds are on average 150x faster than full builds. The build time below includes both database query time and Quicksilver write time.

How we improved DNS record build speed by more than 4,000x

From there, our records are propagated to our global network through Quicksilver.

The 150x improvement above is with respect to averages, but what about that 4000x that I mentioned at the start? As you can imagine, as the size of the zone increases, the difference between full build time and per-record build time also increases. I used a test zone of one million records and ran several per-record builds, followed by several full builds. The results are shown in the table below:

Build Type Build Time (ms)
Per Record #1 6
Per Record #2 7
Per Record #3 6
Per Record #4 8
Per Record #5 6
Full #1 34032
Full #2 33953
Full #3 34271
Full #4 34121
Full #5 34093

We can see that, given five per-record builds, the build time was no more than 8ms. When running a full build however, the build time lasted on average 34 seconds. That is a build time reduction of 4250x!

Given the full build times for both average-sized zones and large zones, it is apparent that all Cloudflare customers are benefitting from this improved performance, and the benefits only improve as the size of the zone increases. In addition, our Zone Builder uses less database and Quicksilver resources meaning other Cloudflare systems are able to operate at increased capacity.

Next Steps

The results here have been very impactful, though we think that we can do even better. In the future, we plan to get rid of full builds altogether by replacing them with zone setting builds. Instead of fetching the zone settings in addition to all the records, the zone setting builder would just fetch the settings for the zone and propagate that to our global network via Quicksilver. Similar to the per-record builds, this is a difficult challenge due to the complexity of zone settings and the number of actors that touch it. Ultimately if this can be accomplished, we can officially retire the full builds and leave it as a reminder in our git history of the scale at which we have grown over the years.

In addition, we plan to introduce a batching system that will collect record changes into groups to minimize the number of queries we make to our database and Quicksilver.

Does solving these kinds of technical and operational challenges excite you? Cloudflare is always hiring for talented specialists and generalists within our Engineering and other teams.

Wildcard proxy for everyone

Post Syndicated from Hannes Gerhart original https://blog.cloudflare.com/wildcard-proxy-for-everyone/

Wildcard proxy for everyone

Wildcard proxy for everyone

Today, I have the pleasure to announce that we’re giving everyone the ability to proxy DNS wildcard records. Previously, this feature was only available to our Enterprise customers. After many of our free and pay-as-you-go users reached out, we decided that this feature should be available to everyone.

What is a wildcard DNS record?

A DNS record usually maps a domain name to one or multiple IP addresses or another resource associated with that name, so it’s a one-to-many mapping. Let’s look at an example:

Wildcard proxy for everyone

When I do a DNS lookup for the IP address of subdomain1.mycoolwebpage.xyz, I get two IP addresses back, because I have added two A records on that subdomain:

$ dig subdomain1.mycoolwebpage.xyz -t a +short
192.0.2.1
192.0.2.2

I could specify the target of all subdomains like this, with one or multiple DNS records per subdomain. But what if I have hundreds or even thousands of subdomains that I all want to point to the same resource?

This is where a wildcard DNS record comes in. By using the asterisk symbol "*" in the Name field, I can create one or multiple DNS records that are used as the response for all subdomains that are not specifically covered by another DNS record (more on this later). So the wildcard record you can see in the screenshot above is covering *.mycoolwebpage.xyz, meaning all subdomains of mycoolwebpage.xyz. This can also be done on deeper levels, like on *.www.mycoolwebpage.xyz.

If I perform a lookup for subdomain2.mycoolwebpage.xyz, the target I specified in the wildcard record will be used as the response. Again, this is only happening because there is no DNS record specifically for this subdomain.

$ dig subdomain2.mycoolwebpage.xyz -t a +short
192.0.2.3

And it is often overlooked that a wildcard record does not only cover the level it is set on directly, but deeper levels, as well:

$ dig some.deep.label.subdomain2.mycoolwebpage.xyz -t a +short 
192.0.2.3

Also, a wildcard DNS record does not cover the apex of the zone (in this example the apex is mycoolwebpage.xyz).

A few more things to know about wildcard records

Below you can find additional rules that apply to wildcard DNS records you should be aware of:

Wildcards are only supported on the first label. Meaning something like subdomain.*.mycoolwebpage.xyz is not a wildcard on the level of the asterisk character. If you create a DNS record with that name, the asterisk is interpreted as the literal character and not as the wildcard operator.

You cannot create wildcards on multiple levels. So if you create a DNS record on *.*.mycoolwebpage.xyz, only the first asterisk is interpreted as a wildcard while the second one is interpreted as the literal “*” character.

Wildcards will be applied for multiple levels. But a specific record on any equal or lower level will terminate anything on or below this specific record — independent of the type of that specific record. Here is an example. If you have only these two records on your domain

subdomain1.mycoolwebpage.xyz  TXT  “some text”
*.mycoolwebpage.xyz  	A  192.0.2.3

the wildcard record will be used for queries going to any subdomain of mycoolwebpage.xyz except subdomain1.mycoolwebpage.xyz or anything below that specific label, like deeper.label.subdomain1.mycoolwebpage.xyz — simply because there already exists a record on subdomain1.mycoolwebpage.xyz. However, the wildcard will be used for deeper labels that are not below the specific record on subdomain1 — for example, deeper.label.subdomain2.mycoolwebpage.xyz.

To expand on this rule: if you think of DNS as a tree starting from the root zone (see the diagram below), simply the existence of a branch terminates the wildcard for all records on that branch. In the example above the wildcard was terminated for anything on the label subdomain1 and below, but even if there only exists a record on a deeper level, anything above will also be terminating the wildcard. This example should make it clear. If you only have the following two records on your domain, as shown in the diagram below

some.deep.label.subdomain1.mycoolwebpage.xyz  TXT  “some other text”
*.mycoolwebpage.xyz  	A  192.0.2.3

a query to label.subdomain1.mycoolwebpage.xyz for an A record is not covered by the wildcard because it is a node on the existing branch ending in the TXT record above.

Wildcard proxy for everyone

Wildcard records only cover the record type they are specified for. If you add a wildcard A record for *.mycoolwebpage.xyz it will not cover queries specifying AAAA records (or any other type). But as mentioned in the previous point, a record on a specific label will terminate the wildcard for this label and everything below even if it’s a different record type.

All the above and more can be found in RFC4592. Not the type to read through complex RFCs but still generally interested in how DNS works, go check out Julia Evans’ wizard zines about DNS, she did a great job explaining all the complexities about DNS in an easy to digest way.

What is a proxied wildcard DNS record?

Cloudflare provides a range of features (including Caching, Firewall, or Workers) that require you to proxy the specific hostname you want to use these features on. You can proxy DNS records of the type A, AAAA, and CNAME. These record types are used to specify the origin server of a hostname which expects traffic via HTTP/S.

Proxying a wildcard DNS record works exactly as proxying a specific record. In the Cloudflare dashboard, navigate to the DNS app and either create a new wildcard record or edit an existing record and toggle the proxy status to Proxied. Previously, we only allowed this on wildcard records if the domain was upgraded to the Enterprise plan, but this feature is now available on all plan levels!

Once you have enabled the proxy status of your wildcard DNS record, Cloudflare nameservers will respond with two Cloudflare anycast IPs instead of the origin IP(s) you have specified for that record. These Cloudflare IPs are advertised on our global network from more than 275 locations in more than 100 countries.

$ dig subdomain2.mycoolwebpage.xyz -t a +short
104.18.35.126
172.64.152.130

In the example above, this will ensure that all HTTP/S requests sent to subdomain2.mycoolwebpage.xyz or any other subdomain that is covered by the proxied wildcard DNS record are proxied by Cloudflare’s network, specifically the closest Cloudflare data center. Go see for yourself and pick a random subdomain of mycoolwebpage.xyz. You will see a simple page that is generated using Cloudflare Workers:

Wildcard proxy for everyone

And the cool thing is that you don’t even have to think about creating a TLS certificate. By default, Cloudflare will issue and automatically renew a certificate for your zone apex (mycoolwebpage.xyz) and all subdomains on the next level (*.mycoolwebpage.xyz).

Wildcard proxy for everyone

If you want to proxy a wildcard DNS record on a deeper level like *.www.mycoolwebpage.xyz you can subscribe to Cloudflare Advanced Certificate Manager and get a certificate that is covering that wildcard like this:

Wildcard proxy for everyone

Try it yourself on your domain

If you are not already using Cloudflare DNS for your domain, it is very easy to move from your existing DNS provider and can be done in a few minutes. Head over to our developer documentation for detailed instructions on how to change your authoritative nameservers.

Announcing experimental DDR in 1.1.1.1

Post Syndicated from Christopher Wood original https://blog.cloudflare.com/announcing-ddr-support/

Announcing experimental DDR in 1.1.1.1

Announcing experimental DDR in 1.1.1.1

1.1.1.1 sees approximately 600 billion queries per day. However, proportionally, most queries sent to this resolver are over cleartext: 89% over UDP and TCP combined, and the remaining 11% are encrypted. We care about end-user privacy and would prefer to see all of these queries sent to us over an encrypted transport using DNS-over-TLS or DNS-over-HTTPS. Having a mechanism by which clients could discover support for encrypted protocols such as DoH or DoT will help drive this number up and lead to more name encryption on the Internet. That’s where DDR – or Discovery of Designated Resolvers – comes into play. As of today, 1.1.1.1 supports the latest version of DDR so clients can automatically upgrade non-secure UDP and TCP connections to secure connections. In this post, we’ll describe the motivations for DDR, how the mechanism works, and, importantly, how you can test it out as a client.

DNS transports and public resolvers

We initially launched our public recursive resolver service 1.1.1.1 over three years ago, and have since seen its usage steadily grow. Today, it is one of the fastest public recursive resolvers available to end-users, supporting the latest security and privacy DNS transports such as HTTP/3 for DNS-over-HTTPS (DoH), as well as Oblivious DoH.

As a public resolver, all clients, regardless of type, are typically manually configured based on a user’s desired performance, security, and privacy requirements. This choice reflects answers to two separate but related types of questions:

  1. What recursive resolver should be used to answer my DNS queries? Does the resolver perform well? Does the recursive resolver respect my privacy?
  2. What protocol should be used to speak to this particular recursive resolver? How can I keep my DNS data safe from eavesdroppers that should otherwise not have access to it?

The second question primarily concerns technical matters. In particular, whether or not a recursive resolver supports DoH is simple enough to answer. Either the recursive resolver does or does not support it!

In contrast, the first question is primarily a matter of policy. For example, consider the question of choosing between a local network-provided DNS recursive resolver and a public recursive resolver. How do resolver features (including DoH support, for example) influence this decision? How does the resolver’s privacy policy regarding data use and retention influence this decision? More generally, what information about recursive resolver capabilities is available to clients in making this decision and how is this information delivered to clients?

These policy questions have been the topic of substantial debate in the Internet Engineering Task Force (IETF), the standards body where DoH was standardized, and is the one facet of the Adaptive DNS Discovery (ADD) Working Group, which is chartered to work on the following items (among others):

– Define a mechanism that allows clients to discover DNS resolvers that support encryption and that are available to the client either on the public Internet or on private or local networks.

– Define a mechanism that allows communication of DNS resolver information to clients for use in selection decisions. This could be part of the mechanism used for discovery, above.

In other words, the ADD Working Group aims to specify mechanisms by which clients can obtain the information they need to answer question (1). Critically, one of those pieces of information is what encrypted transport protocols the recursive resolver supports, which would answer question (2).

As the answer to question (2) is purely technical and not a matter of policy, the ADD Working Group was able to specify a workable solution that we’ve implemented and tested with existing clients. Before getting into the details of how it works, let’s dig into the problem statement here and see what’s required to address it.

Threat model and problem statement

The DDR problem is relatively straightforward: given the IP address of a DNS recursive resolver, how can one discover parameters necessary for speaking to the same resolver using an encrypted transport? (As above, discovering parameters for a different resolver is a distinctly different problem that pertains to policy and is therefore out of scope.)

This question is only meaningful insofar as using encryption helps protect against some attacker. Otherwise, if the network was trusted, encryption would add no value! A direct consequence is that this question assumes the network – for some definition of “the network” – is untrusted and encryption helps protect against this network.

But what exactly is the network here? In practice, the topology typically looks like the following:

Announcing experimental DDR in 1.1.1.1
Typical DNS configuration from DHCP

Again, for DNS discovery to have any meaning, we assume that either the ISP or home network – or both – is untrusted and malicious. The setting here depends on the client and the network they are attached to, but it’s generally simplest to assume the ISP network is untrusted.

This question also makes one important assumption: clients know the desired recursive resolver address. Why is this important? Typically, the IP address of a DNS recursive resolver is provided via Dynamic Host Configuration Protocol (DHCP). When a client joins a network, it uses DHCP to learn information about the network, including the default DNS recursive resolver. However, DHCP is a famously unauthenticated protocol, which means that any active attacker on the network can spoof the information, as shown below.

Announcing experimental DDR in 1.1.1.1
Unauthenticated DHCP discovery

One obvious attacker vector would be for the attacker to redirect DNS traffic from the network’s desired recursive resolver to an attacker-controlled recursive resolver. This has important implications on the threat model for discovery.

First, there is currently no known mechanism for encrypted DNS discovery in the presence of an active attacker that can influence the client’s view of the recursive resolver’s address. In other words, to make any meaningful improvement, DNS discovery assumes the client’s view of the DNS recursive resolver address is correct (and obtained through some secure mechanism). A second implication is that the attacker can simply block any attempt of client discovery, preventing upgrade to encrypted transports. This seems true of any interactive discovery mechanism. As a result, DNS discovery must relax this attacker’s capabilities somewhat: rather than add, drop, or modify packets, the attacker can only add or modify packets.

Altogether, this threat model lets us sharpen the DNS discovery problem statement: given the IP address of a DNS recursive resolver, how can one securely discover parameters necessary for speaking to the same resolver using an encrypted transport in the presence of an active attacker that can add or modify packets? It should be infeasible, for example, for the attacker to redirect the client from the resolver that it knows at the outset to one the attacker controls.

So how does this work, exactly?

DDR mechanics

DDR depends on two mechanisms:

  1. Certificate-based authentication of encrypted DNS resolvers.
  2. SVCB records for encoding and communicating DNS parameters.

Certificates allow resolvers to prove authority for IP addresses. For example, if you view the certificate for one.one.one.one, you’ll see several IP addresses listed under the SubjectAlternativeName extension, including 1.1.1.1.

Announcing experimental DDR in 1.1.1.1
SubjectAltName list of the one.one.one.one certificate

SVCB records are extensible key-value stores that can be used for conveying information about services to clients. Example information includes the supported application protocols, including HTTP/3, as well as keying material like that used for TLS Encrypted Client Hello.

How does DDR combine these two to solve the discovery problem above? In three simple steps:

  1. Clients query the expected DNS resolver for its designations and their parameters with a special-purpose SVCB record.
  2. Clients open a secure connection to the designated resolver, for example, one.one.one.one, authenticating the resolver against the one.one.one.one name.
  3. Clients check that the designated resolver is additionally authenticated for the IP address of the origin resolver. That is, the certificate for one.one.one.one, the designated resolver, must include the IP address 1.1.1.1, the original designator resolver.

If this validation completes, clients can then use the secure connection to the designated resolver. In pictures, this is as follows:

Announcing experimental DDR in 1.1.1.1
DDR discovery process

This demonstrates that the encrypted DNS resolver is authoritative for the client’s original DNS resolver. Or, in other words, that the original resolver and the encrypted resolver are effectively  “the same.” An encrypted resolver that does not include the originally requested resolver IP address on its certificate would fail the validation, and clients are not expected to follow the designated upgrade path. This entire process is referred to as “Verified Discovery” in the DDR specification.

Experimental deployment and next steps

To enable more encrypted DNS on the Internet and help the standardization process, 1.1.1.1 now has experimental support for DDR. You can query it directly to find out:

$ dig +short @1.1.1.1 _dns.resolver.arpa type64

QUESTION SECTION
_dns.resolver.arpa.               IN SVCB 

ANSWER SECTION
_dns.resolver.arpa.                           300    IN SVCB  1 one.one.one.one. alpn="h2,h3" port="443" ipv4hint="1.1.1.1,1.0.0.1" ipv6hint="2606:4700:4700::1111,2606:4700:4700::1001" key7="/dns-query{?name}"
_dns.resolver.arpa.                           300    IN SVCB  2 one.one.one.one. alpn="dot" port="853" ipv4hint="1.1.1.1,1.0.0.1" ipv6hint="2606:4700:4700::1111,2606:4700:4700::1001"

ADDITIONAL SECTION
one.one.one.one.                              300    IN AAAA  2606:4700:4700::1111
one.one.one.one.                              300    IN AAAA  2606:4700:4700::1001
one.one.one.one.                              300    IN A     1.1.1.1
one.one.one.one.                              300    IN A     1.0.0.1

This command sends a SVCB query (type64) for the reserved name _dns.resolver.arpa to 1.1.1.1. The output lists the contents of this record, including the DoH and DoT designation parameters. Let’s walk through the contents of this record:

_dns.resolver.arpa.                           300    IN SVCB  1 one.one.one.one. alpn="h2,h3" port="443" ipv4hint="1.1.1.1,1.0.0.1" ipv6hint="2606:4700:4700::1111,2606:4700:4700::1001" key7="/dns-query{?name}"

This says that the DoH target one.one.one.one is accessible over port 443 (port=”443”) using either HTTP/2 or HTTP/3 (alpn=”h2,h3”), and the DoH path (key7) for queries is “/dns-query{?name}”.

Moving forward

DDR is a simple mechanism that lets clients automatically upgrade to encrypted transport protocols for DNS queries without any manual configuration. At the end of the day, users running compatible clients will enjoy a more private Internet experience. Happily, both Microsoft and Apple recently announced experimental support for this emerging standard, and we’re pleased to help them and other clients test support.
Going forward, we hope to help add support for DDR to open source DNS resolver software such as dnscrypt-proxy and Bind. If you’re interested in helping us continue to drive adoption of encrypted DNS and related protocols to help build a better Internet, we’re hiring!

Announcing Foundation DNS — Cloudflare’s new premium DNS offering

Post Syndicated from Hannes Gerhart original https://blog.cloudflare.com/foundation-dns/

Announcing Foundation DNS — Cloudflare’s new premium DNS offering

Announcing Foundation DNS — Cloudflare’s new premium DNS offering

Today, we’re announcing Foundation DNS, Cloudflare’s new premium DNS offering that provides unparalleled reliability, supreme performance and is able to meet the most complex requirements of infrastructure teams.

Let’s talk money first

When you’re signing an enterprise DNS deal, usually DNS providers request three inputs from you in order to generate a quote:

  • Number of zones
  • Total DNS queries per month
  • Total DNS records across all zones

Some are considerably more complicated and many have pricing calculators or opaque “Contact Us” pricing. Planning a budget around how you may grow brings unnecessary complexity, and we think we can do better. Why not make this even simpler? Here you go: We decided to charge Foundation DNS based on a single input for our enterprise customers: Total DNS queries per month. This way, we expect to save companies money and even more importantly, remove complexity from their DNS bill.

And don’t worry, just like the rest of our products, DDoS mitigation is still unmetered. There won’t be any hidden overage fees in case your nameservers are DDoS’d or the number of DNS queries exceeds your quota for a month or two.

Why is DNS so important?

Announcing Foundation DNS — Cloudflare’s new premium DNS offering

The Domain Name System (DNS) is nearly as old as the Internet itself. It was originally defined in RFC882 and RFC883 in 1983 out of the need to create a mapping between hostnames and IP addresses. Back then, the authors wisely stated: “[The Internet] is a large system and is likely to grow much larger.” [RFC882]. Today there are almost 160 Million domain names just under the .com, one of the largest Top Level Domains (TLD) [source].

By design, DNS is a hierarchical and highly distributed system, but as an end user you usually only communicate with a resolver (1) that is either assigned or operated by your Internet Service Provider (ISP) or directly configured by your employer or yourself. The resolver communicates with one of the root servers (2), the responsible TLD server (3) and the authoritative nameserver (4) of the domain in question. In many cases all of these four parties are operated by a different entity and located in different regions, maybe even continents.

Announcing Foundation DNS — Cloudflare’s new premium DNS offering

As we have seen in the recent past, if your DNS infrastructure goes down you are in serious trouble, and it likely will cost you a lot of money and potentially damage your reputation. So as a domain owner you want that DNS lookups to your domain are answered 100% of the time and ideally as quickly as possible. So what can you do? You cannot influence which resolver your users have configured. You cannot influence the root server. You can choose which TLD server is involved by picking a domain name with the respective TLD. But if you are bound to a certain TLD for other reasons then that is out of your control as well. What you can easily influence is the provider for your authoritative nameservers. So let’s take a closer look at Cloudflare’s authoritative DNS offering.

A look at Cloudflare’s Authoritative DNS

Authoritative DNS is one of our oldest products, and we have spent a lot of time making it great. All DNS queries are answered from our global anycast network with a presence in more than 250 cities. This way we can deliver supreme performance while always guaranteeing global availability. And of course, we leverage our extensive experience in mitigating DDoS attacks to prevent anyone from knocking down our nameservers and with that the domains of our customers.

DNS is critically important to Cloudflare because up until the release of Magic Transit, DNS was how every user on the Internet was directed to Cloudflare to protect and accelerate our customer’s applications. If our DNS answers were slow, Cloudflare was slow. If our DNS answers were unavailable, Cloudflare was unavailable. Speed and reliability of our authoritative DNS is paramount to the speed and reliability of Cloudflare, as it is to our customers. We have also had our customers push our DNS infrastructure as they’ve grown with Cloudflare. Today our largest customer zone has more than 3 million records and the top 5 are reaching almost 10 million records combined. Those customers rely on Cloudflare to push new DNS record updates to our edge in seconds, not minutes. Due to this importance and our customer’s needs, over the years we have grown our dedicated DNS engineering team focused on keeping our DNS stack fast and reliable.

The security of the DNS ecosystem is also important. Cloudflare has always been a proponent of DNSSEC. Signing and validating DNS answers through DNSSEC ensures that an on-path attacker cannot hijack answers and redirect traffic. Cloudflare has always offered DNSSEC for free on all plan levels, and it will continue to be a no charge option for Foundation DNS. For customers who also choose to use Cloudflare as a registrar, simple one-click deployment of DNSSEC is another key feature that ensures our customers’ domains are not hijacked, and their users are protected. We support RFC 8078 for one-click deployment on external registrars as well.

But there are other issues that can bring parts of the Internet to a halt and these are mostly out of our control: route leaks or even worse route hijacking. While DNSSEC can help with mitigating route hijacks, unfortunately not all recursive resolvers will validate DNSSEC. And even if the resolver does validate, a route leak or hijack to your nameservers will still result in downtime. If all your nameserver IPs are affected by such an event, your domain becomes unresolvable.

With many providers each of your nameservers usually resolves to only one IPv4 and one IPv6 address. If that IP address is not reachable — for example because of network congestion or, even worse, a route leak — the entire nameserver becomes unavailable leading to your domain becoming unresolvable. Even worse, some providers even use the same IP subnet for all their nameservers. So if there is an issue with that subnet all nameservers are down.

Let’s take a look at an example:

$ dig aws.com ns +short              
ns-1500.awsdns-59.org.
ns-164.awsdns-20.com.
ns-2028.awsdns-61.co.uk.
ns-917.awsdns-50.net.

$ dig ns-1500.awsdns-59.org. +short
205.251.197.220
$ dig ns-164.awsdns-20.com. +short
205.251.192.164
$ dig ns-2028.awsdns-61.co.uk. +short
205.251.199.236
$ dig ns-917.awsdns-50.net. +short
205.251.195.149

All nameserver IPs are part of 205.251.192.0/21. Thankfully, AWS is now signing their ranges through RPKI and this makes it less likely to leak… provided that the resolver ISP is validating RPKI. But if the resolver ISP does not validate RPKI and should this subnet be leaked or hijacked, resolvers wouldn’t be able to reach any of the nameservers and aws.com would become unresolvable.

It goes without saying that Cloudflare signs all of our routes and are pushing the rest of the Internet to minimize the impact of route leaks, but what else can we do to ensure that our DNS systems remain resilient through route leaks while we wait for RPKI to be ubiquitously deployed?

Today, when you’re using Cloudflare DNS on the Free, Pro, Business or Enterprise plan, your domain gets two nameservers of the structure <name>.ns.cloudflare.com where <name> is a random first name.

$ dig isbgpsafeyet.com ns +short
tom.ns.cloudflare.com.
kami.ns.cloudflare.com.

Now, as we learned before, in order for a domain to be available, its nameservers have to be available. This is why each of these nameservers resolves to 3 anycast IPv4 and 3 anycast IPv6 addresses.

$ dig tom.ns.cloudflare.com a +short
173.245.59.147
108.162.193.147
172.64.33.147

$ dig tom.ns.cloudflare.com aaaa +short
2606:4700:58::adf5:3b93
2803:f800:50::6ca2:c193
2a06:98c1:50::ac40:2193

The essential detail to notice here is that each of the 3 IPv4 and 3 IPv6 addresses is from a different /8 IPv4 (/45 for IPv6) block. So in order for your nameservers to become unavailable via IPv4, the route leak would have to affect exactly the corresponding subnets across all three /8 IPv4 blocks. This type of event, while theoretically is possible, is virtually impossible in practical terms.

How can this be further improved?

Customers using Foundation DNS will be assigned a new set of advanced nameservers hosted on foundationdns.com and foundationdns.net. These nameservers will be even more resilient than the default Cloudflare nameservers. We will be announcing more details about how we’re achieving this early next year, so stay tuned. All external Cloudflare domains (such as cloudflare.com) will transition to these nameservers in the new year.

There is even more

We’re glad to announce that we are launching two highly requested features:

  • Support for outgoing zone transfers for Secondary DNS
  • Logpush for authoritative and secondary DNS queries

Both of them will be available as part of Foundation DNS and to enterprise customers without any additional costs. Let’s take a closer look at each of these and see how they make our DNS offering even better.

Support for outgoing zone transfers for Secondary DNS

What is Secondary DNS, and why is it important? Many large enterprises have requirements to use more than one DNS provider for redundancy in case one provider becomes unavailable. They can achieve this by adding their domain’s DNS records on two independent platforms and manually keeping the zone files in sync — this is referred to as “multi-primary” setup. With Secondary DNS there are two mechanisms how this can be automated using a “primary-secondary” setup:

  • DNS NOTIFY: The primary nameserver notifies the secondary on every change on the zone. Once the secondary receives the NOTIFY, it sends a zone transfer request to the primary to get in sync with it.
  • SOA query: Here, the secondary nameserver regularly queries the SOA record of the zone and checks if the serial number that can be found on the SOA record is the same with the latest serial number the secondary has stored in it’s SOA record of the zone. If there is a new version of the zone available, it sends a zone transfer request to the primary to get those changes.

Alex Fattouche has written a very insightful blog post about how Secondary DNS works behind the scenes if you want to learn more about it. Another flavor of the primary-secondary setup is to hide the primary, thus referred to as “hidden primary”. The difference of this setup is that only the secondary nameservers are authoritative — in other words configured at the domain’s registrar. The diagram below illustrates the different setups.

Announcing Foundation DNS — Cloudflare’s new premium DNS offering

Since 2018, we have been supporting primary-secondary setups where Cloudflare takes the role of the secondary nameserver. This means from our perspective that we are accepting incoming zone transfers from the primary nameservers.

Announcing Foundation DNS — Cloudflare’s new premium DNS offering

Starting today, we are now also supporting outgoing zone transfers, meaning taking the role of the primary nameserver with one or multiple external secondary nameservers receiving zone transfers from Cloudflare. Exactly as for incoming transfers, we are supporting

  • zone transfers via AXFR and IXFR
  • automatic notifications via DNS NOTIFY to trigger zone transfers on every change
  • signed transfers using TSIG to ensure zone files are authenticated during transfer

Logpush for authoritative and secondary DNS

Here at Cloudflare we love logs. In Q3 2021, we processed 28 Million HTTP requests per second and 13.6 Million DNS queries per second on average and blocked 76 Billion threats each day. All these events are stored as logs for a limited time frame in order to provide our users near real-time analytics in the dashboard. For those customers who want to — or have to — permanently store these logs we’ve built Logpush back in 2019. Logpush allows you to stream logs in near real time to one of our analytics partners Microsoft Azure Sentinel, Splunk, Datadog and Sumo Logic or to any cloud storage destination with R2-compatible API.

Announcing Foundation DNS — Cloudflare’s new premium DNS offering

Today, we’re adding one additional data set for Logpush: DNS logs. In order to configure Logpush and stream DNS logs for your domain, just head over to the Cloudflare dashboard, create a new Logpush job, select DNS logs and configure the log fields you’re interested in:

Announcing Foundation DNS — Cloudflare’s new premium DNS offering

Check out our developer documentation for detailed instructions on how to do this through the API and for a thorough description of the new DNS log fields.

One more thing (or two…)

When looking at the entirety of DNS within your infrastructure, it’s important to review how your traffic is flowing through your systems and how that traffic is behaving. At the end of the day, there is only so much processing power, memory, server capacity, and overall compute resources available. One of the best and important tools we have available is Load Balancing and Health Monitoring!

Cloudflare has provided a Load Balancing solution since 2016, supporting customers to leverage their existing resources in a scalable and intelligent manner. But our Load Balancer was limited to A, AAAA, and CNAME records. This covered a lot of major use cases required by customers, but did not cover all of them. Many customers have more needs such as load balancing MX or email server traffic, SRV records to declare which ports and weight that respective traffic should travel across for a specific service, HTTPS records to ensure the respective traffic uses the secure protocol regardless of port and many more. We want to ensure that our customers’ needs are covered and support their ability to align business goals with technical implementation.

We are happy to announce that we have added additional Health Monitoring methods to support Load Balancing MX, SRV, HTTPS and TXT record traffic without any additional configuration necessary. Create your respective DNS records in Cloudflare and set your Load Balancer as the destination…it’s as easy as that! By leveraging ICMP Ping, SMTP, and UDP-ICMP methods, customers will always have a pulse on the health of their servers and be able to apply intelligent steering decisions based on the respective health information.

When thinking about intelligent steering, there is no one size fits all answer. Different businesses have different needs, especially when looking at where your servers are located around the globe, and where your customers are situated. A common rule of thumb to follow is to place servers where your customers are. This ensures they have the most performant and localized experience possible. One common scenario is to steer your traffic based on where your end-user request originates and create a mapping to the server closest to that area. Cloudflare’s geo steering capability allows our customers to do just that — easily create a mapping of regions to pools, ensuring if we see a request originate from Eastern Europe, to send that request to the proper server to suffice that request. But sometimes, regions can be quite large and lend to issues around not being able to tightly couple together that mapping as closely as one might like.

Today, we are very excited to announce country support within our Geo Steering functionality. Now, customers will be able to choose either one of our thirteen regions, or a specific country to map against their pools to give further granularity and control to how customers traffic should behave as it travels through their system. Both country-level steering and our new health monitoring methods to support load balancing more DNS records will be available in January 2022!

Advancing the DNS Ecosystem

Furthermore, we have some other exciting news to share: We’re finishing the work on Multi-Signer DNSSEC (RFC8901) and plan to roll this out in Q1 2022. Why is this important? Two common requirements of large enterprises are:

  • Redundancy: Having multiple DNS providers responding authoritatively for their domains
  • Authenticity: Deploying DNSSEC to ensure DNS responses can be properly authenticated

Both can be achieved by having the primary nameserver sign the domain and transfer its DNS records plus the record signatures to the secondary nameserver which will serve both as is. This setup is supported with Cloudflare Secondary DNS today. What cannot be supported when transferring pre-signed zones are non-standard DNS features like country-level steering. This is where Multi-Signer DNSSEC comes in. Both DNS providers need to know the signing keys of the other provider and perform their own online (or on-the-fly) signing. If you’re curious to learn more about how Multi-Signer DNSSEC works, go check out this excellent blog post published by APNIC.

Last but not least, Cloudflare is joining the DNS Operations, Analysis, and Research Center (DNS-OARC) as a gold member. Together with other researchers and operators of DNS infrastructure we want to tackle the most challenging problems and continuously work on implementing new standards and features.

While we’ve been at DNS since day one of Cloudflare, we’re still just getting started. We know there are more granular and specific features our future customers will ask of us and the launch of Foundation DNS is our stake in the ground that we will continue to invest in all levels of DNS while building the most feature rich enterprise DNS platform on the planet. If you have ideas, let us know what you’ve always dreamed your DNS provider would do. If you want to help build these features, we are hiring.