October 1 was this year’s DNS Flag Day. Read on to find out all about DNS Flag Day and how it affects Cloudflare’s DNS services (hint: it doesn’t, we already did the work to be compliant).
What is DNS Flag Day?
DNS Flag Day is an initiative by several DNS vendors and operators to increase the compliance of implementations with DNS standards. The goal is to make DNS more secure, reliable and robust. Rather than a push for new features, DNS flag day is meant to ensure that workarounds for non-compliance can be reduced and a common set of functionalities can be established and relied upon.
Last year’s flag day was February 1, and it set forth that servers and clients must be able to properly handle the Extensions to DNS (EDNS0) protocol (first RFC about EDNS0 are from 1999 – RFC 2671). This way, by assuming clients have a working implementation of EDNS0, servers can resort to always sending messages as EDNS0. This is needed to support DNSSEC, the DNS security extensions. We were, of course, more than thrilled to support the effort, as we’re keen to push DNSSECadoption forward .
DNS Flag Day 2020
The goal for this year’s flag day is to increase DNS messaging reliability by focusing on problems around IP fragmentation of DNS packets. The intention is to reduce DNS message fragmentation which continues to be a problem. We can do that by ensuring cleartext DNS messages sent over UDP are not too large, as large messages risk being fragmented during the transport. Additionally, when sending or receiving large DNS messages, we have the ability to do so over TCP.
Problem with DNS transport over UDP
A potential issue with sending DNS messages over UDP is that the sender has no indication of the recipient actually receiving the message. When using TCP, each packet being sent is acknowledged (ACKed) by the recipient, and the sender will attempt to resend any packets not being ACKed. UDP, although it may be faster than TCP, does not have the same mechanism of messaging reliability. Anyone still wishing to use UDP as their transport protocol of choice will have to implement this reliability mechanism in higher layers of the network stack. For instance, this is what is being done in QUIC, the new Internet transport protocol used by HTTP/3 that is built on top of UDP.
Even the earliest DNS standards (RFC 1035) specified the use of sending DNS messages over TCP as well as over UDP. Unfortunately, the choice of supporting TCP or not was up to the implementer/operator, and then firewalls were sometimes set to block DNS over TCP. More recent updates to RFC 1035, on the other hand, require that the DNS server is available to query using DNS over TCP.
DNS message fragmentation
Sending data over networks and the Internet is restricted to the limitation of how large each packet can be. Data is chopped up into a stream of packets, and sized to adhere to the Maximum Transmission Unit (MTU) of the network. MTU is typically 1500 bytes for IPv4 and, in the case of IPv6, the minimum is 1280 bytes. Subtracting both the IP header size (IPv4 20 bytes/IPv6 40 bytes) and the UDP protocol header size (8 bytes) from the MTU, we end up with a maximum DNS message size of 1472 bytes for IPv4 and 1232 bytes in order for a message to fit within a single packet. If the message is any larger than that, it will have to be fragmented into more packets.
Sending large messages causes them to get fragmented into more than one pack. This is not a problem with TCP transports since each packet is ACK:ed to ensure proper delivery. However, the same does not hold true when sending large DNS messages over UDP. For many intents and purposes, UDP has been treated as a second-class citizen to TCP as far as network routing is concerned. It is quite common to see UDP packet fragments being dropped by routers and firewalls, potentially causing parts of a message to be lost. To avoid fragmentation over UDP it is better to truncate the DNS message and set the Truncation Flag in the DNS response. This tells the recipient that more data is available if the query is retried over TCP.
DNS Flag Day 2020 wants to ensure that DNS message fragmentation does not happen. When larger DNS messages need to be sent, we need to ensure it can be done reliably over TCP.
DNS servers need to support DNS message transport over TCP in order to be compliant with this year’s flag day. Also, DNS messages sent over UDP must never exceed the limit over which they risk being fragmented.
Cloudflare authoritative DNS and 126.96.36.199
We fully support the DNS Flag Day initiative, as it aims to make DNS more reliable and robust, and it ensures a common set of features for the DNS community to evolve on. In the DNS ecosystem, we are as much a client as we are a provider. When we perform DNS lookups on behalf of our customers and users, we rely on other providers to follow standards and be compliant. When they are not, and we can’t work around the issues, it leads to problems resolving names and reaching resources.
Both our public resolver 188.8.131.52 as well as our authoritative DNS service, set and enforce reasonable limits on DNS message sizes when sent over UDP. Of course, both services are available over TCP. If you’re already using Cloudflare, there is nothing you need to do but to keep using our DNS services! We will continually work on improving DNS.
If you already understand how Secondary DNS works, please feel free to skip this section. It does not provide any Cloudflare-specific information.
Secondary DNS has many use cases across the Internet; however, traditionally, it was used as a synchronized backup for when the primary DNS server was unable to respond to queries. A more modern approach involves focusing on redundancy across many different nameservers, which in many cases broadcast the same anycasted IP address.
Secondary DNS involves the unidirectional transfer of DNS zones from the primary to the Secondary DNS server(s). One primary can have any number of Secondary DNS servers that it must communicate with in order to keep track of any zone updates. A zone update is considered a change in the contents of a zone, which ultimately leads to a Start of Authority (SOA) serial number increase. The zone’s SOA serial is one of the key elements of Secondary DNS; it is how primary and secondary servers synchronize zones. Below is an example of what an SOA record might look like during a dig query.
example.com 3600 IN SOA ashley.ns.cloudflare.com. dns.cloudflare.com.
2034097105 // Serial
10000 // Refresh
2400 // Retry
604800 // Expire
3600 // Minimum TTL
Each of the numbers is used in the following way:
Serial – Used to keep track of the status of the zone, must be incremented at every change.
Refresh – The maximum number of seconds that can elapse before a Secondary DNS server must check for a SOA serial change.
Retry – The maximum number of seconds that can elapse before a Secondary DNS server must check for a SOA serial change, after previously failing to contact the primary.
Expire – The maximum number of seconds that a Secondary DNS server can serve stale information, in the event the primary cannot be contacted.
Minimum TTL – Per RFC 2308, the number of seconds that a DNS negative response should be cached for.
Using the above information, the Secondary DNS server stores an SOA record for each of the zones it is tracking. When the serial increases, it knows that the zone must have changed, and that a zone transfer must be initiated.
Serial increases can be detected in the following ways:
The fastest way for the Secondary DNS server to keep track of a serial change is to have the primary server NOTIFY them any time a zone has changed using the DNS protocol as specified in RFC 1996, Secondary DNS servers will instantly be able to initiate a zone transfer.
Another way is for the Secondary DNS server to simply poll the primary every “Refresh” seconds. This isn’t as fast as the NOTIFY approach, but it is a good fallback in case the notifies have failed.
One of the issues with the basic NOTIFY protocol is that anyone on the Internet could potentially notify the Secondary DNS server of a zone update. If an initial SOA query is not performed by the Secondary DNS server before initiating a zone transfer, this is an easy way to perform an amplification attack. There is two common ways to prevent anyone on the Internet from being able to NOTIFY Secondary DNS servers:
Using transaction signatures (TSIG) as per RFC 2845. These are to be placed as the last record in the extra records section of the DNS message. Usually the number of extra records (or ARCOUNT) should be no more than two in this case.
Using IP based access control lists (ACL). This increases security but also prevents flexibility in server location and IP address allocation.
Generally NOTIFY messages are sent over UDP, however TCP can be used in the event the primary server has reason to believe that TCP is necessary (i.e. firewall issues).
In addition to serial tracking, it is important to ensure that a standard protocol is used between primary and Secondary DNS server(s), to efficiently transfer the zone. DNS zone transfer protocols do not attempt to solve the confidentiality, authentication and integrity triad (CIA); however, the use of TSIG on top of the basic zone transfer protocols can provide integrity and authentication. As a result of DNS being a public protocol, confidentiality during the zone transfer process is generally not a concern.
Authoritative Zone Transfer (AXFR)
AXFR is the original zone transfer protocol that was specified in RFC 1034 and RFC 1035 and later further explained in RFC 5936. AXFR is done over a TCP connection because a reliable protocol is needed to ensure packets are not lost during the transfer. Using this protocol, the primary DNS server will transfer all of the zone contents to the Secondary DNS server, in one connection, regardless of the serial number. AXFR is recommended to be used for the first zone transfer, when none of the records are propagated, and IXFR is recommended after that.
Incremental Zone Transfer (IXFR)
IXFR is the more sophisticated zone transfer protocol that was specified in RFC 1995. Unlike the AXFR protocol, during an IXFR, the primary server will only send the secondary server the records that have changed since its current version of the zone (based on the serial number). This means that when a Secondary DNS server wants to initiate an IXFR, it sends its current serial number to the primary DNS server. The primary DNS server will then format its response based on previous versions of changes made to the zone. IXFR messages must obey the following pattern:
Current latest SOA
Secondary server current SOA
DNS record deletions
Secondary server current SOA + changes
DNS record additions
Current latest SOA
Steps 2,3,4,5,6 can be repeated any number of times, as each of those represents one change set of deletions and additions, ultimately leading to a new serial.
IXFR can be done over UDP or TCP, but again TCP is generally recommended to avoid packet loss.
How Does Secondary DNS Work at Cloudflare?
The DNS team loves microservice architecture! When we initially implemented Secondary DNS at Cloudflare, it was done using Mesos Marathon. This allowed us to separate each of our services into several different marathon apps, individually scaling apps as needed. All of these services live in our core data centers. The following services were created:
Zone Transferer – responsible for attempting IXFR, followed by AXFR if IXFR fails.
Zone Transfer Scheduler – responsible for periodically checking zone SOA serials for changes.
Rest API – responsible for registering new zones and primary nameservers.
In addition to the marathon apps, we also had an app external to the cluster:
Notify Listener – responsible for listening for notifies from primary servers and telling the Zone Transferer to initiate an AXFR/IXFR.
Each of these microservices communicates with the others through Kafka.
Once the zone transferer completes the AXFR/IXFR, it then passes the zone through to our zone builder, and finally gets pushed out to our edge at each of our 200 locations.
Although this current architecture worked great in the beginning, it left us open to many vulnerabilities and scalability issues down the road. As our Secondary DNS product became more popular, it was important that we proactively scaled and reduced the technical debt as much as possible. As with many companies in the industry, Cloudflare has recently migrated all of our core data center services to Kubernetes, moving away from individually managed apps and Marathon clusters.
What this meant for Secondary DNS is that all of our Marathon-based services, as well as our NOTIFY Listener, had to be migrated to Kubernetes. Although this long migration ended up paying off, many difficult challenges arose along the way that required us to come up with unique solutions in order to have a seamless, zero downtime migration.
Challenges When Migrating to Kubernetes
Although the entire DNS team agreed that kubernetes was the way forward for Secondary DNS, it also introduced several challenges. These challenges arose from a need to properly scale up across many distributed locations while also protecting each of our individual data centers. Since our core does not rely on anycast to automatically distribute requests, as we introduce more customers, it opens us up to denial-of-service attacks.
The two main issues we ran into during the migration were:
How do we create a distributed and reliable system that makes use of kubernetes principles while also making sure our customers know which IPs we will be communicating from?
When opening up a public-facing UDP socket to the Internet, how do we protect ourselves while also preventing unnecessary spam towards primary nameservers?.
As was previously mentioned, one form of protection in the Secondary DNS protocol is to only allow certain IPs to initiate zone transfers. There is a fine line between primary servers allow listing too many IPs and them having to frequently update their IP ACLs. We considered several solutions:
Allowlist all Cloudflare IPs and dynamically update
Proxy egress traffic
Ultimately we decided to proxy our egress traffic from k8s, to the DNS primary servers, using static proxy addresses. Shadowsocks-libev was chosen as the SOCKS5 implementation because it is fast, secure and known to scale. In addition, it can handle both UDP/TCP and IPv4/IPv6.
The partnership of k8s and Shadowsocks combined with a large enough IP range brings many benefits:
Efficient load balancing
Primary server ACLs only need to be updated once
It allows us to make use of kubernetes for both the Zone Transferer and the Local ShadowSocks Proxy.
Shadowsocks proxy can be reused by many different Cloudflare services.
The Notify Listener requires listening on static IPs for NOTIFY Messages coming from primary DNS servers. This is mostly a solved problem through the use of k8s services of type loadbalancer, however exposing this service directly to the Internet makes us uneasy because of its susceptibility to attacks. Fortunately DDoS protection is one of Cloudflare’s strengths, which lead us to the likely solution of dogfooding one of our own products, Spectrum.
Spectrum provides the following features to our service:
Figure 3 shows two interesting attributes of the system:
Spectrum <-> k8s IPv4 only:
This is because our custom k8s load balancer currently only supports IPv4; however, Spectrum has no issue terminating the IPv6 connection and establishing a new IPv4 connection.
Spectrum <-> k8s routing decisions based of L4 protocol:
This is because k8s only supports one of TCP/UDP/SCTP per service of type load balancer. Once again, spectrum has no issues proxying this correctly.
One of the problems with using a L4 proxy in between services is that source IP addresses get changed to the source IP address of the proxy (Spectrum in this case). Not knowing the source IP address means we have no idea who sent the NOTIFY message, opening us up to attack vectors. Fortunately, Spectrum’s proxy protocol feature is capable of adding custom headers to TCP/UDP packets which contain source IP/Port information.
As we are using miekg/dns for our Notify Listener, adding proxy headers to the DNS NOTIFY messages would cause failures in validation at the DNS server level. Alternatively, we were able to implement custom read and write decorators that do the following:
Reader: Extract source address information on inbound NOTIFY messages. Place extracted information into new DNS records located in the additional section of the message.
Writer: Remove additional records from the DNS message on outbound NOTIFY replies. Generate a new reply using proxy protocol headers.
There is no way to spoof these records, because the server only permits two extra records, one of which is the optional TSIG. Any other records will be overwritten.
This custom decorator approach abstracts the proxying away from the Notify Listener through the use of the DNS protocol.
Although knowing the source IP will block a significant amount of bad traffic, since NOTIFY messages can use both UDP and TCP, it is prone to IP spoofing. To ensure that the primary servers do not get spammed, we have made the following additions to the Zone Transferer:
Always ensure that the SOA has actually been updated before initiating a zone transfer.
Only allow at most one working transfer and one scheduled transfer per zone.
Additional Technical Challenges
Zone Transferer Scheduling
As shown in figure 1, there are several ways of sending Kafka messages to the Zone Transferer in order to initiate a zone transfer. There is no benefit in having a large backlog of zone transfers for the same zone. Once a zone has been transferred, assuming no more changes, it does not need to be transferred again. This means that we should only have at most one transfer ongoing, and one scheduled transfer at the same time, for any zone.
If we want to limit our number of scheduled messages to one per zone, this involves ignoring Kafka messages that get sent to the Zone Transferer. This is not as simple as ignoring specific messages in any random order. One of the benefits of Kafka is that it holds on to messages until the user actually decides to acknowledge them, by committing that messages offset. Since Kafka is just a queue of messages, it has no concept of order other than first in first out (FIFO). If a user is capable of reading from the Kafka topic concurrently, it is entirely possible that a message in the middle of the queue be committed before a message at the end of the queue.
Most of the time this isn’t an issue, because we know that one of the concurrent readers has read the message from the end of the queue and is processing it. There is one Kubernetes-related catch to this issue, though: pods are ephemeral. The kube master doesn’t care what your concurrent reader is doing, it will kill the pod and it’s up to your application to handle it.
Consider the following problem:
Read offset 1. Start transferring zone 1.
Read offset 2. Start transferring zone 2.
Zone 2 transfer finishes. Commit offset 2, essentially also marking offset 1.
Read offset 3 Start transferring zone 3.
If these events happen, zone 1 will never be transferred. It is important that zones stay up to date with the primary servers, otherwise stale data will be served from the Secondary DNS server. The solution to this problem involves the use of a list to track which messages have been read and completely processed. In this case, when a zone transfer has finished, it does not necessarily mean that the kafka message should be immediately committed. The solution is as follows:
Keep a list of Kafka messages, sorted based on offset.
If finished transfer, remove from list:
If the message is the oldest in the list, commit the messages offset.
This solution is essentially soft committing Kafka messages, until we can confidently say that all other messages have been acknowledged. It’s important to note that this only truly works in a distributed manner if the Kafka messages are keyed by zone id, this will ensure the same zone will always be processed by the same Kafka consumer.
Life of a Secondary DNS Request
Although Cloudflare has a large global network, as shown above, the zone transferring process does not take place at each of the edge datacenter locations (which would surely overwhelm many primary servers), but rather in our core data centers. In this case, how do we propagate to our edge in seconds? After transferring the zone, there are a couple more steps that need to be taken before the change can be seen at the edge.
Zone Builder – This interacts with the Zone Transferer to build the zone according to what Cloudflare edge understands. This then writes to Quicksilver, our super fast, distributed KV store.
Authoritative Server – This reads from Quicksilver and serves the built zone.
What About Performance?
At the time of writing this post, according to dnsperf.com, Cloudflare leads in global performance for both Authoritative and Resolver DNS. Here, Secondary DNS falls under the authoritative DNS category here. Let’s break down the performance of each of the different parts of the Secondary DNS pipeline, from the primary server updating its records, to them being present at the Cloudflare edge.
Primary Server to Notify Listener – Our most accurate measurement is only precise to the second, but we know UDP/TCP communication is likely much faster than that.
NOTIFY to Zone Transferer – This is negligible
Zone Transferer to Primary Server – 99% of the time we see ~800ms as the average latency for a zone transfer.
4. Zone Transferer to Zone Builder – 99% of the time we see ~10ms to build a zone.
5. Zone Builder to Quicksilver edge: 95% of the time we see less than 1s propagation.
End to End latency: less than 5 seconds on average. Although we have several external probes running around the world to test propagation latencies, they lack precision due to their sleep intervals, location, provider and number of zones that need to run. The actual propagation latency is likely much lower than what is shown in figure 10. Each of the different colored dots is a separate data center location around the world.
An additional test was performed manually to get a real world estimate, the test had the following attributes:
Primary server: NS1 Number of records changed: 1 Start test timer event: Change record on NS1 Stop test timer event: Observe record change at Cloudflare edge using dig Recorded timer value: 6 seconds
Cloudflare serves 15.8 trillion DNS queries per month, operating within 100ms of 99% of the Internet-connected population. The goal of Cloudflare operated Secondary DNS is to allow our customers with custom DNS solutions, be it on-premise or some other DNS provider, to be able to take advantage of Cloudflare’s DNS performance and more recently, through Secondary Override, our proxying and security capabilities too. Secondary DNS is currently available on the Enterprise plan, if you’d like to take advantage of it, please let your account team know. For additional documentation on Secondary DNS, please refer to our support article.
In a traditional sense, secondary DNS servers act as a backup to the primary authoritative DNS server. When a change is made to the records on the primary server, a zone transfer occurs, synchronizing the secondary DNS servers with the primary server. The secondary servers can then serve the records as if they were the primary server, however changes can only be made by the primary server, not the secondary servers. This creates redundancy across many different servers that can be distributed as necessary.
There are many common ways to take advantage of Secondary DNS, some of which are:
Secondary DNS as passive backup – The secondary DNS server sits idle until the primary server goes down, at which point a failover can occur and the secondary can start serving records.
Secondary DNS as active backup – The secondary DNS server works alongside the primary server to serve records.
Secondary DNS with a hidden primary – The nameserver records at the registrar point towards the secondary servers only, essentially treating them as the primary nameservers.
What is secondary DNS Override?
Secondary DNS Override builds on the Secondary DNS with a hidden primary model by allowing our customers to not only serve records as they tell us to, but also enable them to proxy any A/AAAA/CNAME records through Cloudflare’s network. This is similar to how Cloudflare as a primary DNS provider currently works.
Consider the following example:
example.com Cloudflare IP – 192.0.2.0 example.com origin IP – 203.0.113.0
In order to take advantage of Cloudflare’s security and performance services, we need to make sure that the origin IP stays hidden from the Internet.
Figure 1 shows that without a hidden primary nameserver, the resolver can choose to query either one. This opens up two issues:
Violates RFC 1034 and RFC 2182 because the Cloudflare server will be responding differently than the primary nameserver.
The origin IP will be exposed to the Internet.
Figure 2 shows the resolver always querying the Cloudflare Secondary DNS server.
How Does Secondary DNS Override work
The Secondary DNS Override UI looks similar to the primary UI, the only difference is that records cannot be edited.
In figure 3, all of the records have been transferred from the primary DNS server. test-orange and test-aaaa-orange have been overridden to proxy through the cloudflare network, while test-grey and test-mx are treated as regular DNS records.
Behind the scenes we store override records that pair with transferred records based on the name. For secondary override we don’t care about the type when overriding records, because of two things:
According to RFC 1912 you cannot have a CNAME record with the same name as any other record. (This does not apply to some DNSSEC records, see RFC 2181)
A and AAAA records are both address type records which should be either all proxied or all not proxied under the same name.
This means if you have several A and several AAAA records all with the name “example.com”, if one of them is proxied, all of them will be proxied. The UI helps abstract the idea that we are storing additional override records through the “orange cloud” button, which when clicked, will create an override record which applies to all A/AAAA or CNAME records with that name.
CNAME at the Apex
Normally, putting a CNAME at the apex of a zone is not allowed. For example:
example.com CNAME other-domain.com
Is not allowed because this means that there will be at least one other SOA record and one other NS record with the same name, disobeying RFC 1912 as mentioned above. Cloudflare can overcome this through the use of CNAME Flattening, which is a common technique used within the primary DNS product today. CNAME flattening allows us to return address records instead of the CNAME record when a query comes into our authoritative server.
Contrary to what was said above regarding the prevention of editing records through the Secondary DNS Override UI, the CNAME at the apex is the one exception to this rule. Users are able to create a CNAME at the apex in addition to the regular secondary DNS records, however the same rules defined in RFC 1912 also apply here. What this means is that the CNAME at the apex record can be treated as a regular DNS record or a proxied record, depending on what the user decides. Regardless of the proxy status of the CNAME at the apex record, it will override any other A/AAAA records that have been transferred from the primary DNS server.
Merging Secondary, Override and CNAME at Apex Records
At record edit time we do all of the merging of the secondary, override and CNAME at the apex records. This means that when a DNS request comes in to our authoritative server at the edge, we can still return the records in blazing fast times. The workflow is shown in figure 4.
Within the zone builder the steps are as follows:
Check if there is any CNAME at the apex, if so, override all other A/AAAA secondary records at the apex.
For each secondary record, check if there is a matching override record, if so, apply the proxy status of the override record to all secondary records with that name.
Leave all other secondary records as is.
Secondary DNS Override is a great option for any users that want to take advantage of the Cloudflare network, without transferring all of their zones to Cloudflare DNS as a primary provider. Security and access control can be managed on the primary side, without worrying about unauthorized edits of information on the Cloudflare side.
Secondary DNS Override is currently available on the Enterprise plan, if you’d like to take advantage of it, please let your account team know. For additional documentation on Secondary DNS Override, please refer to our support article.
Today we’re thrilled to announce general availability of Bring Your Own IP (BYOIP) across our Layer 7 products as well as Spectrum and Magic Transit services. When BYOIP is configured, the Cloudflare edge will announce a customer’s own IP prefixes and the prefixes can be used with our Layer 7 services, Spectrum, or Magic Transit. If you’re not familiar with the term, an IP prefix is a range of IP addresses. Routers create a table of reachable prefixes, known as a routing table, to ensure that packets are delivered correctly across the Internet.
As part of this announcement, we are listing BYOIP on the relevant product pages, developer documentation, and UI support for controlling your prefixes. Previous support was API only.
Customers choose BYOIP with Cloudflare for a number of reasons. It may be the case that your IP prefix is already allow-listed in many important places, and updating firewall rules to also allow Cloudflare address space may represent a large administrative hurdle. Additionally, you may have hundreds of thousands, or even millions, of end users pointed directly to your IPs via DNS, and it would be hugely time consuming to get them all to update their records to point to Cloudflare IPs.
Over the last several quarters we have been building tooling and processes to support customers bringing their own IPs at scale. At the time of writing this post we’ve successfully onboarded hundreds of customer IP prefixes. Of these, 84% have been for Magic Transit deployments, 14% for Layer 7 deployments, and 2% for Spectrum deployments.
When you BYOIP with Cloudflare, this means we announce your IP space in over 200 cities around the world and tie your IP prefix to the service (or services!) of your choosing. Your IP space will be protected and accelerated as if they were Cloudflare’s own IPs. We can support regional deployments for BYOIP prefixes as well if you have technical and/or legal requirements limiting where your prefixes can be announced, such as data sovereignty.
You can turn on advertisement of your IPs from the Cloudflare edge with a click of a button and be live across the world in a matter of minutes.
All BYOIP customers receive network analytics on their prefixes. Additionally all IPs in BYOIP prefixes can be considered static IPs. There are also benefits specific to the service you use with your IP prefix on Cloudflare.
Layer 7 + BYOIP:
Cloudflare has a robust Layer 7 product portfolio, including products like Bot Management, Rate Limiting, Web Application Firewall, and Content Delivery, to name just a few. You can choose to BYOIP with our Layer 7 products and receive all of their benefits on your IP addresses.
For Layer 7 services, we can support a variety of IP to domain mapping requests including sharing IPs between domains or putting domains on dedicated IPs, which can help meet requirements for things such as non-SNI support.
If you are also an SSL for SaaS customer, using BYOIP, you have increased flexibility to change IP address responses for custom_hostnames in the event an IP is unserviceable for some reason.
Spectrum + BYOIP:
Spectrum is Cloudflare’s solution to protect and accelerate applications that run any UDP or TCP protocol. The Spectrum API supports BYOIP today. Spectrum customers who use BYOIP can specify, through Spectrum’s API, which IPs they would like associated with a Spectrum application.
Magic Transit + BYOIP:
Magic Transit is a Layer 3 security service which processes all your network traffic by announcing your IP addresses and attracting that traffic to the Cloudflare edge for processing. Magic Transit supports sophisticated packet filtering and firewall configurations. BYOIP is a requirement for using the Magic Transit service. As Magic Transit is an IP level service, Cloudflare must be able to announce your IPs in order to provide this service
Bringing Your IPs to Cloudflare: What is Required?
Before Cloudflare can announce your prefix we require some documentation to get started. The first is something called a ‘Letter of Authorization’ (LOA), which details information about your prefix and how you want Cloudflare to announce it. We then share this document with our Tier 1 transit providers in advance of provisioning your prefix. This step is done to ensure that Tier 1s are aware we have authorization to announce your prefixes.
Secondly, we require that your Internet Routing Registry (IRR) records are up to date and reflect the data in the LOA. This typically means ensuring the entry in your regional registry is updated (i.e. ARIN, RIPE, APNIC).
Once the administrivia is out of the way, work with your account team to learn when your prefixes will be ready to announce.
We also encourage customers to use RPKI and can support this for customer prefixes. We have blogged and built extensive tooling to make adoption of this protocol easier. If you’re interested in BYOIP with RPKI support just let your account team know!
Each customer prefix can be announced via the ‘dynamic advertisement’ toggle in either the UI or API, which will cause the Cloudflare edge to either announce or withdraw a prefix on your behalf. This can be done as soon as your account team lets you know your prefixes are ready to go.
Once the IPs are ready to be announced, you may want to set up ‘delegations’ for your prefixes. Delegations manage how the prefix can be used across multiple Cloudflare accounts and have slightly different implications depending on which service your prefix is bound to. A prefix is owned by a single account, but a delegation can extend some of the prefix functionality to other accounts. This is also captured on our developer docs. Today, delegations can affect Layer 7 and Spectrum BYOIP prefixes.
Layer 7: If you use BYOIP + Layer 7 and also use the SSL for SaaS service, a delegation to another account will allow that account to also use that prefix to validate custom hostnames in addition to the original account which owns the prefix. This means that multiple accounts can use the same IP prefix to serve up custom hostname traffic. Additionally, all of your IPs can serve traffic for custom hostnames, which means you can easily change IP addresses for these hostnames if an IP is blocked for any reason.
Spectrum: If you used BYOIP + Spectrum, via the Spectrum API, you can specify which IP in your prefix you want to create a Spectrum app with. If you create a delegation for prefix to another account, that second account will also be able to specify an IP from that prefix to create an app.
If you are interested in learning more about BYOIP across either Magic Transit, CDN, or Spectrum, please reach out to your account team if you’re an existing customer or contact [email protected] if you’re a new prospect.
In DNS, nameservers are responsible for serving DNS records for a zone. How the DNS records populate into the nameservers differs based on the type of nameserver.
A primary master is a nameserver that manages a zone’s DNS records. This is where the zone file is maintained and where DNS records are added, removed, and modified. However, relying on one DNS server can be risky. What if that server goes down, or your DNS provider has an outage? If you run a storefront, then your customers would have to wait until your DNS server is back up to access your site. If your website were a brick and mortar store, this would be effectively like boarding up the door while customers are trying to get in.This type of outage can be very costly.
Now imagine you have another DNS server that has a replica of your DNS records. Wouldn’t it be great to have it as a back-up if your primary nameserver went down? Or better yet, what if both served your DNS records at all times— this could help decrease the latency of DNS requests, distribute the load between DNS servers, and add resiliency to your infrastructure! And that’s precisely what Secondary DNS nameservers were built for.
As businesses grow, they often scale their DNS infrastructure. We’re seeing more customers move away from two or three on-premise DNS servers to using a managed DNS provider to having multiple DNS vendors—all to increase redundancy against the possibility of a DDoS attack taking down one of their providers. Cloudflare has data centers in over 200 cities, all of which run our DNS software allowing our authoritative DNS customers to benefit from DNS lookups averaging around 11ms globally. So we decided to expand this functionality to customers who want to use more than one DNS provider, or for those that find it too complicated to move away from their on-premise DNS server.
When we first built our secondary DNS product, our MVP was focused on functionality and not ease of use. We did this because we thought that this feature would be used by a small portion of our Enterprise customers and that they would be comfortable using the API. But the demand for secondary DNS was far greater than we initially imagined. Many customers are interested in the service, including those who aren’t comfortable managing DNS through the API.
Previously, setting up secondary DNS on a zone required a series of API calls: one for creating the zone, one for defining the IP address and settings of the master server, one for linking the master(s) to the zone, and one for initiating a zone transfer.
We heard from customers that this experience was frustrating. There were also a lot of places where the setup could go wrong: some customers would forget to link a master to their zone, others would forget a step when adding subsequent zones, and still others would have to spend hours debugging a typo in their API call. We believe secondary DNS customers should have as seamless an experience as our authoritative DNS customers, and shouldn’t be treated as secondary (pun intended) class citizens. When creating the onboarding UI, we asked ourselves, how can we simplify the experience to just a few input fields? How do we prevent customers from making easy, potentially messy mistakes, like forgetting to attach a master?
Enter: The new Secondary DNS Onboarding Experience
Starting today, enterprise customers who are entitled to secondary DNS will be able to configure their zone in the Cloudflare Dashboard. The time from when they type in their domain name to when they see their records in the dashboard is less than two minutes. We’ve added error prevention to stop customers from adding their zone until they’ve configured at least one master. Customers will also be able to review their transferred records before finishing the onboarding process, allowing them to see what was transferred, without juggling API calls and and switching back and forth between the dashboard and a support article.
How It Looks
The “Add Site” flow in the Cloudflare Dashboard gives customers two options: Authoritative or Secondary DNS. Next, they will need to fill out the IP address of their master server, attach a TSIG (Transactional Signature) to authenticate zone transfers, and voila! In just a few clicks, records populate to your DNS table.
The Intricacies of Secondary DNS
As mentioned above, primary nameservers are where DNS records are managed, and secondary nameservers are responsible for holding the read-only replica of those records. But how do they get there? The communication between a primary master and a secondary nameserver is known as a zone transfer.
Master servers use SOA (Start of Authority) records to keep track of zone updates. Every time a zone file changes (say you add or remove a DNS record), the serial number of the SOA record is incremented as a way to signal secondary nameservers that the zone updated, and it’s time to fetch a fresh copy.
Primary masters can send a NOTIFY message to a secondary master to signal a zone file change. Once the secondary receives the NOTIFY, it will do an SOA sanity check against the master and perform a zone transfer if it sees that the SOA value has increased. An AXFR or IXFR query can initiate the zone transfer. An AXFR query initiates a full zone transfer and is usually requested the very first time a zone is transferred. But AXFR transfers are not always necessary as most zone file changes are minute. This is why IXFR (incremental zone transfer) requests were created— they tell a master server which version of the zone a secondary currently holds and the master sends the difference between the new version and the one the secondary has— this way only the new changes are transferred.
Some masters, unfortunately, do not support NOTIFY queries. This means that instead of the master notifying the secondary of zone updates, the secondary needs to periodically check the SOA of the primary server to see if the value has changed.
Securing Zone Transfers
Zone transfers between a primary and secondary server are unauthenticated on their own. TSIGs (Transactional Signatures) were developed as a means to add authentication to the DNS protocol, and have mostly been used for zone transfers. They provide mutual authentication between a client and a server by using a shared secret between the two parties and a one-way keyed hash function, which is attached as a TSIG record to a DNS message. The TSIG record guarantees that only secondary nameservers with the TSIG can pull zone transfers from a master. And vice versa, secondary servers will only accept zone transfers from masters that have the proper TSIG attached. Additionally, TSIGs provide data integrity and ensure that the DNS message was not modified en route.
We support TSIGs and highly recommend that you add it when configuring your master.
Extending DNS Analytics to Secondary DNS
Setting up a secondary zone on Cloudflare is a simple process with the new onboarding UI. In just a few clicks, Cloudflare’s nameservers in all 200+ cities will begin responding to DNS queries. In addition to serving DNS records, secondary DNS customers will also be able to see the same DNS analytics that we provide to our authoritative DNS customers. The analytics show a breakdown of DNS traffic by record type, response code, and even geographical regions.
One of our customers, Big Cartel, runs an E-commerce platform that has helped people all over the world sell $2.5 billion of their work since 2005. As they grow, Cloudflare’s secondary DNS product helps keep their site fast and reliable:
“At Big Cartel, we provide an online storefront for our customers. We need to be always available and avoid any chances of downtime — eliminating all single points of failure is critical for us. With Cloudflare’s Secondary DNS, we can do just that! It keeps our DNS infrastructure more resilient while allowing our customers to benefit from fast query times. Additionally, using Cloudflare’s Secondary DNS analytics provides granular insights into how our traffic is balanced between our DNS providers” – Lee Jensen, Technical Director
Secondary DNS is currently available on the Enterprise plan, if you’d like to take advantage of it, please let your account team know. For additional documentation on Secondary DNS, please refer to our support article.
DNS is the very first step in accessing any website, API, or pretty much anything on the Internet, which makes it mission-critical to keeping your site up and running. This week, we are launching two significant changes that allow our customers to better maintain and update their DNS records. For customers who use Cloudflare as their authoritative DNS provider, we’ve added a much asked for feature: confirmation to DNS record edits. For our secondary DNS customers, we’re excited to provide a brand new onboarding experience.
Confirm and Commit
One of the benefits of using Cloudflare DNS is that changes quickly propagate to our 200+ data centers. And I mean very quickly: DNS propagation typically takes <5 seconds worldwide. Our UI was set up to allow customers to edit records, click out of the input box, and boom! The record has propagated!
There are a lot of advantages to fast DNS, but there’s also one clear downside – it leaves room for fat fingering. What if you accidentally toggle the proxy icon, or mistype the content of your DNS record? This could result in users not being able to access your website or API and could cause a significant outage. To protect customers from these kinds of mistakes, we’ve added a Save button for DNS record changes.
Now editing records in the DNS table allows you to take an extra look before committing the change.
The new confirmation layout applies to all record types and affects any content, TTL, or proxy status changes.
Let us know what you think by filling out the feedback survey linked at the top of the DNS tab in the dashboard.
Like many who are able, I am working remotely and in this post, I describe some of the ways to deploy Cloudflare Gateway directly from your home. Gateway’s DNS filtering protects networks from malware, phishing, ransomware and other security threats. It’s not only for corporate environments – it can be deployed on your browser or laptop to protect your computer or your home WiFi. Below you will learn how to deploy Gateway, including, but not limited to, DNS over HTTPS (DoH) using a Raspberry Pi, Pi-hole and DNSCrypt.
We recently launched Cloudflare Gateway and shortly thereafter, offered it for free until at least September to any company in need. Cloudflare leadership asked the global Solutions Engineering (SE) team, amongst others, to assist with the incoming onboarding calls. As an SE at Cloudflare, our role is to learn new products, such as Gateway, to educate, and to ensure the success of our prospects and customers. We talk to our customers daily, understand the challenges they face and consult on best practices. We were ready to help!
One way we stay on top of all the services that Cloudflare provides, is by using them ourselves. In this blog, I’ll talk about my experience setting up Cloudflare Gateway.
Gateway sits between your users, device or network and the public Internet. Once you setup Cloudflare Gateway, the service will inspect and manage all Internet-bound DNS queries. In simple terms, you point your recursive DNS to Cloudflare and we enforce policies you create, such as activating SafeSearch, an automated filter for adult and offensive content that’s built into popular search engines like Google, Bing, DuckDuckGo, Yandex and others.
There are various methods and locations DNS filtering can be enabled, whether it’s on your entire laptop, each of your individual browsers and devices or your entire home network. With DNS filtering front of mind, including DoH, I explored each model. The model you choose ultimately depends on your objective.
Butfirst, let’s review what DNS and DNS over HTTPS are.
DNS is the protocol used to resolve hostnames (like www.cloudflare.com) into IP addresses so computers can talk to each other. DNS is an unencrypted clear text protocol, meaning that any eavesdropper or machine between the client and DNS server can see the contents of the DNS request. DNS over HTTPS adds security to DNS and encrypt DNS queries using HTTPS (the protocol we use to encrypt the web).
Configuring a Gateway location, shown below, is the first step.
Conceptually similar to HTTPS traffic, when our edge receives an HTTPS request, we match the incoming SNI header to the correct domain’s configuration (or for plain text HTTP the Host header). And when our edge receives a DNS query, we need a similar mapping to identify the correct configuration. We attempt to match configurations, in this order:
DNS over HTTPS check and lookup based on unique hostname
IPv4 check and lookup based on source IPv4 address
Lookup based on IPv6 destination address
Let’s discuss each option.
DNS over HTTPS
The first attempt to match DNS requests to a location is pointing your traffic to a unique DNS over HTTPS hostname. After you configure your first location, you are given a unique destination IPv6 address and a unique DoH endpoint as shown below. Take note of the hostname as you will need it shortly. I’ll first discuss deploying Gateway in a browser and then to your entire network.
DNS over HTTPS is my favorite method for deploying Gateway and securing DNS queries at the same time. Enabling DoH prevents anyone but the DNS server of your choosing from seeing your DNS queries.
Enabling DNS over HTTPS in browsers
By enabling it in a browser, only queries issued in that browser are affected. It’s available in most browsers and there are quite a few tutorials online to show you how to turn it on.
Using Chrome as an example on behalf of all the Chromium-based browsers, enabling DNS over HTTPS is straightforward, but as you can see in the table above, there is one issue: Chrome does not currently support custom servers. So while it is great that a user can protect their DNS queries, they cannot choose the provider, including Gateway.
Here is how to enable DoH in Chromium based browsers:
Navigate to chrome://flags and toggle the beta flag to enabled.
Firefox is the exception to the rule because they support both DNS over HTTPS and the ability to define a custom server. Mozilla provides detailed instructions about how to get started.
Once enabled, navigate to Preferences -> General -> Network Security and select ‘Settings’. Scroll to the section ‘Enable DNS over HTTPS’, select ‘Custom’ and input your Gateway DoH address, as shown below:
Optionally, you can enable Encrypted SNI (ESNI), which is an IETF draft for encrypting the SNI headers, by toggling the ‘network.security.esni.enabled’ preference in about:config to ‘true’. Read more about how Cloudflare is one of the few providers that supports ESNI by default.
Congratulations, you’ve configured Gateway using DNS over HTTPS! Keep in mind that only queries issued from the configured browser will be secured. Any other device connected to your network such as your mobile devices, gaming platforms, or smart TVs will still use your network’s default DNS server, likely assigned by your ISP.
Configuring Gateway for your entire home or business network
Deploying Gateway at the router level allows you to secure every device on your network without needing to configure each one individually.
Access to your router’s administrative portal
A router that supports DHCP forwarding
Raspberry Pi with WiFi or Ethernet connectivity
There aren’t any consumer routers on the market that natively support DoH custom servers and likely few that natively support DoH at all. A newer router I purchased, the Netgear R7800, does not support either, but it is one of the most popular routers for flashing dd-wrt or open-wrt, which both support DoH. Unfortunately, neither of these popular firmwares support custom servers.
While it’s rare to find a router that supports DoH out of the box, DoH with custom servers, or has potential to be flashed, it’s common for a router to support DHCP forwarding (dd-wrt and open-wrt both support DHCP forwarding). So, I installed Pi-hole on my Raspberry Pi and used it as my home network’s DNS and DHCP server.
Getting started with Pi-hole and dnscrypt-proxy
If your Raspberry Pi is new and hasn’t been configured yet, follow their guide to get started. (Note: by default, ssh is disabled, so you will need a keyboard and/or mouse to access your box in your terminal.)
Once your Raspberry Pi has been initialized, assign it a static IP address in the same network as your router. I hardcoded my router’s LAN address to 192.168.1.2
Using vim: sudo vi /etc/dhcpcd.conf
Restart the service. sudo /etc/init.d/dhcpcd restart
Check that your static IP is configured correctly. ip addr show dev eth0
Go to Settings -> DNS to modify the upstream DNS provider, which we’ve just configured to be dnscrypt-proxy.
Change the upstream server to 127.0.0.1#5054 and hit save. If you want to deploy redundancy, add in a secondary address in Custom 2, such as 184.108.40.206 or Custom 3, such as your IPv6 destination address.
In Settings->DHCP, enable the DHCP server:
At this point, your Pi-hole server is running in isolation and we need to deploy it to your network. The simplest way to ensure your Pi-hole is being used exclusively by every device is to use your Pi-hole as both a DNS server and a DHCP server. I’ve found that routers behave oddly if you outsource DNS but not DHCP, so I outsource both.
After I enabled the DHCP server on the Pi-hole, I set the router’s configuration to DHCP forwarding and defined the Pi-hole static address:
After applying the routers configuration, I confirmed it was working properly by forgetting the network in my network settings and re-joining. This results in a new IPv4 address (from our new DHCP server) and if all goes well, a new DNS server (the IP of Pi-hole).
Now that our entire network is using Gateway, we can configure Gateway Policies to secure our DNS queries!
IPv4 check and lookup based on source IPv4 address
For this method to work properly, Gateway requires that your network has a static IPv4 address. If your IP address does not change, then this is the quickest solution (but still does not prevent third-parties from seeing what domains you are going to). However, if you are configuring Gateway in your home, like I am, and you don’t explicitly pay for this service, then most likely you have a dynamic IP address. These addresses will always change when your router restarts, intentionally or not.
Lookup based on IPv6 destination address
Another option for matching requests in Gateway is to configure your DNS server to point to a unique IPv6 address provided to you by Cloudflare. Any DNS query pointed to this address will be matched properly on our edge.
This might be a good option if you want to use Cloudflare Gateway on your entire laptop by setting your local DNS resolution to this address. However, if your home router or ISP does not support IPv6, DNS resolution won’t work.
In this blog post, we’ve discussed the various ways Gateway can be deployed and how DNS over HTTPS is one of the next big Internet privacy improvements. Deploying Gateway can be done on a per device basis, on your router or even with a Raspberry Pi.
A few months ago, Brian Krebs told the story of the domain corp.com, and how it is basically a security nightmare:
At issue is a problem known as “namespace collision,” a situation where domain names intended to be used exclusively on an internal company network end up overlapping with domains that can resolve normally on the open Internet.
Windows computers on an internal corporate network validate other things on that network using a Microsoft innovation called Active Directory, which is the umbrella term for a broad range of identity-related services in Windows environments. A core part of the way these things find each other involves a Windows feature called “DNS name devolution,” which is a kind of network shorthand that makes it easier to find other computers or servers without having to specify a full, legitimate domain name for those resources.
For instance, if a company runs an internal network with the name internalnetwork.example.com, and an employee on that network wishes to access a shared drive called “drive1,” there’s no need to type “drive1.internalnetwork.example.com” into Windows Explorer; typing “\\drive1\” alone will suffice, and Windows takes care of the rest.
But things can get far trickier with an internal Windows domain that does not map back to a second-level domain the organization actually owns and controls. And unfortunately, in early versions of Windows that supported Active Directory — Windows 2000 Server, for example — the default or example Active Directory path was given as “corp,” and many companies apparently adopted this setting without modifying it to include a domain they controlled.
Compounding things further, some companies then went on to build (and/or assimilate) vast networks of networks on top of this erroneous setting.
Now, none of this was much of a security concern back in the day when it was impractical for employees to lug their bulky desktop computers and monitors outside of the corporate network. But what happens when an employee working at a company with an Active Directory network path called “corp” takes a company laptop to the local Starbucks?
Chances are good that at least some resources on the employee’s laptop will still try to access that internal “corp” domain. And because of the way DNS name devolution works on Windows, that company laptop online via the Starbucks wireless connection is likely to then seek those same resources at “corp.com.”
In practical terms, this means that whoever controls corp.com can passively intercept private communications from hundreds of thousands of computers that end up being taken outside of a corporate environment which uses this “corp” designation for its Active Directory domain.
Microsoft just bought it, so it wouldn’t fall into the hands of any bad actors:
In a written statement, Microsoft said it acquired the domain to protect its customers.
“To help in keeping systems protected we encourage customers to practice safe security habits when planning for internal domain and network names,” the statement reads. “We released a security advisory in June of 2009 and a security update that helps keep customers safe. In our ongoing commitment to customer security, we also acquired the Corp.com domain.”
Today we made a mistake. The mistake caused a number of LGBTQIA+ sites to inadvertently be blocked by the new 220.127.116.11 for Families service. I wanted to walk through what happened, why, and what we’ve done to fix it.
As is our tradition for the last three years, we roll out new products for the general public that uses the Internet on April 1. This year, one of those products was a filtered DNS service, 18.104.22.168 for Families. The service allows anyone who chooses to use it to restrict certain categories of sites.
Filtered vs Unfiltered DNS
Nothing about our new filtered DNS service changes the unfiltered nature of our original 22.214.171.124 service. However, we recognized that some people want a way to control what content is in their home. For instance, I block social media sites from resolving while I am trying to get work done because it makes me more productive. The number one request from users of 126.96.36.199 was that we create a version of the service for home use to block certain categories of sites. And so, earlier today, we launched 188.8.131.52 for Families.
Over time, we’ll provide the ability for users of 184.108.40.206 for Families to customize exactly what categories they block (e.g., do what I do with social media sites to stay productive). But, initially, we created two default settings that were the most requested types of content people wanted to block: Malware (which you can block by setting 220.127.116.11 and 18.104.22.168 as your DNS resolvers) and Malware + Adult Content (which you can block by setting 22.214.171.124 and 126.96.36.199 as your DNS resolvers).
Licensed Categorization Data
To get data for 188.8.131.52 for Families we licensed feeds from multiple different providers who specialize in site categorization. We spent the last several months reviewing classification providers to choose the ones that had the highest accuracy and lowest false positives.
Malware, encompassing a range of widely agreed upon cyber security threats, was the easier of the two categories to define. For Adult Content, we aimed to mirror the Google SafeSearch criteria. Google has been thoughtful in this area and their SafeSearch tool is designed to limit search results for “sexually explicit content.” The definition is focused on pornography and largely follows the requirements of the US Children’s Internet Protection Act (CIPA), which schools and libraries in the United States are required to follow.
Because it was the default for the 184.108.40.206 service, and because we planned in the future to allow individuals to set their own specifications beyond the default, we intended the Adult Content category to be narrow. What we did not intend to include in the Adult Content category was LGBTQIA+ content. And yet, when it launched, we were horrified to receive reports that those sites were being filtered.
Choosing the Wrong Feed
So what went wrong? The data providers that we license content from have different categorizations; those categorizations do not line up perfectly between different providers. One of the providers has multiple “Adult Content” categories. One “Adult Content” category includes content that mirrors the Google SafeSearch/CIPA definition. Another “Adult Content” content category includes a broader set of topics, including LGBTQIA+ sites.
While we had specifically reviewed the Adult Content category to ensure that it was narrowly tailored to mirror the Google SafeSearch/CIPA definition, when we released the production version this morning we included the wrong “Adult Content” category from the provider in the build. As a result, the first users who tried 220.127.116.11 saw a broader set of sites being filtered than was intended, including LGBTQIA+ content. We immediately worked to fix the issue.
Slow to Update Data Structures
In order to distribute the list of sites quickly to all our data centers we use a compact data structure. The upside is that we can replicate the data structure worldwide very efficiently. The downside is that generating a new version of the data structure takes several hours. The minute we saw that we’d made a mistake we pulled the incorrect data provider and began recreating the new data structure.
While the new data structure replicated across our network we pushed individual sites to an allow list immediately. We began compiling lists both from user reports as well as from other LGBTQIA+ resources. These updates went out instantly. We continuously added sites to the allow list as they were reported or we discovered them.
By 16:51 UTC, approximately two hours after we’d received the first report of the mistaken blocking, the data structure with the intended definition of Adult Content had been generated and we pushed it out live. The only users that would have seen over-broad blocking are those that had already switched to the 18.104.22.168 service. Users of 22.214.171.124 — which will remain unfiltered — and 126.96.36.199 would not have experienced this inadvertent blocking.
As of now, the filtering provided by the default setting of 188.8.131.52 is what we intended it to be, and should roughly match what you find if you use Google SafeSearch and LGBTQIA+ sites are not being blocked. If you see site being blocked that should not be, please report them to us here.
Going forward, we’ve set up a number of checks of known sites that should fall outside the intended categories, including many that we mistakenly listed today. Before defaults are updated in the future, our build system will confirm that none of these sites are listed. We hope this will help catch mistakes like this in the future.
I’m sorry for the error. While I understand how it happened, it should never have happened. I appreciate our team responding quickly to fix the mistake we made.
Two years ago today we announced 184.108.40.206, a secure, fast, privacy-first DNS resolver free for anyone to use. In those two years, 220.127.116.11 has grown beyond our wildest imagination. Today, we process more than 200 billion DNS requests per day making us the second largest public DNS resolver in the world behind only Google.
Yesterday, we announced the results of the 18.104.22.168 privacy examination. Cloudflare’s business has never involved selling user data or targeted advertising, so it was easy for us to commit to strong privacy protections for 22.214.171.124. We’ve also led the way supporting encrypted DNS technologies including DNS over TLS and DNS over HTTPS. It is long past time to stop transmitting DNS in plaintext and we’re excited that we see more and more encrypted DNS traffic every day.
126.96.36.199 for Families
Since launching 188.8.131.52, the number one request we have received is to provide a version of the product that automatically filters out bad sites. While 184.108.40.206 can safeguard user privacy and optimize efficiency, it is designed for direct, fast DNS resolution, not for blocking or filtering content. The requests we’ve received largely come from home users who want to ensure that they have a measure of protection from security threats and can keep adult content from being accessed by their kids. Today, we’re happy to answer those requests.
Introducing 220.127.116.11 for Families — the easiest way to add a layer of protection to your home network and protect it from malware and adult content. 18.104.22.168 for Families leverages Cloudflare’s global network to ensure that it is fast and secure around the world. And it includes the same strong privacy guarantees that we committed to when we launched 22.214.171.124 two years ago. And, just like 126.96.36.199, we’re providing it for free and it’s for any home anywhere in the world.
Two Flavors: 188.8.131.52 (No Malware) & 184.108.40.206 (No Malware or Adult Content)
220.127.116.11 for Families is easy to set up and install, requiring just changing two numbers in the settings of your home devices or network router: your primary DNS and your secondary DNS. Setting up 18.104.22.168 for Families usually takes less than a minute and we’ve provided instructions for common devices and routers through the installation guide.
22.214.171.124 for Families has two default options: one that blocks malware and the other that blocks malware and adult content. You choose which setting you want depending on which IP address you configure.
Malware Blocking Only Primary DNS: 126.96.36.199 Secondary DNS: 188.8.131.52
Malware and Adult Content Primary DNS: 184.108.40.206 Secondary DNS: 220.127.116.11
In the coming months, we will provide the ability to define additional configuration settings for 18.104.22.168 for Families. This will include options to create specific whitelists and blacklists of certain sites. You will be able to set the times of the day when categories, such as social media, are blocked and get reports on your household’s Internet usage.
22.214.171.124 for Families is built on top of the same site categorization and filtering technology that powers Cloudflare’s Gateway product. With the success of Gateway, we wanted to provide an easy-to-use service that can help any home network be fast, reliable, secure, and protected from potentially harmful content.
Not A Joke
Most of Cloudflare’s business involves selling services to businesses. However, we’ve made it a tradition every April 1 to launch a new consumer product that leverages our network to bring more speed, reliability, and security to every Internet user. While we make money selling to businesses, the products we launch at this time of the year are close to our hearts because of the broad impact they have for every Internet user.
This year, while many of us are confined to our homes, protecting our communities from COVID-19, and relying on our home networks more than ever it seemed especially important to launch 126.96.36.199 for Families. We hope during these troubled times it will help provide a bit of peace of mind for households everywhere.
Last April 1 we announced WARP — an option within the 188.8.131.52 iOS and Android app to secure and speed up Internet connections. Today, millions of users have secured their mobile Internet connections with WARP.
While WARP started as an option within the 184.108.40.206 app, it’s really a technology that can benefit any device connected to the Internet. In fact, one of the most common requests we’ve gotten over the last year is support for WARP for macOS and Windows. Today we’re announcing exactly that: the start of the WARP beta for macOS and Windows.
What’s The Same: Fast, Secure, and Free
We always wanted to build a WARP client for macOS and Windows. We started with mobile because it was the hardest challenge. And it turned out to be a lot harder than we anticipated. While we announced the beta of 220.127.116.11 with WARP on April 1, 2019 it took us until late September before we were able to open it up to general availability. We don’t expect the wait for macOS and Windows WARP to be nearly as long.
The WARP client for macOS and Windows relies on the same fast, efficient Wireguard protocol to secure Internet connections and keep them safe from being spied on by your ISP. Also, just like WARP on the 18.104.22.168 mobile app, the basic service will be free on macOS and Windows.
WARP+ Gets You There Faster
We plan to add WARP+ support in the coming months to allow you to leverage Cloudflare’s Argo network for even faster Internet performance. We will provide a plan option for existing WARP+ subscribers to add additional devices at a discount. In the meantime, existing WARP+ users will be among the first to be invited to try WARP for macOS and Windows. If you are a WARP+ subscriber, check your 22.214.171.124 app over the coming weeks for a link to an invitation to try the new WARP for macOS and Windows clients.
If you’re not a WARP+ subscriber, you can add yourself to the waitlist by signing up on the page linked below. We’ll email as soon as it’s ready for you to try.
We haven’t forgotten about Linux. About 10% of Cloudflare’s employees run Linux on their desktops. As soon as we get the macOS and Windows clients out we’ll turn our attention to building a WARP client for Linux.
Thank you to everyone who helped us make WARP fast, efficient, and reliable on mobile. It’s incredible how far it’s come over the last year. If you tried it early in the beta last year but aren’t using it now, I encourage you to give it another try. We’re looking forward to bringing WARP speed and security to even more devices.
On April 1, 2018, we took a big step toward improving Internet privacy and security with the launch of the 126.96.36.199 public DNS resolver — the Internet’s fastest, privacy-first public DNS resolver. And we really meant privacy first. We were not satisfied with the status quo and believed that secure DNS resolution with transparent privacy practices should be the new normal. So we committed to our public resolver users that we would not retain any personal data about requests made using our 188.8.131.52 resolver. We also built in technical measures to facilitate DNS over HTTPS to help keep your DNS queries secure. We’ve never wanted to know what individuals do on the Internet, and we took technical steps to ensure we can’t know.
We knew there would be skeptics. Many consumers believe that if they aren’t paying for a product, then they are the product. We don’t believe that has to be the case. So we committed to retaining a Big 4 accounting firm to perform an examination of our 184.108.40.206 resolver privacy commitments.
Today we’re excited to announce that the 220.127.116.11 resolver examination has been completed and a copy of the independent accountants’ report can be obtained from our compliance page.
The examination process
We gained a number of observations and lessons from the privacy examination of the 18.104.22.168 resolver. First, we learned that it takes much longer to agree on terms and complete an examination when you ask an accounting firm to do what we believe is the first of its kind examination of custom privacy commitments for a recursive resolver.
We also observed that privacy by design works. Not that we were surprised — we use privacy by design principles in all our products and services. Because we baked anonymization best practices into the 22.214.171.124 resolver when we built it, we were able to demonstrate that we didn’t have any personal data to sell. More specifically, in accordance with RFC 6235, we decided to truncate the client/source IP at our edge data centers so that we never store in non-volatile storage the full IP address of the 126.96.36.199 resolver user.
We knew that a truncated IP address would be enough to help us understand general Internet trends and where traffic is coming from. In addition, we also further improved our privacy-first approach by replacing the truncated IP address with the network number (the ASN) for our internal logs. On top of that, we committed to only retaining those anonymized logs for a limited period of time. It’s the privacy version of belt plus suspenders plus another belt.
Finally, we learned that aligning our examination of the 188.8.131.52 resolver with our SOC 2 report most efficiently demonstrated that we had the appropriate change control procedures and audit logs in place to confirm that our IP truncation logic and limited data retention periods were in effect during the examination period. The 184.108.40.206 resolver examination period of February 1, 2019, through October 31, 2019, was the earliest we could go back to while relying on our SOC 2 report.
Details on the examination
When we launched the 220.127.116.11 resolver, we committed that we would not track what individual users of our 18.104.22.168 resolver are searching for online. The examination validated that our system is configured to achieve what we think is the most important part of this commitment — we never write the querying IP addresses together with the DNS query to disk and therefore have no idea who is making a specific request using the 22.214.171.124 resolver. This means we don’t track which sites any individual visits, and we won’t sell your personal data, ever.
We want to be fully transparent that during the examination we uncovered that our routers randomly capture up to 0.05% of all requests that pass through them, including the querying IP address of resolver users. We do this separately from the 126.96.36.199 service for all traffic passing into our network and we retain such data for a limited period of time for use in connection with network troubleshooting and mitigating denial of service attacks.
To explain — if a specific IP address is flowing through one of our data centers a large number of times, then it is often associated with malicious requests or a botnet. We need to keep that information to mitigate attacks against our network and to prevent our network from being used as an attack vector itself. This limited subsample of data is not linked up with DNS queries handled by the 188.8.131.52 service and does not have any impact on user privacy.
We also want to acknowledge that when we made our privacy promises about how we would handle non-personally identifiable log data for 184.108.40.206 resolver requests, we made what we now see were some confusing statements about how we would handle those anonymous logs.
For example, we learned that our blog post commitment about retention of anonymous log data was not written clearly enough and our previous statements were not as clear because we referred to temporary logs, transactional logs, and permanent logs in ways that could have been better defined. For example, our 220.127.116.11 resolver privacy FAQs stated that we would not retain transactional logs for more than 24 hours but that some anonymous logs would be retained indefinitely. However, our blog post announcing the public resolver didn’t capture that distinction. You can see a clearer statement about our handling of anonymous logs on our privacy commitments page mentioned below.
With this in mind, we updated and clarified our privacy commitments for the 18.104.22.168 resolver as outlined below. The most critical part of these commitments remains unchanged: We don’t want to know what you do on the Internet — it’s none of our business — and we’ve taken the technical steps to ensure we can’t.
Our 22.214.171.124 public DNS resolver commitments
We have refined our commitments to 126.96.36.199 resolver privacy as part of our examination effort. The nature and intent of our commitments remain consistent with our original commitments. These updated commitments are what was included in the examination:
Cloudflare will not sell or share public resolver users’ personal data with third parties or use personal data from the public resolver to target any user with advertisements.
Cloudflare will only retain or use what is being asked, not information that will identify who is asking it. Except for randomly sampled network packets captured from at most 0.05% of all traffic sent to Cloudflare’s network infrastructure, Cloudflare will not retain the source IP from DNS queries to the public resolver in non-volatile storage (more on that below). The randomly sampled packets are solely used for network troubleshooting and DoS mitigation purposes.
A public resolver user’s IP address (referred to as the client or source IP address) will not be stored in non-volatile storage. Cloudflare will anonymize source IP addresses via IP truncation methods (last octet for IPv4 and last 80 bits for IPv6). Cloudflare will delete the truncated IP address within 25 hours.
Cloudflare will retain only the limited transaction and debug log data (“Public Resolver Logs”) for the legitimate operation of our Public Resolver and research purposes, and Cloudflare will delete the Public Resolver Logs within 25 hours.
Cloudflare will not share the Public Resolver Logs with any third parties except for APNIC pursuant to a Research Cooperative Agreement. APNIC will only have limited access to query the anonymized data in the Public Resolver Logs and conduct research related to the operation of the DNS system.
Proving privacy commitments
We created the 188.8.131.52 resolver because we recognized significant privacy problems: ISPs, WiFi networks you connect to, your mobile network provider, and anyone else listening in on the Internet can see every site you visit and every app you use — even if the content is encrypted. Some DNS providers even sell data about your Internet activity or use it to target you with ads. DNS can also be used as a tool of censorship against many of the groups we protect through our Project Galileo.
If you use DNS-over-HTTPS or DNS-over-TLS to our 184.108.40.206 resolver, your DNS lookup request will be sent over a secure channel. This means that if you use the 220.127.116.11 resolver then in addition to our privacy guarantees an eavesdropper can’t see your DNS requests. We promise we won’t be looking at what you’re doing.
We strongly believe that consumers should expect their service providers to be able to show proof that they are actually abiding by their privacy commitments. If we were able to have our 18.104.22.168 resolver privacy commitments examined by an independent accounting firm, we think other organizations can do the same. We encourage other providers to follow suit and help improve privacy and transparency for Internet users globally. And for our part, we will continue to engage well-respected auditing firms to audit our 22.214.171.124 resolver privacy commitments. We also appreciate the work that Mozilla has undertaken to encourage entities that operate recursive resolvers to adopt data handling practices that protect the privacy of user data.
On January 7th, we announced Cloudflare for Teams, a new way to protect organizations and their employees globally, without sacrificing performance. Cloudflare for Teams centers around two core products – Cloudflare Access and Cloudflare Gateway. Cloudflare Access is already available and used by thousands of teams around the world to secure internal applications. Cloudflare Gateway solves the other end of the problem by protecting those teams from security threats without sacrificing performance.
Today, we’re excited to announce new secure DNS filtering capabilities in Cloudflare Gateway. Cloudflare Gateway protects teams from threats like malware, phishing, ransomware, crypto-mining and other security threats. You can start using Cloudflare Gateway at dash.teams.cloudflare.com. Getting started takes less than five minutes.
Why Cloudflare Gateway?
We built Cloudflare Gateway to address key challenges our customers experience with managing and securing global networks. The root cause of these challenges is architecture and inability to scale. Legacy network security models solved problems in the 1990s, but teams have continued to attempt to force the Internet of the 2020s through them.
Historically, branch offices sent all of their Internet-bound traffic to one centralized data center at or near corporate headquarters. Administrators configured that to make sure all requests passed through a secure hardware firewall. The hardware firewall observed each request, performed inline SSL inspection, applied DNS filtering and made sure that the corporate network was safe from security threats. This solution worked when employees accessed business critical applications from the office, and when applications were not on the cloud.
SaaS broke this model when cloud-delivered applications became the new normal for workforce applications. As business critical applications moved to the cloud, the number of Internet bound requests from all the offices went up. Costs went up, too. In the last 10 years, SaaS spending across all company size segments grew by more than 1615%. The legacy model of backhauling all Internet traffic through centralized locations could not keep up with the digital transformation that all businesses are still going through.
The challenge of backhauling traffic for a global workforce
Expensive and slow
SaaS adoption is only one element that is breaking traditional network models. Geographically distributed offices and remote workers are playing a role, too.
Cloudflare Gateway has been in beta use for some of our customers over the last few months. One of those customers had more than 50 branch offices, and sent all of their DNS traffic through one location. The customer’s headquarters is in New York, but they have offices all over the world, including in India. When someone from the office in India visits google.com, DNS requests travel all the way to New York.
As a result, employees in India have a terrible experience using the Internet. The legacy approach to solve this problem is to add MPLS links from branch offices to the headquarters. But MPLS links are expensive, and can take a long time to configure and deploy. Businesses end up spending millions of dollars on legacy solutions, or they remain slow, driving down employee productivity.
Slow to react to security threats
When businesses backhaul traffic to a single location to inspect and filter malicious traffic using a hardware firewall. But, the legacy hardware appliances were not built for the modern Internet. The threat landscape for the Internet is constantly changing.
For example: about 84% of phishing sites exist for less than 24 hours (source) and legacy hardware firewalls are not fast enough to update their static rules to thwart phishing attacks. When security threats on the Internet act like moving targets, legacy hardware appliances that rely on static models to filter malicious traffic cannot keep up. As a result, employees remain vulnerable to new threats even when businesses backhaul Internet bound traffic to a single location.
Starting today, businesses of all sizes can secure all their Internet-bound traffic and make it faster with Cloudflare Gateway. Cloudflare has data centers in more than 200 cities around the world and all of our services run in every single data center. Therefore, when a business uses Cloudflare Gateway, instead of backhauling traffic to a single location (slow), all Internet-bound requests travel to the nearest data center (fast) from the end user where Cloudflare Gateway applies security policies to protect businesses from security threats. All of this is done without the need for expensive MPLS links.
Gateway’s secure DNS filtering capabilities are built on top of 126.96.36.199, the fastest public DNS resolver in the world. We took the pieces that made the 188.8.131.52 public DNS resolver the fastest and built Cloudflare Gateway’s secure DNS filtering capabilities for customers who want to secure their connection to the Internet. Combined with Cloudflare’s global presence of data centers in more than 200 cities and the fastest public DNS resolver in the world, Cloudflare Gateway secures every connection from every device to every destination on the Internet without sacrificing performance.
Why Secure DNS Filtering?
More than 90% of malware use DNS to perform command & control attacks and exfiltrate sensitive data. Here’s an example of how a malware can infect a device or a data center and perform a command & control (also known as C2C or C&C) attack:
Imagine Bob receives an email from someone impersonating his manager with a link to ‘Box’ that looks harmless. The email looks legitimate but in reality it is a phishing email intended to steal valuable information from Bob’s computer or infected with malware.
When Bob clicks on the link, the website phishing ‘Box’ delivers an exploit and installs malware onto Bob’s computer.
The downloaded malware sends a request to the Command & Control server signaling that the malware is ready to receive instructions from the server.
Once the connection between the malware and Command & Control server is established, the server sends instructions to the malware to steal proprietary data, control the state of the machine to reboot it, shut it down or perform DDoS attacks against other websites.
If Bob’s computer was using DNS filtering, it could have prevented the attack in two places.
First, when Bob clicked on the phishing link (2). The browser sends a DNS request to resolve the domain of the phishing link. If that domain was identified by DNS filtering as a phishing domain, it would have blocked it right away.
Second, when malware initiated the connection with the Command & Control server, the malware also needed to make a DNS request to learn about the Command & Control server’s IP address. This is another place where a secure DNS filtering service can detect the domain as malware and block access to it.
Secure DNS filtering acts as the first layer of defence against most security threats and prevents corporate networks and devices from getting infected by malicious software in the first place. According to a security report by Global Cyber Alliance, companies could have prevented losses of more than $200B using DNS filtering.
How does Gateway’s secure DNS filtering work?
The primary difference between the 184.108.40.206 public DNS resolver and Gateway’s secure DNS filtering is that the 220.127.116.11 public DNS resolver does not block any DNS queries. When a browser requests example.com, the 18.104.22.168 public DNS resolver simply looks up the answer for the DNS query either in cache or by performing a full recursive query.
Cloudflare Gateway adds one new step to introduce security into this flow. Instead of allowing all DNS queries, Gateway first checks the name being queried against the intelligence Cloudflare has about threats on the Internet. If that query matches a known threat, or is requesting a blocked category, Gateway stops it before the site could load for the user – and potentially execute code or phish that team member.
For example, if a customer is using Cloudflare Gateway, and sends a DNS query to example.com, first, Gateway checks if the DNS query is coming from a customer. Second, if it is coming from a customer Gateway checks if the DNS query matches with any of the policies setup by the customer. The policy could be a domain that the customer is manually blocking or it could be part of a broader security category that the customer enabled. If the domain matches one of those cases, Cloudflare Gateway will block access to the domain. This will prevent the end user from going to example.com.
Encrypted DNS from day one
Gateway supports DNS over HTTPS today and will also support DNS over TLS in the future. You can use Firefox to start sending DNS queries to Gateway in an encrypted fashion. It will also support other DNS over HTTPS clients as long as you can change the hostname in your preferred DNS over HTTPS client.
Here’s how DNS over HTTPS for Cloudflare Gateway works:
The DNS over HTTPS client encrypts the DNS request and sends it to the closest Cloudflare’s data center. Upon receiving the encrypted DNS request, it will decrypt it and send it to Cloudflare Gateway. Cloudflare Gateway will apply the required security policies and return the response to our edge. Our edge will encrypt the response and send it back to the DNS over HTTPS client.
By encrypting your DNS queries you will make sure that ISPs cannot snoop on your DNS queries and at the same time filter DNS requests that are malicious.
Cloudflare Gateway is for everyone
One of our customers, Algolia, is a fast growing startup. Algolia grew by 1005% in 2019 (source). As the company experienced rapid growth, Cloudflare Gateway helped maintain their corporate security without slowing them down:
“Algolia is growing pretty fast. At Algolia, we needed a way to have visibility across our corporate network without slowing things down for our employees. Cloudflare Gateway gave us a simple way to do that” Adam Surak (Director of Infrastructure & Security Algolia)
But Gateway isn’t just for fast growing startups. Anyone with a Cloudflare account can start using Cloudflare Gateway today. Gateway has a free tier where we wanted to make sure even small businesses, teams and households who cannot afford expensive security solutions can use Cloudflare Gateway to protect themselves from security threats on the Internet. We offer a free plan to our customers because we have a paid tier for this product with additional functionality that are more suited towards super users. Features like longer data retention for analytics, more granular security and content categories, individual DNS query logs, logpush to a cloud storage bucket etc. are features that are only available to our paid customers. You can learn more about Gateway in our product page.
How can you get started?
If you already have a Cloudflare account get started by visiting the Teams dashboard.
The onboarding will walk you through how to configure your router, or device to send DNS queries to Gateway. The onboarding will help you setup a location. A location is usually a physical entity like your office, retail location, data center or home.
Once you finish onboarding, start by configuring a policy. A policy will allow you to block access to malicious websites when anyone is using the Internet from the location that you just created.
You can choose from the categories of policy that we have created. You can also manually add a domain to block it using Gateway.
Once you start sending DNS queries to Gateway, you will see analytics on the team’s dashboard. The analytics dashboard will help you understand if there are any anomalies in your network.
Cloudflare’s mission is to help create a better Internet. We have achieved this by protecting millions of websites around the world and securing millions of devices using WARP. With Cloudflare Access, we helped secure and protect internal applications. Today, with Cloudflare Gateway’s secure DNS filtering capabilities we have extended our mission to also protect the people who use the Internet every day. The product you are seeing today is a glimpse of what we are building for the future. Our team is incredibly proud of what we have built and we are just getting started.
Whenever you visit a website — even if it’s HTTPS enabled — the DNS query that converts the web address into an IP address that computers can read is usually unencrypted. DNS-over-HTTPS, or DoH, encrypts the request so that it can’t be intercepted or hijacked in order to send a user to a malicious site.
But the move is not without controversy. Last year, an internet industry group branded Mozilla an “internet villain” for pressing ahead the security feature. The trade group claimed it would make it harder to spot terrorist materials and child abuse imagery. But even some in the security community are split, amid warnings that it could make incident response and malware detection more difficult.
The move to enable DoH by default will no doubt face resistance, but browser makers have argued it’s not a technology that browser makers have shied away from. Firefox became the first browser to implement DoH — with others, like Chrome, Edge, and Opera — quickly following suit.
Interesting collision of real-world and Internet security:
The ceremony sees several trusted internet engineers (a minimum of three and up to seven) from across the world descend on one of two secure locations — one in El Segundo, California, just south of Los Angeles, and the other in Culpeper, Virginia — both in America, every three months.
Once in place, they run through a lengthy series of steps and checks to cryptographically sign the digital key pairs used to secure the internet’s root zone. (Here’s Cloudflare‘s in-depth explanation, and IANA’s PDF step-by-step guide.)
Only specific named people are allowed to take part in the ceremony, and they have to pass through several layers of security — including doors that can only be opened through fingerprint and retinal scans — before getting in the room where the ceremony takes place.
Staff open up two safes, each roughly one-metre across. One contains a hardware security module that contains the private portion of the KSK. The module is activated, allowing the KSK private key to sign keys, using smart cards assigned to the ceremony participants. These credentials are stored in deposit boxes and tamper-proof bags in the second safe. Each step is checked by everyone else, and the event is livestreamed. Once the ceremony is complete — which takes a few hours — all the pieces are separated, sealed, and put back in the safes inside the secure facility, and everyone leaves.
But during what was apparently a check on the system on Tuesday night — the day before the ceremony planned for 1300 PST (2100 UTC) Wednesday — IANA staff discovered that they couldn’t open one of the two safes. One of the locking mechanisms wouldn’t retract and so the safe stayed stubbornly shut.
As soon as they discovered the problem, everyone involved, including those who had flown in for the occasion, were told that the ceremony was being postponed. Thanks to the complexity of the problem — a jammed safe with critical and sensitive equipment inside — they were told it wasn’t going to be possible to hold the ceremony on the back-up date of Thursday, either.
The Domain Name System (DNS) is the address book of the Internet. When you visit cloudflare.com or any other site, your browser will ask a DNS resolver for the IP address where the website can be found. Unfortunately, these DNS queries and answers are typically unprotected. Encrypting DNS would improve user privacy and security. In this post, we will look at two mechanisms for encrypting DNS, known as DNS over TLS (DoT) and DNS over HTTPS (DoH), and explain how they work.
Applications that want to resolve a domain name to an IP address typically use DNS. This is usually not done explicitly by the programmer who wrote the application. Instead, the programmer writes something such as fetch("https://example.com/news") and expects a software library to handle the translation of “example.com” to an IP address.
Behind the scenes, the software library is responsible for discovering and connecting to the external recursive DNS resolver and speaking the DNS protocol (see the figure below) in order to resolve the name requested by the application. The choice of the external DNS resolver and whether any privacy and security is provided at all is outside the control of the application. It depends on the software library in use, and the policies provided by the operating system of the device that runs the software.
The external DNS resolver
The operating system usually learns the resolver address from the local network using Dynamic Host Configuration Protocol (DHCP). In home and mobile networks, it typically ends up using the resolver from the Internet Service Provider (ISP). In corporate networks, the selected resolver is typically controlled by the network administrator. If desired, users with control over their devices can override the resolver with a specific address, such as the address of a public resolver like Google’s 22.214.171.124 or Cloudflare’s 126.96.36.199, but most users will likely not bother changing it when connecting to a public Wi-Fi hotspot at a coffee shop or airport.
The choice of external resolver has a direct impact on the end-user experience. Most users do not change their resolver settings and will likely end up using the DNS resolver from their network provider. The most obvious observable property is the speed and accuracy of name resolution. Features that improve privacy or security might not be immediately visible, but will help to prevent others from profiling or interfering with your browsing activity. This is especially important on public Wi-Fi networks where anyone in physical proximity can capture and decrypt wireless network traffic.
Ever since DNS was created in 1987, it has been largely unencrypted. Everyone between your device and the resolver is able to snoop on or even modify your DNS queries and responses. This includes anyone in your local Wi-Fi network, your Internet Service Provider (ISP), and transit providers. This may affect your privacy by revealing the domain names that are you are visiting.
What can they see? Well, consider this network packet capture taken from a laptop connected to a home network:
The following observations can be made:
The UDP source port is 53 which is the standard port number for unencrypted DNS. The UDP payload is therefore likely to be a DNS answer.
That suggests that the source IP address 192.168.2.254 is a DNS resolver while the destination IP 192.168.2.14 is the DNS client.
The UDP payload could indeed be parsed as a DNS answer, and reveals that the user was trying to visit twitter.com.
If there are any future connections to 188.8.131.52 or 184.108.40.206, then it is most likely traffic that is directed at “twitter.com”.
If there is some further encrypted HTTPS traffic to this IP, succeeded by more DNS queries, it could indicate that a web browser loaded additional resources from that page. That could potentially reveal the pages that a user was looking at while visiting twitter.com.
Since the DNS messages are unprotected, other attacks are possible:
Queries could be directed to a resolver that performs DNS hijacking. For example, in the UK, Virgin Media and BT return a fake response for domains that do not exist, redirecting users to a search page. This redirection is possible because the computer/phone blindly trusts the DNS resolver that was advertised using DHCP by the ISP-provided gateway router.
Firewalls can easily intercept, block or modify any unencrypted DNS traffic based on the port number alone. It is worth noting that plaintext inspection is not a silver bullet for achieving visibility goals, because the DNS resolver can be bypassed.
Encrypting DNS makes it much harder for snoopers to look into your DNS messages, or to corrupt them in transit. Just as the web moved from unencrypted HTTP to encrypted HTTPS there are now upgrades to the DNS protocol that encrypt DNS itself. Encrypting the web has made it possible for private and secure communications and commerce to flourish. Encrypting DNS will further enhance user privacy.
Two standardized mechanisms exist to secure the DNS transport between you and the resolver, DNS over TLS (2016) and DNS Queries over HTTPS (2018). Both are based on Transport Layer Security (TLS) which is also used to secure communication between you and a website using HTTPS. In TLS, the server (be it a web server or DNS resolver) authenticates itself to the client (your device) using a certificate. This ensures that no other party can impersonate the server (the resolver).
With DNS over TLS (DoT), the original DNS message is directly embedded into the secure TLS channel. From the outside, one can neither learn the name that was being queried nor modify it. The intended client application will be able to decrypt TLS, it looks like this:
In the packet trace for unencrypted DNS, it was clear that a DNS request can be sent directly by the client, followed by a DNS answer from the resolver. In the encrypted DoT case however, some TLS handshake messages are exchanged prior to sending encrypted DNS messages:
The client sends a Client Hello, advertising its supported TLS capabilities.
The server responds with a Server Hello, agreeing on TLS parameters that will be used to secure the connection. The Certificate message contains the identity of the server while the Certificate Verify message will contain a digital signature which can be verified by the client using the server Certificate. The client typically checks this certificate against its local list of trusted Certificate Authorities, but the DoT specification mentions alternative trust mechanisms such as public key pinning.
Once the TLS handshake is Finished by both the client and server, they can finally start exchanging encrypted messages.
While the above picture contains one DNS query and answer, in practice the secure TLS connection will remain open and will be reused for future DNS queries.
Securing unencrypted protocols by slapping TLS on top of a new port has been done before:
A problem with introducing a new port is that existing firewalls may block it. Either because they employ a whitelist approach where new services have to be explicitly enabled, or a blocklist approach where a network administrator explicitly blocks a service. If the secure option (DoT) is less likely to be available than its insecure option, then users and applications might be tempted to try to fall back to unencrypted DNS. This subsequently could allow attackers to force users to an insecure version.
Such fallback attacks are not theoretical. SSL stripping has previously been used to downgrade HTTPS websites to HTTP, allowing attackers to steal passwords or hijack accounts.
Another approach, DNS Queries over HTTPS (DoH), was designed to support two primary use cases:
Prevent the above problem where on-path devices interfere with DNS. This includes the port blocking problem above.
Enable web applications to access DNS through existing browser APIs. DoH is essentially HTTPS, the same encrypted standard the web uses, and reuses the same port number (tcp/443). Web browsers have already deprecated non-secure HTTP in favor of HTTPS. That makes HTTPS a great choice for securely transporting DNS messages. An example of such a DoH request can be found here.
Using HTTPS means that HTTP protocol improvements can also benefit DoH. For example, the in-development HTTP/3 protocol, built on top of QUIC, could offer additional performance improvements in the presence of packet loss due to lack of head-of-line blocking. This means that multiple DNS queries could be sent simultaneously over the secure channel without blocking each other when one packet is lost.
A draft for DNS over QUIC (DNS/QUIC) also exists and is similar to DoT, but without the head-of-line blocking problem due to the use of QUIC. Both HTTP/3 and DNS/QUIC, however, require a UDP port to be accessible. In theory, both could fall back to DoH over HTTP/2 and DoT respectively.
Deployment of DoT and DoH
As both DoT and DoH are relatively new, they are not universally deployed yet. On the server side, major public resolvers including Cloudflare’s 220.127.116.11 and Google DNS support it. Many ISP resolvers however still lack support for it. A small list of public resolvers supporting DoH can be found at DNS server sources, another list of public resolvers supporting DoT and DoH can be found on DNS Privacy Public Resolvers.
There are two methods to enable DoT or DoH on end-user devices:
Add support to applications, bypassing the resolver service from the operating system.
Add support to the operating system, transparently providing support to applications.
There are generally three configuration modes for DoT or DoH on the client side:
Off: DNS will not be encrypted.
Opportunistic mode: try to use a secure transport for DNS, but fallback to unencrypted DNS if the former is unavailable. This mode is vulnerable to downgrade attacks where an attacker can force a device to use unencrypted DNS. It aims to offer privacy when there are no on-path active attackers.
Strict mode: try to use DNS over a secure transport. If unavailable, fail hard and show an error to the user.
The current state for system-wide configuration of DNS over a secure transport:
Android 9: supports DoT through its “Private DNS” feature. Modes:
Opportunistic mode (“Automatic”) is used by default. The resolver from network settings (typically DHCP) will be used.
Strict mode can be configured by setting an explicit hostname. No IP address is allowed, the hostname is resolved using the default resolver and is also used for validating the certificate. (Relevant source code)
iOS and Android users can also install the 18.104.22.168 app to enable either DoH or DoT support in strict mode. Internally it uses the VPN programming interfaces to enable interception of unencrypted DNS traffic before it is forwarded over a secure channel.
Linux with systemd-resolved from systemd 239: DoT through the DNSOverTLS option.
Off is the default.
Opportunistic mode can be configured, but no certificate validation is performed.
Strict mode is available since systemd 243. Any certificate signed by a trusted certificate authority is accepted. However, there is no hostname validation with the GnuTLS backend while the OpenSSL backend expects an IP address.
In any case, no Server Name Indication (SNI) is sent. The certificate name is not validated, making a man-in-the-middle rather trivial.
Linux, macOS, and Windows can use a DoH client in strict mode. The cloudflared proxy-dns command uses the Cloudflare DNS resolver by default, but users can override it through the proxy-dns-upstream option.
Web browsers support DoH instead of DoT:
Firefox 62 supports DoH and provides several Trusted Recursive Resolver (TRR) settings. By default DoH is disabled, but Mozilla is running an experiment to enable DoH for some users in the USA. This experiment currently uses Cloudflare’s 22.214.171.124 resolver, since we are the only provider that currently satisfies the strict resolver policy required by Mozilla. Since many DNS resolvers still do not support an encrypted DNS transport, Mozilla’s approach will ensure that more users are protected using DoH.
When enabled through the experiment, or through the “Enable DNS over HTTPS” option at Network Settings, Firefox will use opportunistic mode (network.trr.mode=2 at about:config).
Strict mode can be enabled with network.trr.mode=3, but requires an explicit resolver IP to be specified (for example, network.trr.bootstrapAddress=126.96.36.199).
While Firefox ignores the default resolver from the system, it can be configured with alternative resolvers. Additionally, enterprise deployments who use a resolver that does not support DoH have the option to disable DoH.
Chrome 78 enables opportunistic DoH if the system resolver address matches one of the hard-coded DoH providers (source code change). This experiment is enabled for all platforms except Linux and iOS, and excludes enterprise deployments by default.
Opera 65 adds an option to enable DoH through Cloudflare’s 188.8.131.52 resolver. This feature is off by default. Once enabled, it appears to use opportunistic mode: if 184.108.40.206:443 (without SNI) is reachable, it will be used. Otherwise it falls back to the default resolver, unencrypted.
The DNS over HTTPS page from the curl project has a comprehensive list of DoH providers and additional implementations.
As an alternative to encrypting the full network path between the device and the external DNS resolver, one can take a middle ground: use unencrypted DNS between devices and the gateway of the local network, but encrypt all DNS traffic between the gateway router and the external DNS resolver. Assuming a secure wired or wireless network, this would protect all devices in the local network against a snooping ISP, or other adversaries on the Internet. As public Wi-Fi hotspots are not considered secure, this approach would not be safe on open Wi-Fi networks. Even if it is password-protected with WPA2-PSK, others will still be able to snoop and modify unencrypted DNS.
Other security considerations
The previous sections described secure DNS transports, DoH and DoT. These will only ensure that your client receives the untampered answer from the DNS resolver. It does not, however, protect the client against the resolver returning the wrong answer (through DNS hijacking or DNS cache poisoning attacks). The “true” answer is determined by the owner of a domain or zone as reported by the authoritative name server. DNSSEC allows clients to verify the integrity of the returned DNS answer and catch any unauthorized tampering along the path between the client and authoritative name server.
However deployment of DNSSEC is hindered by middleboxes that incorrectly forward DNS messages, and even if the information is available, stub resolvers used by applications might not even validate the results. A report from 2016 found that only 26% of users use DNSSEC-validating resolvers.
DoH and DoT protect the transport between the client and the public resolver. The public resolver may have to reach out to additional authoritative name servers in order to resolve a name. Traditionally, the path between any resolver and the authoritative name server uses unencrypted DNS. To protect these DNS messages as well, we did an experiment with Facebook, using DoT between 220.127.116.11 and Facebook’s authoritative name servers. While setting up a secure channel using TLS increases latency, it can be amortized over many queries.
Transport encryption ensures that resolver results and metadata are protected. For example, the EDNS Client Subnet (ECS) information included with DNS queries could reveal the original client address that started the DNS query. Hiding that information along the path improves privacy. It will also prevent broken middle-boxes from breaking DNSSEC due to issues in forwarding DNS.
Operational issues with DNS encryption
DNS encryption may bring challenges to individuals or organizations that rely on monitoring or modifying DNS traffic. Security appliances that rely on passive monitoring watch all incoming and outgoing network traffic on a machine or on the edge of a network. Based on unencrypted DNS queries, they could potentially identify machines which are infected with malware for example. If the DNS query is encrypted, then passive monitoring solutions will not be able to monitor domain names.
Some parties expect DNS resolvers to apply content filtering for purposes such as:
Blocking domains used for malware distribution.
Perform parental control filtering, blocking domains associated with adult content.
Block access to domains serving illegal content according to local regulations.
Offer a split-horizon DNS to provide different answers depending on the source network.
An advantage of blocking access to domains via the DNS resolver is that it can be centrally done, without reimplementing it in every single application. Unfortunately, it is also quite coarse. Suppose that a website hosts content for multiple users at example.com/videos/for-kids/ and example.com/videos/for-adults/. The DNS resolver will only be able to see “example.com” and can either choose to block it or not. In this case, application-specific controls such as browser extensions would be more effective since they can actually look into the URLs and selectively prevent content from being accessible.
DNS monitoring is not comprehensive. Malware could skip DNS and hardcode IP addresses, or use alternative methods to query an IP address. However, not all malware is that complicated, so DNS monitoring can still serve as a defence-in-depth tool.
All of these non-passive monitoring or DNS blocking use cases require support from the DNS resolver. Deployments that rely on opportunistic DoH/DoT upgrades of the current resolver will maintain the same feature set as usually provided over unencrypted DNS. Unfortunately this is vulnerable to downgrades, as mentioned before. To solve this, system administrators can point endpoints to a DoH/DoT resolver in strict mode. Ideally this is done through secure device management solutions (MDM, group policy on Windows, etc.).
One of the cornerstones of the Internet is mapping names to an address using DNS. DNS has traditionally used insecure, unencrypted transports. This has been abused by ISPs in the past for injecting advertisements, but also causes a privacy leak. Nosey visitors in the coffee shop can use unencrypted DNS to follow your activity. All of these issues can be solved by using DNS over TLS (DoT) or DNS over HTTPS (DoH). These techniques to protect the user are relatively new and are seeing increasing adoption.
From a technical perspective, DoH is very similar to HTTPS and follows the general industry trend to deprecate non-secure options. DoT is a simpler transport mode than DoH as the HTTP layer is removed, but that also makes it easier to be blocked, either deliberately or by accident.
Secondary to enabling a secure transport is the choice of a DNS resolver. Some vendors will use the locally configured DNS resolver, but try to opportunistically upgrade the unencrypted transport to a more secure transport (either DoT or DoH). Unfortunately, the DNS resolver usually defaults to one provided by the ISP which may not support secure transports.
Mozilla has adopted a different approach. Rather than relying on local resolvers that may not even support DoH, they allow the user to explicitly select a resolver. Resolvers recommended by Mozilla have to satisfy high standards to protect user privacy. To ensure that parental control features based on DNS remain functional, and to support the split-horizon use case, Mozilla has added a mechanism that allows private resolvers to disable DoH.
The DoT and DoH transport protocols are ready for us to move to a more secure Internet. As can be seen in previous packet traces, these protocols are similar to existing mechanisms to secure application traffic. Once this security and privacy hole is closed, there will be manymore to tackle.
Today, after a longer than expected wait, we’re opening WARP and WARP Plus to the general public. If you haven’t heard about it yet, WARP is a mobile app designed for everyone which uses our global network to secure all of your phone’s Internet traffic.
We announced WARP on April 1 of this year and expected to roll it out over the next few months at a fairly steady clip and get it released to everyone who wanted to use it by July. That didn’t happen. It turned out that building a next generation service to secure consumer mobile connections without slowing them down or burning battery was… harder than we originally thought.
Before today, there were approximately two million people on the waitlist to try WARP. That demand blew us away. It also embarrassed us. The common refrain is consumers don’t care about their security and privacy, but the attention WARP got proved to us how wrong that assumption actually is.
This post is an explanation of why releasing WARP took so long, what we’ve learned along the way, and an apology for those who have been eagerly waiting. It also talks briefly about the rationale for why we built WARP as well as the privacy principles we’ve committed to. However, if you want a deeper dive on those last two topics, I encourage you to read our original launch announcement.
And, if you just want to jump in and try it, you can download and start using WARP on your iOS or Android devices for free through the following links:
If you’ve already installed the 18.104.22.168 App on your device, you may need to update to the latest version in order to get the option to enable Warp.
Let me start with the apology. We are sorry making WARP available took far longer than we ever intended. As a way of hopefully making amends, for everyone who was on the waitlist before today, we’re giving 10 GB of WARP Plus — the even faster version of WARP that uses Cloudflare’s Argo network — to those of you who have been patiently waiting.
For people just signing up today, the basic WARP service is free without bandwidth caps or limitations. The unlimited version of WARP Plus is available for a monthly subscription fee. WARP Plus is the even faster version of WARP that you can optionally pay for. The fee for WARP Plus varies by region and is designed to approximate what a McDonald’s Big Mac would cost in the region. On iOS, the WARP Plus pricing as of the publication of this post is still being adjusted on a regional basis, but that should settle out in the next couple days.
WARP Plus uses Cloudflare’s virtual private backbone, known as Argo, to achieve higher speeds and ensure your connection is encrypted across the long haul of the Internet. We charge for it because it costs us more to provide. However, in order to help spread the word about WARP, you can earn 1GB of WARP Plus for every friend you refer to sign up for WARP. And everyone you refer gets 1GB of WARP Plus for free to get started as well.
Okay, Thanks, That’s Nice, But What Took You So Long?
So what took us so long?
WARP is an ambitious project. We set out to secure Internet connections from mobile devices to the edge of Cloudflare’s network. In doing so, however, we didn’t want to slow devices down or burn excess battery. We wanted it to just work. We also wanted to bet on the technology of the future, not the technology of the past. Specifically, we wanted to build not around legacy protocols like IPsec, but instead around the hyper-efficient WireGuard protocol.
At some level, we thought it would be easy. We already had the 22.214.171.124 App that was securing DNS requests running on millions of mobile devices. That worked great. How much harder could securing all the rest of the requests on a device be? Right??
It turns out, a lot. Zack Bloom has written up a great technical post describing many of the challenges we faced and the solutions we had to invent to deal with them. If you’re interested, I encourage you to check it out.
Apple threw us a curveball by releasing iOS 12.2 just days before the April 1 planned roll out. The new version of iOS significantly changed the underlying network stack implementation in a way that made some of what we were doing to implement WARP unstable. Ultimately we had to find work-arounds in our networking code, costing us valuable time.
We had a version of the WARP app that (kind of) worked on April 1. But, when we started to invite people from outside of Cloudflare to use it, we quickly realized that the mobile Internet around the world was far more wild and varied than we’d anticipated. The Internet is made up of diverse network components which do not always play nicely, we knew that. What we didn’t expect was how much more pain is introduced by the diversity of mobile carriers, mobile operating systems, and mobile device models.
And, while phones in our testbed were relatively stationary, phones in the real world move around — a lot. When they do, their network settings can change wildly. While that doesn’t matter much for stateless, simple DNS queries, for the rest of Internet traffic that makes things complex. Keeping WireGuard fast requires long-lived sessions between your phone and a server in our network, maintaining that for hours and days was very complex. Even beyond that, we use a technology called Anycast to route your traffic to our network. Anycast meant your traffic could move not just between machines, but between entire data centers. That made things very complex.
But there is a huge difference between hard and impossible. From long before the announcement, the team has been hard at work and I’m deeply proud of what they’ve accomplished. We changed our roll out plan to focus on iOS and solidify the shared underpinnings of the app to ensure it would work even with future network stack upgrades. We invited beta users not in the order of when they signed up, but instead based on networks where we didn’t yet have information to help us discover as many corner cases as possible. And we invented new technologies to keep session state even when the wild west of mobile networks and Anycast routing collide.
I’ve been running WARP on my phone since April 1. The first few months were… rough. Really rough. But, today, WARP has blended into the background of my mobile. And I sleep better knowing that my Internet connections from my phone are secure. Using my phone is as fast, and in some cases faster, than without WARP. In other words, WARP today does what we set out to accomplish: securing your mobile Internet connection and otherwise getting out of the way.
There Will Be Bugs
While WARP is a lot better than it was when we first announced it, we know there are still bugs. The most common bug we’re seeing these days is when WARP is significantly slower than using the mobile Internet without WARP. This is usually due to traffic being misrouted. For instance, we discovered a network in Turkey earlier this week that was being routed to London rather than our local Turkish facility. Once we’re aware of these routing issues we can typically fix them quickly.
Other common bugs involved captive portals — the pages where you have to enter information, for instance, when connecting to a hotel WiFI. We’ve fixed a lot of them but we haven’t had WARP users connecting to every hotel WiFi yet, so there will inevitably still be some that are broken.
We’ve made it easy to report issues that you discover. From the 126.96.36.199 App you can click on the little bug icon near the top of the screen, or just shake your phone with the app open, and quickly send us a report. We expect, over the weeks ahead, we’ll be squashing many of the bugs that you report.
Even Faster With Plus
WARP is not just a product, it’s a testbed for all of the Internet-improving technology we have spent years developing. One dream was to use our Argo routing technology to allow all of your Internet traffic to use faster, less-congested, routes through the Internet. When used by Cloudflare customers for the past several years Argo has improved the speed of their websites by an average of over 30%. Through some hard work of the team we are making that technology available to you as WARP Plus.
The WARP Plus technology is not without cost for us. Routing your traffic over our network often costs us more than if we release it directly to the Internet. To cover those costs we charge a monthly fee — $4.99/month or less — for WARP Plus. The fee depends on the region that you’re in and is intended to approximate what a Big Mac would cost in the same region.
Basic WARP is free. Our first priority is not to make money off of WARP however, we want to grow it to secure every single phone. To help make that happen, we wanted to give you an incentive to share WARP with your friends. You can earn 1GB of free WARP Plus for every person you share WARP with. And everyone you refer also gets 1GB of WARP Plus for free as well. There is no limit on how much WARP Plus data you can earn by sharing.
The free consumer security space has traditionally not been the most reputable. Many other companies that have promised to keep consumers’ data safe but instead built businesses around selling it or using it help target you with advertising. We think that’s disgusting. That is not Cloudflare’s business model and it never will be. WARP continues all the strong privacy protections that 188.8.131.52 launched with including:
We don’t write user-identifiable log data to disk;
We will never sell your browsing data or use it in any way to target you with advertising data;
Don’t need to provide any personal information — not your name, phone number, or email address — in order to use WARP or WARP Plus; and
We will regularly work with outside auditors to ensure we’re living up to these promises.
What WARP Is Not
From a technical perspective, WARP is a VPN. But it is designed for a very different audience than a traditional VPN. WARP is not designed to allow you to access geo-restricted content when you’re traveling. It will not hide your IP address from the websites you visit. If you’re looking for that kind of high-security protection then a traditional VPN or a service like Tor are likely better choices for you.
WARP, instead, is built for the average consumer. It’s built to ensure that your data is secured while it’s in transit. So the networks between you and the applications you’re using can’t spy on you. It will help protect you from people sniffing your data while you’re at a local coffee shop. It will also help ensure that your ISP isn’t hoovering up data on your browsing patterns to sell to advertisers.
WARP isn’t designed for the ultra-techie who wants to specify exactly what server their traffic will be routed through. There’s basically only one button in the WARP interface: ON or OFF. It’s simple on purpose. It’s designed for my mom and dad who ask me every holiday dinner what they can do to be a bit safer online. I’m excited this year to have something easy for them to do: install the 184.108.40.206 App, enable WARP, and rest a bit easier.
How Fast Is It?
Once we got WARP to a stable place, this was my first question. My initial inclination was to go to one of the many Speed Test sites and see the results. And the results were… weird. Sometimes much faster, sometimes much slower. Overall, they didn’t make a lot of sense. The reason why is that these sites are designed to measure the speed of your ISP. WARP is different, so these test sites don’t give particularly accurate readings.
The better test is to visit common sites around the Internet and see how they load, in real conditions, on WARP versus off. We’ve built a tool that does this. Generally, in our tests, WARP is around the same speed as non-WARP connections when you’re on a high performance network. As network conditions get worse, WARP will often improve performance more. But your experience will depend on the particular conditions of your network.
We plan, in the next few weeks, to expose the test tool within the 220.127.116.11 App so you can see how your device loads a set of popular sites without WARP, with WARP, and with WARP Plus. And, again, if you’re seeing particularly poor performance, please report it to us. Our goal is to provide security without slowing you down or burning excess battery. We can already do that for many networks and devices and we won’t rest until we can do it for everyone.
Here’s to a More Secure, Fast Internet
Cloudflare’s mission is to help build a better Internet. We’ve done that by securing and making more performance millions of Internet properties since we launched almost exactly 9 years ago. WARP furthers Cloudflare’s mission by extending our network to help make every consumer’s mobile device a bit more secure. Our team is proud of what we’ve built with WARP — albeit a bit embarrassed it took us so long to get into your hands. We hope you’ll forgive us for the delay, give WARP a try, and let us know what you think.
Controlling outbound communication from your Amazon Virtual Private Cloud (Amazon VPC) to the internet is an important part of your overall preventive security controls. By limiting outbound traffic to certain trusted domains (called “whitelisting”) you help prevent instances from downloading malware, communicating with bot networks, or attacking internet hosts. It’s not practical to prevent all outbound web traffic, though. Often, you want to allow access to certain well-known domains (for example, to communicate with partners, to download software updates, or to communicate with AWS API endpoints). In this post, I’ll show you how to limit outbound web connections from your VPC to the internet, using a web proxy with custom domain whitelists or DNS content filtering services. The solution is scalable, highly available, and deploys in a fully automated way.
Solution benefits and deliverables
This solution is based on the open source HTTP proxy Squid. The proxy can be used for all workloads running in the VPC, like Amazon Elastic Compute Cloud (EC2) and AWS Fargate. The solution provides you with the following benefits:
An outbound proxy that permit connections to whitelisted domains that you define, while presenting customizable error messages when connections are attempted to unapproved domains.
Optional domain content filtering based on DNS, delivered by external services like OpenDNS, Quad9, CleanBrowsing, Yandex.DNS or others. For this option, you do need to be a customer of these external services.
Transparent encryption handling, due to the extraction of the domain information from the Server Name Indication (SNI) extension in TLS. Encryption in transit is preserved and end-to-end encryption is maintained.
One Elastic IP address per proxy instance for internet communication. Sometimes the web sites that you’re communicating want to know your IP address so they can accept traffic from you. Giving the proxies’ elastic IP addresses allows you to know what IP addresses your web connections will come from.
CloudWatch Logs stores the Squid access log so that you can search and analyze it.
The list of allowed (whitelisted) domains is stored in AWS Secrets Manager. The Amazon EC2 instance retrieves the domain list every 5 minutes via cronjob and updates the proxy configuration if the list has changed. The values in Secrets Manager are provisioned by CloudFormation and can be read only by the proxy EC2 instances.
The client running on the EC2 instance must have proxy settings pointing toward the Network Load Balancer. The load balancer will forward the request to the fleet of proxies in the target group.
You need an already deployed VPC, with public and private subnets spreading over several Availability Zones (AZs). You can find a description of how to set up your VPC environment at Default VPC Setup.
You must have an internet gateway, with routing set up so that only traffic from a public subnet can reach the internet.
You don’t need to have a NAT (network translation address) gateway deployed since this function will be provided by the outbound proxy.
Integration with content filtering DNS services
If you require content filtering from an external company, like OpenDNS or Yandex.DNS, you must register and become a customer of that service. Many have free services, in addition to paid plans if you need advanced statistics and custom categories. This is your responsibility as the customer. (Learn more about the shared responsibility between AWS and the customer.)
Your DNS service provider will assign you a list of DNS IP addresses. You’ll need to enter the IP addresses when you provision (see Installation below).
If the DNS provider requires it, you may give them the source IPs of the proxies. There are four reserved IPs that you can find in the stack output (see Output parameters below).
Installation (one-time setup)
Select the Launch Stack button to launch the CloudFormation template:
Note: You must sign in your AWS Account in order to launch the stack in the required region. The stack content can also be downloaded here.
Provide the following proxy parameters, as shown in Figure 2:
Allowed domains: Enter your whitelisted domains. Use a leading dot (“.”) to indicate subdomains.
Custom DNS servers (optional): List any DNS servers that will be used by the proxy. Leave the default value to use the default Amazon DNS server.
Proxy Port: Enter the listener port of the proxy.
Instance Type: Enter the EC2 instance type that you want to use for the proxies. Instance type will affect vertical scaling capabilities and solution cost. For more information, see Amazon EC2 Instance Types.
AMI ID to be used: This field is prepopulated with the Amazon Machine Image (AMI) ID found in AWS Systems Manager Parameter Store. By default, it will point toward the latest Amazon Linux 2 image. You do not need to adjust this value.
SSH Key name (optional): Enter the name of the SSH key for your proxy EC2 instances. This is relevant only for debugging, or if you need to log in on the proxy servers. Consider using AWS Systems Manager Session Manager instead of SSH.
Next, provide the following network parameters, as shown in Figure 2:
VPC ID: The VPC where the solution will be deployed.
Public subnets: The subnets where the proxies will be deployed. Select between 2 and 3 subnets.
Private subnets: The subnets where the Network Load Balancer will be deployed. Select between 2 and 3 subnets.
Allowed client CIDR: The value you enter here will be added to the proxy security group. By default, the private IP range 172.31.0.0/16 is allowed. The allowed block size is between a /32 netmask and an /8 netmask. This prevents you from using an open IP range like 0.0.0.0/0. If you were to set an open IP range, your proxies would accept traffic from anywhere on the internet, which is a bad practice.
Figure 2: Launching the CloudFormation template
When you’ve entered all your proxy and network parameters, select Next. On the following wizard screens, you can keep the default values and select Next and Create Stack.
Find the output parameters
After the stack status has changed to “deployed,” you’ll need to note down the output parameters to configure your clients. Look for the following parameters in the Outputs tab of the stack:
The domain name of the proxy that should be configured on the client
The port of the proxy that should be configured on the client
4 Elastic IP addresses for the proxy’s instances. These are used for outbound connections to Internet.
The CloudWatch Log Group, for access logs.
The Security Group that is attached to the proxies.
The Linux command to set the proxy. You can copy and paste this to your shell.
Figure 3: Stack output parameters
Use the proxy
Proxy setting parameters are specific to every application. Most Linux application use the environment variables http_proxy and https_proxy.
Log in on the Linux EC2 instance that’s allowed to use the proxy.
To set the shell parameter temporarily (only for the current shell session), execute the following export commands:
You can add the proxy parameter permanently to interactive and non-interactive shells. If you do this, you won’t need to set them again after reloading. Execute the following commands in your application shell:
Replace <Proxy-DOMAIN> with the domain of the load balancer.
Replace <Proxy-Port> with the port of your proxy.
Customize the access denied page
An error page will display when a user’s access is blocked or if there’s an internal error. You can adjust the look and feel of this page (HTML or styles) according to the Squid error directory tag.
Use the proxy access log
The proxy access log is an important tool for troubleshooting. It contains the client IP address, the destination domain, the port, and errors with timestamps. The access logs from Squid are uploaded to CloudWatch. You can find them from the CloudWatch console under Log Groups, with the prefix Proxy, as shown in the figure below.
Figure 4: CloudWatch log with access group
You can use CloudWatch Insight to analyze and visualize your queries. See the following figure for an example of denied connections visualized on a timeline:
Figure 5: Access logs analysis with CloudWatch Insight
Monitor your metrics with CloudWatch
The main proxy metrics are upload every five minutes to CloudWatch Metrics in the proxy namespace:
client_http.errors /sec – errors in processing client requests per second
client_http.hits /sec – cache hits per second
client_http.kbytes_in /sec – client uploaded data per second
client_http.kbytes_out /sec – client downloaded data per second
client_http.requests /sec – number of requests per second
server.all.errors /sec – proxy server errors per second
server.all.kbytes_in /sec – proxy server uploaded data per second
server.all.kbytes_out /sec – proxy downloaded data per second
server.all.requests /sec – all requests sent by proxy server per second
In the figure below, you can see an example of metrics. For more information on metric use, see the Squid project information.
Figure 6: Example of CloudWatch metrics
Manage the proxy configuration
From time to time, you may want to add or remove domains from the whitelist. To change your whitelisted domains, you must update the input values in the CloudFormation stack. This will cause the values stored in Secrets Manager to update as well. Every five minutes, the proxies will pull the list from Secrets Manager and update as needed. This means it can take up to five minutes for your change to propagate. The change will be propagated to all instances without terminating or deploying them.
Note that when the whitelist is updated, the Squid proxy processes are restarted, which will interrupt ALL connections passing through them at that time. This can be disruptive, so be careful about when you choose to adjust the whitelist.
If you want to change other CloudFormation parameters, like DNS or Security Group settings, you can again update the CloudFormation stack with new values. The CloudFormation stack will launch a new instance and terminate legacy instances (a rolling update).
You can change the proxy Squid configuration by editing the CloudFormation template (section AWS::CloudFormation::Init) and updating the stack. However, you should not do this unless you have advanced AWS and Squid experience.
Update the instances
To update your AMI, you can update the stack. If the AMI has been updated with a newer version, then a rolling update will redeploy the EC2 instances and Squid software. This automates the process of patching managed instances with both security-related and other updates. If the AMI has not changed, no update will be performed.
Alternately, you can terminate the instance, and the auto scaling group will launch a new instance with the latest updates for Squid and the OS, starting from scratch. This approach may lead to a short service interruption for the clients served on this instance, during the time in which the load balancer is switching to an active instance.
I’ve summarized a few common problems and solutions below.
I receive timeout at client application.
Check that you’ve configured the client application to use the proxy. (See Using a proxy, above.)
Check that the Security Group allows access from the client instance.
Verify that your NACL and routing table allow communication to and from the Network Load Balancer.
I receive an error page that access was blocked by the administrator.
Check the stack input parameter for allowed domains. The domains must be comma separated. Included subdomains must start with dot. For example:
To include www.amazon.com, specify www.amazon.com
To include all subdomains of amazon.com as part of a list, specify .amazon.com
I received a 500 error page from the proxy.
Make sure that the proxy EC2 instance has internet access. The public subnets must have an Internet Gateway connected and set as the default route.
Check the DNS input parameter in the CloudFormation stack, if you use an external DNS service. Make sure the DNS provider has the correct proxy IPs (if you were required to provide them
The webpage doesn’t look as expected. There are fragments or styles missing.
Many pages download content from multiple domains. You need to whitelist all of these domains. Use the access logs in CloudWatch Log to determine which domains are blocked, then update the stack.
On the proxy error page, I receive “unknown certificate issuer.”
During the setup, a self-signed certificate for the squid error page is generated. If you need to add your own certificate, you can adapt the CloudFormation template. This requires moderate knowledge of Unix/Linux and AWS CloudFormation.
In this blog post, I showed you how you can configure an outbound proxy for controlling the internet communication from a VPC. If you need Squid support, you can find various offerings on the Squid Support page. AWS forums provides support for Amazon Elastic Compute Cloud (EC2). When you need AWS experts to help you plan, build, or optimise your infrastructure, consider engaging AWS Professional Services.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
Trust on the Internet is underpinned by the Public Key Infrastructure (PKI). PKI grants servers the ability to securely serve websites by issuing digital certificates, providing the foundation for encrypted and authentic communication.
Certificates make HTTPS encryption possible by using the public key in the certificate to verify server identity. HTTPS is especially important for websites that transmit sensitive data, such as banking credentials or private messages. Thankfully, modern browsers, such as Google Chrome, flag websites not secured using HTTPS by marking them “Not secure,” allowing users to be more security conscious of the websites they visit.
Now that we know what certificates are used for, let’s talk about where they come from.
Certificate Authorities (CAs) are the institutions responsible for issuing certificates.
When issuing a certificate for any given domain, they use Domain Control Validation (DCV) to verify that the entity requesting a certificate for the domain is the legitimate owner of the domain. With DCV the domain owner:
creates a DNS resource record for a domain;
uploads a document to the web server located at that domain; OR
proves ownership of the domain’s administrative email account.
The DCV process prevents adversaries from obtaining private-key and certificate pairs for domains not owned by the requestor.
Preventing adversaries from acquiring this pair is critical: if an incorrectly issued certificate and private-key pair wind up in an adversary’s hands, they could pose as the victim’s domain and serve sensitive HTTPS traffic. This violates our existing trust of the Internet, and compromises private data on a potentially massive scale.
For example, an adversary that tricks a CA into mis-issuing a certificate for gmail.com could then perform TLS handshakes while pretending to be Google, and exfiltrate cookies and login information to gain access to the victim’s Gmail account. The risks of certificate mis-issuance are clearly severe.
Domain Control Validation
To prevent attacks like this, CAs only issue a certificate after performing DCV. One way of validating domain ownership is through HTTP validation, done by uploading a text file to a specific HTTP endpoint on the webserver they want to secure. Another DCV method is done using email verification, where an email with a validation code link is sent to the administrative contact for the domain.
Suppose Alice buys the domain name aliceswonderland.com and wants to get a dedicated certificate for this domain. Alice chooses to use Let’s Encrypt as their certificate authority. First, Alice must generate their own private key and create a certificate signing request (CSR). She sends the CSR to Let’s Encrypt, but the CA won’t issue a certificate for that CSR and private key until they know Alice owns aliceswonderland.com. Alice can then choose to prove that she owns this domain through HTTP validation.
When Let’s Encrypt performs DCV over HTTP, they require Alice to place a randomly named file in the /.well-known/acme-challenge path for her website. The CA must retrieve the text file by sending an HTTP GET request to http://aliceswonderland.com/.well-known/acme-challenge/<random_filename>. An expected value must be present on this endpoint for DCV to succeed.
For HTTP validation, Alice would upload a file to http://aliceswonderland.com/.well-known/acme-challenge/YnV0dHNz
where the body contains:
HTTP/1.1 200 OK
The CA instructs them to use the Base64 token YnV0dHNz. TEST_CLIENT_KEY in an account-linked key that only the certificate requestor and the CA know. The CA uses this field combination to verify that the certificate requestor actually owns the domain. Afterwards, Alice can get her certificate for her website!
Another way users can validate domain ownership is to add a DNS TXT record containing a verification string or token from the CA to their domain’s resource records. For example, here’s a domain for an enterprise validating itself towards Google:
$ dig TXT aliceswonderland.com
aliceswonderland.com. 28 IN TXT "google-site-verification=COanvvo4CIfihirYW6C0jGMUt2zogbE_lC6YBsfvV-U"
Here, Alice chooses to create a TXT DNS resource record with a specific token value. A Google CA can verify the presence of this token to validate that Alice actually owns her website.
Types of BGP Hijacking Attacks
Certificate issuance is required for servers to securely communicate with clients. This is why it’s so important that the process responsible for issuing certificates is also secure. Unfortunately, this is not always the case.
Researchers at Princeton University recently discovered that common DCV methods are vulnerable to attacks executed by network-level adversaries. If Border Gateway Protocol (BGP) is the “postal service” of the Internet responsible for delivering data through the most efficient routes, then Autonomous Systems (AS) are individual post office branches that represent an Internet network run by a single organization. Sometimes network-level adversaries advertise false routes over BGP to steal traffic, especially if that traffic contains something important, like a domain’s certificate.
Bamboozling Certificate Authorities with BGP highlights five types of attacks that can be orchestrated during the DCV process to obtain a certificate for a domain the adversary does not own. After implementing these attacks, the authors were able to (ethically) obtain certificates for domains they did not own from the top five CAs: Let’s Encrypt, GoDaddy, Comodo, Symantec, and GlobalSign. But how did they do it?
Attacking the Domain Control Validation Process
There are two main approaches to attacking the DCV process with BGP hijacking:
These attacks create a vulnerability when an adversary sends a certificate signing request for a victim’s domain to a CA. When the CA verifies the network resources using an HTTP GET request (as discussed earlier), the adversary then uses BGP attacks to hijack traffic to the victim’s domain in a way that the CA’s request is rerouted to the adversary and not the domain owner. To understand how these attacks are conducted, we first need to do a little bit of math.
Every device on the Internet uses an IP (Internet Protocol) address as a numerical identifier. IPv4 addresses contain 32 bits and follow a slash notation to indicate the size of the prefix. So, in the network address 18.104.22.168/24, “/24” refers to how many bits the network contains. This means that there are 8 bits left that contain the host addresses, for a total of 256 host addresses. The smaller the prefix number, the more host addresses remain in the network. With this knowledge, let’s jump into the attacks!
Attack one: Sub-Prefix Attack
When BGP announces a route, the router always prefers to follow the more specific route. So if 22.214.171.124/8 and 126.96.36.199/24 are advertised, the router will use the latter as it is the more specific prefix. This becomes a problem when an adversary makes a BGP announcement to a specific IP address while using the victim’s domain IP address. Let’s say the IP address for our victim, leagueofentropy.com, is 188.8.131.52/8. If an adversary announces the prefix 184.108.40.206/24, then they will capture the victim’s traffic, launching a sub-prefix hijack attack.
For example, in an attack during April 2018, routes were announced with the more specific /24 vs. the existing /23. In the diagram below, /23 is Texas and /24 is the more specific Austin, Texas. The new (but nefarious) routes overrode the existing routes for portions of the Internet. The attacker then ran a nefarious DNS server on the normal IP addresses with DNS records pointing at some new nefarious web server instead of the existing server. This attracted the traffic destined for the victim’s domain within the area the nefarious routes were being propagated. The reason this attack was successful was because a more specific prefix is always preferred by the receiving routers.
Attack two: Equally-Specific-Prefix Attack
In the last attack, the adversary was able to hijack traffic by offering a more specific announcement, but what if the victim’s prefix is /24 and a sub-prefix attack is not viable? In this case, an attacker would launch an equally-specific-prefix hijack, where the attacker announces the same prefix as the victim. This means that the AS chooses the preferred route between the victim and the adversary’s announcements based on properties like path length. This attack only ever intercepts a portion of the traffic.
There are more advanced attacks that are covered in more depth in the paper. They are fundamentally similar attacks but are more stealthy.
Once an attacker has successfully obtained a bogus certificate for a domain that they do not own, they can perform a convincing attack where they pose as the victim’s domain and are able to decrypt and intercept the victim’s TLS traffic. The ability to decrypt the TLS traffic allows the adversary to completely Monster-in-the-Middle (MITM) encrypted TLS traffic and reroute Internet traffic destined for the victim’s domain to the adversary. To increase the stealthiness of the attack, the adversary will continue to forward traffic through the victim’s domain to perform the attack in an undetected manner.
Another way an adversary can gain control of a domain is by spoofing DNS traffic by using a source IP address that belongs to a DNS nameserver. Because anyone can modify their packets’ outbound IP addresses, an adversary can fake the IP address of any DNS nameserver involved in resolving the victim’s domain, and impersonate a nameserver when responding to a CA.
This attack is more sophisticated than simply spamming a CA with falsified DNS responses. Because each DNS query has its own randomized query identifiers and source port, a fake DNS response must match the DNS query’s identifiers to be convincing. Because these query identifiers are random, making a spoofed response with the correct identifiers is extremely difficult.
Adversaries can fragment User Datagram Protocol (UDP) DNS packets so that identifying DNS response information (like the random DNS query identifier) is delivered in one packet, while the actual answer section follows in another packet. This way, the adversary spoofs the DNS response to a legitimate DNS query.
Say an adversary wants to get a mis-issued certificate for victim.com by forcing packet fragmentation and spoofing DNS validation. The adversary sends a DNS nameserver for victim.com a DNS packet with a small Maximum Transmission Unit, or maximum byte size. This gets the nameserver to start fragmenting DNS responses. When the CA sends a DNS query to a nameserver for victim.com asking for victim.com’s TXT records, the nameserver will fragment the response into the two packets described above: the first contains the query ID and source port, which the adversary cannot spoof, and the second one contains the answer section, which the adversary can spoof. The adversary can continually send a spoofed answer to the CA throughout the DNS validation process, in the hopes of sliding their spoofed answer in before the CA receives the real answer from the nameserver.
In doing so, the answer section of a DNS response (the important part!) can be falsified, and an adversary can trick a CA into mis-issuing a certificate.
At first glance, one could think a Certificate Transparency log could expose a mis-issued certificate and allow a CA to quickly revoke it. CT logs, however, can take up to 24 hours to include newly issued certificates, and certificate revocation can be inconsistently followed among different browsers. We need a solution that allows CAs to proactively prevent this attacks, not retroactively address them.
We’re excited to announce that Cloudflare provides CAs a free API to leverage our global network to perform DCV from multiple vantage points around the world. This API bolsters the DCV process against BGP hijacking and off-path DNS attacks.
Given that Cloudflare runs 175+ datacenters around the world, we are in a unique position to perform DCV from multiple vantage points. Each datacenter has a unique path to DNS nameservers or HTTP endpoints, which means that successful hijacking of a BGP route can only affect a subset of DCV requests, further hampering BGP hijacks. And since we use RPKI, we actually sign and verify BGP routes.
This DCV checker additionally protects CAs against off-path, DNS spoofing attacks. An additional feature that we built into the service that helps protect against off-path attackers is DNS query source IP randomization. By making the source IP unpredictable to the attacker, it becomes more challenging to spoof the second fragment of the forged DNS response to the DCV validation agent.
By comparing multiple DCV results collected over multiple paths, our DCV API makes it virtually impossible for an adversary to mislead a CA into thinking they own a domain when they actually don’t. CAs can use our tool to ensure that they only issue certificates to rightful domain owners.
Our multipath DCV checker consists of two services:
DCV agents responsible for performing DCV out of a specific datacenter, and
a DCV orchestrator that handles multipath DCV requests from CAs and dispatches them to a subset of DCV agents.
When a CA wants to ensure that DCV occurred without being intercepted, it can send a request to our API specifying the type of DCV to perform and its parameters.
The DCV orchestrator then forwards each request to a random subset of over 20 DCV agents in different datacenters. Each DCV agent performs the DCV request and forwards the result to the DCV orchestrator, which aggregates what each agent observed and returns it to the CA.
This approach can also be generalized to performing multipath queries over DNS records, like Certificate Authority Authorization (CAA) records. CAA records authorize CAs to issue certificates for a domain, so spoofing them to trick unauthorized CAs into issuing certificates is another attack vector that multipath observation prevents.
As we were developing our multipath checker, we were in contact with the Princeton research group that introduced the proof-of-concept (PoC) of certificate mis-issuance through BGP hijacking attacks. Prateek Mittal, coauthor of the Bamboozling Certificate Authorities with BGP paper, wrote:
“Our analysis shows that domain validation from multiple vantage points significantly mitigates the impact of localized BGP attacks. We recommend that all certificate authorities adopt this approach to enhance web security. A particularly attractive feature of Cloudflare’s implementation of this defense is that Cloudflare has access to a vast number of vantage points on the Internet, which significantly enhances the robustness of domain control validation.”
Our DCV checker follows our belief that trust on the Internet must be distributed, and vetted through third-party analysis (like that provided by Cloudflare) to ensure consistency and security. This tool joins our pre-existing Certificate Transparency monitor as a set of services CAs are welcome to use in improving the accountability of certificate issuance.
An Opportunity to Dogfood
Building our multipath DCV checker also allowed us to dogfood multiple Cloudflare products.
The DCV orchestrator as a simple fetcher and aggregator was a fantastic candidate for Cloudflare Workers. We implemented the orchestrator in TypeScript using this post as a guide, and created a typed, reliable orchestrator service that was easy to deploy and iterate on. Hooray that we don’t have to maintain our own dcv-orchestrator server!
We use Argo Tunnel to allow Cloudflare Workers to contact DCV agents. Argo Tunnel allows us to easily and securely expose our DCV agents to the Workers environment. Since Cloudflare has approximately 175 datacenters running DCV agents, we expose many services through Argo Tunnel, and have had the opportunity to load test Argo Tunnel as a power user with a wide variety of origins. Argo Tunnel readily handled this influx of new origins!
Getting Access to the Multipath DCV Checker
If you and/or your organization are interested in trying our DCV checker, email [email protected] and let us know! We’d love to hear more about how multipath querying and validation bolsters the security of your certificate issuance.
As a new class of BGP and IP spoofing attacks threaten to undermine PKI fundamentals, it’s important that website owners advocate for multipath validation when they are issued certificates. We encourage all CAs to use multipath validation, whether it is Cloudflare’s or their own. Jacob Hoffman-Andrews, Tech Lead, Let’s Encrypt, wrote:
“BGP hijacking is one of the big challenges the web PKI still needs to solve, and we think multipath validation can be part of the solution. We’re testing out our own implementation and we encourage other CAs to pursue multipath as well”
Hopefully in the future, website owners will look at multipath validation support when selecting a CA.
The Internet is an extraordinarily complex and evolving ecosystem. Its constituent protocols range from the ancient and archaic (hello FTP) to the modern and sleek (meet WireGuard), with a fair bit of everything in between. This evolution is ongoing, and as one of the most connected networks on the Internet, Cloudflare has a duty to be a good steward of this ecosystem. We take this responsibility to heart: Cloudflare’s mission is to help build a better Internet. In this spirit, we are very proud to announce Crypto Week 2019.
Every day this week we’ll announce a new project or service that uses modern cryptography to build a more secure, trustworthy Internet. Everything we release this week will be free and immediately useful. This blog is a fun exploration of the themes of the week.
Monday: Coming Soon
Tuesday: Coming Soon
Wednesday: Coming Soon
Thursday: Coming Soon
Friday: Coming Soon
The Internet of the Future
Many pieces of the Internet in use today were designed in a different era with different assumptions. The Internet’s success is based on strong foundations that support constant reassessment and improvement. Sometimes these improvements require deploying new protocols.
Performing an upgrade on a system as large and decentralized as the Internet can’t be done by decree;
There are too many economic, cultural, political, and technological factors at play.
Changes must be compatible with existing systems and protocols to even be considered for adoption.
To gain traction, new protocols must provide tangible improvements for users. Nobody wants to install an update that doesn’t improve their experience!
The last time the Internet had a complete reboot and upgrade was during TCP/IP flag dayin 1983. Back then, the Internet (called ARPANET) had fewer than ten thousand hosts! To have an Internet-wide flag day today to switch over to a core new protocol is inconceivable; the scale and diversity of the components involved is way too massive. Too much would break. It’s challenging enough to deprecate outmoded functionality. In some ways, the open Internet is a victim of its own success. The bigger a system grows and the longer it stays the same, the harder it is to change. The Internet is like a massive barge: it takes forever to steer in a different direction and it’s carrying a lot of garbage.
As you would expect, many of the warts of the early Internet still remain. Both academic security researchers and real-life adversaries are still finding and exploiting vulnerabilities in the system. Many vulnerabilities are due to the fact that most of the protocols in use on the Internet have a weak notion of trust inherited from the early days. With 50 hosts online, it’s relatively easy to trust everyone, but in a world-scale system, that trust breaks down in fascinating ways. The primary tool to scale trust is cryptography, which helps provide some measure of accountability, though it has its own complexities.
In an ideal world, the Internet would provide a trustworthy substrate for human communication and commerce. Some people naïvely assume that this is the natural direction the evolution of the Internet will follow. However, constant improvement is not a given. It’s possible that the Internet of the future will actually be worse than the Internet today: less open, less secure, less private, less trustworthy. There are strong incentives to weaken the Internet on a fundamental level by Governments, by businesses such as ISPs, and even by the financial institutions entrusted with our personal data.
In a system with as many stakeholders as the Internet, real change requires principled commitment from all invested parties. At Cloudflare, we believe everyone is entitled to an Internet built on a solid foundation of trust. Crypto Week is our way of helping nudge the Internet’s evolution in a more trust-oriented direction. Each announcement this week helps bring the Internet of the future to the present in a tangible way.
Ongoing Internet Upgrades
Before we explore the Internet of the future, let’s explore some of the previous and ongoing attempts to upgrade the Internet’s fundamental protocols.
As we highlighted in last year’s Crypto Weekone of the weak links on the Internet is routing. Not all networks are directly connected.
To send data from one place to another, you might have to rely on intermediary networks to pass your data along. A packet sent from one host to another may have to be passed through up to a dozen of these intermediary networks.No single network knows the full path the data will have to take to get to its destination, it only knows which network to pass it to next.The protocol that determines how packets are routed is called the Border Gateway Protocol (BGP.) Generally speaking, networks use BGP to announce to each other which addresses they know how to route packets for and (dependent on a set of complex rules) these networks share what they learn with their neighbors.
Unfortunately, BGP is completely insecure:
Any network can announce any set of addresses to any other network, even addresses they don’t control. This leads to a phenomenon called BGP hijacking, where networks are tricked into sending data to the wrong network.
A BGP hijack ismost often caused by accidental misconfiguration, but can also be the result of malice on the network operator’s part.
During a BGP hijack, a network inappropriately announces a set of addresses to other networks, which results in packets destined for the announced addresses to be routed through the illegitimate network.
Understanding the risk
If the packets represent unencrypted data, this can be a big problemas it allows the hijacker to read or even change the data:
The Resource Public Key Infrastructure (RPKI) system helps bring some trust to BGP by enabling networks to utilize cryptography to digitally sign network routes with certificates, making BGP hijacking much more difficult.
This enables participants of the network to gain assurances about the authenticity of route advertisements. Certificate Transparency (CT) is a tool that enables additional trust for certificate-based systems. Cloudflare operates the Cirrus CT log to support RPKI.
Since we announced our support of RPKI last year, routing security has made big strides. More routes are signed, more networks validate RPKI, and the software ecosystem has matured, but this work is not complete. Most networks are still vulnerable to BGP hijacking. For example, Pakistan knocked YouTube offline with a BGP hijack back in 2008, and could likely do the same today. Adoption here is driven less by providing a benefit to users, but rather by reducing systemic risk, which is not the strongest motivating factor for adopting a complex new technology. Full routing security on the Internet could take decades.
The Domain Name System (DNS) is the phone book of the Internet. Or, for anyone under 25 who doesn’t remember phone books, it’s the system that takes hostnames (like cloudflare.com or facebook.com) and returns the Internet address where that host can be found. For example, as of this publication, www.cloudflare.com is 220.127.116.11 and 18.104.22.168 (IPv4) and 2606:4700::c629:d7a2, 2606:4700::c629:d6a2 (IPv6). Like BGP, DNS is completely insecure. Queries and responses sent unencrypted over the Internet are modifiable by anyone on the path.
There are many ongoing attempts to add security to DNS, such as:
DNSSEC that adds a chain of digital signatures to DNS responses
DoT/DoH that wraps DNS queries in the TLS encryption protocol (more on that later)
Both technologies are slowly gaining adoption, but have a long way to go.
Just like RPKI, securing DNS comes with a performance cost, making it less attractive to users. However,
This performance improvement makes it appealing for customers of privacy-conscious applications, like Firefox and Cloudflare’s 22.214.171.124 app, to adopt secure DNS.
Transport Layer Security (TLS) is a cryptographic protocol that gives two parties the ability to communicate over an encrypted and authenticated channel.TLS protects communications from eavesdroppers even in the event of a BGP hijack. TLS is what puts the “S” in HTTPS. TLS protects web browsing against multiple types of network adversaries.
The adoption of TLS on the web is partially driven by the fact that:
Browsers make TLS adoption appealing to website operators by only supporting new web features such as HTTP/2 over HTTPS.
This has led to the rapid adoption of HTTPS over the last five years.
To further that adoption, TLS recently got an upgrade in TLS 1.3, making it faster and more secure (a combination we love). It’s taking over the Internet!
Despite this fantastic progress in the adoption of security for routing, DNS, and the web, there are still gaps in the trust model of the Internet. There are other things needed to help build the Internet of the future. To find and identify these gaps, we lean on research experts.
Research Farm to Table
Cryptographic security on the Internet is a hot topic and there have been many flaws and issues recently pointed out in academic journals. Researchers often study the vulnerabilities of the past and ask:
What other critical components of the Internet have the same flaws?
What underlying assumptions can subvert trust in these existing systems?
The answers to these questions help us decide what to tackle next. Some recent research topics we’ve learned about include:
Attacks on Time Synchronization
DNS attacks affecting Certificate issuance
Scaling distributed trust
Cloudflare keeps abreast of these developments and we do what we can to bring these new ideas to the Internet at large. In this respect, we’re truly standing on the shoulders of giants.
Future-proofing Internet Cryptography
The new protocols we are currently deploying (RPKI, DNSSEC, DoT/DoH, TLS 1.3) use relatively modern cryptographic algorithms published in the 1970s and 1980s.
If you can solve the hard problem, you can crack the code. Using a bigger key makes the problem harder, making it more difficult to break, but also slows performance.
Modern Internet protocols typically pick keys large enough to make it infeasible to break with classical computers, but no larger. The sweet spot is around 128-bits of security;meaning a computer has to do approximately 2¹²⁸ operations to break it.
Arjen Lenstra and others created a useful measure of security levels by comparing the amount of energy it takes to break a key to the amount of water you can boil using that much energy. You can think of this as the electric bill you’d get if you run a computer long enough to crack the key.
35-bit securityis “Teaspoon security” — It takes about the same amount of energy to break a 35-bit key as it does to boil a teaspoon of water (pretty easy).
65 bits gets you up to “Pool security” – The energy needed to boil the average amount of water in a swimming pool.
105 bits is “Sea Security” – The energy needed to boil the Mediterranean Sea.
114-bits is “Global Security” – The energy needed to boil all water on Earth.
128-bit security is safely beyond thatof Global Security – Anything larger is overkill.
256-bit security corresponds to “Universal Security” – The estimated mass-energy of the observable universe. So, if you ever hear someone suggest 256-bit AES, you know they mean business.
Post-Quantum of Solace
As far as we know, the algorithms we use for cryptography are functionally uncrackable with all known algorithms that classical computers can run. Quantum computers change this calculus. Instead of transistors and bits, a quantum computer uses the effects of quantum mechanics to perform calculations that just aren’t possible with classical computers. As you can imagine, quantum computers are very difficult to build. However, despite large-scale quantum computers not existing quite yet, computer scientists have already developed algorithms that can only run efficiently on quantum computers. Surprisingly, it turns out that with a sufficiently powerful quantum computer, most of the hard mathematical problems we rely on for Internet security become easy!
Although there are still quantum-skeptics out there, some expertsestimate that within 15-30 years these large quantum computers will exist, which poses a risk to every security protocol online. Progress is moving quickly; every few months a more powerful quantum computer is announced.
Luckily, there are cryptography algorithms that rely on different hard math problems that seem to be resistant to attack from quantum computers. These math problems form the basis of so-called quantum-resistant (or post-quantum) cryptography algorithms that can run on classical computers. These algorithms can be used as substitutes for most of our current quantum-vulnerable algorithms.
Some quantum-resistant algorithms (such as McEliece and Lamport Signatures) were invented decades ago, but there’s a reason they aren’t in common use: they lack some of the nice properties of the algorithms we’re currently using, such as key size and efficiency.
Some quantum-resistant algorithms require much larger keys to provide 128-bit security
Some are very CPU intensive,
And some just haven’t been studied enough to know if they’re secure.
It is possible to swap our current set of quantum-vulnerable algorithms with new quantum-resistant algorithms, but it’s a daunting engineering task. With widely deployed protocols, it is hard to make the transition from something fast and small to something slower, bigger or more complicated without providing concrete user benefits. When exploring new quantum-resistant algorithms, minimizing user impact is of utmost importance to encourage adoption. This is a big deal, because almost all the protocols we use to protect the Internet are vulnerable to quantum computers.
Cryptography-breaking quantum computing is still in the distant future, but we must start the transition to ensure that today’s secure communications are safe from tomorrow’s quantum-powered onlookers; however, that’s not the most timely problem with the Internet. We haven’t addressed that…yet.
Just like DNS, BGP, and HTTP, the Network Time Protocol (NTP) is fundamental to how the Internet works. And like these other protocols, it is completely insecure.
Last year, Cloudflare introduced Roughtime as a mechanism for computers to access the current time from a trusted server in an authenticated way.
Roughtime is powerful because it provides a way to distribute trust among multiple time servers so that if one server attempts to lie about the time, it will be caught.
However, Roughtime is not exactly a secure drop-in replacement for NTP.
Roughtime lacks the complex mechanisms of NTP that allow it to compensate for network latency and yet maintain precise time, especially if the time servers are remote. This leads to imprecise time.
Roughtime also involves expensive cryptography that can further reduce precision. This lack of precision makes Roughtime useful for browsers and other systems that need coarse time to validate certificates (most certificates are valid for 3 months or more), but some systems (such as those used for financial trading) require precision to the millisecond or below.
With Roughtime we supported the time protocol of the future, but there are things we can do to help improve the health of security online today.
Some academic researchers, including Aanchal Malhotra of Boston University, have demonstrated a variety of attacks against NTP, including BGP hijacking and off-path User Datagram Protocol (UDP) attacks.
Some of these attacks can be avoided by connecting to an NTP server that is close to you on the Internet.
However, to bring cryptographic trust to time while maintaining precision, we need something in between NTP and Roughtime.
To solve this, it’s natural to turn to the same system of trust that enabled us to patch HTTP and DNS: Web PKI.
Attacking the Web PKI
The Web PKI is similar to the RPKI, but is more widely visible since it relates to websites rather than routing tables.
If you’ve ever clicked the lock icon on your browser’s address bar, you’ve interacted with it.
The PKI relies on a set of trusted organizations called Certificate Authorities (CAs) to issue certificates to websites and web services.
While we were all patting ourselves on the back for moving the web to HTTPS, someresearchers managed to find and exploit a weakness in the system: the process for getting HTTPS certificates.
Certificate Authorities (CAs) use a process called domain control validation (DCV) to ensure that they only issue certificates to websites owners who legitimately request them.
Some CAs do this validation manually, which is secure, but can’t scale to the total number of websites deployed today.
More progressive CAs have automated this validation process, but rely on insecure methods (HTTP and DNS) to validate domain ownership.
Without ubiquitous cryptography in place (DNSSEC may never reach 100% deployment), there is no completely secure way to bootstrap this system. So, let’s look at how to distribute trust using other methods.
One tool at our disposal is the distributed nature of the Cloudflare network.
Cloudflare is global. We have locations all over the world connected to dozens of networks. That means we have different vantage points, resulting in different ways to traverse networks. This diversity can prove an advantage when dealing with BGP hijacking, since an attacker would have to hijack multiple routes from multiple locations to affect all the traffic between Cloudflare and other distributed parts of the Internet. The natural diversity of the network raises the cost of the attacks.
A distributed set of connections to the Internet and using them as a quorum is a mighty paradigm to distribute trust, with or without cryptography.
This idea of distributing the source of trust is powerful. Last year we announced the Distributed Web Gateway that
Enables users to access content on the InterPlanetary File System (IPFS), a network structured to reduce the trust placed in any single party.
Even if a participant of the network is compromised, it can’t be used to distribute compromised content because the network is content-addressed.
However, using content-based addressing is not the only way to distribute trust between multiple independent parties.
Another way to distribute trust is to literally split authority between multiple independent parties. We’ve explored this topic before. In the context of Internet services, this means ensuring that no single server can authenticate itself to a client on its own. For example,
In HTTPS the server’s private key is the lynchpin of its security. Compromising the owner of the private key (by hook or by crook) gives an attacker the ability to impersonate (spoof) that service. This single point of failure puts services at risk. You can mitigate this risk by distributing the authority to authenticate the service between multiple independently-operated services.
The Internet barge is old and slow, and we’ve only been able to improve it through the meticulous process of patching it piece by piece. Another option is to build new secure systems on top of this insecure foundation. IPFS is doing this, and IPFS is not alone in its design. There has been more research into secure systems with decentralized trust in the last ten years than ever before.
The result is radical new protocols and designs that use exotic new algorithms. These protocols do not supplant those at the core of the Internet (like TCP/IP), but instead, they sit on top of the existing Internet infrastructure, enabling new applications, much like HTTP did for the web.
Some of the most innovative technical projects were considered failures because they couldn’t attract users. New technology has to bring tangible benefits to users to sustain it: useful functionality, content, and a decent user experience. Distributed projects, such as IPFS and others, are gaining popularity, but have not found mass adoption. This is a chicken-and-egg problem. New protocols have a high barrier to entry—users have to install new software—and because of the small audience, there is less incentive to create compelling content. Decentralization and distributed trust are nice security features to have, but they are not products. Users still need to get some benefit out of using the platform.
An example of a system to break this cycle is the web. In 1992 the web was hardly a cornucopia of awesomeness. What helped drive the dominance of the web was its users.
The growth of the user base meant more incentive for people to build services, and the availability of more services attracted more users. It was a virtuous cycle.
It’s hard for a platform to gain momentum, but once the cycle starts, a flywheel effect kicks in to help the platform grow.
The Distributed Web Gateway project Cloudflare launched last year in Crypto Week is our way of exploring what happens if we try to kickstart that flywheel. By providing a secure, reliable, and fast interface from the classic web with its two billion users to the content on the distributed web, we give the fledgling ecosystem an audience.
If the advantages provided by building on the distributed web are appealing to users, then the larger audience will help these services grow in popularity.
This is somewhat reminiscent of how IPv6 gained adoption. It started as a niche technology only accessible using IPv4-to-IPv6 translation services.
IPv6 adoption has now grown so much that it is becoming a requirement for new services. For example,Apple is requiring that all apps work in IPv6-only contexts.
Eventually, as user-side implementations of distributed web technologies improve, people may move to using the distributed web natively rather than through an HTTP gateway. Or they may not! By leveraging Cloudflare’s global network to give users access to new technologies based on distributed trust, we give these technologies a better chance at gaining adoption.
Happy Crypto Week
At Cloudflare, we always support new technologies that help make the Internet better. Part of helping make a better Internet is scaling the systems of trust that underpin web browsing and protect them from attack. We provide the tools to create better systems of assurance with fewer points of vulnerability. We work with academic researchers of security to get a vision of the future and engineer away vulnerabilities before they can become widespread. It’s a constant journey.
Cloudflare knows that none of this is possible without the work of researchers. From award-winning researcher publishing papers in top journals to the blog posts of clever hobbyists, dedicated and curious people are moving the state of knowledge of the world forward. However, the push to publish new and novel research sometimes holds researchers back from committing enough time and resources to fully realize their ideas. Great research can be powerful on its own, but it can have an even broader impact when combined with practical applications. We relish the opportunity to stand on the shoulders of these giants and use our engineering know-how and global reach to expand on their work to help build a better Internet.
So, to all of you dedicated researchers, thank you for your work! Crypto Week is yours as much as ours. If you’re working on something interesting and you want help to bring the results of your research to the broader Internet, please contact us at [email protected]. We want to help you realize your dream of making the Internet safe and trustworthy.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.