Tag Archives: 1.1.1.1

Zero Trust WARP: tunneling with a MASQUE

2024-03-06 Dan Hall

Post Syndicated from Dan Hall original https://blog.cloudflare.com/zero-trust-warp-with-a-masque

Slipping on the MASQUE

In June 2023, we told you that we were building a new protocol, MASQUE, into WARP. MASQUE is a fascinating protocol that extends the capabilities of HTTP/3 and leverages the unique properties of the QUIC transport protocol to efficiently proxy IP and UDP traffic without sacrificing performance or privacy

At the same time, we’ve seen a rising demand from Zero Trust customers for features and solutions that only MASQUE can deliver. All customers want WARP traffic to look like HTTPS to avoid detection and blocking by firewalls, while a significant number of customers also require FIPS-compliant encryption. We have something good here, and it’s been proven elsewhere (more on that below), so we are building MASQUE into Zero Trust WARP and will be making it available to all of our Zero Trust customers — at WARP speed!

This blog post highlights some of the key benefits our Cloudflare One customers will realize with MASQUE.

Before the MASQUE

Cloudflare is on a mission to help build a better Internet. And it is a journey we’ve been on with our device client and WARP for almost five years. The precursor to WARP was the 2018 launch of 1.1.1.1, the Internet’s fastest, privacy-first consumer DNS service. WARP was introduced in 2019 with the announcement of the 1.1.1.1 service with WARP, a high performance and secure consumer DNS and VPN solution. Then in 2020, we introduced Cloudflare’s Zero Trust platform and the Zero Trust version of WARP to help any IT organization secure their environment, featuring a suite of tools we first built to protect our own IT systems. Zero Trust WARP with MASQUE is the next step in our journey.

The current state of WireGuard

WireGuard was the perfect choice for the 1.1.1.1 with WARP service in 2019. WireGuard is fast, simple, and secure. It was exactly what we needed at the time to guarantee our users’ privacy, and it has met all of our expectations. If we went back in time to do it all over again, we would make the same choice.

But the other side of the simplicity coin is a certain rigidity. We find ourselves wanting to extend WireGuard to deliver more capabilities to our Zero Trust customers, but WireGuard is not easily extended. Capabilities such as better session management, advanced congestion control, or simply the ability to use FIPS-compliant cipher suites are not options within WireGuard; these capabilities would have to be added on as proprietary extensions, if it was even possible to do so.

Plus, while WireGuard is popular in VPN solutions, it is not standards-based, and therefore not treated like a first class citizen in the world of the Internet, where non-standard traffic can be blocked, sometimes intentionally, sometimes not. WireGuard uses a non-standard port, port 51820, by default. Zero Trust WARP changes this to use port 2408 for the WireGuard tunnel, but it’s still a non-standard port. For our customers who control their own firewalls, this is not an issue; they simply allow that traffic. But many of the large number of public Wi-Fi locations, or the approximately 7,000 ISPs in the world, don’t know anything about WireGuard and block these ports. We’ve also faced situations where the ISP does know what WireGuard is and blocks it intentionally.

This can play havoc for roaming Zero Trust WARP users at their local coffee shop, in hotels, on planes, or other places where there are captive portals or public Wi-Fi access, and even sometimes with their local ISP. The user is expecting reliable access with Zero Trust WARP, and is frustrated when their device is blocked from connecting to Cloudflare’s global network.

Now we have another proven technology — MASQUE — which uses and extends HTTP/3 and QUIC. Let’s do a quick review of these to better understand why Cloudflare believes MASQUE is the future.

Unpacking the acronyms

HTTP/3 and QUIC are among the most recent advancements in the evolution of the Internet, enabling faster, more reliable, and more secure connections to endpoints like websites and APIs. Cloudflare worked closely with industry peers through the Internet Engineering Task Force on the development of RFC 9000 for QUIC and RFC 9114 for HTTP/3. The technical background on the basic benefits of HTTP/3 and QUIC are reviewed in our 2019 blog post where we announced QUIC and HTTP/3 availability on Cloudflare’s global network.

Most relevant for Zero Trust WARP, QUIC delivers better performance on low-latency or high packet loss networks thanks to packet coalescing and multiplexing. QUIC packets in separate contexts during the handshake can be coalesced into the same UDP datagram, thus reducing the number of receive and system interrupts. With multiplexing, QUIC can carry multiple HTTP sessions within the same UDP connection. Zero Trust WARP also benefits from QUIC’s high level of privacy, with TLS 1.3 designed into the protocol.

MASQUE unlocks QUIC’s potential for proxying by providing the application layer building blocks to support efficient tunneling of TCP and UDP traffic. In Zero Trust WARP, MASQUE will be used to establish a tunnel over HTTP/3, delivering the same capability as WireGuard tunneling does today. In the future, we’ll be in position to add more value using MASQUE, leveraging Cloudflare’s ongoing participation in the MASQUE Working Group. This blog post is a good read for those interested in digging deeper into MASQUE.

OK, so Cloudflare is going to use MASQUE for WARP. What does that mean to you, the Zero Trust customer?

Proven reliability at scale

Cloudflare’s network today spans more than 310 cities in over 120 countries, and interconnects with over 13,000 networks globally. HTTP/3 and QUIC were introduced to the Cloudflare network in 2019, the HTTP/3 standard was finalized in 2022, and represented about 30% of all HTTP traffic on our network in 2023.

We are also using MASQUE for iCloud Private Relay and other Privacy Proxy partners. The services that power these partnerships, from our Rust-based proxy framework to our open source QUIC implementation, are already deployed globally in our network and have proven to be fast, resilient, and reliable.

Cloudflare is already operating MASQUE, HTTP/3, and QUIC reliably at scale. So we want you, our Zero Trust WARP users and Cloudflare One customers, to benefit from that same reliability and scale.

Connect from anywhere

Employees need to be able to connect from anywhere that has an Internet connection. But that can be a challenge as many security engineers will configure firewalls and other networking devices to block all ports by default, and only open the most well-known and common ports. As we pointed out earlier, this can be frustrating for the roaming Zero Trust WARP user.

We want to fix that for our users, and remove that frustration. HTTP/3 and QUIC deliver the perfect solution. QUIC is carried on top of UDP (protocol number 17), while HTTP/3 uses port 443 for encrypted traffic. Both of these are well known, widely used, and are very unlikely to be blocked.

We want our Zero Trust WARP users to reliably connect wherever they might be.

Compliant cipher suites

MASQUE leverages TLS 1.3 with QUIC, which provides a number of cipher suite choices. WireGuard also uses standard cipher suites. But some standards are more, let’s say, standard than others.

NIST, the National Institute of Standards and Technology and part of the US Department of Commerce, does a tremendous amount of work across the technology landscape. Of interest to us is the NIST research into network security that results in FIPS 140-2 and similar publications. NIST studies individual cipher suites and publishes lists of those they recommend for use, recommendations that become requirements for US Government entities. Many other customers, both government and commercial, use these same recommendations as requirements.

Our first MASQUE implementation for Zero Trust WARP will use TLS 1.3 and FIPS compliant cipher suites.

How can I get Zero Trust WARP with MASQUE?

Cloudflare engineers are hard at work implementing MASQUE for the mobile apps, the desktop clients, and the Cloudflare network. Progress has been good, and we will open this up for beta testing early in the second quarter of 2024 for Cloudflare One customers. Your account team will be reaching out with participation details.

Continuing the journey with Zero Trust WARP

Cloudflare launched WARP five years ago, and we’ve come a long way since. This introduction of MASQUE to Zero Trust WARP is a big step, one that will immediately deliver the benefits noted above. But there will be more — we believe MASQUE opens up new opportunities to leverage the capabilities of QUIC and HTTP/3 to build innovative Zero Trust solutions. And we’re also continuing to work on other new capabilities for our Zero Trust customers.
Cloudflare is committed to continuing our mission to help build a better Internet, one that is more private and secure, scalable, reliable, and fast. And if you would like to join us in this exciting journey, check out our open positions.

Remediating new DNSSEC resource exhaustion vulnerabilities

2024-02-29 Vicky Shrestha

Post Syndicated from Vicky Shrestha original https://blog.cloudflare.com/remediating-new-dnssec-resource-exhaustion-vulnerabilities

Cloudflare has been part of a multivendor, industry-wide effort to mitigate two critical DNSSEC vulnerabilities. These vulnerabilities exposed significant risks to critical infrastructures that provide DNS resolution services. Cloudflare provides DNS resolution for anyone to use for free with our public resolver 1.1.1.1 service. Mitigations for Cloudflare’s public resolver 1.1.1.1 service were applied before these vulnerabilities were disclosed publicly. Internal resolvers using unbound (open source software) were upgraded promptly after a new software version fixing these vulnerabilities was released.

All Cloudflare DNS infrastructure was protected from both of these vulnerabilities before they were disclosed and is safe today. These vulnerabilities do not affect our Authoritative DNS or DNS firewall products.

All major DNS software vendors have released new versions of their software. All other major DNS resolver providers have also applied appropriate mitigations. Please update your DNS resolver software immediately, if you haven’t done so already.

Background

Domain name system (DNS) security extensions, commonly known as DNSSEC, are extensions to the DNS protocol that add authentication and integrity capabilities. DNSSEC uses cryptographic keys and signatures that allow DNS responses to be validated as authentic. DNSSEC protocol specifications have certain requirements that prioritize availability at the cost of increased complexity and computational cost for the validating DNS resolvers. The mitigations for the vulnerabilities discussed in this blog require local policies to be applied that relax these requirements in order to avoid exhausting the resources of validators.

The design of the DNS and DNSSEC protocols follows the Robustness principle: “be conservative in what you do, be liberal in what you accept from others”. There have been many vulnerabilities in the past that have taken advantage of protocol requirements following this principle. Malicious actors can exploit these vulnerabilities to attack DNS infrastructure, in this case by causing additional work for DNS resolvers by crafting DNSSEC responses with complex configurations. As is often the case, we find ourselves having to create a pragmatic balance between the flexibility that allows a protocol to adapt and evolve and the need to safeguard the stability and security of the services we operate.

Cloudflare’s public resolver 1.1.1.1 is a privacy-centric public resolver service. We have been using stricter validations and limits aimed at protecting our own infrastructure in addition to shielding authoritative DNS servers operated outside our network. As a result, we often receive complaints about resolution failures. Experience shows us that strict validations and limits can impact availability in some edge cases, especially when DNS domains are improperly configured. However, these strict validations and limits are necessary to improve the overall reliability and resilience of the DNS infrastructure.

The vulnerabilities and how we mitigated them are described below.

Keytrap vulnerability (CVE-2023-50387)

Introduction

A DNSSEC signed zone can contain multiple keys (DNSKEY) to sign the contents of a DNS zone and a Resource Record Set (RRSET) in a DNS response can have multiple signatures (RRSIG). Multiple keys and signatures are required to support things like key rollover, algorithm rollover, and multi-signer DNSSEC. DNSSEC protocol specifications require a validating DNS resolver to try every possible combination of keys and signatures when validating a DNS response.

During validation, a resolver looks at the key tag of every signature and tries to find the associated key that was used to sign it. A key tag is an unsigned 16-bit number calculated as a checksum over the key’s resource data (RDATA). Key tags are intended to allow efficient pairing of a signature with the key which has supposedly created it. However, key tags are not unique, and it is possible that multiple keys can have the same key tag. A malicious actor can easily craft a DNS response with multiple keys having the same key tag together with multiple signatures, none of which might validate. A validating resolver would have to try every combination (number of keys multiplied by number of signatures) when trying to validate this response. This increases the computational cost of the validating resolver many-fold, degrading performance for all its users. This is known as the Keytrap vulnerability.

Variations of this vulnerability include using multiple signatures with one key, using one signature with multiple keys having colliding key tags, and using multiple keys with corresponding hashes added to the parent delegation signer record.

Mitigation

We have limited the maximum number of keys we will accept at a zone cut. A zone cut is where a parent zone delegates to a child zone, e.g. where the .com zone delegates cloudflare.com to Cloudflare nameservers. Even with this limit already in place and various other protections built for our platform, we realized that it would still be computationally costly to process a malicious DNS answer from an authoritative DNS server.

To address and further mitigate this vulnerability, we added a signature validations limit per RRSET and a total signature validations limit per resolution task. One resolution task might include multiple recursive queries to external authoritative DNS servers in order to answer a single DNS question. Clients queries exceeding these limits will fail to resolve and will receive a response with an Extended DNS Error (EDE) code 0. Furthermore, we added metrics which will allow us to detect attacks attempting to exploit this vulnerability.

NSEC3 iteration and closest encloser proof vulnerability (CVE-2023-50868)

Introduction

NSEC3 is an alternative approach for authenticated denial of existence. You can learn more about authenticated denial of existence here. NSEC3 uses a hash derived from DNS names instead of the DNS names directly in an attempt to prevent zone enumeration and the standard supports multiple iterations for hash calculations. However, because the full DNS name is used as input to the hash calculation, increasing hashing iterations beyond the initial doesn’t provide any additional value and is not recommended in RFC9276. This complication is further inflated while finding the closest enclosure proof. A malicious DNS response from an authoritative DNS server can set a high NSEC3 iteration count and long DNS names with multiple DNS labels to exhaust the computing resources of a validating resolver by making it do unnecessary hash computations.

Mitigation

For this vulnerability, we applied a similar mitigation technique as we did for Keytrap. We added a limit for total hash calculations per resolution task to answer a single DNS question. Similarly, clients queries exceeding this limit will fail to resolve and will receive a response with an EDE code 27. We also added metrics to track hash calculations allowing early detection of attacks attempting to exploit this vulnerability.

Timeline

Date and time in UTC	Event
2023-11-03 16:05	John Todd from Quad9 invites Cloudflare to participate in a joint task force to discuss a new DNS vulnerability.
2023-11-07 14:30	A group of DNS vendors and service providers meet to discuss the vulnerability during IETF 118. Discussions and collaboration continues in a closed chat group hosted at DNS-OARC
2023-12-08 20:20	Cloudflare public resolver 1.1.1.1 is fully patched to mitigate Keytrap vulnerability (CVE-2023-50387)
2024-01-17 22:39	Cloudflare public resolver 1.1.1.1 is fully patched to mitigate NSEC3 iteration count and closest encloser vulnerability (CVE-2023-50868)
2024-02-13 13:04	Unbound package is released
2024-02-13 23:00	Cloudflare internal CDN resolver is fully patched to mitigate both CVE-2023-50387 and CVE-2023-50868

Credits

We would like to thank Elias Heftrig, Haya Schulmann, Niklas Vogel, Michael Waidner from the German National Research Center for Applied Cybersecurity ATHENE, for discovering the Keytrap vulnerability and doing a responsible disclosure.

We would like to thank Petr Špaček from Internet Systems Consortium (ISC) for discovering the NSEC3 iteration and closest encloser proof vulnerability and doing a responsible disclosure.

We would like to thank John Todd from Quad9 and the DNS Operations Analysis and Research Center (DNS-OARC) for facilitating coordination amongst various stakeholders.

And finally, we would like to thank the DNS-OARC community members, representing various DNS vendors and service providers, who all came together and worked tirelessly to fix these vulnerabilities, working towards a common goal of making the internet resilient and secure.

1.1.1.1 lookup failures on October 4th, 2023

2023-10-04 Ólafur Guðmundsson

Post Syndicated from Ólafur Guðmundsson original http://blog.cloudflare.com/1-1-1-1-lookup-failures-on-october-4th-2023/

1.1.1.1 lookup failures on October 4th, 2023

On 4 October 2023, Cloudflare experienced DNS resolution problems starting at 07:00 UTC and ending at 11:00 UTC. Some users of 1.1.1.1 or products like WARP, Zero Trust, or third party DNS resolvers which use 1.1.1.1 may have received SERVFAIL DNS responses to valid queries. We’re very sorry for this outage. This outage was an internal software error and not the result of an attack. In this blog, we’re going to talk about what the failure was, why it occurred, and what we’re doing to make sure this doesn’t happen again.

Background

In the Domain Name System (DNS), every domain name exists within a DNS zone. The zone is a collection of domain names and host names that are controlled together. For example, Cloudflare is responsible for the domain name cloudflare.com, which we say is in the “cloudflare.com” zone. The .com top-level domain (TLD) is owned by a third party and is in the “com” zone. It gives directions on how to reach cloudflare.com. Above all of the TLDs is the root zone, which gives directions on how to reach TLDs. This means that the root zone is important in being able to resolve all other domain names. Like other important parts of the DNS, the root zone is signed with DNSSEC, which means the root zone itself contains cryptographic signatures.

The root zone is published on the root servers, but it is also common for DNS operators to retrieve and retain a copy of the root zone automatically so that in the event that the root servers cannot be reached, the information in the root zone is still available. Cloudflare’s recursive DNS infrastructure takes this approach as it also makes the resolution process faster. New versions of the root zone are normally published twice a day. 1.1.1.1 has a WebAssembly app called static_zone running on top of the main DNS logic that serves those new versions when they are available.

What happened

On 21 September, as part of a known and planned change in root zone management, a new resource record type was included in the root zones for the first time. The new resource record is named ZONEMD, and is in effect a checksum for the contents of the root zone.

The root zone is retrieved by software running in Cloudflare’s core network. It is subsequently redistributed to Cloudflare’s data centers around the world. After the change, the root zone containing the ZONEMD record continued to be retrieved and distributed as normal. However, the 1.1.1.1 resolver systems that make use of that data had problems parsing the ZONEMD record. Because zones must be loaded and served in their entirety, the system’s failure to parse ZONEMD meant the new versions of the root zone were not used in Cloudflare’s resolver systems. Some of the servers hosting Cloudflare's resolver infrastructure failed over to querying the DNS root servers directly on a request-by-request basis when they did not receive the new root zone. However, others continued to rely on the known working version of the root zone still available in their memory cache, which was the version pulled on 21 September before the change.

On 4 October 2023 at 07:00 UTC, the DNSSEC signatures in the version of the root zone from 21 September expired. Because there was no newer version that the Cloudflare resolver systems were able to use, some of Cloudflare’s resolver systems stopped being able to validate DNSSEC signatures and as a result started sending error responses (SERVFAIL). The rate at which Cloudflare resolvers generated SERVFAIL responses grew by 12%. The diagrams below illustrate the progression of the failure and how it became visible to users.

Incident timeline and impact

21 September 6:30 UTC: Last successful pull of the root zone
4 October 7:00 UTC: DNSSEC signatures in the root zone obtained on 21 September expired causing an increase in SERVFAIL responses to client queries.
7:57: First external reports of unexpected SERVFAILs started coming in.
8:03: Internal Cloudflare incident declared.
8:50: Initial attempt made at stopping 1.1.1.1 from serving responses using the stale root zone file with an override rule.
10:30: Stopped 1.1.1.1 from preloading the root zone file entirely.
10:32: Responses returned to normal.
11:02: Incident closed.

This below chart shows the timeline of impact along with the percentage of DNS queries that returned with a SERVFAIL error:

We expect a baseline volume of SERVFAIL errors for regular traffic during normal operation. Usually that percentage sits at around 3%. These SERVFAILs can be caused by legitimate issues in the DNSSEC chain, failures to connect to authoritative servers, authoritative servers taking too long to respond, and many others. During the incident the amount of SERVFAILs peaked at 15% of total queries, although the impact was not evenly distributed around the world and was mainly concentrated in our larger data centers like Ashburn, Virginia; Frankfurt, Germany; and Singapore.

Why this incident happened

Why parsing the ZONEMD record failed

DNS has a binary format for storing resource records. In this binary format the type of the resource record (TYPE) is stored as a 16-bit integer. The type of resource record determines how the resource data (RDATA) is parsed. When the record type is 1, this means it is an A record, and the RDATA can be parsed as an IPv4 address. Record type 28 is an AAAA record, whose RDATA can be parsed as an IPv6 address instead. When a parser runs into an unknown resource type it won’t know how to parse its RDATA, but fortunately it doesn’t have to: the RDLENGTH field indicates how long the RDATA field is, allowing the parser to treat it as an opaque data element.

                                   1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                                               |
    /                                               /
    /                      NAME                     /
    |                                               |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      TYPE                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                     CLASS                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      TTL                      |
    |                                               |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                   RDLENGTH                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
    /                     RDATA                     /
    /                                               /
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

RFC 1035

The reason static_zone didn’t support the new ZONEMD record is because up until now we had chosen to distribute the root zone internally in its presentation format, rather than in the binary format. When looking at the text representation for a few resource records we can see there is a lot more variation in how different records are presented.

.			86400	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2023100400 1800 900 604800 86400
.			86400	IN	RRSIG	SOA 8 0 86400 20231017050000 20231004040000 46780 . J5lVTygIkJHDBt6HHm1QLx7S0EItynbBijgNlcKs/W8FIkPBfCQmw5BsUTZAPVxKj7r2iNLRddwRcM/1sL49jV9Jtctn8OLLc9wtouBmg3LH94M0utW86dKSGEKtzGzWbi5hjVBlkroB8XVQxBphAUqGxNDxdE6AIAvh/eSSb3uSQrarxLnKWvHIHm5PORIOftkIRZ2kcA7Qtou9NqPCSE8fOM5EdXxussKChGthmN5AR5S2EruXIGGRd1vvEYBrRPv55BAWKKRERkaXhgAp7VikYzXesiRLdqVlTQd+fwy2tm/MTw+v3Un48wXPg1lRPlQXmQsuBwqg74Ts5r8w8w==
.			518400	IN	NS	a.root-servers.net.
.			86400	IN	ZONEMD	2023100400 1 241 E375B158DAEE6141E1F784FDB66620CC4412EDE47C8892B975C90C6A102E97443678CCA4115E27195B468E33ABD9F78C

Example records taken from https://www.internic.net/domain/root.zone

When we run into an unknown resource record it’s not always easy to know how to handle it. Because of this, the library we use to parse the root zone at the edge does not make an attempt at doing so, and instead returns a parser error.

Why a stale version of the root zone was used

The static_zone app, tasked with loading and parsing the root zone for the purpose of serving the root zone locally (RFC 7706), stores the latest version in memory. When a new version is published it parses it and, when successfully done so, drops the old version. However, as parsing failed the static_zone app never switched to a newer version, and instead continued using the old version indefinitely. When the 1.1.1.1 service is first started the static_zone app does not have an existing version in memory. When it tries to parse the root zone it fails in doing so, but because it does not have an older version of the root zone to fall back on, it falls back on querying the root servers directly for incoming requests.

Why the initial attempt at disabling static_zone didn’t work

Initially we tried to disable the static_zone app through override rules, a mechanism that allows us to programmatically change some behavior of 1.1.1.1. The rule we deployed was:

phase = pre-cache set-tag rec_disable_static

For any incoming request this rule adds the tag rec_disable_static to the request. Inside the static_zone app we check for this tag and, if it’s set, we do not return a response from the cached, static root zone. However, to improve cache performance queries are sometimes forwarded to another node if the current node can’t find the response in its own cache. Unfortunately, the rec_disable_static tag is not included in the queries being forwarded to other nodes, which caused the static_zone app to continue replying with stale information until we eventually disabled the app entirely.

Why the impact was partial

Cloudflare regularly performs rolling reboots of the servers that host our services for tasks like kernel updates that can only take effect after a full system restart. At the time of this outage, resolver server instances that were restarted between the ZONEMD change and the DNSSEC invalidation did not contribute to impact. If they had restarted during this two-week period, they would have failed to load the root zone on startup and fallen back to resolving by sending DNS queries to root servers instead. In addition, the resolver uses a technique called serve stale (RFC 8767) with the purpose of being able to continue to serve popular records from a potentially stale cache to limit the impact. A record is considered to be stale once the TTL amount of seconds has passed since the record was retrieved from upstream. This prevented a total outage; impact was mainly felt in our largest data centers which had many servers that had not restarted the 1.1.1.1 service in that timeframe.

Remediation and follow-up steps

This incident had widespread impact, and we take the availability of our services very seriously. We have identified several areas of improvement and will continue to work on uncovering any other gaps that could cause a recurrence.

Here is what we are working on immediately:

Visibility: We’re adding alerts to notify when static_zone serves a stale root zone file. It should not have been the case that serving a stale root zone file went unnoticed for as long as it did. If we had been monitoring this better, with the caching that exists, there would have been no impact. It is our goal to protect our customers and their users from upstream changes.

Resilience: We will re-evaluate how we ingest and distribute the root zone internally. Our ingestion and distribution pipelines should handle new RRTYPEs seamlessly, and any brief interruption to the pipeline should be invisible to end users.

Testing: Despite having tests in place around this problem, including tests related to unreleased changes in parsing the new ZONEMD records, we did not adequately test what happens when the root zone fails to parse. We will improve our test coverage and the related processes.

Architecture: We should not use stale copies of the root zone past a certain point. While it’s certainly possible to continue to use stale root zone data for a limited amount of time, past a certain point there are unacceptable operational risks. We will take measures to ensure that the lifetime of cached root zone data is better managed as described in RFC 8806: Running a Root Server Local to a Resolver.

Conclusion

We are deeply sorry that this incident happened. There is one clear message from this incident: do not ever assume that something is not going to change! Many modern systems are built with a long chain of libraries that are pulled into the final executable, each one of those may have bugs or may not be updated early enough for programs to operate correctly when changes in input happen. We understand how important it is to have good testing in place that allows detection of regressions and systems and components that fail gracefully on changes to input. We understand that we need to always assume that “format” changes in the most critical systems of the internet (DNS and BGP) are going to have an impact.

We have a lot to follow up on internally and are working around the clock to make sure something like this does not happen again.

How Rust and Wasm power Cloudflare’s 1.1.1.1

2023-02-28 Anbang Wen

Post Syndicated from Anbang Wen original https://blog.cloudflare.com/big-pineapple-intro/

How Rust and Wasm power Cloudflare's 1.1.1.1

On April 1, 2018, Cloudflare announced the 1.1.1.1 public DNS resolver. Over the years, we added the debug page for troubleshooting, global cache purge, 0 TTL for zones on Cloudflare, Upstream TLS, and 1.1.1.1 for families to the platform. In this post, we would like to share some behind the scenes details and changes.

When the project started, Knot Resolver was chosen as the DNS resolver. We started building a whole system on top of it, so that it could fit Cloudflare’s use case. Having a battle tested DNS recursive resolver, as well as a DNSSEC validator, was fantastic because we could spend our energy elsewhere, instead of worrying about the DNS protocol implementation.

Knot Resolver is quite flexible in terms of its Lua-based plugin system. It allowed us to quickly extend the core functionality to support various product features, like DoH/DoT, logging, BPF-based attack mitigation, cache sharing, and iteration logic override. As the traffic grew, we reached certain limitations.

Lessons we learned

Before going any deeper, let’s first have a bird’s-eye view of a simplified Cloudflare data center setup, which could help us understand what we are going to talk about later. At Cloudflare, every server is identical: the software stack running on one server is exactly the same as on another server, only the configuration may be different. This setup greatly reduces the complexity of fleet maintenance.

The resolver runs as a daemon process, kresd, and it doesn’t work alone. Requests, specifically DNS requests, are load-balanced to the servers inside a data center by Unimog. DoH requests are terminated at our TLS terminator. Configs and other small pieces of data can be delivered worldwide by Quicksilver in seconds. With all the help, the resolver can concentrate on its own goal – resolving DNS queries, and not worrying about transport protocol details. Now let’s talk about 3 key areas we wanted to improve here – blocking I/O in plugins, a more efficient use of cache space, and plugin isolation.

Callbacks blocking the event loop

Knot Resolver has a very flexible plugin system for extending its core functionality. The plugins are called modules, and they are based on callbacks. At certain points during request processing, these callbacks will be invoked with current query context. This gives a module the ability to inspect, modify, and even produce requests / responses. By design, these callbacks are supposed to be simple, in order to avoid blocking the underlying event loop. This matters because the service is single threaded, and the event loop is in charge of serving many requests at the same time. So even just one request being held up in a callback means that no other concurrent requests can be progressed until the callback finishes.

The setup worked well enough for us until we needed to do blocking operations, for example, to pull data from Quicksilver before responding to the client.

Cache efficiency

As requests for a domain could land on any node inside a data center, it would be wasteful to repetitively resolve a query when another node already has the answer. By intuition, the latency could be improved if the cache could be shared among the servers, and so we created a cache module which multicasted the newly added cache entries. Nodes inside the same data center could then subscribe to the events and update their local cache.

The default cache implementation in Knot Resolver is LMDB. It is fast and reliable for small to medium deployments. But in our case, cache eviction shortly became a problem. The cache itself doesn’t track for any TTL, popularity, etc. When it’s full, it just clears all the entries and starts over. Scenarios like zone enumeration could fill the cache with data that is unlikely to be retrieved later.

Furthermore, our multicast cache module made it worse by amplifying the less useful data to all the nodes, and led them to the cache high watermark at the same time. Then we saw a latency spike because all the nodes dropped the cache and started over around the same time.

Module isolation

With the list of Lua modules increasing, debugging issues became increasingly difficult. This is because a single Lua state is shared among all the modules, so one misbehaving module could affect another. For example, when something went wrong inside the Lua state, like having too many coroutines, or being out of memory, we got lucky if the program just crashed, but the resulting stack traces were hard to read. It is also difficult to forcibly tear down, or upgrade, a running module as it not only has state in the Lua runtime, but also FFI, so memory safety is not guaranteed.

Hello BigPineapple

We didn’t find any existing software that would meet our somewhat niche requirements, so eventually we started building something ourselves. The first attempt was to wrap Knot Resolver’s core with a thin service written in Rust (modified edgedns).

This proved to be difficult due to having to constantly convert between the storage, and C/FFI types, and some other quirks (for example, the ABI for looking up records from cache expects the returned records to be immutable until the next call, or the end of the read transaction). But we learned a lot from trying to implement this sort of split functionality where the host (the service) provides some resources to the guest (resolver core library), and how we would make that interface better.

In the later iterations, we replaced the entire recursive library with a new one based around an async runtime; and a redesigned module system was added to it, sneakily rewriting the service into Rust over time as we swapped out more and more components. That async runtime was tokio, which offered a neat thread pool interface for running both non-blocking and blocking tasks, as well as a good ecosystem for working with other crates (Rust libraries).

After that, as the futures combinators became tedious, we started converting everything to async/await. This was before the async/await feature that landed in Rust 1.39, which led us to use nightly (Rust beta) for a while and had some hiccups. When the async/await stabilized, it enabled us to write our request processing routine ergonomically, similar to Go.

All the tasks can be run concurrently, and certain I/O heavy ones can be broken down into smaller pieces, to benefit from a more granular scheduling. As the runtime executes tasks on a threadpool, instead of a single thread, it also benefits from work stealing. This avoids a problem we previously had, where a single request taking a lot of time to process, that blocks all the other requests on the event loop.

Finally, we forged a platform that we are happy with, and we call it BigPineapple. The figure above shows an overview of its main components and the data flow between them. Inside BigPineapple, the server module gets inbound requests from the client, validates and transforms them into unified frame streams, which can then be processed by the worker module. The worker module has a set of workers, whose task is to figure out the answer to the question in the request. Each worker interacts with the cache module to check if the answer is there and still valid, otherwise it drives the recursor module to recursively iterate the query. The recursor doesn’t do any I/O, when it needs anything, it delegates the sub-task to the conductor module. The conductor then uses outbound queries to get the information from upstream nameservers. Through the whole process, some modules can interact with the Sandbox module, to extend its functionality by running the plugins inside.

Let’s look at some of them in more detail, and see how they helped us overcome the problems we had before.

Updated I/O architecture

A DNS resolver can be seen as an agent between a client and several authoritative nameservers: it receives requests from the client, recursively fetches data from the upstream nameservers, then composes the responses and sends them back to the client. So it has both inbound and outbound traffic, which are handled by the server and the conductor component respectively.

The server listens on a list of interfaces using different transport protocols. These are later abstracted into streams of “frames”. Each frame is a high level representation of a DNS message, with some extra metadata. Underneath, it can be a UDP packet, a segment of TCP stream, or the payload of a HTTP request, but they are all processed the same way. The frame is then converted into an asynchronous task, which in turn is picked up by a set of workers in charge of resolving these tasks. The finished tasks are converted back into responses, and sent back to the client.

This “frame” abstraction over the protocols and their encodings simplified the logic used to regulate the frame sources, such as enforcing fairness to prevent starving and controlling pacing to protect the server from being overwhelmed. One of the things we’ve learned with the previous implementations is that, for a service open to the public, a peak performance of the I/O matters less than the ability to pace clients fairly. This is mainly because the time and computational cost of each recursive request is vastly different (for example a cache hit from a cache miss), and it’s difficult to guess it beforehand. The cache misses in recursive service not only consume Cloudflare’s resources, but also the resources of the authoritative nameservers being queried, so we need to be mindful of that.

On the other side of the server is the conductor, which manages all the outbound connections. It helps to answer some questions before reaching out to the upstream: Which is the fastest nameserver to connect to in terms of latency? What to do if all the nameservers are not reachable? What protocol to use for the connection, and are there any better options? The conductor is able to make these decisions by tracking the upstream server’s metrics, such as RTT, QoS, etc. With that knowledge, it can also guess for things like upstream capacity, UDP packet loss, and take necessary actions, e.g. retry when it thinks the previous UDP packet didn’t reach the upstream.

Figure 3 shows a simplified data flow about the conductor. It is called by the exchanger mentioned above, with upstream requests as input. The requests will be deduplicated first: meaning in a small window, if a lot of requests come to the conductor and ask for the same question, only one of them will pass, the others are put into a waiting queue. This is common when a cache entry expires, and can reduce unnecessary network traffic. Then based on the request and upstream metrics, the connection instructor either picks an open connection if available, or generates a set of parameters. With these parameters, the I/O executor is able to connect to the upstream directly, or even take a route via another Cloudflare data center using our Argo Smart Routing technology!

The cache

Caching in a recursive service is critical as a server can return a cached response in under one millisecond, while it will be hundreds of milliseconds to respond on a cache miss. As the memory is a finite resource (and also a shared resource in Cloudflare’s architecture), more efficient use of space for cache was one of the key areas we wanted to improve. The new cache is implemented with a cache replacement data structure (ARC), instead of a KV store. This makes good use of the space on a single node, as less popular entries are progressively evicted, and the data structure is resistant to scans.

Moreover, instead of duplicating the cache across the whole data center with multicast, as we did before, BigPineapple is aware of its peer nodes in the same data center, and relays queries from one node to another if it cannot find an entry in its own cache. This is done by consistent hashing the queries onto the healthy nodes in each data center. So, for example, queries for the same registered domain go through the same subset of nodes, which not only increases the cache hit ratio, but also helps the infrastructure cache, which stores information about performance and features of nameservers.

Async recursive library

The recursive library is the DNS brain of BigPineapple, as it knows how to find the answer to the question in the query. Starting from the root, it breaks down the client query into subqueries, and uses them to collect knowledge recursively from various authoritative nameservers on the internet. The product of this process is the answer. Thanks to the async/await it can be abstracted as a function like such:

async fn resolve(Request, Exchanger) → Result<Response>;

The function contains all the logic necessary to generate a response to a given request, but it doesn’t do any I/O on its own. Instead, we pass an Exchanger trait(Rust interface) that knows how to exchange DNS messages with upstream authoritative nameservers asynchronously. The exchanger is usually called at various await points – for example, when a recursion starts, one of the first things it does is that it looks up the closest cached delegation for the domain. If it doesn’t have the final delegation in cache, it needs to ask what nameservers are responsible for this domain and wait for the response, before it can proceed any further.

Thanks to this design, which decouples the “waiting for some responses” part from the recursive DNS logic, it is much easier to test by providing a mock implementation of the exchanger. In addition, it makes the recursive iteration code (and DNSSEC validation logic in particular) much more readable, as it’s written sequentially instead of being scattered across many callbacks.

Fun fact: writing a DNS recursive resolver from scratch is not fun at all!

Not only because of the complexity of DNSSEC validation, but also because of the necessary “workarounds” needed for various RFC incompatible servers, forwarders, firewalls, etc. So we ported deckard into Rust to help test it. Additionally, when we started migrating over to this new async recursive library, we first ran it in “shadow” mode: processing real world query samples from the production service, and comparing differences. We’ve done this in the past on Cloudflare’s authoritative DNS service as well. It is slightly more difficult for a recursive service due to the fact that a recursive service has to look up all the data over the Internet, and authoritative nameservers often give different answers for the same query due to localization, load balancing and such, leading to many false positives.

In December 2019, we finally enabled the new service on a public test endpoint (see the announcement) to iron out remaining issues before slowly migrating the production endpoints to the new service. Even after all that, we continued to find edge cases with the DNS recursion (and DNSSEC validation in particular), but fixing and reproducing these issues has become much easier due to the new architecture of the library.

Sandboxed plugins

Having the ability to extend the core DNS functionality on the fly is important for us, thus BigPineapple has its redesigned plugin system. Before, the Lua plugins run in the same memory space as the service itself, and are generally free to do what they want. This is convenient, as we can freely pass memory references between the service and modules using C/FFI. For example, to read a response directly from cache without having to copy to a buffer first. But it is also dangerous, as the module can read uninitialized memory, call a host ABI using a wrong function signature, block on a local socket, or do other undesirable things, in addition the service doesn’t have a way to restrict these behaviors.

So we looked at replacing the embedded Lua runtime with JavaScript, or native modules, but around the same time, embedded runtimes for WebAssembly (Wasm for short) started to appear. Two nice properties of WebAssembly programs are that it allows us to write them in the same language as the rest of the service, and that they run in an isolated memory space. So we started modeling the guest/host interface around the limitations of WebAssembly modules, to see how that would work.

BigPineapple’s Wasm runtime is currently powered by Wasmer. We tried several runtimes over time like Wasmtime, WAVM in the beginning, and found Wasmer was simpler to use in our case. The runtime allows each module to run in its own instance, with an isolated memory and a signal trap, which naturally solved the module isolation problem we described before. In addition to this, we can have multiple instances of the same module running at the same time. Being controlled carefully, the apps can be hot swapped from one instance to another without missing a single request! This is great because the apps can be upgraded on the fly without a server restart. Given that the Wasm programs are distributed via Quicksilver, BigPineapple’s functionality can be safely changed worldwide within a few seconds!

To better understand the WebAssembly sandbox, several terms need to be introduced first:

Host: the program which runs the Wasm runtime. Similar to a kernel, it has full control through the runtime over the guest applications.
Guest application: the Wasm program inside the sandbox. Within a restricted environment, it can only access its own memory space, which is provided by the runtime, and call the imported Host calls. We call it an app for short.
Host call: the functions defined in the host that can be imported by the guest. Comparable to syscall, it’s the only way guest apps can access the resources outside the sandbox.
Guest runtime: a library for guest applications to easily interact with the host. It implements some common interfaces, so an app can just use async, socket, log and tracing without knowing the underlying details.

Now it’s time to dive into the sandbox, so stay awhile and listen. First let’s start from the guest side, and see what a common app lifespan looks like. With the help of the guest runtime, guest apps can be written similar to regular programs. So like other executables, an app begins with a start function as an entrypoint, which is called by the host upon loading. It is also provided with arguments as from the command line. At this point, the instance normally does some initialization, and more importantly, registers callback functions for different query phases. This is because in a recursive resolver, a query has to go through several phases before it gathers enough information to produce a response, for example a cache lookup, or making subrequests to resolve a delegation chain for the domain, so being able to tie into these phases is necessary for the apps to be useful for different use cases. The start function can also run some background tasks to supplement the phase callbacks, and store global state. For example – report metrics, or pre-fetch shared data from external sources, etc. Again, just like how we write a normal program.

But where do the program arguments come from? How could a guest app send log and metrics? The answer is, external functions.

In figure 5, we can see a barrier in the middle, which is the sandbox boundary, that separates the guest from the host. The only way one side can reach out to the other, is via a set of functions exported by the peer beforehand. As in the picture, the “hostcalls” are exported by the host, imported and called by the guest; while the “trampoline” are guest functions that the host has knowledge of.

It is called trampoline because it is used to invoke a function or a closure inside a guest instance that’s not exported. The phase callbacks are one example of why we need a trampoline function: each callback returns a closure, and therefore can’t be exported on instantiation. So a guest app wants to register a callback, it calls a host call with the callback address “hostcall_register_callback(pre_cache, #30987)”, when the callback needs to be invoked, the host cannot just call that pointer as it’s pointing to the guest’s memory space. What it can do instead is, to leverage one of the aforementioned trampolines, and give it the address of the callback closure: “trampoline_call(#30987)”.

Isolation overhead
Like a coin that has two sides, the new sandbox does come with some additional overhead. The portability and isolation that WebAssembly offers bring extra cost. Here, we’ll list two examples.

Firstly, guest apps are not allowed to read host memory. The way it works is the guest provides a memory region via a host call, then the host writes the data into the guest memory space. This introduces a memory copy that would not be needed if we were outside the sandbox. The bad news is, in our use case, the guest apps are supposed to do something on the query and/or the response, so they almost always need to read data from the host on every single request. The good news, on the other hand, is that during a request life cycle, the data won’t change. So we pre-allocate a bulk of memory in the guest memory space right after the guest app instantiates. The allocated memory is not going to be used, but instead serves to occupy a hole in the address space. Once the host gets the address details, it maps a shared memory region with the common data needed by the guest into the guest’s space. When the guest code starts to execute, it can just access the data in the shared memory overlay, and no copy is needed.

Another issue we ran into was when we wanted to add support for a modern protocol, oDoH, into BigPineapple. The main job of it is to decrypt the client query, resolve it, then encrypt the answers before sending it back. By design, this doesn’t belong to core DNS, and should instead be extended with a Wasm app. However, the WebAssembly instruction set doesn’t provide some crypto primitives, such as AES and SHA-2, which prevents it from getting the benefit of host hardware. There is ongoing work to bring this functionality to Wasm with WASI-crypto. Until then, our solution for this is to simply delegate the HPKE to the host via host calls, and we already saw 4x performance improvements, compared to doing it inside Wasm.

Async in Wasm
Remember the problem we talked about before that the callbacks could block the event loop? Essentially, the problem is how to run the sandboxed code asynchronously. Because no matter how complex the request processing callback is, if it can yield, we can put an upper bound on how long it is allowed to block. Luckily, Rust’s async framework is both elegant and lightweight. It gives us the opportunity to use a set of guest calls to implement the “Future”s.

In Rust, a Future is a building block for asynchronous computations. From the user’s perspective, in order to make an asynchronous program, one has to take care of two things: implement a pollable function that drives the state transition, and place a waker as a callback to wake itself up, when the pollable function should be called again due to some external event (e.g. time passes, socket becomes readable, and so on). The former is to be able to progress the program gradually, e.g. read buffered data from I/O and return a new state indicating the status of the task: either finished, or yielded. The latter is useful in case of task yielding, as it will trigger the Future to be polled when the conditions that the task was waiting for are fulfilled, instead of busy looping until it’s complete.

Let’s see how this is implemented in our sandbox. For a scenario when the guest needs to do some I/O, it has to do so via the host calls, as it is inside a restricted environment. Assuming the host provides a set of simplified host calls which mirror the basic socket operations: open, read, write, and close, the guest can have its pseudo poller defined as below:

fn poll(&mut self, wake: fn()) -> Poll {
	match hostcall_socket_read(self.sock, self.buffer) {
    	    HostOk  => Poll::Ready,
    	    HostEof => Poll::Pending,
	}
}

Here the host call reads data from a socket into a buffer, depending on its return value, the function can move itself to one of the states we mentioned above: finished(Ready), or yielded(Pending). The magic happens inside the host call. Remember in figure 5, that it is the only way to access resources? The guest app doesn’t own the socket, but it can acquire a “handle” via “hostcall_socket_open”, which will in turn create a socket on the host side, and return a handle. The handle can be anything in theory, but in practice using integer socket handles map well to file descriptors on the host side, or indices in a vector or slab. By referencing the returned handle, the guest app is able to remotely control the real socket. As the host side is fully asynchronous, it can simply relay the socket state to the guest. If you noticed that the waker function isn’t used above, well done! That’s because when the host call is called, it not only starts opening a socket, but also registers the current waker to be called then the socket is opened (or fails to do so). So when the socket becomes ready, the host task will be woken up, it will find the corresponding guest task from its context, and wakes it up using the trampoline function as shown in figure 5. There are other cases where a guest task needs to wait for another guest task, an async mutex for example. The mechanism here is similar: using host calls to register wakers.

All of these complicated things are encapsulated in our guest async runtime, with easy to use API, so the guest apps get access to regular async functions without having to worry about the underlying details.

(Not) The End

Hopefully, this blog post gave you a general idea of the innovative platform that powers 1.1.1.1. It is still evolving. As of today, several of our products, such as 1.1.1.1 for Families, AS112, and Gateway DNS, are supported by guest apps running on BigPineapple. We are looking forward to bringing new technologies into it. If you have any ideas, please let us know in the community or via email.

Protests spur Internet disruptions in Iran

2022-09-22 David Belson

Post Syndicated from David Belson original https://blog.cloudflare.com/protests-internet-disruption-ir/

Protests spur Internet disruptions in Iran

Over the past several days, protests and demonstrations have erupted across Iran in response to the death of Mahsa Amini. Amini was a 22-year-old woman from the Kurdistan Province of Iran, and was arrested on September 13, 2022, in Tehran by Iran’s “morality police”, a unit that enforces strict dress codes for women. She died on September 16 while in police custody.

Published reports indicate that the growing protests have resulted in at least eight deaths. Iran has a history of restricting Internet connectivity in response to protests, taking such steps in May 2022, February 2021, and November 2019. They have taken a similar approach to the current protests, including disrupting Internet connectivity, blocking social media platforms, and blocking DNS. The impact of these actions, as seen through Cloudflare’s data, are reviewed below.

Impact to Internet traffic

In the city of Sanandij in the Kurdistan Province, several days of anti-government protests took place after the death of Mahsa Amini. In response, the government reportedly disrupted Internet connectivity there on September 19. This disruption is clearly visible in the graph below, with traffic on TCI (AS58224), Iran’s fixed-line incumbent operator, in Sanandij dropping to zero between 1630 and 1925 UTC, except for a brief spike evident between 1715 and 1725 UTC.

On September 21, Internet disruptions started to become more widespread, with mobile networks effectively shut down nationwide. (Iran is a heavily mobile-centric country, with Cloudflare Radar reporting that 85% of requests are made from mobile devices.) Internet traffic from Iran Mobile Communications Company (AS197207) started to decline around 1530 UTC, and remained near zero until it started to recover at 2200 UTC, returning to “normal” levels by the end of the day.

Internet traffic from RighTel (AS57218) began to decline around 1630 UTC. After an outage lasting more than 12 hours, traffic returned at 0510 UTC.

Internet traffic from MTN Irancell (AS44244) began to drop just before 1700 UTC. After a 12-hour outage, traffic began recovering at 0450 UTC.

The impact of these disruptions is also visible when looking at traffic at both a regional and national level. In Tehran Province, HTTP request volume declined by approximately 70% around 1600 UTC, and continued to drop for the next several hours before seeing a slight recovery at 2200 UTC, likely related to the recovery also seen at that time on AS197207.

Similarly, Internet traffic volumes across the whole country began to decline just after 1600 UTC, falling approximately 40%. Nominal recovery at 2200 UTC is visible in this view as well, again likely from the increase in traffic from AS197207. More aggressive traffic growth is visible starting around 0500 UTC, after the remaining two mobile network providers came back online.

DNS blocking

In addition to shutting down mobile Internet providers within the country, Iran’s government also reportedly blocked access to social media platform Instagram, as well as blocking access to DNS-over-HTTPS from open DNS resolver services including Quad9, Google’s 8.8.8.8, and Cloudflare’s 1.1.1.1. Analysis of requests originating in Iran to 1.1.1.1 illustrates the impacts of these blocking attempts.

In analyzing DNS requests to Cloudflare’s resolver for domains associated with leading social media platforms, we observe that requests for instagram.com hostnames drop sharply at 1310 UTC, remaining lower for the rest of the day, except for a significant unexplained spike in requests between 1540 and 1610 UTC. Request volumes for hostnames associated with other leading social media platforms did not appear to be similarly affected.

In addition, it was reported that access to WhatsApp had also been blocked in Iran. This can be seen in resolution requests to Cloudflare’s resolver for whatsapp.com hostnames. The graph below shows a sharp decline in query traffic at 1910 UTC, dropping to near zero.

The Open Observatory for Network Interference (OONI), an organization that measures Internet censorship, reported in a Tweet that the cloudflare-dns.com domain name, used for DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT) connections to Cloudflare’s DNS resolver, was blocked in Iran on September 20. This is clearly evident in the graph below, with resolution volume over DoH and DoT dropping to zero at 1940 UTC. The OONI tweet also noted that the 1.1.1.1 IP address “remains blocked on most networks.” The trend line for resolution over TCP or UDP (on port 53) in the graph below suggests that the IP address is not universally blocked, as there are still resolution requests reaching Cloudflare.

Interested parties can use Cloudflare Radar to monitor the impact of such government-directed Internet disruptions, and can follow @CloudflareRadar on Twitter for updates on Internet disruptions as they occur.

1.1.1.1 + WARP: More features, still private

2022-08-06 Mari Galicer

Post Syndicated from Mari Galicer original https://blog.cloudflare.com/geoexit-improving-warp-user-experience-larger-network/

1.1.1.1 + WARP: More features, still private

It’s a Saturday night. You open your browser, looking for nearby pizza spots that are open. If the search goes as intended, your browser will show you results that are within a few miles, often based on the assumed location of your IP address. At Cloudflare, we affectionately call this type of geolocation accuracy the “pizza test”. When you use a Cloudflare product that sits between you and the Internet (for example, WARP), it’s one of the ways we work to balance user experience and privacy. Too inaccurate and you’re getting pizza places from a neighboring country; too accurate and you’re reducing the privacy benefits of obscuring your location.

With that in mind, we’re excited to announce two major improvements to our 1.1.1.1 + WARP apps: first, an improvement to how we ensure search results and other geographically-aware Internet activity work without compromising your privacy, and second, a larger network with more locations available to WARP+ subscribers, powering even speedier connections to our global network.

A better Internet browsing experience for every WARP user

When we originally built the 1.1.1.1+ WARP mobile app, we wanted to create a consumer-friendly way to connect to our network and privacy-respecting DNS resolver.

What we discovered over time is that the topology of the Internet dictates a different type of experience for users in different locations. Why? Sometimes, because traffic congestion or technical issues route your traffic to a less congested part of the network. Other times, Internet Service Providers may not peer with Cloudflare or engage in traffic engineering to optimize their networks how they see fit, which could result in user traffic connecting to a location that doesn’t quite map to their locale or language.

Regardless of the cause, the impact is that your search results become less relevant, if not outright confusing. For example, in somewhere dense with country borders, like Europe, your traffic in Berlin could get routed to Amsterdam because your mobile operator chooses to not peer in-country, giving you results in Dutch instead of German. This can also be disruptive if you’re trying to stream content subject to licensing restrictions, such as a person in the UK trying to watch BBC iPlayer or a person in Brazil trying to watch the World Cup.

So we fixed this. We just rolled out a major update to the service that powers WARP that will give you a geographically accurate browsing experience without revealing your IP address to the websites you’re visiting. Instead, websites you visit will see a Cloudflare IP address instead, making it harder for them to track you directly.

How it works

Traditionally, consumer VPNs deliberately route your traffic through a server in another country, making your connection slow, and often getting blocked because of their ability to flout location-based content restrictions. We took a different approach when we first launched WARP in 2018, giving you the best possible performance by routing your traffic through the Cloudflare data center closest to you. However, because not every Internet Service Provider (ISP) peers with Cloudflare, users sometimes end up exiting the Cloudflare network from a more “random” data center – one that does not accurately represent their locale.

Websites and third party services often infer geolocation from your IP address, and now, 1.1.1.1 + WARP replaces your original IP address with one that consistently and accurately represents your approximate location.

Here’s how we did it:

We ran an analysis on a subset of our network traffic to find a rough approximation of how many users we have per city.
We divided that amongst our egress IPs, using an anycast architecture to be efficient with the number of additional IPs we had to allocate and advertise per metro area.
We then submitted geolocation information of those IPs to various geolocation database providers, ensuring third party services associate those Cloudflare egress IPs with an accurate approximate location.

It was important to us to provide the benefits of this location accuracy without compromising user privacy, so the app doesn’t ask for specific location permissions or log your IP address.

An even bigger network for WARP+ users

We also recently announced that we’ve expanded our network to over 275 cities in over 100 countries. This gave us an opportunity to revisit where we offered WARP, and how we could expand the number of locations users can connect to WARP with (in other words: an opportunity to make things faster).

From today, all WARP+ subscribers will benefit from a larger network with 20+ new cities: with no change in subscription pricing. A closer Cloudflare data center means less latency between your device and Cloudflare, which directly improves your download speed, thanks to what’s called the Bandwidth-Delay Product (put simply: lower latency, higher throughput!).

As a result, sites load faster, both for those on the Cloudflare network and those that aren’t. As we continue to expand our network, we’ll be revising this on a regular basis to ensure that all WARP and WARP+ subscribers continue to get great performance.

Speed, privacy, and relevance

Beyond being able to find pizza on a Saturday night, we believe everyone should be able to browse the Internet freely – and not have to sacrifice the speed, privacy, or relevance of their search results in order to do so.

In the near future, we’ll be investing in features to bring even more of the benefits of Cloudflare infrastructure to every 1.1.1.1 + WARP user. Stay tuned!

Area 1 threat indicators now available in Cloudflare Zero Trust

2022-06-20 Jesse Kipp

Post Syndicated from Jesse Kipp original https://blog.cloudflare.com/phishing-threat-indicators-in-zero-trust/

Area 1 threat indicators now available in Cloudflare Zero Trust

Over the last several years, both Area 1 and Cloudflare built pipelines for ingesting threat indicator data, for use within our products. During the acquisition process we compared notes, and we discovered that the overlap of indicators between our two respective systems was smaller than we expected. This presented us with an opportunity: as one of our first tasks in bringing the two companies together, we have started bringing Area 1’s threat indicator data into the Cloudflare suite of products. This means that all the products today that use indicator data from Cloudflare’s own pipeline now get the benefit of Area 1’s data, too.

Area 1 built a data pipeline focused on identifying new and active phishing threats, which now supplements the Phishing category available today in Gateway. If you have a policy that references this category, you’re already benefiting from this additional threat coverage.

How Cloudflare identifies potential phishing threats

Cloudflare is able to combine the data, procedures and techniques developed independently by both the Cloudflare team and the Area 1 team prior to acquisition. Customers are able to benefit from the work of both teams across the suite of Cloudflare products.

Cloudflare curates a set of data feeds both from our own network traffic, OSINT sources, and numerous partnerships, and applies custom false positive control. Customers who rely on Cloudflare are spared the software development effort as well as the operational workload to distribute and update these feeds. Cloudflare handles this automatically, with updates happening as often as every minute.

Cloudflare is able to go beyond this and work to proactively identify phishing infrastructure in multiple ways. With the Area 1 acquisition, Cloudflare is now able to apply the adversary-focused threat research approach of Area1 across our network. A team of threat researchers track state-sponsored and financially motivated threat actors, newly disclosed CVEs, and current phishing trends.

Cloudflare now operates mail exchange servers for hundreds of organizations around the world, in addition to its DNS resolvers, Zero Trust suite, and network services. Each of these products generates data that is used to enhance the security of all of Cloudflare’s products. For example, as part of mail delivery, the mail engine performs domain lookups, scores potential phishing indicators via machine learning, and fetches URLs. Data which can now be used through Cloudflare’s offerings.

How Cloudflare Area 1 identifies potential phishing threats

The Cloudflare Area 1 team operates a suite of web crawling tools designed to identify phishing pages, capture phishing kits, and highlight attacker infrastructure. In addition, Cloudflare Area 1 threat models assess campaigns based on signals gathered from threat actor campaigns; and the associated IOCs of these campaign messages are further used to enrich Cloudflare Area 1 threat data for future campaign discovery. Together these techniques give Cloudflare Area 1 a leg up on identifying the indicators of compromise for an attacker prior to their attacks against our customers. As part of this proactive approach, Cloudflare Area 1 also houses a team of threat researchers that track state-sponsored and financially motivated threat actors, newly disclosed CVEs, and current phishing trends. Through this research, analysts regularly insert phishing indicators into an extensive indicator management system that may be used for our email product or any other product that may query it.

Cloudflare Area 1 also collects information about phishing threats during our normal operation as the mail exchange server for hundreds of organizations across the world. As part of that role, the mail engine performs domain lookups, scores potential phishing indicators via machine learning, and fetches URLs. For those emails found to be malicious, the indicators associated with the email are inserted into our indicator management system as part of a feedback loop for subsequent message evaluation.

How Cloudflare data will be used to improve phishing detection

In order to support Cloudflare products, including Gateway and Page Shield, Cloudflare has a data pipeline that ingests data from partnerships, OSINT sources, as well as threat intelligence generated in-house at Cloudflare. We are always working to curate a threat intelligence data set that is relevant to our customers and actionable in the products Cloudflare supports. This is our North star: what data can we provide that enhances our customer’s security without requiring our customers to manage the complexity of data, relationships, and configuration. We offer a variety of security threat categories, but some major focus areas include:

Malware distribution
Malware and Botnet Command & Control
Phishing,
New and newly seen domains

Phishing is a threat regardless of how the potential phishing link gets entry into an organization, whether via email, SMS, calendar invite or shared document, or other means. As such, detecting and blocking phishing domains has been an area of active development for Cloudflare’s threat data team since almost its inception.

Looking forward, we will be able to incorporate that work into Cloudflare Area 1’s phishing email detection process. Cloudflare’s list of phishing domains can help identify malicious email when those domains appear in the sender, delivery headers, message body or links of an email.

Threat actors have long had an unfair advantage — and that advantage is rooted in the knowledge of their target, and the time they have to set up specific campaigns against their targets. That dimension of time allows threat actors to set up the right infrastructure, perform reconnaissance, stage campaigns, perform test probes, observe their results, iterate, improve and then launch their ‘production’ campaigns. This precise element of time gives us the opportunity to discover, assess and proactively filter out campaign infrastructure prior to campaigns reaching critical mass. But to do that effectively, we need visibility and knowledge of threat activity across the public IP space.

With Cloudflare’s extensive network and global insight into the origins of DNS, email or web traffic, combined with Cloudflare Area 1’s datasets of campaign tactics, techniques, and procedures (TTPs), seed infrastructure and threat models — we are now better positioned than ever to help organizations secure themselves against sophisticated threat actor activity, and regain the advantage that for so long has been heavily weighted towards the bad guys.

If you’d like to extend Zero Trust to your email security to block advanced threats, contact your Customer Success manager, or request a Phishing Risk Assessment here.

Dig through SERVFAILs with EDE

2022-05-25 Stanley Chiang

Post Syndicated from Stanley Chiang original https://blog.cloudflare.com/dig-through-servfails-with-ede/

Dig through SERVFAILs with EDE

It can be frustrating to get errors (SERVFAIL response codes) returned from your DNS queries. It can be even more frustrating if you don’t get enough information to understand why the error is occurring or what to do next. That’s why back in 2020, we launched support for Extended DNS Error (EDE) Codes to 1.1.1.1.

As a quick refresher, EDE codes are a proposed IETF standard enabled by the Extension Mechanisms for DNS (EDNS) spec. The codes return extra information about DNS or DNSSEC issues without touching the RCODE so that debugging is easier.

Now we’re happy to announce we will return more error code types and include additional helpful information to further improve your debugging experience. Let’s run through some examples of how these error codes can help you better understand the issues you may face.

To try for yourself, you’ll need to run the dig or kdig command in the terminal. For dig, please ensure you have v9.11.20 or above. If you are on macOS 12.1, by default you only have dig 9.10.6. Install an updated version of BIND to fix that.

Let’s start with the output of an example dig command without EDE support.

% dig @1.1.1.1 dnssec-failed.org +noedns

; <<>> DiG 9.18.0 <<>> @1.1.1.1 dnssec-failed.org +noedns
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8054
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;dnssec-failed.org.		IN	A

;; Query time: 23 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Thu Mar 17 10:12:57 PDT 2022
;; MSG SIZE  rcvd: 35

In the output above, we tried to do DNSSEC validation on dnssec-failed.org. It returns a SERVFAIL, but we don’t have context as to why.

Now let’s try that again with 1.1.1.1’s EDE support.

% dig @1.1.1.1 dnssec-failed.org +dnssec

; <<>> DiG 9.18.0 <<>> @1.1.1.1 dnssec-failed.org +dnssec
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 34492
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
; EDE: 9 (DNSKEY Missing): (no SEP matching the DS found for dnssec-failed.org.)
;; QUESTION SECTION:
;dnssec-failed.org.		IN	A

;; Query time: 15 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Fri Mar 04 12:53:45 PST 2022
;; MSG SIZE  rcvd: 103

We can see there is still a SERVFAIL. However, this time there is also an EDE Code 9 which stands for “DNSKey Missing”. Accompanying that, we also have additional information saying “no SEP matching the DS found” for dnssec-failed.org. That’s better!

Another nifty feature is that we will return multiple errors when appropriate, so you can debug each one separately. In the example below, we returned a SERVFAIL with three different error codes: “Unsupported DNSKEY Algorithm”, “No Reachable Authority”, and “Network Error”.

dig @1.1.1.1 [domain] +dnssec

; <<>> DiG 9.18.0 <<>> @1.1.1.1 [domain] +dnssec
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 55957
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
; EDE: 1 (Unsupported DNSKEY Algorithm): (no supported DNSKEY algorithm for [domain].)
; EDE: 22 (No Reachable Authority): (at delegation [domain].)
; EDE: 23 (Network Error): (135.181.58.79:53 rcode=REFUSED for [domain] A)
;; QUESTION SECTION:
;[domain].		IN	A

;; Query time: 1197 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Wed Mar 02 13:41:30 PST 2022
;; MSG SIZE  rcvd: 202

Here’s a list of the additional codes we now support:

Error Code Number	Error Code Name
1	Unsupported DNSKEY Algorithm
2	Unsupported DS Digest Type
5	DNSSEC Indeterminate
7	Signature Expired
8	Signature Not Yet Valid
9	DNSKEY Missing
10	RRSIGs Missing
11	No Zone Key Bit Set
12	NSEC Missing

We have documented all the error codes we currently support with additional information you may find helpful. Refer to our dev docs for more information.

Five Great (free!) Ways to Get Started With Cloudflare

2021-11-12 John Engates

Post Syndicated from John Engates original https://blog.cloudflare.com/five-free-ways-to-get-started-with-cloudflare/

Five Great (free!) Ways to Get Started With Cloudflare

I joined Cloudflare a few weeks ago, and as someone new to the company, there’s a ton of information to absorb. I have always learned best by doing, so I decided to use Cloudflare like a brand-new user. Cloudflare customers range from individuals with a simple website to companies in the Fortune 100. I’m currently exploring Cloudflare from the perspective of the individual, so I signed up for a free account and logged into the dashboard. Just like getting into a new car, I want to turn all the dials and push all the buttons. I looked for things that would be fun and easy to do and would deliver some immediate value. Now I want to share the best ones with you.

Here are my five ways to get started with Cloudflare. These should be easy for anyone, and they’re free. You’ll likely even save some money and improve your privacy and security in the process. Let’s go!

1. Transfer or register a domain with Cloudflare Registrar

If you’re like me, you’ve acquired a few (dozen) Internet domains for things like personalizing your email address, a web page for your nature photography hobby, or maybe a side business. You probably registered them at one or more of the popular domain name registrars, and you pay around $15 per year for each domain. I did an audit and found I was spending a shocking amount each year to maintain my domains, and they were spread across three different registrars.

Cloudflare makes it easy to transfer domains from other registrars and doesn’t charge a markup for domain registrar services. Let me say that again; there is zero price markup for domain registration with Cloudflare Registrar. You’ll pay exactly what Cloudflare pays. For example, a .com domain registered with Cloudflare currently costs half of what I was paying at other registrars.

Not only will you save on the domain registration, but Cloudflare doesn’t nickel-and-dime you like registrars who charge extra for WHOIS privacy and transfer lock and then sneakily bundle their website hosting services. It all adds up.

To get started registering or transferring a domain, log into the Cloudflare Dashboard, click “Add a Site,” and bring your domains to Cloudflare.

2. Configure DNS on Cloudflare DNS

DNS servers do the work of translating hostnames into IP addresses. To put a domain name to use on the Internet, you can create DNS records to point to your website and email provider. Every time someone wants to put a website or Internet application online, this process must happen so the rest of us can find it. Cloudflare’s DNS dashboard makes it simple to configure DNS records. For transfers, Cloudflare will even copy records from your existing DNS service to prevent any disruption.

The Cloudflare DNS dashboard will also improve security on your domains with DNSSEC, protect your domains from email spoofing with DMARC, and enforce other DNS best practices.

I’ve now moved all my domains to Cloudflare DNS, which is a big win for me for security and simplicity. I can see them all in one place, and I’m more confident with the increased level of control and protection I have for my domains.

3. Set up a blog with Cloudflare Pages

Once I moved my domains, I was eager to set up a new website. I have been thinking lately it would be fun to have a place to post my photos where they can stand out and won’t get lost in the stream of social media. It’s been a while since I’ve built a website from scratch, but it’s fun getting back to basics. In the old days, to host a website you’d set up a dedicated web server or use a shared web host to serve your site. Today, many web hosts provide ready-to-go templates for websites and make hosting as easy as one click to set up a new site.

I wanted to learn by doing, so I took the do-it-yourself route. What I discovered in the process is an architecture called Jamstack. It’s a bit different from the traditional way of building and hosting websites. With Jamstack, your site doesn’t live at a traditional hosting provider, nor is it dynamically generated from CGI scripts and a database. Your content is now stored on a code repository like GitHub. The site is pre-generated as a static site and then deployed and delivered directly from Cloudflare’s network.

I used a Jamstack static site generator called Hugo to build my photo blog, pushed it to GitHub, and used Cloudflare Pages to generate the content and host my site. Now that it’s configured, there’s zero work necessary to maintain it. Jamstack, combined with Pages, alleviates the regular updates required to keep up with security patches, and there are no web servers or database services to break. Delivered from Cloudflare’s edge network, the site scales effortlessly, and it’s blazingly fast from a user perspective.

By the way, you don’t need to register a domain to deploy to Pages. Cloudflare will generate a pages.dev site that you can use.

For extra credit, have a look at the Cloudflare Workers serverless platform. Workers will allow you to write and deploy even more advanced custom code and run it across Cloudflare’s globally distributed network.

4. Protect your network with Cloudflare for Teams

At first, it wasn’t evident to me how I was going to use Cloudflare for Teams. I initially thought it was only for larger organizations. After all, I’m sitting here in my home office, and I’m just a team of one. Digging into the product more, it became clear that Teams is about privacy and security for groups of any size.

We’ve discussed the impressive Cloudflare DNS infrastructure, and you can take advantage of the Cloudflare DNS resolver for your devices at home by simply configuring them to point to Cloudflare 1.1.1.1 DNS servers. But for more granular control and detailed logging, you should try the DNS infrastructure built into the Cloudflare for Teams Gateway feature.

When you point your home network to Cloudflare for Teams DNS servers, your dashboard will populate with logs of all DNS requests coming from your network. You can set up rules to block DNS requests for various categories, including known malware, phishing, adult sites, and other questionable content. You’ll see the logs instantly and can add or remove categories as needed. If you trigger one of the rules, Cloudflare will display a page that shows you’ve hit one of these blocked sites.

Malware can bypass DNS, so filtering DNS is no silver bullet. Think of DNS filtering as another layer of defense that may help you avoid nefarious sites in the first place. For example, known phishing sites sent as URLs via email won’t resolve and will be blocked before they affect you. Additionally, DNS logs should give you visibility into what’s happening on the network and that may lead you to implement even better security in other areas.

There’s so much more to Cloudflare for Teams than DNS filtering, but I wanted to give you just a little taste of what you can do with it quickly and for free.

5. Secure your traffic with the Cloudflare 1.1.1.1 app and WARP

Finally, let’s discuss the challenge of securing Internet communications on your mobile phones, tablets, and devices at home and while traveling. We know that the SSL/TLS encryption on secure websites provides a degree of protection, but the apps you use and sites you visit are still visible to your ISP and upstream network operators. Some providers sell this data or use it to target you with ads.

If you install the 1.1.1.1 app, Cloudflare will create an always-on, encrypted tunnel from your device to the nearest Cloudflare data center and secure your Internet traffic. We call this Cloudflare WARP. WARP not only encrypts your traffic but can even help accelerate it by routing intelligently across the Cloudflare network.

WARP is a compelling VPN replacement without the risks associated with some shady VPN providers who may also want to sell your data. Remember, Cloudflare will never sell your data!

The Cloudflare WARP client combined with Cloudflare for Teams gives you enhanced visibility into DNS queries and unlocks some advanced traffic management and filtering capabilities. And it’s all free for small teams.

Hopefully, my exploration of the Cloudflare product portfolio gives you some ideas of what you can do to make your life a little easier or your team more secure. I’m just scratching the surface, and I’m excited to keep learning what’s possible with Cloudflare. I’ll continue to share what I learn, and I encourage you to experiment with some of these capabilities yourself and let me know how it goes.

Cloudflare Tunnel for Content Teams

2021-10-25 Alice Bracchi

Post Syndicated from Alice Bracchi original https://blog.cloudflare.com/cloudflare-tunnel-for-content-teams/

Cloudflare Tunnel for Content Teams

A big part of the job of a technical writer is getting feedback on the content you produce. Writing and maintaining product documentation is a deeply collaborative and cyclical effort — through constant conversation with product managers and engineers, technical writers ensure the content is clear and serves the user in the most effective way. Collaboration with other technical writers is also important to keep the documentation consistent with Cloudflare’s content strategy.

So whether we’re documenting a new feature or overhauling a big portion of existing documentation, sharing our writing with stakeholders before it’s published is quite literally half the work.

In my experience as a technical writer, the feedback I’ve received has been exponentially more impactful when stakeholders could see my changes in context. This is especially true for bigger and more strategic changes. Imagine I’m changing the structure of an entire section of a product’s documentation, or shuffling the order of pages in the navigation bar. It’s hard to guess the impact of those changes just by looking at the markdown files.

We writers check those changes in context by building a development server on our local machines. But sharing what we see locally with our stakeholders has always been a pain point for us. We’ve sent screenshots (hardly a good idea). We’ve recorded our screens. We’ve asked stakeholders to check out our branches locally and build a development server on their own. Lately, we’ve added a GitHub action to our open-source cloudflare-docs repo that allows us to generate a preview link for all pull requests with a certain label. However, that requires us to open a pull request with our changes, and that is not ideal if we’re documenting a feature that’s yet to be announced, or if our work is still in its early stages.

So the question has always been: could there be a way for someone else to see what we see, as easily as we see it?

Enter Cloudflare Tunnel

I was working on a complete refresh of Cloudflare Tunnel’s documentation when I realized the product could very well answer that question for us as a technical writing team.

If you’re not familiar with the product, Cloudflare Tunnel provides a secure way to connect your local resources to the Cloudflare network without poking holes in your firewall. By running cloudflared in your environment, you can create outbound-only connections to Cloudflare’s edge, and ensure all traffic to your origins goes through Cloudflare and is protected from outside interference.

For our team, Cloudflare Tunnel could offer a way for our stakeholders to interact with what’s on our local environments in real-time, just like a customer would if the changes were published. To do that, we could expose our local environment to the edge through a tunnel, assign a DNS record to that tunnel, and then share that URL with our stakeholders.

So if each member in the technical writing team had their own tunnel that they could spin up every time they needed to get feedback, that would pretty much solve our long-standing problem.

Setting up the tunnel

To test out that this would work, I went ahead and tried it for myself.

First, I made sure to create a local branch of the cloudflare-docs repo, make local changes, and run a development server locally on port 8000.

Since I already had cloudflared installed on my machine, the next thing I needed to do was log into my team’s Cloudflare account, pick the zone I wanted to create tunnels for (I picked developers.cloudflare.com), and authorize Cloudflare Tunnel for that zone.

$ cloudflared login

Next, it was time to create the Named Tunnel.

$ cloudflared tunnel create alice
Tunnel credentials written to /Users/alicebracchi/.cloudflared/0e025819-6f12-4f49-8183-c678273feef4.json. cloudflared chose this file based on where your origin certificate was found. Keep this file secret. To revoke these credentials, delete the tunnel.

Created tunnel alice with id 0e025819-6f12-4f49-8183-c678273feef4

Alright, tunnel created. Next, I needed to assign a DNS record to it. I wanted it to be something readable and easily shareable with stakeholders (like abracchi.developers.cloudflare.com), so I ran the following command and specified the tunnel name first and then the desired subdomain:

$ cloudflared tunnel route dns alice abracchi

Next, I needed a way to tell the tunnel to serve traffic to my localhost:8000 port. For that, I created a configuration file in my default cloudflared directory and specified the following fields:

url: https://localhost:8000
tunnel: 0e025819-6f12-4f49-8183-c678273feef4
credentials-file: /Users/alicebracchi/.cloudflared/0e025819-6f12-4f49-8183-c678273feef4
.json

Time to run the tunnel. The following command established connections between my origin and the Cloudflare edge, telling the tunnel to serve traffic to my origin according to the parameters I’d specified in the config file:

$ cloudflared tunnel --config /Users/alicebracchi/.cloudflared/config.yml run alice
2021-10-18T09:39:54Z INF Starting tunnel tunnelID=0e025819-6f12-4f49-8183-c678273feef4
2021-10-18T09:39:54Z INF Version 2021.9.2
2021-10-18T09:39:54Z INF GOOS: darwin, GOVersion: go1.16.5, GoArch: amd64
2021-10-18T09:39:54Z INF Settings: map[cred-file:/Users/alicebracchi/.cloudflared/0e025819-6f12-4f49-8183-c678273feef4.json credentials-file:/Users/alicebracchi/.cloudflared/0e025819-6f12-4f49-8183-c678273feef4.json url:http://localhost:8000]
2021-10-18T09:39:54Z INF Generated Connector ID: 90a7e3a9-9d59-4d26-9b87-4b94ebf4d2a0
2021-10-18T09:39:54Z INF cloudflared will not automatically update when run from the shell. To enable auto-updates, run cloudflared as a service: https://developers.cloudflare.com/argo-tunnel/reference/service/
2021-10-18T09:39:54Z INF Initial protocol http2
2021-10-18T09:39:54Z INF Starting metrics server on 127.0.0.1:64193/metrics
2021-10-18T09:39:55Z INF Connection 13bf4c0c-b35b-4f9a-b6fa-f0a3dd001951 registered connIndex=0 location=MAD
2021-10-18T09:39:56Z INF Connection 38510c22-5256-45f2-abf8-72f1207ca242 registered connIndex=1 location=LIS
2021-10-18T09:39:57Z INF Connection 9ab0ea06-b1cf-483c-bd48-64a067a87c39 registered connIndex=2 location=MAD
2021-10-18T09:39:58Z INF Connection df079efe-8246-4e93-85f5-10caf8b7c354 registered connIndex=3 location=LIS

And sure enough, at abracchi.developers.cloudflare.com, my teammates could see what I was seeing on localhost:8000.

Securing the tunnel

After creating the tunnel, I needed to make sure only people within Cloudflare could access that tunnel. As it was, anyone with access to abracchi.developers.cloudflare.com could see what was in my local environment. To fix this, I set up an Access self-hosted application by navigating to Access > Applications on the Teams Dashboard. For this application, I then created a policy that restricts access to the tunnel to a user group that includes only Cloudflare employees and requires authentication via Google or One-time PIN (OTP).

This makes applications like my tunnel easily shareable between colleagues, but also safe from potential vulnerabilities.

Et voilà!

Back to the Tunnels page, this is what the content team’s Cloudflare Tunnel setup looks like after each writer completed the process I’ve outlined above. Every writer has their personal tunnel set up and their local environment exposed to the Cloudflare Edge:

What’s next

The team is now seamlessly sharing visual content with their stakeholders, but there’s still room for improvement. Cloudflare Tunnel is just the first step towards making the feedback loop easier for everyone involved. We’re currently exploring ways we can capture integrated feedback directly at the URL that’s shared with the stakeholders, to avoid back-and-forth on separate channels.

We’re also looking into bringing in Cloudflare Pages to make the entire deployment process faster. Stay tuned for future updates, and in the meantime, check out our developer docs.

Announcing WARP for Linux and Proxy Mode

2021-06-17 Kyle Krum

Post Syndicated from Kyle Krum original https://blog.cloudflare.com/announcing-warp-for-linux-and-proxy-mode/

Announcing WARP for Linux and Proxy Mode

Last October we released WARP for Desktop, bringing a safer and faster way to use the Internet to billions of devices for free. At the same time, we gave our enterprise customers the ability to use WARP with Cloudflare for Teams. By routing all an enterprise’s traffic from devices anywhere on the planet through WARP, we’ve been able to seamlessly power advanced capabilities such as Secure Web Gateway and Browser Isolation and, in the future, our Data Loss Prevention platforms.

Today, we are excited to announce Cloudflare WARP for Linux and, across all desktop platforms, the ability to use WARP with single applications instead of your entire device.

What is WARP?

WARP was built on the philosophy that even people who don’t know what “VPN” stands for should be able to still easily get the protection a VPN offers. It was also built for those of us who are unfortunately all too familiar with traditional corporate VPNs, and need an innovative, seamless solution to meet the challenges of an always-connected world.

Enter our own WireGuard implementation called BoringTun.

The WARP application uses BoringTun to encrypt traffic from your device and send it directly to Cloudflare’s edge, ensuring that no one in between is snooping on what you’re doing. If the site you are visiting is already a Cloudflare customer, the content is immediately sent down to your device. With WARP+, we use Argo Smart Routing to use the shortest path through our global network of data centers to reach whomever you are connecting to.

Bringing WARP to Linux

When we built out the foundations of our desktop client last year, we knew a Linux client was something we would deliver. If you have ever shipped software at this scale, you’ll know that maintaining a client across all major operating systems is a daunting (and error-prone) task. To avoid these pitfalls, we wrote the core of the product in Rust, which allows for 95% of the code to be shared across platforms.

Internally we refer to this common code as the shared Daemon (or Service, for Windows folks), and it allows our engineers to spend less time duplicating code across multiple platforms while ensuring most quality improvements hit everyone at the same time. The really cool thing about this is that millions of existing WARP users have already helped us solidify the code base for Linux!

The other 5% of code is split into two main buckets: UI and quirks of the operating system. For now, we are forgoing a UI on Linux and instead working to support three distributions:

Ubuntu
Red Hat Enterprise Linux
CentOS

We want to add more distribution support in the future, so if your favorite distro isn’t there, don’t despair — the client may in fact already work with other Debian and Redhat based distributions, so please give it a try. If we missed your favorite distribution, we’d love to hear from you in our Community Forums.

So without a UI — what’s the mechanism for controlling WARP? The command line, of course! Keen observers may have noticed an executable that already ships with each client called the warp-cli. This platform-agnostic interface is already the preferred mechanism of interacting with the daemon by some of our engineers and is the main way you’ll interact with WARP on Linux.

Installing Cloudflare WARP for Linux

Seasoned Linux developers can jump straight to https://pkg.cloudflareclient.com/install. After linking our repository, get started with either sudo apt install cloudflare-warp or sudo yum install cloudflare-warp, depending on your distribution.

For more detailed installation instructions head over to our WARP Client documentation.

Using the CLI

Once you’ve installed WARP, you can begin using the CLI with a single command:

warp-cli --help

The CLI will display the output below.

~$ warp-cli --help
WARP 0.2.0
Cloudflare
CLI to the WARP service daemon
 
USAGE:
    warp-cli [FLAGS] [SUBCOMMAND]
 
FLAGS:
        --accept-tos    Accept the Terms of Service agreement
    -h, --help          Prints help information
    -l                  Stay connected to the daemon and listen for status changes and DNS logs (if enabled)
    -V, --version       Prints version information
 
SUBCOMMANDS:
    register                    Registers with the WARP API, will replace any existing registration (must be run
                                before first connection)
    teams-enroll                Enroll with Cloudflare for Teams
    delete                      Deletes current registration
    rotate-keys                 Generates a new key-pair, keeping the current registration
    status                      Asks the daemon to send the current status
    warp-stats                  Retrieves the stats for the current WARP connection
    settings                    Retrieves the current application settings
    connect                     Asks the daemon to start a connection, connection progress should be monitored with
                                -l
    disconnect                  Asks the daemon to stop a connection
    enable-always-on            Enables always on mode for the daemon (i.e. reconnect automatically whenever
                                possible)
    disable-always-on           Disables always on mode
    disable-wifi                Pauses service on WiFi networks
    enable-wifi                 Re-enables service on WiFi networks
    disable-ethernet            Pauses service on ethernet networks
    enable-ethernet             Re-enables service on ethernet networks
    add-trusted-ssid            Adds a trusted WiFi network, for which the daemon will be disabled
    del-trusted-ssid            Removes a trusted WiFi network
    allow-private-ips           Exclude private IP ranges from tunnel
    enable-dns-log              Enables DNS logging, use with the -l option
    disable-dns-log             Disables DNS logging
    account                     Retrieves the account associated with the current registration
    devices                     Retrieves the list of devices associated with the current registration
    network                     Retrieves the current network information as collected by the daemon
    set-mode                    
    set-families-mode           
    set-license                 Attaches the current registration to a different account using a license key
    set-gateway                 Forces the app to use the specified Gateway ID for DNS queries
    clear-gateway               Clear the Gateway ID
    set-custom-endpoint         Forces the client to connect to the specified IP:PORT endpoint
    clear-custom-endpoint       Remove the custom endpoint setting
    add-excluded-route          Adds an excluded IP
    remove-excluded-route       Removes an excluded IP
    get-excluded-routes         Get the list of excluded routes
    add-fallback-domain         Adds a fallback domain
    remove-fallback-domain      Removes a fallback domain
    get-fallback-domains        Get the list of fallback domains
    restore-fallback-domains    Restore the fallback domains
    get-device-posture          Get the current device posture
    override                    Temporarily override MDM policies that require the client to stay enabled
    set-proxy-port              Set the listening port for WARP proxy (127.0.0.1:{port})
    help                        Prints this message or the help of the given subcommand(s)

You can begin connecting to Cloudflare’s network with just two commands. The first command, register, will prompt you to authenticate. The second command, connect, will enable the client, creating a WireGuard tunnel from your device to Cloudflare’s network.

~$ warp-cli register
Success
~$ warp-cli connect
Success

Once you’ve connected the client, the best way to verify it is working is to run our trace command:

~$ curl https://www.cloudflare.com/cdn-cgi/trace/

And look for the following output:

warp=on

Want to switch from encrypting all traffic in WARP to just using our 1.1.1.1 DNS resolver? Use the warp-cli set-mode command:

~$ warp-cli help set-mode
warp-cli-set-mode 
 
USAGE:
    warp-cli set-mode [mode]
 
FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
 
ARGS:
    <mode>     [possible values: warp, doh, warp+doh, dot, warp+dot, proxy]
~$ warp-cli set-mode doh
Success

Protecting yourself against malware with 1.1.1.1 for Families is just as easy, and it can be used with either WARP enabled or in straight DNS mode:

~$ warp-cli set-families-mode --help
warp-cli-set-families-mode 
 
USAGE:
    warp-cli set-families-mode [mode]
 
FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
 
ARGS:
    <mode>     [possible values: off, malware, full]
~$ warp-cli set-families-mode malware
Success

A note on Cloudflare for Teams support

Cloudflare for Teams support is on the way, and just like our other clients, it will ship in the same package. Stay tuned for an in-app update or reach out to your Account Executive to be notified when a beta is available.

We need feedback

If you encounter an error, send us feedback with the sudo warp-diag feedback command:

~$ sudo warp-diag feedback

For all other functionality check out warp-cli --help or see our documentation here.

WARP as a Local Proxy

When WARP launched in 2019, one of our primary goals was ease of use. You turn WARP on and all traffic from your device is encrypted to our edge. Through all releases of the client, we’ve kept that as a focus. One big switch to turn on and you are protected.

However, as we’ve grown, so have the requirements for our client. Earlier this year we released split tunnel and local domain fallback as a way for our Cloudflare for Teams customers to exclude certain routes from WARP. Our consumer customers may have noticed this stealthily added in the last release as well. We’ve heard from customers who want to deploy WARP in one additional mode: Single Applications. Today we are also announcing the ability for our customers to run WARP in a local proxy mode in all desktop clients.

When WARP is configured as a local proxy, only the applications that you configure to use the proxy (HTTPS or SOCKS5) will have their traffic sent through WARP. This allows you to pick and choose which traffic is encrypted (for instance, your web browser or a specific app), and everything else will be left open over the Internet.

Because this feature restricts WARP to just applications configured to use the local proxy, leaving all other traffic unencrypted over the Internet by default, we’ve hidden it in the advanced menu. To turn it on:

1. Navigate to Preferences -> Advanced and click the Configure Proxy button.

2. On the dialog that opens, check the box and configure the port you want to listen on.

3. This will enable a new mode you can select from:

To configure your application to use the proxy, you want to specify 127.0.0.1 for the address and the value you specified for a port (40000 by default). For example, if you are using Firefox, the configuration would look like this:

Download today

You can start using these capabilities right now by visiting https://one.one.one.one. We’re super excited to hear your feedback.

Introducing WARP for Desktop and Cloudflare for Teams

2020-10-14 Kyle Krum

Post Syndicated from Kyle Krum original https://blog.cloudflare.com/warp-for-desktop/

Introducing WARP for Desktop and Cloudflare for Teams

Cloudflare launched ten years ago to keep web-facing properties safe from attack and fast for visitors. Cloudflare customers owned Internet properties that they placed on our network. Visitors to those sites and applications enjoyed a faster experience, but that speed was not consistent for accessing Internet properties outside the Cloudflare network.

Over the last few years, we began building products that could help deliver a faster and safer Internet to everyone, not just visitors to sites on our network. We started with the first step to visiting any website, a DNS query, and released the world’s fastest public DNS resolver, 1.1.1.1. Any Internet user could improve the speed to connect to any website simply by changing their resolver.

While making the Internet faster for users, we also focused on making it more private. We built 1.1.1.1 to accelerate the last mile of connections, from user to our edge or other destinations on the Internet. Unlike other providers, we did not build it to sell ads.

Last year we went one step further to make the entire connection from a device both faster and safer when we launched Cloudflare WARP. With the push of a button, users could connect their mobile device to the entire Internet using a WireGuard tunnel through a Cloudflare data center near to them. Traffic to sites behind Cloudflare became even faster and a user’s experience with the rest of the Internet became more secure and private.

We brought that experience to desktops in beta earlier this year, and are excited to announce the general availability of Cloudflare WARP for desktop users today. The entire Internet can now be more secure and private regardless of how you connect.

Bringing the power of WARP to security teams everywhere

WARP made the Internet faster and more private for individual users everywhere. But as businesses embraced remote work models at scale, security teams struggled to extend the security controls they had enabled in the office to their remote workers. Today, we’re bringing everything our users have come to expect from WARP to security teams. The release also enables new functionality in our Cloudflare Gateway product.

Customers can use the Cloudflare WARP application to connect corporate desktops to Cloudflare Gateway for advanced web filtering. The Gateway features rely on the same performance and security benefits of the underlying WARP technology, now with security filtering available to the connection.

The result is a simple way for enterprises to protect their users wherever they are without requiring the backhaul of network traffic to a centralized security boundary. Instead, organizations can configure the WARP client application to securely and privately send remote users’ traffic through a Cloudflare data center near them. Gateway administrators apply policies to outbound Internet traffic proxied through the client, allowing organizations to protect users from threats on the Internet, and stop corporate data from leaving their organization.

Privacy, Security and Speed for Everyone

WARP was built on the philosophy that even people who don’t know what “VPN” stands for should be able to still easily get the protection a VPN offers. For those of us unfortunately very familiar with traditional corporate VPNs, something better was needed. Enter our own WireGuard implementation called BoringTun.

The WARP application uses BoringTun to encrypt all the traffic from your device and send it directly to Cloudflare’s edge, ensuring that no one in between is snooping on what you’re doing. If the site you are visiting is already a Cloudflare customer, the content is immediately sent down to your device. With WARP+ we use Argo Smart Routing to to devise the shortest path through our global network of data centers to reach whomever you are talking to.

Combined with the power of 1.1.1.1 (the world’s fastest public DNS resolver), WARP keeps your traffic secure, private and fast. Since nearly everything you do on the Internet starts with a DNS request, choosing the fastest DNS server across all your devices will accelerate almost everything you do online. Speed isn’t everything though, and while the connection between your application and a website may be encrypted, DNS lookups for that website were not. This allowed anyone, even your Internet Service Provider, to potentially snoop (and sell) on where you are going on the Internet.

Cloudflare will never snoop or sell your personal data. And if you use DNS-over-HTTPS or DNS-over-TLS to our 1.1.1.1 resolver, your DNS request will be sent over a secure channel. This means that if you use the 1.1.1.1 resolver then in addition to our privacy guarantees an eavesdropper can’t see your DNS requests. Don’t take our word for it though, earlier this year we published the results of a third-party privacy examination, something we’ll keep doing and wish others would do as well.

For Gateway customers, we are committed to privacy and trust and will never sell your personal data to third parties. While your administrator will have the ability to audit your organization’s traffic, create rules around how long data is retained, or create specific policies about where they can go, Cloudflare will never sell your personal data or use your personal data to retarget you with advertisements. Privacy and control of your organization’s data is in your hands.

Now integrated with Cloudflare Gateway

Traditionally, companies have used VPN solutions to gate access to corporate resources and keep devices secure with their filtering rules. These connections quickly became a point of failure (and intrusion vector) as organizations needed to manage and scale up VPN servers as traffic through their on premise servers grew. End users didn’t like it either. VPN servers were usually overwhelmed at peak times, the client was bulky and they were rarely made with performance in mind. And once a bad actor got in, they had access to everything.

In January 2020, we launched Cloudflare for Teams as a replacement to this model. Cloudflare for Teams is built around two core products. Cloudflare Access is a Zero Trust solution allowing organizations to connect internal (and now, SaaS) applications to Cloudflare’s edge and build security rules to enforce safe access to them. No longer were VPNs a single entry point to your organization; users could work from anywhere and still get access. Cloudflare Gateway’s first features focused on protecting users from threats on the Internet with a DNS resolver and policy engine built for enterprises.

The strength and power of WARP clients, used today by millions of users around the world, will enable incredible new use cases for security teams:

Encrypt all user traffic – Regardless of your users’ location, all traffic from their device is encrypted with WARP and sent privately to the nearest WARP endpoint. This keeps your users and your organizations protected from whomever may be snooping. If you still used a traditional VPN on top of Access to encrypt user traffic, that is no longer needed.
WARP+ – Cloudflare offers a premium WARP+ service for customers who want additional speed benefits. That now comes packaged into Teams deployments. Any Teams customer who deploys the Teams client applications will automatically receive the premium speed benefits of WARP+.
Gateway for remote workers – Until today, Gateway required that you keep track of all your users’ IP addresses and build policies per location. This made it difficult to enforce policy or provide malware protection when a user took their device to a new location. With the client installed, these policies can be enforced anywhere.
L7 Firewall and user based policies – Today’s announcement of Cloudflare Gateway SWG and Secure DNS allows your organization to enforce device authentication to your Teams account, enabling you to build user-specific policies and force all traffic through the firewall.
Device and User auditing – Along with user and device policies, administrators will also be able to audit specific user and device traffic. Used in conjunction with logpush, this will allow your organization to do detailed level tracing in case of a breach or audit.

Enroll your organization to use the WARP client with Cloudflare for Teams

We know how hard it can be to deploy another piece of software in your organization, so we’ve worked hard to make deployment easy. To get started, just navigate to our sign-up page and create an account. If you already have an active account, you can bypass this step and head straight to the Cloudflare for Teams dashboard where you’ll be dropped directly into our onboarding flow. After you have signed up and configured your team, setup a Gateway policy and then choose one of the three ways to install the clients to enforce that policy from below:

Self Install
If you are a small organization without an IT department, asking your users to download the client themselves and type in the required settings is the fastest way to get going.

Scripted Install
Our desktop installers support the ability to quickly script the installation. In the case of Windows, this is as easy as this command line:

Cloudflare_WARP_Release-x64.msi /quiet ORGANIZATION="<insert your org>" SERVICE_MODE="warp" ENABLE="true" GATEWAY_UNIQUE_ID="<insert your gateway DoH domain>" SUPPORT_URL=”<mailto or http of your support person>"

Managed Device
Organizations with MDM tools like Intune or JAMF can deploy WARP to their entire fleet of devices from a single operation. Just as you preconfigure all other device settings, WARP can be set so that all end users need to do is login with your team’s identity provider by clicking on the Cloudflare WARP client after it has been deployed.

For a complete list of the installation options, required fields and step by step instructions for all platforms see the WARP Client documentation.

What’s coming next

There is still more we want to build for both our consumer users of WARP and our Cloudflare for Teams customers. Here’s a sneak peek at some of the ones we are most excited about (and allowed to share):

New partner integrations with CrowdStrike and VMware Carbon Black (Tanium available today) will allow you to build even more comprehensive Cloudflare Access policies that check for device health before allowing users to connect to applications
Split Tunnel support will allow you or your organization to specify applications, sites or IP addresses that should be excluded from WARP. This will allow content like games, streaming services, or any application you choose to work outside the connection.
BYOD device support, especially for mobile clients. Enterprise users that are not on the clock should be able to easily toggle off “office mode,” so corporate policies don’t limit personal use of their personal devices.
We are still missing one major operating system from our client portfolio and Linux support is coming.

Download now

We are excited to finally share these applications with our customers. We’d especially like to thank our Cloudflare MVP’s, the 100,000+ beta users on desktop, and the millions of existing users on mobile who have helped grow WARP into what it is today.

You can download the applications right now from https://one.one.one.one

Cloudflare Gateway now protects teams, wherever they are

2020-10-14 Pete Zimmerman

Post Syndicated from Pete Zimmerman original https://blog.cloudflare.com/gateway-swg/

Cloudflare Gateway now protects teams, wherever they are

In January 2020, we launched Cloudflare for Teams—a new way to protect organizations and their employees globally, without sacrificing performance. Cloudflare for Teams centers around two core products – Cloudflare Access and Cloudflare Gateway.

In March 2020, Cloudflare launched the first feature of Cloudflare Gateway, a secure DNS filtering solution powered by the world’s fastest DNS resolver. Gateway’s DNS filtering feature kept users safe by blocking DNS queries to potentially harmful destinations associated with threats like malware, phishing, or ransomware. Organizations could change the router settings in their office and, in about five minutes, keep the entire team safe.

Shortly after that launch, entire companies began leaving their offices. Users connected from initially makeshift home offices that have become permanent in the last several months. Protecting users and data has now shifted from a single office-level setting to user and device management in hundreds or thousands of locations.

Security threats on the Internet have also evolved. Phishing campaigns and malware attacks have increased in the last six months. Detecting those types of attacks requires looking deeper than just the DNS query.

Starting today, we’re excited to announce two features in Cloudflare Gateway that solve those new challenges. First, Cloudflare Gateway now integrates with the Cloudflare WARP desktop client. We built WARP around WireGuard, a modern, efficient VPN protocol that is much more efficient and flexible than legacy VPN protocols.

Second, Cloudflare Gateway becomes a Secure Web Gateway and performs L7 filtering to inspect traffic for threats that hide below the surface. Like our DNS filtering and 1.1.1.1 resolver, both features are powered by everything we’ve learned by offering Cloudflare WARP to millions of users globally.

Securing the distributed workforce

Our customers are largely distributed workforces with employees split between corporate offices and their homes. Due to the pandemic, this is their operating environment for the foreseeable future.

The fact that users aren’t located at fixed, known locations (with remote workers allowed by exception) has created challenges for already overworked IT staff:

VPNs are an all-or-nothing approach to providing remote access to internal applications. We address this with Cloudflare Access and our Zero Trust approach to security for internal applications and now SaaS applications as well.
VPNs are slow and expensive. However, backhauling traffic to a centralized security boundary has been the primary approach to enforcing corporate content and security policies to protect roaming users. Cloudflare Gateway was created to tackle this problem for our customers.

Until today, Cloudflare Gateway has provided security for our customers through DNS filtering. While this provides a level of security and content control that’s application-agnostic, it still leaves our customers with a few challenges:

Customers need to register the source IP address of all locations that send DNS queries to Gateway, so their organization’s traffic can be identified for policy enforcement. This is tedious at best, if not intractable for larger organizations with hundreds of locations.
DNS policies are relatively coarse, with enforcement performed with an all-or-nothing approach per domain. Organizations lack the ability to, for example, allow access to a cloud storage provider but block the download of harmful files from known-malicious URLs.
Organizations that register IP addresses frequently use Network Address Translation (NAT) traffic in order to share public IP addresses across many users. This results in a loss of visibility into DNS activity logs at the individual user level. So while IT security admins can see that a malicious domain was blocked, they must leverage additional forensic tools to track down a potentially compromised device.

Starting today, we are taking Cloudflare Gateway beyond a secure DNS filtering solution by pairing the Cloudflare for Teams client with a cloud L7 firewall. Now our customers can toss out another hardware appliance in their centralized security boundary and provide enterprise-level security for their users directly from the Cloudflare edge.

Protecting users and preventing corporate data loss

DNS filtering provides a baseline level of security across entire systems and even networks, since it’s leveraged by all applications for Internet communications. However, application-specific protection offers granular policy enforcement and visibility into whether traffic should be classified as malicious.

Today we’re excited to extend the protection we offer through DNS filtering by adding an L7 firewall that allows our customers to apply security and content policies to HTTP traffic. This provides administrators with a better tool to protect users through granular controls within HTTP sessions, and with visibility into policy enforcement. Just as importantly, it also gives our customers greater control over where their data resides. By building policies, customers can specify whether to allow or block a request based on file type, on whether the request was to upload or download a file, or on whether the destination is an approved cloud storage provider for the organization.

Enterprises protect their users’ Internet traffic wherever they are by connecting to Cloudflare with the Cloudflare for Teams client. This client provides a fast, secure connection to the Cloudflare data center nearest them, and it relies on the same Cloudflare WARP application millions of users connect through globally. Because the client uses the same WARP application under the hood, enterprises can be sure it has been tested at scale to provide security without compromising on performance. Cloudflare WARP optimizes network performance by leveraging WireGuard for the connection to the Cloudflare edge.

The result is a secure, performant connection for enterprise users wherever they are without requiring the backhaul of network traffic to a centralized security boundary. By connecting to Cloudflare Gateway with the Cloudflare for Teams client, enterprise users are protected through filtering policies applied to all outbound Internet traffic–protecting users as they navigate the Internet and preventing the loss of corporate data.

Cloudflare Gateway now supports HTTP traffic filtering based on a variety of criteria including:

Criteria	Example
URL, path, and/or query string	https://www.myurl.com/path?query
HTTP method	GET, POST, etc.
HTTP response code	500
File type and file name	myfilename.zip
MIME type	application/zip
URL security or content category	Malware, phishing, adult themes

To complement DNS filtering policies, IT admins can now create L7 firewall rules to apply granular policies on HTTP traffic.

For example, an admin may want to allow users to navigate to useful parts of Reddit, but block undesirable subreddits.

Or to prevent data loss, an admin could create a rule that allows users to receive content from popular cloud storage providers but not upload select file types from corporate devices.

Another admin might want to prevent malicious files from being smuggled in through zip file downloads, so they may decide to configure a rule to block downloads of compressed file types.

Having used our DNS filtering categories to protect internal users, an admin may want to simply block security threats based on the classification of full URLs. Malware payloads are frequently disseminated from cloud storage and with DNS filtering an admin has to choose whether to allow or deny access to the entire domain for a given storage provider. URL filtering gives admins the ability to filter requests for the exact URLs where malware payloads reside, allowing customers to continue to leverage the usefulness of their chosen storage provider.

And because all of this is made possible with the Cloudflare for Teams client, distributed workforces with roaming clients receive this protection wherever they are through a secure connection to the Cloudflare data center nearest them.

We’re excited to protect teams as they browse the Internet by inspecting HTTP traffic, but what about non-HTTP traffic? Later this year, we will extend Cloudflare Gateway by adding support for IP, port, and protocol filtering with a cloud L4 firewall. This will allow administrators to apply rules to all Internet-bound traffic, like rules that allow outbound SSH, or rules that determine whether to send HTTP traffic arriving on a non-standard port to the L7 firewall for HTTP inspection.

At launch, Cloudflare Gateway will allow administrators to create policies that filter DNS and HTTP traffic across all users in an organization. This creates a great baseline for security. However, exceptions are part of reality: a one-size-fits-all approach to content and security policy enforcement rarely matches the specific needs of all users.

To address this, we’re working on supporting rules based on user and group identity by integrating Cloudflare Access with a customer’s existing identity provider. This will let administrators create granular rules that also leverage context around the user, such as:

Deny access to social media to all users. But if John Doe is in the marketing group, allow him to access these sites in order to perform his job role.
Only allow Jane Doe to connect to specific SaaS applications through Cloudflare Gateway, or a certain device posture.

The need for policy enforcement and logging visibility based on identity arises from the reality that users aren’t tied to fixed, known workplaces. We meet that need by integrating identity and protecting users wherever they are with the Cloudflare for Teams client.

What’s next

People do not start businesses to deal with the minutiae of information technology and security. They have a vision and a product or service they want to get out in the world, and we want to get them back to doing that. We can help eliminate the hard parts around implementing advanced security tools that are usually reserved for larger, more sophisticated organizations, and we want to make them available to teams regardless of size.

The launch of both the Cloudflare for Teams client and L7 firewall lays the foundation for an advanced Secure Web Gateway with integrations including anti-virus scanning, CASB, and remote browser isolation—all performed at the Cloudflare edge. We’re excited to share this glimpse of the future our team has built—and we’re just getting started.

Get started now

All of these new capabilities are ready for you to use today. The L7 firewall is available in Gateway standalone, Teams Standard, and Teams Enterprise plans. You can get started by signing up for a Gateway account and following the onboarding directions.

Slipping on the MASQUE

Before the MASQUE

The current state of WireGuard

Unpacking the acronyms

Proven reliability at scale

Connect from anywhere

Compliant cipher suites

How can I get Zero Trust WARP with MASQUE?

Continuing the journey with Zero Trust WARP

Background

Keytrap vulnerability (CVE-2023-50387)

Introduction

Mitigation

NSEC3 iteration and closest encloser proof vulnerability (CVE-2023-50868)

Introduction

Mitigation

Timeline

Credits

Background

What happened

Incident timeline and impact

Why this incident happened

Why parsing the ZONEMD record failed

Why a stale version of the root zone was used

Why the initial attempt at disabling static_zone didn’t work

Why the impact was partial

Remediation and follow-up steps

Conclusion

Lessons we learned

Callbacks blocking the event loop

Cache efficiency

Module isolation

Hello BigPineapple

Updated I/O architecture

The cache

Async recursive library

Sandboxed plugins

(Not) The End

Impact to Internet traffic

DNS blocking

A better Internet browsing experience for every WARP user

How it works

An even bigger network for WARP+ users

Speed, privacy, and relevance

How Cloudflare identifies potential phishing threats

How Cloudflare Area 1 identifies potential phishing threats

How Cloudflare data will be used to improve phishing detection

1+1 = 3: Greater dataset sharing between Cloudflare and Area 1

1. Transfer or register a domain with Cloudflare Registrar

2. Configure DNS on Cloudflare DNS

3. Set up a blog with Cloudflare Pages

4. Protect your network with Cloudflare for Teams

5. Secure your traffic with the Cloudflare 1.1.1.1 app and WARP

Enter Cloudflare Tunnel

Setting up the tunnel

Securing the tunnel

Et voilà!

What’s next

What is WARP?

Bringing WARP to Linux

Installing Cloudflare WARP for Linux

Using the CLI

A note on Cloudflare for Teams support

We need feedback

WARP as a Local Proxy

Download today

Bringing the power of WARP to security teams everywhere

Privacy, Security and Speed for Everyone

Now integrated with Cloudflare Gateway

Enroll your organization to use the WARP client with Cloudflare for Teams

What’s coming next

Download now

Securing the distributed workforce

Protecting users and preventing corporate data loss

What’s next

Get started now

The collective thoughts of the interwebz