Upgrading the Cloudflare China Network: better performance and security through product innovation and partnership

Post Syndicated from Patrick R. Donahue original https://blog.cloudflare.com/upgrading-the-cloudflare-china-network/

Upgrading the Cloudflare China Network: better performance and security through product innovation and partnership

Upgrading the Cloudflare China Network: better performance and security through product innovation and partnership

Core to Cloudflare’s mission of helping build a better Internet is making it easy for our customers to improve the performance, security, and reliability of their digital properties, no matter where in the world they might be. This includes Mainland China. Cloudflare has had customers using our service in China since 2015 and recently, we expanded our China presence through a partnership with JD Cloud, the cloud division of Chinese Internet giant, JD.com. We’ve also had a local office in Beijing for several years, which has given us a deep understanding of the Chinese Internet landscape as well as local customers.

The new Cloudflare China Network built in partnership with JD Cloud has been live for several months, with significant performance and security improvements compared to the previous in-country network. Today, we’re excited to describe the improvements we made to our DNS and DDoS systems, and provide data demonstrating the performance gains customers are seeing. All customers licensed to operate in China can now benefit from these innovations, with the click of a button in the Cloudflare dashboard or via the API.

Serving DNS inside China

With over 14% of all domains on the Internet using Cloudflare’s nameservers we are the largest DNS provider. Furthermore, we pride ourselves on consistently being among the fastest authoritative nameservers, answering about 12 million DNS queries per second on average (in Q2 2021). We achieve this scale and performance by running our DNS platform on our global network in more than 200 cities, in over 100 countries.

Not too long ago, a user in mainland China accessing a website using Cloudflare DNS did not fully benefit from these advantages. Their DNS queries had to leave the country and, in most cases, cross the Pacific Ocean to reach our nameservers outside of China. This network distance introduced latency and sometimes even packet drops, resulting in a poor user experience.

With the new China Network offering built on JD Cloud’s infrastructure, customers are now able to serve their DNS in mainland China. This means DNS queries are answered directly from one of the JD Cloud Points of Presence (PoPs), leading to faster response times and improved reliability.

Once a user signs up a domain and opts in to serve their DNS in China we will assign two nameservers, from two of the following three domains:

cf-ns.com
cf-ns.net
cf-ns.tech

We selected these Top Level Domains (TLDs) because they offer the best possible performance from within mainland China. They are chosen to always be different from the TLD of the domain using them. For example, example.com will be assigned nameservers using the .tech and .net TLD. This gives us “glueless delegations” for customers’ nameservers, allowing us to dynamically return nameserver IP addresses instead of static glue records.

A “glue record” (or just “glue”) is a mapping between nameservers and IPs that’s added by registrars to break circular lookup dependencies when a domain uses a nameserver with the same TLD. For example, imagine a resolver asks the .com TLD nameserver: “Where do I find the nameservers for example.com?” and this domain is using ns1.example.com and ns2.example.com as nameservers. If .com just replied: “Go and ask ns1.example.com or ns2.example.com.” the resolver would come back to .com with the same question and this would never stop. One solution is to add glue at .com, so the answer can be: “The nameservers for example.com are ns1.example.com and ns2.example.com, and they can be reached at 192.0.2.78 and 203.0.113.55.”.

By using different TLDs, as described above, we don’t need to rely on glue records for customers’ nameservers. This way, we can ensure that queries will always be answered from the nearest point of presence (PoP) leading to a faster DNS response. Another advantage of serving dynamic nameserver IPs is the ability to distribute queries across different PoPs, which helps to spread load more efficiently and mitigate attacks.

Mitigating DDoS attacks within China

Everywhere in the world except for China and India, we use a technique known as anycast routing to distribute DDoS attacks and absorb them in data centers as close to the traffic source as possible. But as we first wrote in 2015, the Internet in China works a bit differently than the rest of the world so anycast-based mitigation was not an option:

Unlike much of the rest of the world where network routing is open, in China core Internet access is largely controlled by two ISPs: China Telecom and China Unicom. [Today this list also includes China Mobile.] These ISPs control IP address allocation and routing inside the country. Even the Chinese Internet giants rarely own their own IP address allocations, or use BGP to control routing across the Chinese Internet. This makes BGP Anycast and many of the other routing techniques we use across Cloudflare’s network impossible inside of China.

The lack of anycast in China requires a different approach to mitigating attacks, and our expansion with JD Cloud pushed us to further improve the edge-based mitigation system we wrote about earlier this year. Most importantly, we pushed the detection and mitigation of application (L7) attacks to the edge, reducing our time to mitigate and improving the resiliency of the system by removing a dependency on other core data centers for instructions. In the first quarter of 2021, we mitigated 81% of all L7 attacks at the edge.

For the larger network-based (L3/L4) attacks, we worked closely with JD Cloud to augment our in-data center protections with remote signaling to China Telecom, China Unicom, and China Mobile. These integrations allow us to remotely — and automatically — signal from our edge-based mitigation systems when we want upstream filtering assistance from the ISP. Mitigating attacks at the edge is faster than relying on centralized data centers, and in the first quarter of 2021 98.6% of all L3/4 DDoS attacks were mitigated without centralized communication. Attacks exceeding certain thresholds can also be re-routed to large scrubbing centers, a technique that doesn’t make sense in an anycast world but is useful when unicast is the only option.

Beyond the improved mitigation controls, we also developed new traffic engineering processes to move traffic from overloaded data centers to locations with more spare resources. These controls are already used outside of China, but doing so within the country required integration with our DNS systems.

Lastly, because all of our data centers run the same software stack, the work we did to improve the underlying components of DDoS detection and mitigation systems within China has already made its way back to our data centers outside of China.

Improving performance

Cloudflare on JD Cloud is significantly faster than our previous in-country network, allowing us to accelerate the delivery of our customers’ web properties in China.

To compare the Cloudflare PoPs on JD Cloud vs. our previous in-country network, we deployed a test zone to simulate a customer website on both China networks. We tested each website with the same two origin networks. Both origins are commonly used public cloud providers. One site was hosted in the northwest region of the United States, and the other in Western Europe.

For both zones, we assigned DNS nameservers in China to reduce out-of-country latency incurred during DNS lookups (more details are on DNS below). To test our caching, we used a monitoring and benchmarking service with a wide variety of clients in various Chinese cities and provinces to download 100 kilobyte, 1 megabyte, and 10 megabyte files every 15 minutes over the course of 36 hours.

Latency, as measured by Round Trip Time (RTT) from the client to our JD Cloud PoPs, was reduced at least 30% across tests for all file sizes. This subsequently reduced our Time to First Byte (TTFB) metrics. Reducing latency — and making it more consistent, i.e., improving jitter — has the most impact on other performance metrics, as latency and the slow-start process is the bottleneck for the vast majority of TCP connections.

Our latency reduction comes from the quality of the JD Cloud network, their placement of the PoPs within China, and our ability to direct clients to the closest PoP. As we continue to add more capacity and PoPs in partnership with JD Cloud in the future, we only expect our latency metrics to get even better.

Dynamic Content

Upgrading the Cloudflare China Network: better performance and security through product innovation and partnership

Static Content

Upgrading the Cloudflare China Network: better performance and security through product innovation and partnership

DNS Response Time

Upgrading the Cloudflare China Network: better performance and security through product innovation and partnership

Looking forward and welcoming new customers in China

Cloudflare’s sustained product investments in China, in partnership with JD Cloud, have resulted in significant performance and security improvements over our previous in-country network first launched in 2015.

Specifically, innovations in DNS and DDoS mitigation technology, alongside an improved network design and distribution of PoPs, have resulted in better security for our customers and at least a 30% performance boost.

This new network is open for business, and interested customers should reach out to learn more.