Tag Archives: Magic Transit

DIY BYOIP: a new way to Bring Your Own IP prefixes to Cloudflare

Post Syndicated from Ash Pallarito original https://blog.cloudflare.com/diy-byoip/

When a customer wants to bring IP address space to Cloudflare, they’ve always had to reach out to their account team to put in a request. This request would then be sent to various Cloudflare engineering teams such as addressing and network engineering — and then the team responsible for the particular service they wanted to use the prefix with (e.g., CDN, Magic Transit, Spectrum, Egress). In addition, they had to work with their own legal teams and potentially another organization if they did not have primary ownership of an IP prefix in order to get a Letter of Agency (LOA) issued through hoops of approvals. This process is complex, manual, and  time-consuming for all parties involved — sometimes taking up to 4–6 weeks depending on various approvals. 

Well, no longer! Today, we are pleased to announce the launch of our self-serve BYOIP API, which enables our customers to onboard and set up their BYOIP prefixes themselves.

With self-serve, we handle the bureaucracy for you. We have automated this process using the gold standard for routing security — the Resource Public Key Infrastructure, RPKI. All the while, we continue to ensure the best quality of service by generating LOAs on our customers’ behalf, based on the security guarantees of our new ownership validation process. This ensures that customer routes continue to be accepted in every corner of the Internet.

Cloudflare takes the security and stability of the whole Internet very seriously. RPKI is a cryptographically-strong authorization mechanism and is, we believe, substantially more reliable than common practice which relies upon human review of scanned documents. However, deployment and availability of some RPKI-signed artifacts like the AS Path Authorisation (ASPA) object remains limited, and for that reason we are limiting the initial scope of self-serve onboarding to BYOIP prefixes originated from Cloudflare’s autonomous system number (ASN) AS 13335. By doing this, we only need to rely on the publication of Route Origin Authorisation (ROA) objects, which are widely available. This approach has the advantage of being safe for the Internet and also meeting the needs of most of our BYOIP customers. 

Today, we take a major step forward in offering customers a more comprehensive IP address management (IPAM) platform. With the recent update to enable multiple services on a single BYOIP prefix and this latest advancement to enable self-serve onboarding via our API, we hope customers feel empowered to take control of their IPs on our network.

An evolution of Cloudflare BYOIP

We want Cloudflare to feel like an extension of your infrastructure, which is why we originally launched Bring-Your-Own-IP (BYOIP) back in 2020

A quick refresher: Bring-your-own-IP is named for exactly what it does – it allows customers to bring their own IP space to Cloudflare. Customers choose BYOIP for a number of reasons, but the main reasons are control and configurability. An IP prefix is a range or block of IP addresses. Routers create a table of reachable prefixes, known as a routing table, to ensure that packets are delivered correctly across the Internet. When a customer’s Cloudflare services are configured to use the customer’s own addresses, onboarded to Cloudflare as BYOIP, a packet with a corresponding destination address will be routed across the Internet to Cloudflare’s global edge network, where it will be received and processed. BYOIP can be used with our Layer 7 services, Spectrum, or Magic Transit. 

A look under the hood: How it works

Today’s world of prefix validation

Let’s take a step back and take a look at the state of the BYOIP world right now. Let’s say a customer has authority over a range of IP addresses, and they’d like to bring them to Cloudflare. We require customers to provide us with a Letter of Authorization (LOA) and have an Internet Routing Registry (IRR) record matching their prefix and ASN. Once we have this, we require manual review by a Cloudflare engineer. There are a few issues with this process:

  • Insecure: The LOA is just a document—a piece of paper. The security of this method rests entirely on the diligence of the engineer reviewing the document. If the review is not able to detect that a document is fraudulent or inaccurate, it is possible for a prefix or ASN to be hijacked.

  • Time-consuming: Generating a single LOA is not always sufficient. If you are leasing IP space, we will ask you to provide documentation confirming that relationship as well, so that we can see a clear chain of authorisation from the original assignment or allocation of addresses to you. Getting all the paper documents to verify this chain of ownership, combined with having to wait for manual review can result in weeks of waiting to deploy a prefix!

Automating trust: How Cloudflare verifies your BYOIP prefix ownership in minutes

Moving to a self-serve model allowed us to rethink the manner in which we conduct prefix ownership checks. We asked ourselves: How can we quickly, securely, and automatically prove you are authorized to use your IP prefix and intend to route it through Cloudflare?

We ended up killing two birds with one stone, thanks to our two-step process involving the creation of an RPKI ROA (verification of intent) and modification of IRR or rDNS records (verification of ownership). Self-serve unlocks the ability to not only onboard prefixes more quickly and without human intervention, but also exercises more rigorous ownership checks than a simple scanned document ever could. While not 100% foolproof, it is a significant improvement in the way we verify ownership.

Tapping into the authorities

Regional Internet Registries (RIRs) are the organizations responsible for distributing and managing Internet number resources like IP addresses. They are composed of 5 different entities operating in different regions of the world (RIRs). Originally allocated address space from the Internet Assigned Numbers Authority (IANA), they in turn assign and allocate that IP space to Local Internet Registries (LIRs) like ISPs.

This process is based on RIR policies which generally look at things like legal documentation, existing database/registry records, technical contacts, and BGP information. End-users can obtain addresses from an LIR, or in some cases through an RIR directly. As IPv4 addresses have become more scarce, brokerage services have been launched to allow addresses to be leased for fixed periods from their original assignees.

The Internet Routing Registry (IRR) is a separate system that focuses on routing rather than address assignment. Many organisations operate IRR instances and allow routing information to be published, including all five RIRs. While most IRR instances impose few barriers to the publication of routing data, those that are operated by RIRs are capable of linking the ability to publish routing information with the organisations to which the corresponding addresses have been assigned. We believe that being able to modify an IRR record protected in this way provides a good signal that a user has the rights to use a prefix.

Example of a route object containing validation token (using the documentation-only address 192.0.2.0/24):

% whois -h rr.arin.net 192.0.2.0/24

route:          192.0.2.0/24
origin:         AS13335
descr:          Example Company, Inc.
                cf-validation: 9477b6c3-4344-4ceb-85c4-6463e7d2453f
admin-c:        ADMIN2521-ARIN
tech-c:         ADMIN2521-ARIN
tech-c:         CLOUD146-ARIN
mnt-by:         MNT-CLOUD14
created:        2025-07-29T10:52:27Z
last-modified:  2025-07-29T10:52:27Z
source:         ARIN

For those that don’t want to go through the process of IRR-based validation, reverse DNS (rDNS) is provided as another secure method of verification. To manage rDNS for a prefix — whether it’s creating a PTR record or a security TXT record — you must be granted permission by the entity that allocated the IP block in the first place (usually your ISP or the RIR).

This permission is demonstrated in one of two ways:

  • Directly through the IP owner’s authenticated customer portal (ISP/RIR).

  • By the IP owner delegating authority to your third-party DNS provider via an NS record for your reverse zone.

Example of a reverse domain lookup using dig command (using the documentation-only address 192.0.2.0/24):

% dig cf-validation.2.0.192.in-addr.arpa TXT

; <<>> DiG 9.10.6 <<>> cf-validation.2.0.192.in-addr.arpa TXT
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16686
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cf-validation.2.0.192.in-addr.arpa. IN TXT

;; ANSWER SECTION:
cf-validation.2.0.192.in-addr.arpa. 300 IN TXT "b2f8af96-d32d-4c46-a886-f97d925d7977"

;; Query time: 35 msec
;; SERVER: 127.0.2.2#53(127.0.2.2)
;; WHEN: Fri Oct 24 10:43:52 EDT 2025
;; MSG SIZE  rcvd: 150

So how exactly is one supposed to modify these records? That’s where the validation token comes into play. Once you choose either the IRR or Reverse DNS method, we provide a unique, single-use validation token. You must add this token to the content of the relevant record, either in the IRR or in the DNS. Our system then looks for the presence of the token as evidence that the request is being made by someone with authorization to make the requested modification. If the token is found, verification is complete and your ownership is confirmed!

The digital passport 🛂

Ownership is only half the battle; we also need to confirm your intention that you authorize Cloudflare to advertise your prefix. For this, we rely on the gold standard for routing security: the Resource Private Key Infrastructure (RPKI), and in particular Route Origin Authorization (ROA) objects.

A ROA is a cryptographically-signed document that specifies which Autonomous System Number (ASN) is authorized to originate your IP prefix. You can think of a ROA as the digital equivalent of a certified, signed, and notarised contract from the owner of the prefix.

Relying parties can validate the signatures in a ROA using the RPKI.You simply create a ROA that specifies Cloudflare’s ASN (AS13335) as an authorized originator and arrange for it to be signed. Many of our customers used hosted RPKI systems available through RIR portals for this. When our systems detect this signed authorization, your routing intention is instantly confirmed. 

Many other companies that support BYOIP require a complex workflow involving creating self-signed certificates and manually modifying RDAP (Registration Data Access Protocol) records—a heavy administrative lift. By embracing a choice of IRR object modification and Reverse DNS TXT records, combined with RPKI, we offer a verification process that is much more familiar and straightforward for existing network operators.

The global reach guarantee

While the new self-serve flow ditches the need for the “dinosaur relic” that is the LOA, many network operators around the world still rely on it as part of the process of accepting prefixes from other networks.

To help ensure your prefix is accepted by adjacent networks globally, Cloudflare automatically generates a document on your behalf to be distributed in place of a LOA. This document provides information about the checks that we have carried out to confirm that we are authorised to originate the customer prefix, and confirms the presence of valid ROAs to authorise our origination of it. In this way we are able to support the workflows of network operators we connect to who rely upon LOAs, without our customers having the burden of generating them.


Staying away from black holes

One concern in designing the Self-Serve API is the trade-off between giving customers flexibility while implementing the necessary safeguards so that an IP prefix is never advertised without a matching service binding. If this were to happen, Cloudflare would be advertising a prefix with no idea on what to do with the traffic when we receive it! We call this “blackholing” traffic. To handle this, we introduced the requirement of a default service binding — i.e. a service binding that spans the entire range of the IP prefix onboarded. 

A customer can later layer different service bindings on top of their default service binding via multiple service bindings, like putting CDN on top of a default Spectrum service binding. This way, a prefix can never be advertised without a service binding and blackhole our customers’ traffic.


Getting started

Check out our developer docs on the most up-to-date documentation on how to onboard, advertise, and add services to your IP prefixes via our API. Remember that onboardings can be complex, and don’t hesitate to ask questions or reach out to our professional services team if you’d like us to do it for you.

The future of network control

The ability to script and integrate BYOIP management into existing workflows is a game-changer for modern network operations, and we’re only just getting started. In the months ahead, look for self-serve BYOIP in the dashboard, as well as self-serve BYOIP offboarding to give customers even more control.

Cloudflare’s self-serve BYOIP API onboarding empowers customers with unprecedented control and flexibility over their IP assets. This move to automate onboarding empowers a stronger security posture, moving away from manually-reviewed PDFs and driving RPKI adoption. By using these API calls, organizations can automate complex network tasks, streamline migrations, and build more resilient and agile network infrastructures.

Your IPs, your rules: enabling more efficient address space usage

Post Syndicated from Mark Rodgers original https://blog.cloudflare.com/your-ips-your-rules-enabling-more-efficient-address-space-usage/

IPv4 addresses have become a costly commodity, driven by their growing scarcity. With the original pool of 4.3 billion addresses long exhausted, organizations must now rely on the secondary market to acquire them. Over the years, prices have surged, often exceeding $30–$50 USD per address, with costs varying based on block size and demand. Given the scarcity, these prices are only going to rise, particularly for businesses that haven’t transitioned to IPv6. This rising cost and limited availability have made efficient IP address management more critical than ever. In response, we’ve evolved how we handle BYOIP (Bring Your Own IP) prefixes to give customers greater flexibility.

Historically, when customers onboarded a BYOIP prefix, they were required to assign it to a single service, binding all IP addresses within that prefix to one service before it was advertised. Once set, the prefix’s destination was fixed — to direct traffic exclusively to that service. If a customer wanted to use a different service, they had to onboard a new prefix or go through the cumbersome process of offboarding and re-onboarding the existing one.

As a step towards addressing this limitation, we’ve introduced a new level of flexibility: customers can now use parts of any prefix — whether it’s bound to Cloudflare CDN, Spectrum, or Magic Transit — for additional use with CDN or Spectrum. This enhancement provides much-needed flexibility, enabling businesses to optimize their IP address usage while keeping costs under control. 

The challenges of moving onboarded BYOIP prefixes between services

Migrating BYOIP prefixes dynamically between Cloudflare services is no trivial task, especially with thousands of servers capable of accepting and processing connections. The problem required overcoming several technical challenges related to IP address management, kernel-level bindings, and orchestration. 

Dynamic reallocation of prefixes across services

When configuring an IP prefix for a service, we need to update IP address lists and firewall rules on each of our servers to allow only the traffic we expect for that service, such as opening ports 80 and 443 to allow HTTP and HTTPS traffic for the Cloudflare CDN. We use Linux iptables and IP sets for this.

Migrating IP prefixes to a different service involves dynamically reassigning them to different IP sets and iptable rules. This requires automated updates across a large-scale distributed environment.

As prefixes shift between services, it is critical that servers update their IP sets and iptable rules dynamically to ensure traffic is correctly routed. Failure to do so could lead to routing loops or dropped connections.  

Updating Tubular – an eBPF-based IP and port binding service

Most web applications bind to a list of IP addresses at startup, and listen on only those IPs until shutdown. To allow customers to change the IPs bound to each service dynamically, we needed a way to add and remove IPs from a running service, without restarting it. Tubular is a BPF program we wrote that runs on Cloudflare servers that allows services to listen on a single socket, dynamically updating the list of addresses that are routed to that socket over the lifetime of the service, without requiring it to restart when those addresses change.

A significant engineering challenge was extending Tubular to support traffic destined for Cloudflare’s CDN.  Without this enhancement, customers would be unable to leverage dynamic reassignment to bind prefixes onboarded through Spectrum to the Cloudflare CDN, limiting flexibility across services.

Cloudflare’s CDN depends on each server running an NGINX ingress proxy to terminate incoming connections. Due to the scale and performance limitations of NGINX, we are actively working to replace it by 2026. In the interim, however, we still depend on the current ingress proxy to reliably handle incoming connections.

One limitation is that this ingress proxy does not support systemd socket activation, a mechanism Tubular relies on to integrate with other Cloudflare services on each server. For services that do support systemd socket activation, systemd independently starts the sockets for the owning service and passes them to Tubular, allowing Tubular to easily detect and route traffic to the correct terminating service.

Since this integration model is not feasible, an alternative solution was required. This was addressed by introducing a shared Unix domain socket between Tubular and the ingress proxy service on each server. Through this channel,  the ingress proxy service explicitly transmits socket information to Tubular, enabling it to correctly register the sockets in its datapath.

The final challenge was deploying the Tubular-ingress proxy integration across the fleet of servers without disrupting active connections. As of April 2025, Cloudflare handles an average of 71 million HTTP requests per second, peaking at 100 million. To safely deploy at this scale, the necessary Tubular and ingress proxy configuration changes were staged across all Cloudflare servers without disrupting existing connections. The final step involved adding bindings — IP addresses and ports corresponding to Cloudflare CDN prefixes — to the Tubular configuration. These bindings direct connections through Tubular via the Unix sockets registered during the previous integration step. To minimize risk, bindings were gradually enabled in a controlled rollout across the global fleet.

Tubular data plane in action

This high-level representation of the Tubular data plane binds together the Layer 4 protocol (TCP), prefix (192.0.2.0/24 – which is 254 usable IP addresses), and port number 0 (any port). When incoming packets match this combination, they are directed to the correct socket of the service — in this case, Spectrum.​


In the following example, TCP 192.0.2.200/32 has been upgraded to the Cloudflare CDN via the edge Service Bindings API. Tubular dynamically consumes this information, adding a new entry to its data plane bindings and socket table. Using Longest Prefix Match, all packets within the 192.0.2.0/24 range port 0 will be routed to Spectrum, except for 192.0.2.200/32 port 443, which will be directed to the Cloudflare CDN.


Coordination and orchestration at scale 

Our goal is to achieve a quick transition of IP address prefixes between services when initiated by customers, which requires a high level of coordination. We need to ensure that changes propagate correctly across all servers to maintain stability. Currently, when a customer migrates a prefix between services, there is a 4-6 hour window of uncertainty where incoming packets may be dropped due to a lack of guaranteed routing. To address this, we are actively implementing systems that will reduce this transition time from hours to just a matter of minutes, significantly improving reliability and minimizing disruptions.

Smarter IP address management

Service Bindings are mappings that control whether traffic destined for a given IP address is routed to Magic Transit, the CDN pipeline, or the Spectrum pipeline.

Consider the example in the diagram below. One of our customers, a global finance infrastructure platform, is using BYOIP and has a /24 range bound to Spectrum for DDoS protection of their TCP and UDP traffic. However, they are only using a few addresses in that range for their Spectrum applications, while the rest go unused. In addition, the customer is using Cloudflare’s CDN for their Layer 7 traffic and wants to set up Static IPs, so that their customers can allowlist a consistent set of IP addresses owned and controlled by their own network infrastructure team. Instead of using up another block of address space, they asked us whether they could carve out those unused sub-ranges of the /24 prefix.

From there, we set out to determine how to selectively map sub-ranges of the onboarded prefix to different services using service bindings:

  • 192.0.2.0/24 is already bound to Spectrum

    • 192.0.2.0/25 is updated and bound to CDN

    • 192.0.2.200/32 is also updated bound to CDN

Both the /25 and /32 are sub-ranges within the /24 prefix and will receive traffic directed to the CDN. All remaining IP addresses within the /24 prefix, unless explicitly bound, will continue to use the default Spectrum service binding.


As you can see in this example, this approach provides customers with greater control and agility over how their IP address space is allocated. Instead of rigidly assigning an entire prefix to a single service, users can now tailor their IP address usage to match specific workloads or deployment needs. Setting this up is straightforward — all it takes is a few HTTP requests to the Cloudflare API. You can define service bindings by specifying which IP addresses or subnets should be routed to CDN, Spectrum, or Magic Transit. This allows you to tailor traffic routing to match your architecture without needing to restructure your entire IP address allocation. The process remains consistent whether you’re configuring a single IP address or splitting up larger subnets, making it easy to apply across different parts of your network. The foundational technical work addressing the underlying architectural challenges outlined above made it possible to streamline what could have been a complex setup into a straightforward series of API interactions.

Conclusion

We envision a future where customers have granular control over how their traffic moves through Cloudflare’s global network, not just by service, but down to the port level. A single prefix could simultaneously power web applications on CDN, protect infrastructure through Magic Transit, and much more. This isn’t just flexible routing, but programmable traffic orchestration across different services. What was once rigid and static becomes dynamic and fully programmable to meet each customer’s unique needs. 

If you are an existing BYOIP customer using Magic Transit, CDN, or Spectrum, check out our configuration guide here. If you are interested in bringing your own IP address space and using multiple Cloudflare services on it, please reach out to your account team to enable setting up this configuration via API or reach out to [email protected] if you’re new to Cloudflare.

The backbone behind Cloudflare’s Connectivity Cloud

Post Syndicated from Shozo Moritz Takaya original https://blog.cloudflare.com/backbone2024


The modern use of “cloud” arguably traces its origins to the cloud icon, omnipresent in network diagrams for decades. A cloud was used to represent the vast and intricate infrastructure components required to deliver network or Internet services without going into depth about the underlying complexities. At Cloudflare, we embody this principle by providing critical infrastructure solutions in a user-friendly and easy-to-use way. Our logo, featuring the cloud symbol, reflects our commitment to simplifying the complexities of Internet infrastructure for all our users.

This blog post provides an update about our infrastructure, focusing on our global backbone in 2024, and highlights its benefits for our customers, our competitive edge in the market, and the impact on our mission of helping build a better Internet. Since the time of our last backbone-related blog post in 2021, we have increased our backbone capacity (Tbps) by more than 500%, unlocking new use cases, as well as reliability and performance benefits for all our customers.

A snapshot of Cloudflare’s infrastructure

As of July 2024, Cloudflare has data centers in 330 cities across more than 120 countries, each running Cloudflare equipment and services. The goal of delivering Cloudflare products and services everywhere remains consistent, although these data centers vary in the number of servers and amount of computational power.

These data centers are strategically positioned around the world to ensure our presence in all major regions and to help our customers comply with local regulations. It is a programmable smart network, where your traffic goes to the best data center possible to be processed. This programmability allows us to keep sensitive data regional, with our Data Localization Suite solutions, and within the constraints that our customers impose. Connecting these sites, exchanging data with customers, public clouds, partners, and the broader Internet, is the role of our network, which is managed by our infrastructure engineering and network strategy teams. This network forms the foundation that makes our products lightning fast, ensuring our global reliability, security for every customer request, and helping customers comply with data sovereignty requirements.

Traffic exchange methods

The Internet is an interconnection of different networks and separate autonomous systems that operate by exchanging data with each other. There are multiple ways to exchange data, but for simplicity, we’ll focus on two key methods on how these networks communicate: Peering and IP Transit. To better understand the benefits of our global backbone, it helps to understand these basic connectivity solutions we use in our network.

  1. Peering: The voluntary interconnection of administratively separate Internet networks that allows for traffic exchange between users of each network is known as “peering”. Cloudflare is one of the most peered networks globally. We have peering agreements with ISPs and other networks in 330 cities and across all major Internet Exchanges (IX’s). Interested parties can register to peer with us anytime, or directly connect to our network with a link through a private network interconnect (PNI).
  2. IP transit: A paid service that allows traffic to cross or “transit” somebody else’s network, typically connecting a smaller Internet service provider (ISP) to the larger Internet. Think of it as paying a toll to access a private highway with your car.

The backbone is a dedicated high-capacity optical fiber network that moves traffic between Cloudflare’s global data centers, where we interconnect with other networks using these above-mentioned traffic exchange methods. It enables data transfers that are more reliable than over the public Internet. For the connectivity within a city and long distance connections we manage our own dark fiber or lease wavelengths using Dense Wavelength Division Multiplexing (DWDM). DWDM is a fiber optic technology that enhances network capacity by transmitting multiple data streams simultaneously on different wavelengths of light within the same fiber. It’s like having a highway with multiple lanes, so that more cars can drive on the same highway. We buy and lease these services from our global carrier partners all around the world.

Backbone operations and benefits

Operating a global backbone is challenging, which is why many competitors don’t do it. We take this challenge for two key reasons: traffic routing control and cost-effectiveness.

With IP transit, we rely on our transit partners to carry traffic from Cloudflare to the ultimate destination network, introducing unnecessary third-party reliance. In contrast, our backbone gives us full control over routing of both internal and external traffic, allowing us to manage it more effectively. This control is crucial because it lets us optimize traffic routes, usually resulting in the lowest latency paths, as previously mentioned. Furthermore, the cost of serving large traffic volumes through the backbone is, on average, more cost-effective than IP transit. This is why we are doubling down on backbone capacity in regions such as Frankfurt, London, Amsterdam, and Paris and Marseille, where we see continuous traffic growth and where connectivity solutions are widely available and competitively priced.

Our backbone serves both internal and external traffic. Internal traffic includes customer traffic using our security or performance products and traffic from Cloudflare’s internal systems that shift data between our data centers. Tiered caching, for example, optimizes our caching delivery by dividing our data centers into a hierarchy of lower tiers and upper tiers. If lower-tier data centers don’t have the content, they request it from the upper tiers. If the upper tiers don’t have it either, they then request it from the origin server. This process reduces origin server requests and improves cache efficiency. Using our backbone to transport the cached content between lower and upper-tier data centers and the origin is often the most cost-effective method, considering the scale of our network. Magic Transit is another example where we attract traffic, by means of BGP anycast, to the Cloudflare data center closest to the end user and implement our DDoS solution. Our backbone transports the clean traffic to our customer’s data center, which they connect through a Cloudflare Network Interconnect (CNI).

External traffic that we carry on our backbone can be traffic from other origin providers like AWS, Oracle, Alibaba, Google Cloud Platform, or Azure, to name a few. The origin responses from these cloud providers are transported through peering points and our backbone to the Cloudflare data center closest to our customer. By leveraging our backbone we have more control over how we backhaul this traffic throughout our network, which results in more reliability and better performance and less dependency on the public Internet.

This interconnection between public clouds, offices, and the Internet with a controlled layer of performance, security, programmability, and visibility running on our global backbone is our Connectivity Cloud.

This map is a simplification of our current backbone network and does not show all paths

Expanding our network

As mentioned in the introduction, we have increased our backbone capacity (Tbps) by more than 500% since 2021. With the addition of sub-sea cable capacity to Africa, we achieved a big milestone in 2023 by completing our global backbone ring. It now reaches six continents through terrestrial fiber and subsea cables.

Building out our backbone within regions where Internet infrastructure is less developed compared to markets like Central Europe or the US has been a key strategy for our latest network expansions. We have a shared goal with regional ISP partners to keep our data flow localized and as close as possible to the end user. Traffic often takes inefficient routes outside the region due to the lack of sufficient local peering and regional infrastructure. This phenomenon, known as traffic tromboning, occurs when data is routed through more cost-effective international routes and existing peering agreements.

Our regional backbone investments in countries like India or Turkey aim to reduce the need for such inefficient routing. With our own in-region backbone, traffic can be directly routed between in-country Cloudflare data centers, such as from Mumbai to New Delhi to Chennai, reducing latency, increasing reliability, and helping us to provide the same level of service quality as in more developed markets. We can control that data stays local, supporting our Data Localization Suite (DLS), which helps businesses comply with regional data privacy laws by controlling where their data is stored and processed.

Improved latency and performance

This strategic expansion has not only extended our global reach but has also significantly improved our overall latency. One illustration of this is that since the deployment of our backbone between Lisbon and Johannesburg, we have seen a major performance improvement for users in Johannesburg. Customers benefiting from this improved latency can be, for example, a financial institution running their APIs through us for real-time trading, where milliseconds can impact trades, or our Magic WAN users, where we facilitate site-to-site connectivity between their branch offices.

The table above shows an example where we measured the round-trip time (RTT) for an uncached origin fetch, from an end-user in Johannesburg to various origin locations, comparing our backbone and the public Internet. By carrying the origin request over our backbone, as opposed to IP transit or peering, local users in Johannesburg get their content up to 22% faster. By using our own backbone to long-haul the traffic to its final destination, we are in complete control of the path and performance. This improvement in latency varies by location, but consistently demonstrates the superiority of our backbone infrastructure in delivering high performance connectivity.

Traffic control

Consider a navigation system using 1) GPS to identify the route and 2) a highway toll pass that is valid until your final destination and allows you to drive straight through toll stations without stopping. Our backbone works quite similarly.

Our global backbone is built upon two key pillars. The first is BGP (Border Gateway Protocol), the routing protocol for the Internet, and the second is Segment Routing MPLS (Multiprotocol label switching), a technique for steering traffic across predefined forwarding paths in an IP network. By default, Segment Routing provides end-to-end encapsulation from ingress to egress routers where the intermediate nodes execute no route lookup. Instead, they forward traffic across an end-to-end virtual circuit, or tunnel, called a label-switched path. Once traffic is put on a label-switched path, it cannot detour onto the public Internet and must continue on the predetermined route across Cloudflare’s backbone. This is nothing new, as many networks will even run a “BGP Free Core” where all the route intelligence is carried at the edge of the network, and intermediate nodes only participate in forwarding from ingress to egress.

While leveraging Segment Routing Traffic Engineering (SR-TE) in our backbone, we can automatically select paths between our data centers that are optimized for latency and performance. Sometimes the “shortest path” in terms of routing protocol cost is not the lowest latency or highest performance path.

Supercharged: Argo and the global backbone

Argo Smart Routing is a service that uses Cloudflare’s portfolio of backbone, transit, and peering connectivity to find the most optimal path between the data center where a user’s request lands and your back-end origin server. Argo may forward a request from one Cloudflare data center to another on the way to an origin if the performance would improve by doing so. Orpheus is the counterpart to Argo, and routes around degraded paths for all customer origin requests free of charge. Orpheus is able to analyze network conditions in real-time and actively avoid reachability failures. Customers with Argo enabled get optimal performance for requests from Cloudflare data centers to their origins, while Orpheus provides error self-healing for all customers universally. By mixing our global backbone using Segment Routing as an underlay with Argo Smart Routing and Orpheus as our connectivity overlay, we are able to transport critical customer traffic along the most optimized paths that we have available.

So how exactly does our global backbone fit together with Argo Smart Routing? Argo Transit Selection is an extension of Argo Smart Routing where the lowest latency path between Cloudflare data center hops is explicitly selected and used to forward customer origin requests. The lowest latency path will often be our global backbone, as it is a more dedicated and private means of connectivity, as opposed to third-party transit networks.

Consider a multinational Dutch pharmaceutical company that relies on Cloudflare’s network and services with our SASE solution to connect their global offices, research centers, and remote employees. Their Asian branch offices depend on Cloudflare’s security solutions and network to provide secure access to important data from their central data centers back to their offices in Asia. In case of a cable cut between regions, our network would automatically look for the best alternative route between them so that business impact is limited.

Argo measures every potential combination of the different provider paths, including our own backbone, as an option for reaching origins with smart routing. Because of our vast interconnection with so many networks, and our global private backbone, Argo is able to identify the most performant network path for requests. The backbone is consistently one of the lowest latency paths for Argo to choose from.

In addition to high performance, we care greatly about network reliability for our customers. This means we need to be as resilient as possible from fiber cuts and third-party transit provider issues. During a disruption of the AAE-1 (Asia Africa Europe-1) submarine cable, this is what Argo saw between Singapore and Amsterdam across some of our transit provider paths vs. the backbone.

The large (purple line) spike shows a latency increase on one of our third-party IP transit provider paths due to congestion, which was eventually resolved following likely traffic engineering within the provider’s network. We saw a smaller latency increase (yellow line) over other transit networks, but still one that is noticeable. The bottom (green) line on the graph is our backbone, where round-trip time more or less remains flat throughout the event, due to our diverse backbone connectivity between Asia and Europe. Throughout the fiber cut, we remained stable at around 200ms between Amsterdam and Singapore. There was no noticeable network hiccup as was seen on the transit provider paths, so Argo actively leveraged the backbone for optimal performance.

Call to action

As Argo improves performance in our network, Cloudflare Network Interconnects (CNIs) optimize getting onto it. We encourage our Enterprise customers to use our free CNI’s as on-ramps onto our network whenever practical. In this way, you can fully leverage our network, including our robust backbone, and increase overall performance for every product within your Cloudflare Connectivity Cloud. In the end, our global network is our main product and our backbone plays a critical role in it. This way we continue to help build a better Internet, by improving our services for everybody, everywhere.

If you want to be part of our mission, join us as a Cloudflare network on-ramp partner to offer secure and reliable connectivity to your customers by integrating directly with us. Learn more about our on-ramp partnerships and how they can benefit your business here.

Free network flow monitoring for all enterprise customers

Post Syndicated from Chris Draper original https://blog.cloudflare.com/free-network-monitoring-for-enterprise


A key component of effective corporate network security is establishing end to end visibility across all traffic that flows through the network. Every network engineer needs a complete overview of their network traffic to confirm their security policies work, to identify new vulnerabilities, and to analyze any shifts in traffic behavior. Often, it’s difficult to build out effective network monitoring as teams struggle with problems like configuring and tuning data collection, managing storage costs, and analyzing traffic across multiple visibility tools.

Today, we’re excited to announce that a free version of Cloudflare’s network flow monitoring product, Magic Network Monitoring, is available to all Enterprise Customers. Every Enterprise Customer can configure Magic Network Monitoring and immediately improve their network visibility in as little as 30 minutes via our self-serve onboarding process.

Enterprise Customers can visit the Magic Network Monitoring product page, click “Talk to an expert”, and fill out the form. You’ll receive access within 24 hours of submitting the request. Over the next month, the free version of Magic Network Monitoring will be rolled out to all Enterprise Customers. The product will automatically be available by default without the need to submit a form.

How it works

Cloudflare customers can send their network flow data (either NetFlow or sFlow) from their routers to Cloudflare’s network edge.

Magic Network Monitoring will pick up this data, parse it, and instantly provide insights and analytics on your network traffic. These analytics include traffic volume overtime in bytes and packets, top protocols, sources, destinations, ports, and TCP flags.

Dogfooding Magic Network Monitoring during the remediation of the Thanksgiving 2023 security incident

Let’s review a recent example of how Magic Network Monitoring improved Cloudflare’s own network security and traffic visibility during the Thanksgiving 2023 security incident. Our security team needed a lightweight method to identify malicious packet characteristics in our core data center traffic. We monitored for any network traffic sourced from or destined to a list of ASNs associated with the bad actor. Our security team setup Magic Network Monitoring and established visibility into our first core data center within 24 hours of the project kick-off. Today, Cloudflare continues to use Magic Network Monitoring to monitor for traffic related to bad actors and to provide real time traffic analytics on more than 1 Tbps of core data center traffic.

Magic Network Monitoring – Traffic Analytics

Monitoring local network traffic from IoT devices

Magic Network Monitoring also improves visibility on any network traffic that doesn’t go through Cloudflare. Imagine that you’re a network engineer at ACME Corporation, and it’s your job to manage and troubleshoot IoT devices in a factory that are connected to the factory’s internal network. The traffic generated by these IoT devices doesn’t go through Cloudflare because it is destined to other devices and endpoints on the internal network. Nonetheless, you still need to establish network visibility into device traffic over time to monitor and troubleshoot the system.

To solve the problem, you configure a router or other network device to securely send encrypted traffic flow summaries to Cloudflare via an IPSec tunnel. Magic Network Monitoring parses the data, and instantly provides you with insights and analytics on your network traffic. Now, when an IoT device goes down, or a connection between IoT devices is unexpectedly blocked, you can analyze historical network traffic data in Magic Network Monitoring to speed up the troubleshooting process.

Monitoring cloud network traffic

As cloud networking becomes increasingly prevalent, it is essential for enterprises to invest in visibility across their cloud environments. Let’s say you’re responsible for monitoring and troubleshooting your corporation’s cloud network operations which are spread across multiple public cloud providers. You need to improve visibility into your cloud network traffic to analyze and troubleshoot any unexpected traffic patterns like configuration drift that leads to an exposed network port.

To improve traffic visibility across different cloud environments, you can export cloud traffic flow logs from any virtual device that supports NetFlow or sFlow to Cloudflare. In the future, we are building support for native cloud VPC flow logs in conjunction with Magic Cloud Networking. Cloudflare will parse this traffic flow data and provide alerts plus analytics across all your cloud environments in a single pane of glass on the Cloudflare dashboard.

Improve your security posture today in less than 30 minutes

If you’re an existing Enterprise customer, and you want to improve your corporate network security, you can get started right away. Visit the Magic Network Monitoring product page, click “Talk to an expert”, and fill out the form. You’ll receive access within 24 hours of submitting the request. You can begin the self-serve onboarding tutorial, and start monitoring your first batch of network traffic in less than 30 minutes.

Over the next month, the free version of Magic Network Monitoring will be rolled out to all Enterprise Customers. The product will be automatically available by default without the need to submit a form.

If you’re interested in becoming an Enterprise Customer, and have more questions about Magic Network Monitoring, you can talk with an expert. If you’re a free customer, and you’re interested in testing a limited beta of Magic Network Monitoring, you can fill out this form to request access.

Simplifying how enterprises connect to Cloudflare with Express Cloudflare Network Interconnect

Post Syndicated from Ben Ritter original https://blog.cloudflare.com/announcing-express-cni


We’re excited to announce the largest update to Cloudflare Network Interconnect (CNI) since its launch, and because we’re making CNIs faster and easier to deploy, we’re calling this Express CNI. At the most basic level, CNI is a cable between a customer’s network router and Cloudflare, which facilitates the direct exchange of information between networks instead of via the Internet. CNIs are fast, secure, and reliable, and have connected customer networks directly to Cloudflare for years. We’ve been listening to how we can improve the CNI experience, and today we are sharing more information about how we’re making it faster and easier to order CNIs, and connect them to Magic Transit and Magic WAN.

Interconnection services and what to consider

Interconnection services provide a private connection that allows you to connect your networks to other networks like the Internet, cloud service providers, and other businesses directly. This private connection benefits from improved connectivity versus going over the Internet and reduced exposure to common threats like Distributed Denial of Service (DDoS) attacks.

Cost is an important consideration when evaluating any vendor for interconnection services. The cost of an interconnection is typically comprised of a fixed port fee, based on the capacity (speed) of the port, and the variable amount of data transferred. Some cloud providers also add complex inter-region bandwidth charges.

Other important considerations include the following:

  • How much capacity is needed?
  • Are there variable or fixed costs associated with the port?
  • Is the provider located in the same colocation facility as my business?
  • Are they able to scale with my network infrastructure?
  • Are you able to predict your costs without any unwanted surprises?
  • What additional products and services does the vendor offer?

Cloudflare does not charge a port fee for Cloudflare Network Interconnect, nor do we charge for inter-region bandwidth. Using CNI with products like Magic Transit and Magic WAN may even reduce bandwidth spending with Internet service providers. For example, you can deliver Magic Transit-cleaned traffic to your data center with a CNI instead of via your Internet connection, reducing the amount of bandwidth that you would pay an Internet service provider for.

To underscore the value of CNI, one vendor charges nearly \$20,000 a year for a 10 Gigabit per second (Gbps) direct connect port. The same 10 Gbps CNI on Cloudflare for one year is $0. Their cost also does not include any costs related to the amount of data transferred between different regions or geographies, or outside of their cloud. We have never charged for CNIs, and are committed to making it even easier for customers to connect to Cloudflare, and destinations beyond on the open Internet.

3 Minute Provisioning

Our first big announcement is a new, faster approach to CNI provisioning and deployment. Starting today, all Magic Transit and Magic WAN customers can order CNIs directly from their Cloudflare account. The entire process is about 3 clicks and takes less than 3 minutes (roughly the time to make coffee). We’re going to show you how simple it is to order a CNI.

The first step is to find out whether Cloudflare is in the same data center or colocation facility as your routers, servers, and network hardware. Let’s navigate to the new “Interconnects” section of the Cloudflare dashboard, and order a new Direct CNI.

Search for the city of your data center, and quickly find out if Cloudflare is in the same facility. I’m going to stand up a CNI to connect my example network located in Ashburn, VA.

It looks like Cloudflare is in the same facility as my network, so I’m going to select the location where I’d like to connect.

As of right now, my data center is only exchanging a few hundred Megabits per second of traffic on Magic Transit, so I’m going to select a 1 Gigabit per second interface, which is the smallest port speed available. I can also order a 10 Gbps link if I have more than 1 Gbps of traffic in a single location. Cloudflare also supports 100 Gbps CNIs, but if you have this much traffic to exchange with us, we recommend that you coordinate with your account team.

After selecting your preferred port speed, you can name your CNI, which will be referenceable later when you direct your Magic Transit or Magic WAN traffic to the interconnect. We are given the opportunity to verify that everything looks correct before confirming our CNI order.

Once we click the “Confirm Order” button, Cloudflare will provision an interface on our router for your CNI, and also assign IP addresses for you to configure on your router interface. Cloudflare will also issue you a Letter of Authorization (LOA) for you to order a cross connect with the local facility. Cloudflare will provision a port on our router for your CNI within 3 minutes of your order, and you will be able to ping across the CNI as soon as the interface line status comes up.

After downloading the Letter of Authorization (LOA) to order a cross connect, we’ll navigate back to our Interconnects area. Here we can see the point to point IP addressing, and the CNI name that is used in our Magic Transit or Magic WAN configuration. We can also redownload the LOA if needed.

Simplified Magic Transit and Magic WAN onboarding

Our second major announcement is that Express CNI dramatically simplifies how Magic Transit and Magic WAN customers connect to Cloudflare. Getting packets into Magic Transit or Magic WAN in the past with a CNI required customers to configure a GRE (Generic Routing Encapsulation) tunnel on their router. These configurations are complex, and not all routers and switches support these changes. Since both Magic Transit and Magic WAN protect networks, and operate at the network layer on packets, customers rightly asked us, “If I connect directly to Cloudflare with CNI, why do I also need a GRE tunnel for Magic Transit and Magic WAN?”

Starting today, GRE tunnels are no longer required with Express CNI. This means that Cloudflare supports standard 1500-byte packets on the CNI, and there’s no need for complex GRE or MSS adjustment configurations to get traffic into Magic Transit or Magic WAN. This significantly reduces the amount of configuration required on a router for Magic Transit and Magic WAN customers who can connect over Express CNI. If you’re not familiar with Magic Transit, the key takeaway is that we’ve reduced the complexity of changes you must make on your router to protect your network with Cloudflare.

What’s next for CNI?

We’re excited about how Express CNI simplifies connecting to Cloudflare’s network. Some customers connect to Cloudflare through our Interconnection Platform Partners, like Equinix and Megaport, and we plan to bring the Express CNI features to our partners too.

We have upgraded a number of our data centers to support Express CNI, and plan to upgrade many more over the next few months. We are rapidly expanding the number of global locations that support Express CNI as we install new network hardware. If you’re interested in connecting to Cloudflare with Express CNI, but are unable to find your data center, please let your account team know.

If you’re on an existing classic CNI today, and you don’t need Express CNI features, there is no obligation to migrate to Express CNI. Magic Transit and Magic WAN customers have been asking for BGP support to control how Cloudflare routes traffic back to their networks, and we expect to extend BGP support to Express CNI first, so keep an eye out for more Express CNI announcements later this year.

Get started with Express CNI today

As we’ve demonstrated above, Express CNI makes it fast and easy to connect your network to Cloudflare. If you’re a Magic Transit or Magic WAN customer, the new “Interconnects” area is now available on your Cloudflare dashboard. To deploy your first CNI, you can follow along with the screenshots above, or refer to our updated interconnects documentation.

Introducing Cloudflare’s new Network Analytics dashboard

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/network-analytics-v2-announcement/

Introducing Cloudflare’s new Network Analytics dashboard

Introducing Cloudflare’s new Network Analytics dashboard

We’re pleased to introduce Cloudflare’s new and improved Network Analytics dashboard. It’s now available to Magic Transit and Spectrum customers on the Enterprise plan.

The dashboard provides network operators better visibility into traffic behavior, firewall events, and DDoS attacks as observed across Cloudflare’s global network. Some of the dashboard’s data points include:

  1. Top traffic and attack attributes
  2. Visibility into DDoS mitigations and Magic Firewall events
  3. Detailed packet samples including full packets headers and metadata
Introducing Cloudflare’s new Network Analytics dashboard
Network Analytics – Drill down by various dimensions
Introducing Cloudflare’s new Network Analytics dashboard
Network Analytics – View traffic by mitigation system

This dashboard was the outcome of a full refactoring of our network-layer data logging pipeline. The new data pipeline is decentralized and much more flexible than the previous one — making it more resilient, performant, and scalable for when we add new mitigation systems, introduce new sampling points, and roll out new services. A technical deep-dive blog is coming soon, so stay tuned.

In this blog post, we will demonstrate how the dashboard helps network operators:

  1. Understand their network better
  2. Respond to DDoS attacks faster
  3. Easily generate security reports for peers and managers

Understand your network better

One of the main responsibilities network operators bare is ensuring the operational stability and reliability of their network. Cloudflare’s Network Analytics dashboard shows network operators where their traffic is coming from, where it’s heading, and what type of traffic is being delivered or mitigated. These insights, along with user-friendly drill-down capabilities, help network operators identify changes in traffic, surface abnormal behavior, and can help alert on critical events that require their attention — to help them ensure their network’s stability and reliability.

Starting at the top, the Network Analytics dashboard shows network operators their traffic rates over time along with the total throughput. The entire dashboard is filterable, you can drill down using select-to-zoom, change the time-range, and toggle between a packet or bit/byte view. This can help gain a quick understanding of traffic behavior and identify sudden dips or surges in traffic.

Cloudflare customers advertising their own IP prefixes from the Cloudflare network can also see annotations for BGP advertisement and withdrawal events. This provides additional context atop of the traffic rates and behavior.

Introducing Cloudflare’s new Network Analytics dashboard
The Network Analytics dashboard time series and annotations

Geographical accuracy

One of the many benefits of Cloudflare’s Network Analytics dashboard is its geographical accuracy. Identification of the traffic source usually involves correlating the source IP addresses to a city and country. However, network-layer traffic is subject to IP spoofing. Malicious actors can spoof (alter) their source IP address to obfuscate their origin (or their botnet’s nodes) while attacking your network. Correlating the location (e.g., the source country) based on spoofed IPs would therefore result in spoofed countries. Using spoofed countries would skew the global picture network operators rely on.

To overcome this challenge and provide our users accurate geoinformation, we rely on the location of the Cloudflare data center wherein the traffic was ingested. We’re able to achieve geographical accuracy with high granularity, because we operate data centers in over 285 locations around the world. We use BGP Anycast which ensures traffic is routed to the nearest data center within BGP catchment.

Introducing Cloudflare’s new Network Analytics dashboard
Traffic by Cloudflare data center country from the Network Analytics dashboard

Detailed mitigation analytics

The dashboard lets network operators understand exactly what is happening to their traffic while it’s traversing the Cloudflare network. The All traffic tab provides a summary of attack traffic that was dropped by the three mitigation systems, and the clean traffic that was passed to the origin.

Introducing Cloudflare’s new Network Analytics dashboard
The All traffic tab in Network Analytics

Each additional tab focuses on one mitigation system, showing traffic dropped by the corresponding mitigation system and traffic that was passed through it. This provides network operators almost the same level of visibility as our internal support teams have. It allows them to understand exactly what Cloudflare systems are doing to their traffic and where in the Cloudflare stack an action is being taken.

Introducing Cloudflare’s new Network Analytics dashboard
Introducing Cloudflare’s new Network Analytics dashboard
Data path for Magic Transit customers

Using the detailed tabs, users can better understand the systems’ decisions and which rules are being applied to mitigate attacks. For example, in the Advanced TCP Protection tab, you can view how the system is classifying TCP connection states. In the screenshot below, you can see the distribution of packets according to connection state. For example, a sudden spike in Out of sequence packets may result in the system dropping them.

Introducing Cloudflare’s new Network Analytics dashboard
The Advanced TCP Protection tab in Network Analytics

Note that the presence of tabs differ slightly for Spectrum customers because they do not have access to the Advanced TCP Protection and Magic Firewall tabs. Spectrum customers only have access to the first two tabs.

Respond to DDoS attacks faster

Cloudflare detects and mitigates the majority of DDoS attacks automatically. However, when a network operator responds to a sudden increase in traffic or a CPU spike in their data centers, they need to understand the nature of the traffic. Is this a legitimate surge due to a new game release for example, or an unmitigated DDoS attack? In either case, they need to act quickly to ensure there are no disruptions to critical services.

The Network Analytics dashboard can help network operators quickly pattern traffic by switching the time-series’ grouping dimensions. They can then use that pattern to drop packets using the Magic Firewall. The default dimension is the outcome indicating whether traffic was dropped or passed. But by changing the time series dimension to another field such as the TCP flag, Packet size, or Destination port a pattern can emerge.

In the example below, we have zoomed in on a surge of traffic. By setting the Protocol field as the grouping dimension, we can see that there is a 5 Gbps surge of UDP packets (totalling at 840 GB throughput out of 991 GB in this time period). This is clearly not the traffic we want, so we can hover and click the UDP indicator to filter by it.

Introducing Cloudflare’s new Network Analytics dashboard
Distribution of a DDoS attack by IP protocols

We can then continue to pattern the traffic, and so we set the Source port to be the grouping dimension. We can immediately see that, in this case, the majority of traffic (838 GB) is coming from source port 123. That’s no bueno, so let’s filter by that too.

Introducing Cloudflare’s new Network Analytics dashboard
The UDP flood grouped by source port

We can continue iterating to identify the main pattern of the surge. An example of a field that is not necessarily helpful in this case is the Destination port. The time series is only showing us the top five ports but we can already see that it is quite distributed.

Introducing Cloudflare’s new Network Analytics dashboard
The attack targets multiple destination ports

We move on to see what other fields can contribute to our investigation. Using the Packet size dimension yields good results. Over 771 GB of the traffic are delivered over 286 byte packets.

Introducing Cloudflare’s new Network Analytics dashboard
Zooming in on an UDP flood originating from source port 123 

Assuming that our attack is now sufficiently patterned, we can create a Magic Firewall rule to block the attack by combining those fields. You can combine additional fields to ensure you do not impact your legitimate traffic. For example, if the attack is only targeting a single prefix (e.g., 192.0.2.0/24), you can limit the scope of the rule to that prefix.

Introducing Cloudflare’s new Network Analytics dashboard
Creating a Magic Firewall rule directly from within the analytics dashboard
Introducing Cloudflare’s new Network Analytics dashboard
Creating a Magic Firewall rule to block a UDP flood

If needed for attack mitigation or network troubleshooting, you can also view and export packet samples along with the packet headers. This can help you identify the pattern and sources of the traffic.

Introducing Cloudflare’s new Network Analytics dashboard
Example of packet samples with one sample expanded
Introducing Cloudflare’s new Network Analytics dashboard
Example of a packet sample with the header sections expanded

Generate reports

Another important role of the network security team is to provide decision makers an accurate view of their threat landscape and network security posture. Understanding those will enable teams and decision makers to prepare and ensure their organization is protected and critical services are kept available and performant. This is where, again, the Network Analytics dashboard comes in to help. Network operators can use the dashboard to understand their threat landscape — which endpoints are being targeted, by which types of attacks, where are they coming from, and how does that compare to the previous period.

Introducing Cloudflare’s new Network Analytics dashboard
Dynamic, adaptive executive summary

Using the Network Analytics dashboard, users can create a custom report — filtered and tuned to provide their decision makers a clear view of the attack landscape that’s relevant to them.

Introducing Cloudflare’s new Network Analytics dashboard

In addition, Magic Transit and Spectrum users also receive an automated weekly Network DDoS Report which includes key insights and trends.

Extending visibility from Cloudflare’s vantage point

As we’ve seen in many cases, being unprepared can cost organizations substantial revenue loss, it can negatively impact their reputation, reduce users’ trust as well as burn out teams that need to constantly put out fires reactively. Furthermore, impact to organizations that operate in the healthcare industry, water, and electric and other critical infrastructure industries can cause very serious real-world problems, e.g., hospitals not being able to provide care for patients.

The Network Analytics dashboard aims to reduce the effort and time it takes network teams to investigate and resolve issues as well as to simplify and automate security reporting. The data is also available via GraphQL API and Logpush to allow teams to integrate the data into their internal systems and cross references with additional data points.

To learn more about the Network Analytics dashboard, refer to the developer documentation.

Introducing Advanced DDoS Alerts

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/advanced-ddos-alerts/

Introducing Advanced DDoS Alerts

Introducing Advanced DDoS Alerts

We’re pleased to introduce Advanced DDoS Alerts. Advanced DDoS Alerts are customizable and provide users the flexibility they need when managing many Internet properties. Users can easily define which alerts they want to receive — for which DDoS attack sizes, protocols and for which Internet properties.

This release includes two types of Advanced DDoS Alerts:

  1. Advanced HTTP DDoS Attack Alerts – Available to WAF/CDN customers on the Enterprise plan, who have also subscribed to the Advanced DDoS Protection service.
  2. Advanced L3/4 DDoS Attack Alerts – Available to Magic Transit and Spectrum BYOIP customers on the Enterprise plan.

Standard DDoS Alerts are available to customers on all plans, including the Free plan. Advanced DDoS Alerts are part of Cloudflare’s Advanced DDoS service.

Why alerts?

Distributed Denial of Service attacks are cyber attacks that aim to take down your Internet properties and make them unavailable for your users. As early as 2017, Cloudflare pioneered the Unmetered DDoS Protection to provide all customers with DDoS protection, without limits, to ensure that their Internet properties remain available. We’re able to provide this level of commitment to our customers thanks to our automated DDoS protection systems. But if the systems operate automatically, why even be alerted?

Well, to put it plainly, when our DDoS protection systems kick in, they insert ephemeral rules inline to mitigate the attack. Many of our customers operate business critical applications and services. When our systems make a decision to insert a rule, customers might want to be able to verify that all the malicious traffic is mitigated, and that legitimate user traffic is not. Our DDoS alerts begin firing as soon as our systems make a mitigation decision. Therefore, by informing our customers about a decision to insert a rule in real time, they can observe and verify that their Internet properties are both protected and available.

Managing many Internet properties

The standard DDoS Alerts alert you on DDoS attacks that target any and all of your Cloudflare-protected Internet properties. However, some of our customers may manage large numbers of Internet properties ranging from hundreds to hundreds of thousands. The standard DDoS Alerts would notify users every time one of those properties would come under attack — which could become very noisy.

The Advanced DDoS Alerts address this concern by allowing users to select the specific Internet properties that they want to be notified about; zones and hostnames for WAF/CDN customers, and IP prefixes for Magic Transit and Spectrum BYOIP customers.

Introducing Advanced DDoS Alerts
Creating an Advanced HTTP DDoS Attack Alert: selecting zones and hostnames
Introducing Advanced DDoS Alerts
Creating an Advanced L3/4 DDoS Attack Alert: selecting prefixes

One (attack) size doesn’t fit all

The standard DDoS Alerts alert you on DDoS attacks of any size. Well, almost any size. We implemented minimal alert thresholds to avoid spamming our customers’ email inboxes. Those limits are very small and not customer-configurable. As we’ve seen in the recent DDoS trends report, most of the attacks are very small — another reason why the standard DDoS Alert could become noisy for customers that only care about very large attacks. On the opposite end of the spectrum, choosing not to alert may become too quiet for customers that do want to be notified about smaller attacks.

The Advanced DDoS Alerts let customers choose their own alert threshold. WAF/CDN customers can define the minimum request-per-second rate of an HTTP DDoS attack alert. Magic Transit and Spectrum BYOIP customers can define the packet-per-second and Megabit-per-second rates of a L3/4 DDoS attack alert.

Introducing Advanced DDoS Alerts
Creating an Advanced HTTP DDoS Attack Alert: defining request rate
Introducing Advanced DDoS Alerts
Creating an Advanced L3/4 DDoS Attack Alert: defining packet/bit rate

Not all protocols are created equal

As part of the Advanced L3/4 DDoS Alerts, we also let our users define the protocols to be alerted on. If a Magic Transit customer manages mostly UDP applications, they may not care if TCP-based DDoS attacks target it. Similarly, if a Spectrum BYOIP customer only cares about HTTP/TCP traffic, other-protocol-based attacks could be of no concern to them.

Introducing Advanced DDoS Alerts
Introducing Advanced DDoS Alerts
Creating an Advanced L3/4 DDoS Attack Alert: selecting the protocols

Creating an Advanced DDoS Alert

We’ll show here how to create an Advanced HTTP DDoS Alert, but the process to create a L3/4 alert is similar. You can view a more detailed guide on our developers website.

First, click here or log in to your Cloudflare account, navigate to Notifications and click Add. Then select the Advanced HTTP DDoS Attack Alert or Advanced L3/4 DDoS Attack Alert (based on your eligibility). Give your alert a name, an optional description, add your preferred delivery method (e.g., Webhook) and click Next.

Introducing Advanced DDoS Alerts
Step 1: Creating an Advanced HTTP DDoS Attack Alert

Second, select the domains you’d like to be alerted on. You can also narrow it down to specific hostnames. Define the minimum request-per-second rate to be alerted on, click Save, and voilà.

Introducing Advanced DDoS Alerts
Step 2: Defining the Advanced HTTP DDoS Attack Alert conditions

Actionable alerts for making better decisions

Cloudflare Advanced DDoS Alerts aim to provide our customers with configurable controls to make better decisions for their own environments. Customers can now be alerted on attacks based on which domain/prefix is being attacked, the size of the attack, and the protocol of the attack. We recognize that the power to configure and control DDoS attack alerts should ultimately be left up to our customers, and we are excited to announce the availability of this functionality.

Want to learn more about Advanced DDoS Alerts? Visit our developer site.

Interested in upgrading to get Advanced DDoS Alerts? Contact your account team.

New to Cloudflare? Speak to a Cloudflare expert.

Introducing Cloudflare Adaptive DDoS Protection – our new traffic profiling system for mitigating DDoS attacks

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/adaptive-ddos-protection/

Introducing Cloudflare Adaptive DDoS Protection - our new traffic profiling system for mitigating DDoS attacks

Introducing Cloudflare Adaptive DDoS Protection - our new traffic profiling system for mitigating DDoS attacks

Every Internet property is unique, with its own traffic behaviors and patterns. For example, a website may only expect user traffic from certain geographies, and a network might only expect to see a limited set of protocols.

Understanding that the traffic patterns of each Internet property are unique is what led us to develop the Adaptive DDoS Protection system. Adaptive DDoS Protection joins our existing suite of automated DDoS defenses and takes it to the next level. The new system learns your unique traffic patterns and adapts to protect against sophisticated DDoS attacks.

Adaptive DDoS Protection is now generally available to Enterprise customers:

  • HTTP Adaptive DDoS Protection – available to WAF/CDN customers on the Enterprise plan, who have also subscribed to the Advanced DDoS Protection service.
  • L3/4 Adaptive DDoS Protection – available to Magic Transit and Spectrum customers on an Enterprise plan.

Adaptive DDoS Protection learns your traffic patterns

The Adaptive DDoS Protection system creates a traffic profile by looking at a customer’s maximal rates of traffic every day, for the past seven days. The profiles are recalculated every day using the past seven-day history. We then store the maximal traffic rates seen for every predefined dimension value. Every profile uses one dimension and these dimensions include the source country of the request, the country where the Cloudflare data center that received the IP packet is located, user agent, IP protocol, destination ports and more.

So, for example, for the profile that uses the source country as a dimension, the system will log the maximal traffic rates seen per country. e.g. 2,000 requests per second (rps) for Germany, 3,000 rps for France, 10,000 rps for Brazil, and so on. This example is for HTTP traffic, but Adaptive DDoS protection also profiles L3/4 traffic for our Magic Transit and Spectrum Enterprise customers.

Another note on the maximal rates is that we use the 95th percentile rates. This means that we take a look at the maximal rates and discard the top 5% of the highest rates. The purpose of this is to eliminate outliers from the calculations.

Calculating traffic profiles is done asynchronously — meaning that it does not induce any latency to our customers’ traffic. The system  then distributes a compact profile representation across our network that can be consumed by our DDoS protection systems to be used to detect and mitigate DDoS attacks in a much more cost-efficient manner.

In addition to the traffic profiles, the Adaptive DDoS Protection also leverages Cloudflare’s Machine Learning generated Bot Scores as an additional signal to differentiate between user and automated traffic. The purpose of using these scores is to differentiate between legitimate spikes in user traffic that deviates from the traffic profile, and a spike of automated and potentially malicious traffic.

Out of the box and easy to use

Adaptive DDoS Protection just works out of the box. It automatically creates the profiles, and then customers can tweak and tune the settings as they need via DDoS Managed Rules. Customers can change the sensitivity level, leverage expression fields to create overrides (e.g. exclude this type of traffic), and change the mitigation action to tailor the behavior of the system to their specific needs and traffic patterns.

Introducing Cloudflare Adaptive DDoS Protection - our new traffic profiling system for mitigating DDoS attacks

Adaptive DDoS Protection complements the existing DDoS protection systems which leverages dynamic fingerprinting to detect and mitigate DDoS attacks. The two work in tandem to protect our customers from DDoS attacks. When Cloudflare customers onboard a new Internet property to Cloudflare, the dynamic fingerprinting protects them automatically and out of the box — without requiring any user action. Once the Adaptive DDoS Protection learns their legitimate traffic patterns and creates a profile, users can turn it on to provide an extra layer of protection.

Rules included as part of the Adaptive DDoS Protection

As part of this release, we’re pleased to announce the following capabilities as part of Cloudflare’s Adaptive DDoS Protection:

Profiling Dimension Availability
WAF/CDN customers on the Enterprise plan with Advanced DDoS Magic Transit & Spectrum Enterprise customers
Origin errors
Client IP Country & region Coming soon
User Agent (globally, not per customer*)
IP Protocol
Combination of IP Protocol and Destination Port Coming soon

*The User-Agent-aware feature analyzes, learns and profiles all the top user agents that we see across the Cloudflare network. This feature helps us identify DDoS attacks that leverage legacy or wrongly configured user agents.

Excluding UA-aware DDoS Protection, Adaptive DDoS Protection rules are deployed in Log mode. Customers can observe the traffic that’s flagged, tweak the sensitivity if needed, and then deploy the rules in mitigation mode. You can follow the steps outlined in this guide to do so.

Making the impact of DDoS attacks a thing of the past

Our mission at Cloudflare is to help build a better Internet. The DDoS Protection team’s vision is derived from this mission: our goal is to make the impact of DDoS attacks a thing of the past. Cloudflare’s Adaptive DDoS Protection takes us one step closer to achieving that vision: making Cloudflare’s DDoS protection even more intelligent, sophisticated, and tailored to our customer’s unique traffic patterns and individual needs.

Want to learn more about Cloudflare’s Adaptive DDoS Protection? Visit our developer site.

Interested in upgrading to get access to Adaptive DDoS Protection? Contact your account team.

New to Cloudflare? Speak to a Cloudflare expert.

Integrating Network Analytics Logs with your SIEM dashboard

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/network-analytics-logs/

Integrating Network Analytics Logs with your SIEM dashboard

Integrating Network Analytics Logs with your SIEM dashboard

We’re excited to announce the availability of Network Analytics Logs. Magic Transit, Magic Firewall, Magic WAN, and Spectrum customers on the Enterprise plan can feed packet samples directly into storage services, network monitoring tools such as Kentik, or their Security Information Event Management (SIEM) systems such as Splunk to gain near real-time visibility into network traffic and DDoS attacks.

What’s included in the logs

By creating a Network Analytics Logs job, Cloudflare will continuously push logs of packet samples directly to the HTTP endpoint of your choice, including Websockets. The logs arrive in JSON format which makes them easy to parse, transform, and aggregate. The logs include packet samples of traffic dropped and passed by the following systems:

  1. Network-layer DDoS Protection Ruleset
  2. Advanced TCP Protection
  3. Magic Firewall

Note that not all mitigation systems are applicable to all Cloudflare services. Below is a table describing which mitigation service is applicable to which Cloudflare service:

Mitigation System Cloudflare Service
Magic Transit Magic WAN Spectrum
Network-layer DDoS Protection Ruleset
Advanced TCP Protection
Magic Firewall

Packets are processed by the mitigation systems in the order outlined above. Therefore, a packet that passed all three systems may produce three packet samples, one from each system. This can be very insightful when troubleshooting and wanting to understand where in the stack a packet was dropped. To avoid overcounting the total passed traffic, Magic Transit users should only take into consideration the passed packets from the last mitigation system, Magic Firewall.

An example of a packet sample log:

{"AttackCampaignID":"","AttackID":"","ColoName":"bkk06","Datetime":1652295571783000000,"DestinationASN":13335,"Direction":"ingress","IPDestinationAddress":"(redacted)","IPDestinationSubnet":"/24","IPProtocol":17,"IPSourceAddress":"(redacted)","IPSourceSubnet":"/24","MitigationReason":"","MitigationScope":"","MitigationSystem":"magic-firewall","Outcome":"pass","ProtocolState":"","RuleID":"(redacted)","RulesetID":"(redacted)","RulesetOverrideID":"","SampleInterval":100,"SourceASN":38794,"Verdict":"drop"}

All the available log fields are documented here: https://developers.cloudflare.com/logs/reference/log-fields/account/network_analytics_logs/

Setting up the logs

In this walkthrough, we will demonstrate how to feed the Network Analytics Logs into Splunk via Postman. At this time, it is only possible to set up Network Analytics Logs via API. Setting up the logs requires three main steps:

  1. Create a Cloudflare API token.
  2. Create a Splunk Cloud HTTP Event Collector (HEC) token.
  3. Create and enable a Cloudflare Logpush job.

Let’s get started!

1) Create a Cloudflare API token

  1. Log in to your Cloudflare account and navigate to My Profile.
  2. On the left-hand side, in the collapsing navigation menu, click API Tokens.
  3. Click Create Token and then, under Custom token, click Get started.
  4. Give your custom token a name, and select an Account scoped permission to edit Logs. You can also scope it to a specific/subset/all of your accounts.
  5. At the bottom, click Continue to summary, and then Create Token.
  6. Copy and save your token. You can also test your token with the provided snippet in Terminal.

When you’re using an API token, you don’t need to provide your email address as part of the API credentials.

Integrating Network Analytics Logs with your SIEM dashboard

Read more about creating an API token on the Cloudflare Developers website: https://developers.cloudflare.com/api/tokens/create/

2) Create a Splunk token for an HTTP Event Collector

In this walkthrough, we’re using a Splunk Cloud free trial, but you can use almost any service that can accept logs over HTTPS. In some cases, if you’re using an on-premise SIEM solution, you may need to allowlist Cloudflare IP address in your firewall to be able to receive the logs.

  1. Create a Splunk Cloud account. I created a trial account for the purpose of this blog.
  2. In the Splunk Cloud dashboard, go to Settings > Data Input.
  3. Next to HTTP Event Collector, click Add new.
  4. Follow the steps to create a token.
  5. Copy your token and your allocated Splunk hostname and save both for later.
Integrating Network Analytics Logs with your SIEM dashboard

Read more about using Splunk with Cloudflare Logpush on the Cloudflare Developers website: https://developers.cloudflare.com/logs/get-started/enable-destinations/splunk/

Read more about creating an HTTP Event Collector token on Splunk’s website: https://docs.splunk.com/Documentation/Splunk/8.2.6/Data/UsetheHTTPEventCollector

3) Create a Cloudflare Logpush job

Creating and enabling a job is very straightforward. It requires only one API call to Cloudflare to create and enable a job.

To send the API calls I used Postman, which is a user-friendly API client that was recommended to me by a colleague. It allows you to save and customize API calls. You can also use Terminal/CMD or any other API client/script of your choice.

One thing to notice is Network Analytics Logs are account-scoped. The API endpoint is therefore a tad different from what you would normally use for zone-scoped datasets such as HTTP request logs and DNS logs.

This is the endpoint for creating an account-scoped Logpush job:

https://api.cloudflare.com/client/v4/accounts/{account-id}/logpush/jobs

Your account identifier number is a unique identifier of your account. It is a string of 32 numbers and characters. If you’re not sure what your account identifier is, log in to Cloudflare, select the appropriate account, and copy the string at the end of the URL.

https://dash.cloudflare.com/{account-id}

Then, set up a new request in Postman (or any other API client/CLI tool).

To successfully create a Logpush job, you’ll need the HTTP method, URL, Authorization token, and request body (data). The request body must include a destination configuration (destination_conf), the specified dataset (network_analytics_logs, in our case), and the token (your Splunk token).

Integrating Network Analytics Logs with your SIEM dashboard

Method:

POST

URL:

https://api.cloudflare.com/client/v4/accounts/{account-id}/logpush/jobs

Authorization: Define a Bearer authorization in the Authorization tab, or add it to the header, and add your Cloudflare API token.

Body: Select a Raw > JSON

{
"destination_conf": "{your-unique-splunk-configuration}",
"dataset": "network_analytics_logs",
"token": "{your-splunk-hec-tag}",
"enabled": "true"
}

If you’re using Splunk Cloud, then your unique configuration has the following format:

{your-unique-splunk-configuration}=splunk://{your-splunk-hostname}.splunkcloud.com:8088/services/collector/raw?channel={channel-id}&header_Authorization=Splunk%20{your-splunk–hec-token}&insecure-skip-verify=false

Definition of the variables:

{your-splunk-hostname}= Your allocated Splunk Cloud hostname.

{channel-id} = A unique ID that you choose to assign for.`{your-splunk–hec-token}` = The token that you generated for your Splunk HEC.

An important note is that customers should have a valid SSL/TLS certificate on their Splunk instance to support an encrypted connection.

After you’ve done that, you can create a GET request to the same URL (no request body needed) to verify that the job was created and is enabled.

The response should be similar to the following:

{
    "errors": [],
    "messages": [],
    "result": {
        "id": {job-id},
        "dataset": "network_analytics_logs",
        "frequency": "high",
        "kind": "",
        "enabled": true,
        "name": null,
        "logpull_options": null,
        "destination_conf": "{your-unique-splunk-configuration}",
        "last_complete": null,
        "last_error": null,
        "error_message": null
    },
    "success": true
}

Shortly after, you should start receiving logs to your Splunk HEC.

Integrating Network Analytics Logs with your SIEM dashboard

Read more about enabling Logpush on the Cloudflare Developers website: https://developers.cloudflare.com/logs/reference/logpush-api-configuration/examples/example-logpush-curl/

Reduce costs with R2 storage

Depending on the amount of logs that you read and write, the cost of third party cloud storage can skyrocket — forcing you to decide between managing a tight budget and being able to properly investigate networking and security issues. However, we believe that you shouldn’t have to make those trade-offs. With R2’s low costs, we’re making this decision easier for our customers. Instead of feeding logs to a third party, you can reap the cost benefits of storing them in R2.

To learn more about the R2 features and pricing, check out the full blog post. To enable R2, contact your account team.

Cloudflare logs for maximum visibility

Cloudflare Enterprise customers have access to detailed logs of the metadata generated by our products. These logs are helpful for troubleshooting, identifying network and configuration adjustments, and generating reports, especially when combined with logs from other sources, such as your servers, firewalls, routers, and other appliances.

Network Analytics Logs joins Cloudflare’s family of products on Logpush: DNS logs, Firewall events, HTTP requests, NEL reports, Spectrum events, Audit logs, Gateway DNS, Gateway HTTP, and Gateway Network.

Not using Cloudflare yet? Start now with our Free and Pro plans to protect your websites against DDoS attacks, or contact us for comprehensive DDoS protection and firewall-as-a-service for your entire network.

Cloudflare partners with Kentik to enhance on-demand DDoS protection

Post Syndicated from Matt Lewis original https://blog.cloudflare.com/kentik-and-magic-transit/

Cloudflare partners with Kentik to enhance on-demand DDoS protection

We are excited to announce that as of today, network security teams can procure and use Magic Transit, Cloudflare’s industry-leading DDoS mitigation solution, and Kentik’s network observability as an integrated solution. We are excited to help our customers not just with technical simplicity, but business process simplicity as well.

Cloudflare partners with Kentik to enhance on-demand DDoS protection

Why monitoring and mitigation?

Distributed Denial of Service (DDoS) attacks are highly disruptive to businesses everywhere. According to the Cloudflare DDoS Attack Trends report, in the first half of 2021 the world witnessed massive ransomware and ransom DDoS attack campaigns that interrupted critical infrastructure, including oil pipelines, healthcare, and financial services. In the second half, we saw a growing swarm of attacks, including one of the most powerful botnets deployed (Meris), with record-breaking network-layer attacks observed on the Cloudflare network.

Along with an increase in severity, there is a proliferation of automated toolkits that make it simple and cheap for anyone to launch these attacks. Detecting and stopping these attacks manually is not effective, and network security engineers are increasingly turning to automated tools to help ensure network and application availability.

DDoS protection has evolved over the years from appliances to hybrid models to fully Internet-native solutions, like Cloudflare’s Magic Transit. Cloudflare has been protecting millions of Internet properties against DDoS attacks, ensuring they are available at all times. Magic Transit extends Cloudflare’s industry-leading DDoS protection to shield entire IP subnets from DDoS attacks, while also accelerating network traffic, ensuring your data centers, cloud services and corporate networks are always reachable from the Internet. Our powerful global network spanning 250+ cities and 121 Tbps of capacity ensures that customers can have always-on DDoS protection without impacting network latency and application performance. Magic Transit also supports on-demand mode, which allows customers to activate DDoS protection when they need it most.

Network observability becomes critical to understand what normal looks like for your environment so that DDoS attacks are readily detected. Flow-based monitoring helps you understand not only how much traffic is flowing over your network, but also where it came from, where it’s going, and what applications are consuming bandwidth.

Magic Transit protection for every network configuration

Magic Transit is one of the most powerful DDoS mitigation platforms available today. We have worked hard to ensure Magic Transit is flexible enough for the most demanding network architectures. We need to fit into your world, not the other way around. And that involves partnering with leading network observability vendors to give you multiple options for how you choose to protect your network.

With this new partnership, customers can now consume Cloudflare’s Magic Transit service in one of three modes:

  • Always On — Customers looking for fast mitigation and traffic acceleration can deploy Magic Transit in Always On mode.
  • On Demand — Customers can choose to turn on Magic Transit response to a DDoS attack via Cloudflare’s UI or Cloudflare’s Magic Transit API.
  • On Demand + Flow-based Monitoring — Customers can now purchase and deploy an integrated network observability and DDoS protection solution consisting of Cloudflare Magic Transit On Demand and Kentik Protect from a single vendor.

In each configuration, Magic Transit is seamlessly paired with Magic Firewall — our cloud-native firewall-as-a-service.

Why Kentik’s flow-based monitoring?

At Cloudflare, we continuously take feedback from our customers on both our product and on what other tools they use. Customer feedback helps us build our products and how we grow Cloudflare’s Technology Partner Program.

For our Magic Transit customers, we found that many of our customers who chose Magic Transit On Demand have adopted solutions from Kentik, the network observability company with one of the leading flow-based monitoring tools in the ecosystem. Kentik empowers network professionals to plan, run, and fix any network with observability into all their traffic.

Simplifying network security

Cloudflare strives to simplify how customers can shield their network from cybersecurity threats like DDoS attacks. Magic Transit gives network security professionals the confidence that their network resources are immune from DDoS-related outages. We have now extended that same simplicity to this joint solution, making it simple for our customers to procure, provision, and integrate Magic Transit and Kentik. Our end goal is always creating the best experience possible for our customers, with Cloudflare’s services fitting seamlessly into their existing technology stack.

Kentik’s powerful network observability cloud collects flow logs from your network components and continuously learns network behavior, detecting anomalies such as DDoS attacks. Using our native API integration, the Kentik platform can trigger Magic Transit to start attracting network traffic when there’s an attack underway. Magic Transit’s autonomous DDoS mitigation automatically analyzes incoming traffic and filters out DDoS traffic across the entire Cloudflare network, protecting your network from unwanted traffic and avoiding service availability issues and outages.

Together, Kentik and Cloudflare have created a well-supported integration and a more streamlined procurement process to combine Kentik’s best-of-breed network observability and Cloudflare’s industry-leading DDoS protection in Magic Transit. Customers can now receive the best DDoS protection and network observability in a completely SaaS-based offering.

Cloudflare partners with Kentik to enhance on-demand DDoS protection

“We are excited to partner with Cloudflare to make it easier for our mutual customers to integrate our leading technology solutions and deploy industry-leading DDoS protection in a fully SaaS-based environment”, said Mike Mooney, CRO at Kentik.

Conclusion

Now, customers seeking to combine purpose-built, best-of-breed network observability and visualization from Kentik with Cloudflare’s Magic Transit On Demand can do so through a single vendor agreement and an integrated solution.

If you’d like to learn more DDoS attack trends and how Kentik plus Cloudflare combine to provide the leading SaaS-based DDoS protection solution with over 121 Tbps of capacity, review our developer documentation and join our upcoming webinar on April 28 to learn more.

Optimizing Magic Firewall’s IP lists

Post Syndicated from Jordan Griege original https://blog.cloudflare.com/magic-firewall-optimizing-ip-lists/

Optimizing Magic Firewall’s IP lists

Optimizing Magic Firewall’s IP lists

Magic Firewall is Cloudflare’s replacement for network-level firewall hardware. It evaluates gigabits of traffic every second against user-defined rules that can include millions of IP addresses. Writing a firewall rule for each IP address is cumbersome and verbose, so we have been building out support for various IP lists in Magic Firewall—essentially named groups that make the rules easier to read and write. Some users want to reject packets based on our growing threat intelligence of bad actors, while others know the exact set of IPs they want to match, which Magic Firewall supports via the same API as Cloudflare’s WAF.

With all those IPs, the system was using more of our memory budget than we’d like. To understand why, we need to first peek behind the curtain of our magic.

Life inside a network namespace

Magic Transit and Magic WAN enable Cloudflare to route layer 3 traffic, and they are the front door for Magic Firewall. We have previously written about how Magic Transit uses network namespaces to route packets and isolate customer configuration. Magic Firewall operates inside these namespaces, using nftables as the primary implementation of packet filtering.

Optimizing Magic Firewall’s IP lists

When a user makes an API request to configure their firewall, a daemon running on every server detects the change and makes the corresponding changes to nftables. Because nftables configuration is stored within the namespace, each user’s firewall rules are isolated from the next. Every Cloudflare server routes traffic for all of our customers, so it’s important that the firewall only applies a customer’s rules to their own traffic.

Using our existing technologies, we built IP lists using nftables sets. In practice, this means that the wirefilter expression

ip.geoip.country == “US”

is implemented as

table ip mfw {
	set geoip.US {
		typeof ip saddr
		elements = { 192.0.2.0, … }
	}
	chain input {
		ip saddr @geoip.US block
	}
}

This design for the feature made it very easy to extend our wirefilter syntax so that users could write rules using all the different IP lists we support.

Scaling to many customers

We chose this design since sets fit easily into our existing use of nftables. We shipped quickly with just a few, relatively small lists to a handful of customers. This approach worked well at first, but we eventually started to observe a dramatic increase in our system’s memory use. It’s common practice for our team to enable a new feature for just a few customers at first. Here’s what we observed as those customers started using IP lists:

Optimizing Magic Firewall’s IP lists

It turns out you can’t store millions of IPs in the kernel without someone noticing, but the real problem is that we pay that cost for each customer that uses them. Each network namespace gets a complete copy of all the lists that its rules reference. Just to make it perfectly clear, if 10 users have a rule that uses the anonymizer list, then a server has 10 copies of that list stored in the kernel. Yikes!

And it isn’t just the storage of these IPs that caused alarm; the sheer size of the nftables configuration causes nft (the command-line program for configuring nftables) to use quite a bit of compute power.

Optimizing Magic Firewall’s IP lists

Can you guess when the nftables updates take place?  Now the CPU usage wasn’t particularly problematic, as the service is already quite lean. However, those spikes also create competition with other services that heavily use netlink sockets. At one point, a monitoring alert for another service started firing for high CPU, and a fellow development team went through the incident process only to realize our system was the primary contributor to the problem. They made use of the -terse flag to prevent nft from wasting precious time rendering huge lists of IPs, but degrading another team’s service was a harsh way to learn that lesson.

We put quite a bit of effort into working around this problem. We tried doing incremental updates of the sets, only adding or deleting the elements that had changed since the last update. It was helpful sometimes, but the complexity ended up not being worth the small savings. Through a lot of profiling, we found that splitting an update across multiple statements improved the situation some. This means we changed

add element ip mfw set0 { 192.0.2.0, 192.0.2.2, …, 198.51.100.0, 198.51.100.2, … }

to

add element ip mfw set0 { 192.0.2.0, … }
add element ip mfw set0 { 198.51.100.0, … }

just to squeeze out a second or two of reduced time that nft took to process our configuration. We were spending a fair amount of development cycles looking for optimizations, and we needed to find a way to turn the tides.

One more step with eBPF

As we mentioned on our blog recently, Magic Firewall leverages eBPF for some advanced packet-matching. And it may not be a big surprise that eBPF can match an IP in a list, but it wasn’t a tool we thought to reach for a year ago. What makes eBPF relevant to this problem is that eBPF maps exist regardless of a network namespace. That’s right; eBPF gives us a way to share this data across all the network namespaces we create for our customers.

Since several of our IP lists contain ranges, we can’t use a simple BPF_MAP_TYPE_HASH or BPF_MAP_TYPE_ARRAY to perform a lookup. Fortunately, Linux already has a data structure that we can use to accommodate ranges: the BPF_MAP_TYPE_LPM_TRIE.

Given an IP address, the trie’s implementation of bpf_map_lookup_elem() will return a non-null result if the IP falls within any of the ranges already inserted into the map. And using the map is about as simple as it gets for eBPF programs. Here’s an example:

typedef struct {
	__u32 prefix_len;
	__u8 ip[4];
} trie_key_t;

SEC("maps")
struct bpf_map_def ip_list = {
    .type = BPF_MAP_TYPE_LPM_TRIE,
    .key_size = sizeof(trie_key_t),
    .value_size = 1,
    .max_entries = 1,
    .map_flags = BPF_F_NO_PREALLOC,
};

SEC("socket/saddr_in_list")
int saddr_in_list(struct __sk_buff *skb) {
	struct iphdr iph;
	trie_key_t trie_key;

	bpf_skb_load_bytes(skb, 0, &iph, sizeof(iph));

	trie_key.prefix_len = 32;
	trie_key.ip[0] = iph.saddr & 0xff;
	trie_key.ip[1] = (iph.saddr >> 8) & 0xff;
	trie_key.ip[2] = (iph.saddr >> 16) & 0xff;
	trie_key.ip[3] = (iph.saddr >> 24) & 0xff;

	void *val = bpf_map_lookup_elem(&ip_list, &trie_key);
	return val == NULL ? BPF_OK : BPF_DROP;
}

nftables befriends eBPF

One of the limitations of our prior work incorporating eBPF into the firewall involves how the program is loaded into the nftables ruleset. Because the nft command-line program does not support calling a BPF program from a rule, we formatted the Netlink messages ourselves. While this allowed us to insert a rule that executes an eBPF program, it came at the cost of losing out on atomic rule replacements.

So any native nftables rules could be applied in one transaction, but any eBPF-based rules would have to be inserted afterwards. This introduces some headaches for handling failures—if inserting an eBPF rule fails, should we simply retry, or perhaps roll back the nftables ruleset as well? And what if those follow-up operations fail?

In other words, we went from a relatively simple operation—the new firewall configuration either fully applied or didn’t—to a much more complex one. We didn’t want to keep using more and more eBPF without solving this problem. After hacking on the nftables repo for a few days, we came out with a patch that adds a BPF keyword to the nftables configuration syntax.

I won’t cover all the changes, which we hope to upstream soon, but there were two key parts. The first is implementing struct expr_ops and some methods required, which is how nft’s parser knows what to do with each keyword in its configuration language (like the “bpf” keyword we introduced). The second is just a different approach for what we solved previously. Instead of directly formatting struct xt_bpf_info_v1 into a byte array as iptables does, nftables uses the nftnl library to create the netlink messages.

Once we had our footing working in a foreign codebase, that code turned out fairly simple.

static void netlink_gen_bpf(struct netlink_linearize_ctx *ctx,
                            const struct expr *expr,
                            enum nft_registers dreg)
{
       struct nftnl_expr *nle;

       nle = alloc_nft_expr("match");
       nftnl_expr_set_str(nle, NFTNL_EXPR_MT_NAME, "bpf");
       nftnl_expr_set_u8(nle, NFTNL_EXPR_MT_REV, 1);
       nftnl_expr_set(nle, NFTNL_EXPR_MT_INFO, expr->bpf.info, sizeof(struct xt_bpf_info_v1));
       nft_rule_add_expr(ctx, nle, &expr->location);
}

With the patch finally built, it became possible for us to write configuration like this where we can provide a path to any pinned eBPF program.

# nft -f - <<EOF
table ip mfw {
  chain input {
    bpf pinned "/sys/fs/bpf/filter" drop
  }
}
EOF

And now we can mix native nftables matches with eBPF programs without giving up the atomic nature of nft commands. This unlocks the opportunity for us to build even more advanced packet inspection and protocol detection in eBPF.

Results

This plot shows kernel memory during the time of the deployment – the change was rolled out to each namespace over a few hours.

Optimizing Magic Firewall’s IP lists

Though at this point, we’ve merely moved memory out of this slab. Fortunately, the eBPF map memory now appears in the cgroup for our Golang process, so it’s relatively easy to report. Here’s the point at which we first started populating the maps:

Optimizing Magic Firewall’s IP lists

This change was pushed a couple of weeks before the maps were actually used in the firewall (i.e. the graph just above). And it has remained constant ever since—unlike our original design, the number of customers is no longer a predominant factor in this feature’s resource usage.  The new model makes us more confident about how the service will behave as our product continues to grow.

Other than being better positioned to use eBPF more in the future, we made a significant improvement on the efficiency of the product today. In any project that grows rapidly for a while, an engineer can often see a few pesky pitfalls looming on the horizon. It is very satisfying to dive deep into these technologies and come out with a creative, novel solution before experiencing too much pain. We love solving hard problems and improving our systems to handle rapid growth in addition to pushing lots of new features.

Does that sound like an environment you want to work in? Consider applying!

Protect all network traffic with Cloudflare

Post Syndicated from Annika Garbers original https://blog.cloudflare.com/protect-all-network-traffic/

Protect all network traffic with Cloudflare

Protect all network traffic with Cloudflare

Magic Transit protects customers’ entire networks—any port/protocol—from DDoS attacks and provides built-in performance and reliability. Today, we’re excited to extend the capabilities of Magic Transit to customers with any size network, from home networks to offices to large cloud properties, by offering Cloudflare-maintained and Magic Transit-protected IP space as a service.

What is Magic Transit?

Magic Transit extends the power of Cloudflare’s global network to customers, absorbing all traffic destined for your network at the location closest to its source. Once traffic lands at the closest Cloudflare location, it flows through a stack of security protections including industry-leading DDoS mitigation and cloud firewall. Detailed Network Analytics, alerts, and reporting give you deep visibility into all your traffic and attack patterns. Clean traffic is forwarded to your network using Anycast GRE or IPsec tunnels or Cloudflare Network Interconnect. Magic Transit includes load balancing and automatic failover across tunnels to steer traffic across the healthiest path possible, from everywhere in the world.

Protect all network traffic with Cloudflare
Magic Transit architecture: Internet BGP advertisement attracts traffic to Cloudflare’s network, where attack mitigation and security policies are applied before clean traffic is forwarded back to customer networks with an Anycast GRE tunnel or Cloudflare Network Interconnect.

The “Magic” is in our Anycast architecture: every server across our network runs every Cloudflare service, so traffic can be processed wherever it lands. This means the entire capacity of our network—121+Tbps as of this post—is available to block even the largest attacks. It also drives huge benefits for performance versus traditional “scrubbing center” solutions that route traffic to specialized locations for processing, and makes onboarding much easier for network engineers: one tunnel to Cloudflare automatically connects customer infrastructure to our entire network in over 250 cities worldwide.

What’s new?

Historically, Magic Transit has required customers to bring their own IP addresses—a minimum of a class C IP block, or /24—in order to use this service. This is because a /24 is the minimum prefix length that can be advertised via BGP on the public Internet, which is how we attract traffic for customer networks.

But not all customers have this much IP space; we’ve talked to many of you who want IP layer protection for a smaller network than we’re able to advertise to the Internet on your behalf. Today, we’re extending the availability of Magic Transit to customers with smaller networks by offering Magic Transit-protected, Cloudflare-managed IP space. Starting now, you can direct your network traffic to dedicated static IPs and receive all the benefits of Magic Transit including industry leading DDoS protection, visibility, performance, and resiliency.

Let’s talk through some new ways you can leverage Magic Transit to protect and accelerate any network.

Consistent cross-cloud security

Organizations adopting a hybrid or poly-cloud strategy have struggled to maintain consistent security controls across different environments. Where they used to manage a single firewall appliance in a datacenter, security teams now have a myriad of controls across different providers—physical, virtual, and cloud-based—all with different capabilities and control mechanisms.

Cloudflare is the single control plane across your hybrid cloud deployment, allowing you to manage security policies from one place, get uniform protection across your entire environment, and get deep visibility into your traffic and attack patterns.

Protect all network traffic with Cloudflare

Protecting branches of any size

As DDoS attack frequency and variety continues to grow, attackers are getting more creative with angles to target organizations. Over the past few years, we have seen a consistent rise in attacks targeted at corporate infrastructure including internal applications. As the percentage of a corporate network dependent on the Internet continues to grow, organizations need consistent protection across their entire network.

Now, you can get any network location covered—branch offices, stores, remote sites, event venues, and more—with Magic Transit-protected IP space. Organizations can also replace legacy hardware firewalls at those locations with our built-in cloud firewall, which filters bidirectional traffic and propagates changes globally within seconds.

Protect all network traffic with Cloudflare

Keeping streams alive without worrying about leaked IPs

Generally, DDoS attacks target a specific application or network in order to impact the availability of an Internet-facing resource. But you don’t have to be hosting anything in order to get attacked, as many gamers and streamers have unfortunately discovered. The public IP associated with a home network can easily be leaked, giving attackers the ability to directly target and take down a live stream.

As a streamer, you can now route traffic from your home network through a Magic Transit-protected IP. This means no more worrying about leaking your IP: attackers targeting you will have traffic blocked at the closest Cloudflare location to them, far away from your home network. And no need to worry about impact to your game: thanks to Cloudflare’s globally distributed and interconnected network, you can get protected without sacrificing performance.

Protect all network traffic with Cloudflare

Get started today

This solution is available today; learn more or contact your account team to get started.

How to customize your layer 3/4 DDoS protection settings

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/l34-ddos-managed-rules/

How to customize your layer 3/4 DDoS protection settings

How to customize your layer 3/4 DDoS protection settings

After initially providing our customers control over the HTTP-layer DDoS protection settings earlier this year, we’re now excited to extend the control our customers have to the packet layer. Using these new controls, Cloudflare Enterprise customers using the Magic Transit and Spectrum services can now tune and tweak their L3/4 DDoS protection settings directly from the Cloudflare dashboard or via the Cloudflare API.

The new functionality provides customers control over two main DDoS rulesets:

  1. Network-layer DDoS Protection ruleset — This ruleset includes rules to detect and mitigate DDoS attacks on layer 3/4 of the OSI model such as UDP floods, SYN-ACK reflection attacks, SYN Floods, and DNS floods. This ruleset is available for Spectrum and Magic Transit customers on the Enterprise plan.
  2. Advanced TCP Protection ruleset — This ruleset includes rules to detect and mitigate sophisticated out-of-state TCP attacks such as spoofed ACK Floods, Randomized SYN Floods, and distributed SYN-ACK Reflection attacks. This ruleset is available for Magic Transit customers only.

To learn more, review our DDoS Managed Ruleset developer documentation. We’ve put together a few guides that we hope will be helpful for you:

  1. Onboarding & getting started with Cloudflare DDoS protection
  2. Handling false negatives
  3. Handling false positives
  4. Best practices when using VPNs, VoIP, and other third-party services
  5. How to simulate a DDoS attack

Cloudflare’s DDoS Protection

A Distributed Denial of Service (DDoS) attack is a type of cyberattack that aims to disrupt the victim’s Internet services. There are many types of DDoS attacks, and they can be generated by attackers at different layers of the Internet. One example is the HTTP flood. It aims to disrupt HTTP application servers such as those that power mobile apps and websites. Another example is the UDP flood. While this type of attack can be used to disrupt HTTP servers, it can also be used in an attempt to disrupt non-HTTP applications. These include TCP-based and UDP-based applications, networking services such as VoIP services, gaming servers, cryptocurrency, and more.

How to customize your layer 3/4 DDoS protection settings

To defend organizations against DDoS attacks, we built and operate software-defined systems that run autonomously. They automatically detect and mitigate DDoS attacks across our entire network. You can read more about our autonomous DDoS protection systems and how they work in our deep-dive technical blog post.

How to customize your layer 3/4 DDoS protection settings

Unmetered and unlimited DDoS Protection

The level of protection that we offer is unmetered and unlimited — It is not bounded by the size of the attack, the number of the attacks, or the duration of the attacks. This is especially important these days because as we’ve recently seen, attacks are getting larger and more frequent. Consequently, in Q3, network-layer attacks increased by 44% compared to the previous quarter. Furthermore, just recently, our systems automatically detected and mitigated a DDoS attack that peaked just below 2 Tbps — the largest we’ve seen to date.

How to customize your layer 3/4 DDoS protection settings
Mirai botnet launched an almost 2 Tbps DDoS attack

Read more about recent DDoS trends.

Managed Rulesets

You can think of our autonomous DDoS protection systems as groups (rulesets) of intelligent rules. There are rulesets of HTTP DDoS Protection rules, Network-layer DDoS Protection rules and Advanced TCP Protection rules. In this blog post, we will cover the latter two rulesets. We’ve already covered the former in the blog post How to customize your HTTP DDoS protection settings.

How to customize your layer 3/4 DDoS protection settings
Cloudflare L3/4 DDoS Managed Rules

In the Network-layer DDoS Protection rulesets, each rule has a unique set of conditional fingerprints, dynamic field masking, activation thresholds, and mitigation actions. These rules are managed (by Cloudflare), meaning that the specifics of each rule is curated in-house by our DDoS experts. Before deploying a new rule, it is first rigorously tested and optimized for mitigation accuracy and efficiency across our entire global network.

In the Advanced TCP Protection ruleset, we use a novel TCP state classification engine to identify the state of TCP flows. The engine powering this ruleset is flowtrackd — you can read more about it in our announcement blog post. One of the unique features of this system is that it is able to operate using only the ingress (inbound) packet flows. The system sees only the ingress traffic and is able to drop, challenge, or allow packets based on their legitimacy. For example, a flood of ACK packets that don’t correspond to open TCP connections will be dropped.

How attacks are detected and mitigated

Sampling

Initially, traffic is routed through the Internet via BGP Anycast to the nearest Cloudflare edge data center. Once the traffic reaches our data center, our DDoS systems sample it asynchronously allowing for out-of-path analysis of traffic without introducing latency penalties. The Advanced TCP Protection ruleset needs to view the entire packet flow and so it sits inline for Magic Transit customers only. It, too, does not introduce any latency penalties.

Analysis & mitigation

The analysis for the Advanced TCP Protection ruleset is straightforward and efficient. The system qualifies TCP flows and tracks their state. In this way, packets that don’t correspond to a legitimate connection and its state are dropped or challenged. The mitigation is activated only above certain thresholds that customers can define.

The analysis for the Network-layer DDoS Protection ruleset is done using data streaming algorithms. Packet samples are compared to the conditional fingerprints and multiple real-time signatures are created based on the dynamic masking. Each time another packet matches one of the signatures, a counter is increased. When the activation threshold is reached for a given signature, a mitigation rule is compiled and pushed inline. The mitigation rule includes the real-time signature and the mitigation action, e.g., drop.

How to customize your layer 3/4 DDoS protection settings

​​​​Example

As a simple example, one fingerprint could include the following fields: source IP, source port, destination IP, and the TCP sequence number. A packet flood attack with a fixed sequence number would match the fingerprint and the counter would increase for every packet match until the activation threshold is exceeded. Then a mitigation action would be applied.

However, in the case of a spoofed attack where the source IP addresses and ports are randomized, we would end up with multiple signatures for each combination of source IP and port. Assuming a sufficiently randomized/distributed attack, the activation thresholds would not be met and mitigation would not occur. For this reason, we use dynamic masking, i.e. ignoring fields that may not be a strong indicator of the signature. By masking (ignoring) the source IP and port, we would be able to match all the attack packets based on the unique TCP sequence number regardless of how randomized/distributed the attack is.

Configuring the DDoS Protection Settings

For now, we’ve only exposed a handful of the Network-layer DDoS protection rules that we’ve identified as the ones most prone to customizations. We will be exposing more and more rules on a regular basis. This shouldn’t affect any of your traffic.

How to customize your layer 3/4 DDoS protection settings
Overriding the sensitivity level and mitigation action

For the Network-layer DDoS Protection ruleset, for each of the available rules, you can override the sensitivity level (activation threshold), customize the mitigation action, and apply expression filters to exclude/include traffic from the DDoS protection system based on various packet fields. You can create multiple overrides to customize the protection for your network and your various applications.

How to customize your layer 3/4 DDoS protection settings
Configuring expression fields for the DDoS Managed Rules to match on

In the past, you’d have to go through our support channels to customize the rules. In some cases, this may have taken longer to resolve than desired. With today’s announcement, you can tailor and fine-tune the settings of our autonomous edge system by yourself to quickly improve the accuracy of the protection for your specific network needs.

For the Advanced TCP Protection ruleset, for now, we’ve only exposed the ability to enable or disable it as a whole in the dashboard. To enable or disable the ruleset per IP prefix, you must use the API. At this time, when initially onboarding to Cloudflare, the Cloudflare team must first create a policy for you. After onboarding, if you need to change the sensitivity thresholds, use Monitor mode, or add filter expressions you must contact Cloudflare Support. In upcoming releases, this too will be available via the dashboard and API without requiring help from our Support team.

How to customize your layer 3/4 DDoS protection settings

Pre-existing customizations

If you previously contacted Cloudflare Support to apply customizations, your customizations have been preserved, and you can visit the dashboard to view the settings of the Network-layer DDoS Protection ruleset and change them if you need. If you require any changes to your Advanced TCP Protection customizations, please reach out to Cloudflare Support.

If so far you didn’t have the need to customize this protection, there is no action required on your end. However, if you would like to view and customize your DDoS protection settings, follow this dashboard guide or review the API documentation to programmatically configure the DDoS protection settings.

Helping Build a Better Internet

At Cloudflare, everything we do is guided by our mission to help build a better Internet. The DDoS team’s vision is derived from this mission: our goal is to make the impact of DDoS attacks a thing of the past. Our first step was to build the autonomous systems that detect and mitigate attacks independently. Done. The second step was to expose the control plane over these systems to our customers (announced today). Done. The next step will be to fully automate the configuration with an auto-pilot feature — training the systems to learn your specific traffic patterns to automatically optimize your DDoS protection settings. You can expect many more improvements, automations, and new capabilities to keep your Internet properties safe, available, and performant.

Not using Cloudflare yet? Start now.

Magic Firewall gets Smarter

Post Syndicated from Achiel van der Mandele original https://blog.cloudflare.com/magic-firewall-gets-smarter/

Magic Firewall gets Smarter

Magic Firewall gets Smarter

Today, we’re very excited to announce a set of updates to Magic Firewall, adding security and visibility features that are key in modern cloud firewalls. To improve security, we’re adding threat intel integration and geo-blocking. For visibility, we’re adding packet captures at the edge, a way to see packets arrive at the edge in near real-time.

Magic Firewall is our network-level firewall which is delivered through Cloudflare to secure your enterprise. Magic Firewall covers your remote users, branch offices, data centers and cloud infrastructure. Best of all, it’s deeply integrated with Cloudflare, giving you a one-stop overview of everything that’s happening on your network.

A brief history of firewalls

We talked a lot about firewalls on Monday, including how our firewall-as-a-service solution is very different from traditional firewalls and helps security teams that want sophisticated inspections at the Application Layer. When we talk about the Application Layer, we’re referring to OSI Layer 7. This means we’re applying security features using semantics of the protocol. The most common example is HTTP, the protocol you’re using to visit this website. We have Gateway and our WAF to protect inbound and outbound HTTP requests, but what about Layer 3 and Layer 4 capabilities? Layer 3 and 4 refer to the packet and connection levels. These security features aren’t applied to HTTP requests, but instead to IP packets and (for example) TCP connections. A lot of folks in the CIO organization want to add extra layers of security and visibility without resorting to decryption at Layer 7. We’re excited to talk to you about two sets of new features that will make your lives easier: geo-blocking and threat intel integration to improve security posture, and packet captures to get you better visibility.

Threat Intel and IP Lists

Magic Firewall is great if you know exactly what you want to allow and block. You can put in rules that match exactly on IP source and destination, as well as bitslicing to verify the contents of various packets. However, there are many situations in which you don’t exactly know who the bad and good actors are: is this IP address that’s trying to access my network a perfectly fine consumer, or is it part of a botnet that’s trying to attack my network?

The same goes the other way: whenever someone inside your network is trying to create a connection to the Internet, how do you know whether it’s an obscure blog or a malware website? Clearly, you don’t want to play whack-a-mole and try to keep track of every malicious actor on the Internet by yourself. For most security teams, it’s nothing more than a waste of time! You’d much rather rely on a company that makes it their business to focus on this.

Today, we’re announcing Magic Firewall support for our in-house Threat Intelligence feed. Cloudflare sees approximately 28 million HTTP requests each second and blocks 76 billion cyber threats each day. With almost 20% of the top 10 million Alexa websites on Cloudflare, we see a lot of novel threats pop up every day. We use that data to detect malicious actors on the Internet and turn it into a list of known malicious IPs. And we don’t stop there: we also integrate with a number of third party vendors to augment our coverage.

To match on any of the threat intel lists, just set up a rule in the UI as normal:

Magic Firewall gets Smarter

Threat intel feed categories include Malware, Anonymizer and Botnet Command-and-Control centers. Malware and Botnet lists cover properties on the Internet distributing malware and known command and control centers. Anonymizers contain a list of known forward proxies that allow attackers to hide their IP addresses.

In addition to the managed lists, you also have the flexibility of creating your own lists, either to add your own known set of malicious IPs or to make management of your known good network endpoints easier. As an example, you may want to create a list of all your own servers. That way, you can easily block traffic to and from it from any rule, without having to replicate the list each time.

Another particularly gnarly problem that many of our customers deal with is geo restrictions. Many are restricted in where they are allowed (or want to) accept traffic from and to. The challenge here is that nothing about an IP address tells you anything about the geolocation of it. And even worse, IP addresses regularly change hands, moving from one country to the other.

As of today, you can easily block or allow traffic to any country, without the management hassle that comes with maintaining lists yourself. Country lists are kept up to date entirely by Cloudflare, all you need to do is set up a rule matching on the country and we’ll take care of the rest.

Magic Firewall gets Smarter

Packet captures at the edge

Finally, we’re releasing a very powerful feature: packet captures at the edge. A packet capture is a pcap file that contains all packets that were seen by a particular network box (usually a firewall or router) during a specific time frame. Packet captures are useful if you want to debug your network: why can’t my users connect to a particular website? Or you may want to get better visibility into a DDoS attack, so you can put up better firewall rules.

Traditionally, you’d log into your router or firewall and start up something like tcpdump. You’d set up a filter to only match on certain packets (packet capture files can quickly get very big) and grab the file. But what happens if you want coverage across your entire network: on-premises, offices and all your cloud environments? You’ll likely have different vendors for each of those locations and have to figure out how to get packet captures from all of them. Even worse, some of them might not even support grabbing packet captures.

With Magic Firewall, grabbing packet captures across your entire network becomes simple: because you run a single network-firewall-as-a-service, you can grab packets across your entire network in one go. This gets you instant visibility into exactly where that particular IP is interacting with your network, regardless of physical or virtual location. You have the option of grabbing all network traffic (warning, it might be a lot!) or set a filter to only grab a subset. Filters follow the same Wireshark syntax that Magic Firewall rules use:

(ip.src in $cf.anonymizer)

We think these are great additions to Magic Firewall, giving you powerful primitives to police traffic and tooling to gain visibility into what’s actually going on in your network. Threat Intel, geo blocking and IP lists are all available today — reach out to your account team to have them activated. Packet captures is entering early access later in December. Similarly, if you’re interested, please reach out to your account team!

How We Used eBPF to Build Programmable Packet Filtering in Magic Firewall

Post Syndicated from Chris J Arges original https://blog.cloudflare.com/programmable-packet-filtering-with-magic-firewall/

How We Used eBPF to Build Programmable Packet Filtering in Magic Firewall

How We Used eBPF to Build Programmable Packet Filtering in Magic Firewall

Cloudflare actively protects services from sophisticated attacks day after day. For users of Magic Transit, DDoS protection detects and drops attacks, while Magic Firewall allows custom packet-level rules, enabling customers to deprecate hardware firewall appliances and block malicious traffic at Cloudflare’s network. The types of attacks and sophistication of attacks continue to evolve, as recent DDoS and reflection attacks against VoIP services targeting protocols such as Session Initiation Protocol (SIP) have shown. Fighting these attacks requires pushing the limits of packet filtering beyond what traditional firewalls are capable of. We did this by taking best of class technologies and combining them in new ways to turn Magic Firewall into a blazing fast, fully programmable firewall that can stand up to even the most sophisticated of attacks.

Magical Walls of Fire

Magic Firewall is a distributed stateless packet firewall built on Linux nftables. It runs on every server, in every Cloudflare data center around the world. To provide isolation and flexibility, each customer’s nftables rules are configured within their own Linux network namespace.

How We Used eBPF to Build Programmable Packet Filtering in Magic Firewall

This diagram shows the life of an example packet when using Magic Transit, which has Magic Firewall built in. First, packets go into the server and DDoS protections are applied, which drops attacks as early as possible. Next, the packet is routed into a customer-specific network namespace, which applies the nftables rules to the packets. After this, packets are routed back to the origin via a GRE tunnel. Magic Firewall users can construct firewall statements from a single API, using a flexible Wirefilter syntax. In addition, rules can be configured via the Cloudflare dashboard, using friendly UI drag and drop elements.

Magic Firewall provides a very powerful syntax for matching on various packet parameters, but it is also limited to the matches provided by nftables. While this is more than sufficient for many use cases, it does not provide enough flexibility to implement the advanced packet parsing and content matching we want. We needed more power.

Hello eBPF, meet Nftables!

When looking to add more power to your Linux networking needs, Extended Berkeley Packet Filter (eBPF) is a natural choice. With eBPF, you can insert packet processing programs that execute in the kernel, giving you the flexibility of familiar programming paradigms with the speed of in-kernel execution. Cloudflare loves eBPF and this technology has been transformative in enabling many of our products. Naturally, we wanted to find a way to use eBPF to extend our use of nftables in Magic Firewall. This means being able to match, using an eBPF program within a table and chain as a rule. By doing this we can have our cake and eat it too, by keeping our existing infrastructure and code, and extending it further.

If nftables could leverage eBPF natively, this story would be much shorter; alas, we had to continue our quest. To get us started in our search, we know that iptables integrates with eBPF. For example, one can use iptables and a pinned eBPF program for dropping packets with the following command:

iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/match -j DROP

This clue helped to put us on the right path. Iptables uses the xt_bpf extension to match on an eBPF program. This extension uses the BPF_PROG_TYPE_SOCKET_FILTER eBPF program type, which allows us to load the packet information from the socket buffer and return a value based on our code.

Since we know iptables can use eBPF, why not just use that? Magic Firewall currently leverages nftables, which is a great choice for our use case due to its flexibility in syntax and programmable interface. Thus, we need to find a way to use the xt_bpf extension with nftables.

How We Used eBPF to Build Programmable Packet Filtering in Magic Firewall

This diagram helps explain the relationship between iptables, nftables and the kernel. The nftables API can be used by both the iptables and nft userspace programs, and can configure both xtables matches (including xt_bpf) and normal nftables matches.

This means that given the right API calls (netlink/netfilter messages), we can embed an xt_bpf match within an nftables rule. In order to do this, we need to understand which netfilter messages we need to send. By using tools such as strace, Wireshark, and especially using the source we were able to construct a message that could append an eBPF rule given a table and chain.

NFTA_RULE_TABLE table
NFTA_RULE_CHAIN chain
NFTA_RULE_EXPRESSIONS | NFTA_MATCH_NAME
	NFTA_LIST_ELEM | NLA_F_NESTED
	NFTA_EXPR_NAME "match"
		NLA_F_NESTED | NFTA_EXPR_DATA
		NFTA_MATCH_NAME "bpf"
		NFTA_MATCH_REV 1
		NFTA_MATCH_INFO ebpf_bytes	

The structure of the netlink/netfilter message to add an eBPF match should look like the above example. Of course, this message needs to be properly embedded and include a conditional step, such as a verdict, when there is a match. The next step was decoding the format of `ebpf_bytes` as shown in the example below.

 struct xt_bpf_info_v1 {
	__u16 mode;
	__u16 bpf_program_num_elem;
	__s32 fd;
	union {
		struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR];
		char path[XT_BPF_PATH_MAX];
	};
};

The bytes format can be found in the kernel header definition of struct xt_bpf_info_v1. The code example above shows the relevant parts of the structure.

The xt_bpf module supports both raw bytecodes, as well as a path to a pinned ebpf program. The later mode is the technique we used to combine the ebpf program with nftables.

With this information we were able to write code that could create netlink messages and properly serialize any relevant data fields. This approach was just the first step, we are also looking into incorporating this into proper tooling instead of sending custom netfilter messages.

Just Add eBPF

Now we needed to construct an eBPF program and load it into an existing nftables table and chain. Starting to use eBPF can be a bit daunting. Which program type do we want to use? How do we compile and load our eBPF program? We started this process by doing some exploration and research.

First we constructed an example program to try it out.

SEC("socket")
int filter(struct __sk_buff *skb) {
  /* get header */
  struct iphdr iph;
  if (bpf_skb_load_bytes(skb, 0, &iph, sizeof(iph))) {
    return BPF_DROP;
  }

  /* read last 5 bytes in payload of udp */
  __u16 pkt_len = bswap_16(iph.tot_len);
  char data[5];
  if (bpf_skb_load_bytes(skb, pkt_len - sizeof(data), &data, sizeof(data))) {
    return BPF_DROP;
  }

  /* only packets with the magic word at the end of the payload are allowed */
  const char SECRET_TOKEN[5] = "xyzzy";
  for (int i = 0; i < sizeof(SECRET_TOKEN); i++) {
    if (SECRET_TOKEN[i] != data[i]) {
      return BPF_DROP;
    }
  }

  return BPF_OK;
}

The excerpt mentioned is an example of an eBPF program that only accepts packets that have a magic string at the end of the payload. This requires checking the total length of the packet to find where to start the search. For clarity, this example omits error checking and headers.

Once we had a program, the next step was integrating it into our tooling. We tried a few technologies to load the program, like BCC, libbpf, and we even created a custom loader. Ultimately, we ended up using cilium’s ebpf library, since we are using Golang for our control-plane program and the library makes it easy to generate, embed and load eBPF programs.

# nft list ruleset
table ip mfw {
	chain input {
		#match bpf pinned /sys/fs/bpf/mfw/match drop
	}
}

Once the program is compiled and pinned, we can add matches into nftables using netlink commands. Listing the ruleset shows the match is present. This is incredible! We are now able to deploy custom C programs to provide advanced matching inside a Magic Firewall ruleset!

More Magic

With the addition of eBPF to our toolkit, Magic Firewall is an even more flexible and powerful way to protect your network from bad actors. We are now able to look deeper into packets and implement more complex matching logic than nftables alone could provide. Since our firewall is running as software on all Cloudflare servers, we can quickly iterate and update features.

One outcome of this project is SIP protection, which is currently in beta. That’s only the beginning. We are currently exploring using eBPF for protocol validations, advanced field matching, looking into payloads, and supporting even larger sets of IP lists.

We welcome your help here, too! If you have other use cases and ideas, please talk to your account team. If you find this technology interesting, come join our team!

Introducing Cloudflare’s Technology Partner Program

Post Syndicated from Matt Lewis original https://blog.cloudflare.com/technology-partner-program/

Introducing Cloudflare’s Technology Partner Program

The Internet is built on a series of shared protocols, all working in harmony to deliver the collective experience that has changed the way we live and work. These open standards have created a platform such that a myriad of companies can build unique services and products that work together seamlessly. As a steward and supporter of an open Internet, we aspire to provide an interoperable platform that works with all the complementary technologies that our customers use across their technology stack. This has been the guiding principle for the multiple partnerships we have launched over the last few years.  

One example is our Bandwidth Alliance — launched in 2018, this alliance with 18 cloud and storage providers aims to reduce egress fees, also known as data transfer fees, for our customers. The Bandwidth Alliance has broken the norms of the cloud industry so that customers can move data more freely. Since then, we have launched several technology partner programs with over 40+ partners, including:

  • Analytics — Visualize Cloudflare logs and metrics easily, and help customers better understand events and trends from websites and applications on the Cloudflare network.
  • Network Interconnect — Partnerships with best-in-class Interconnection platforms offer private, secure, software-defined links with near instant-turn-up of ports.
  • Endpoint Protection Partnerships — With these integrations, every connection to our customer’s corporate application gets an additional layer of identity assurance without the need to connect to VPN.
  • Identity Providers — Easily integrate your organization’s single sign-on provider and benefit from the ease-of-use and functionality of Cloudflare Access.
Introducing Cloudflare’s Technology Partner Program

These partner programs have helped us serve our customers better alongside our partners with our complementary solutions. The integrations we have driven have made it easy for thousands of customers to use Cloudflare with other parts of their stack.

We aim to continue expanding the Cloudflare Partner Network to make it seamless for our customers to use Cloudflare. To support our growing ecosystem of partners, we are excited to launch our Technology Partner Program.

Announcing Cloudflare’s Technology Partner Program

Cloudflare’s Technology Partner Program facilitates innovative integrations that create value for our customers, our technology partners, and Cloudflare. Our partners not only benefit from technical integrations with us, but also have the opportunity to drive sales and marketing efforts to better serve mutual customers and prospects.

This program offers a guiding structure so that our partners can benefit across three key areas:

  • Build with Cloudflare: Sandbox access to Cloudflare enterprise features and APIs to build and test integrations. Opportunity to collaborate with Cloudflare’s product teams to build innovative solutions.
  • Market with Cloudflare: Develop joint solution brief and host joint events to drive awareness and adoption of integrations. Leverage a range of our partners tools and resources to bring our joint solutions to market.
  • Sell with Cloudflare: Align with our sales teams to jointly target relevant customer segments across geographies.

Technology Partner Tiers

Depending on the maturity of the integration and fit with Cloudflare’s product portfolio, we have two types of partners:

  • Strategic partners: Strategic partners have mature integrations across the Cloudflare product suite. They are leaders in their industries and have a significant overlap with our customer base. These partners are strategically aligned with our sales and marketing efforts, and they collaborate with our product teams to bring innovative solutions to market.
  • Integration partners: Integration partners are early participants in Cloudflare’s partnership ecosystem. They already have or are on a path to build validated, functional integrations with Cloudflare. These partners have programmatic access to resources that will help them experiment with and build integrations with Cloudflare.

Work with Us

If you are interested in working with our Technology Partnerships team to develop and bring to market a joint solution, we’d love to hear from you!  Partners can complete the application on our Technology Partner Program website and we will reach out quickly to discuss how we can help build solutions for our customers together.

Magic makes your network faster

Post Syndicated from Annika Garbers original https://blog.cloudflare.com/magic-makes-your-network-faster/

Magic makes your network faster

Magic makes your network faster

We launched Magic Transit two years ago, followed more recently by its siblings Magic WAN and Magic Firewall, and have talked at length about how this suite of products helps security teams sleep better at night by protecting entire networks from malicious traffic. Today, as part of Speed Week, we’ll break down the other side of the Magic: how using Cloudflare can automatically make your entire network faster. Our scale and interconnectivity, use of data to make more intelligent routing decisions, and inherent architecture differences versus traditional networks all contribute to performance improvements across all IP traffic.

What is Magic?

Cloudflare’s “Magic” services help customers connect and secure their networks without the cost and complexity of maintaining legacy hardware. Magic Transit provides connectivity and DDoS protection for Internet-facing networks; Magic WAN enables customers to replace legacy WAN architectures by routing private traffic through Cloudflare; and Magic Firewall protects all connected traffic with a built-in firewall-as-a-service. All three share underlying architecture principles that form the basis of the performance improvements we’ll dive deeper into below.

Anycast everything

In contrast to traditional “point-to-point” architecture, Cloudflare uses Anycast GRE or IPsec (coming soon) tunnels to send and receive traffic for customer networks. This means that customers can set up a single tunnel to Cloudflare, but effectively get connected to every single Cloudflare location, dramatically simplifying the process to configure and maintain network connectivity.

Magic makes your network faster

Every service everywhere

In addition to being able to send and receive traffic from anywhere, Cloudflare’s edge network is also designed to run every service on every server in every location. This means that incoming traffic can be processed wherever it lands, which allows us to block DDoS attacks and other malicious traffic within seconds, apply firewall rules, and route traffic efficiently and without bouncing traffic around between different servers or even different locations before it’s dispatched to its destination.

Zero Trust + Magic: the next-gen network of the future

With Cloudflare One, customers can seamlessly combine Zero Trust and network connectivity to build a faster, more secure, more reliable experience for their entire corporate network. Everything we’ll talk about today applies even more to customers using the entire Cloudflare One platform – stacking these products together means the performance benefits multiply (check out our post on Zero Trust and speed from today for more on this).

More connectivity = faster traffic

So where does the Magic come in? This part isn’t intuitive, especially for customers using Magic Transit in front of their network for DDoS protection: how can adding a network hop subtract latency?

The answer lies in Cloudflare’s network architecture — our web of connectivity to the rest of the Internet. Cloudflare has invested heavily in building one of the world’s most interconnected networks (9800 interconnections and counting, including with major ISPs, cloud services, and enterprises). We’re also continuing to grow our own private backbone and giving customers the ability to directly connect with us. And our expansive connectivity to last mile providers means we’re just milliseconds away from the source of all your network traffic, regardless of where in the world your users or employees are.

This toolkit of varying connectivity options means traffic routed through the Cloudflare network is often meaningfully faster than paths across the public Internet alone, because more options available for BGP path selection mean increased ability to choose more performant routes. Imagine having only one possible path between your house and the grocery store versus ten or more – chances are, adding more options means better alternatives will be available. A cornucopia of connectivity methods also means more resiliency: if there’s an issue on one of the paths (like construction happening on what is usually the fastest street), we can easily route around it to avoid impact to your traffic.

One common comparison customers are interested in is latency for inbound traffic. From the end user perspective, does routing through Cloudflare speed up or slow down traffic to networks protected by Magic Transit? Our response: let’s test it out and see! We’ve repeatedly compared Magic Transit vs standard Internet performance for customer networks across geographies and industries and consistently seen really exciting results. Here’s an example from one recent test where we used third-party probes to measure the ping time to the same customer network location (their data center in Qatar) before and after onboarding with Magic Transit:

Probe location RTT w/o Magic (ms) RTT w/ Magic (ms) Difference (ms) Difference (% improvement)
Dubai 27 23 4 13%
Marseille 202 188 13 7%
Global (results averaged across 800+ distributed probes) 194 124 70 36%

All of these results were collected without the use of Argo Smart Routing for Packets, which we announced on Tuesday. Early data indicates that networks using Smart Routing will see even more substantial gains.

Modern architecture eliminates traffic trombones

In addition to the performance boost available for traffic routed across the Cloudflare network versus the public Internet, customers using Magic products benefit from a new architecture model that totally removes up to thousands of miles worth of latency.

Traditionally, enterprises adopted a “hub and spoke” model for granting employees access to applications within and outside their network. All traffic from within a connected network location was routed through a central “hub” where a stack of network hardware (e.g. firewalls) was maintained. This model worked great in locations where the hub and spokes were geographically close, but started to strain as companies became more global and applications moved to the cloud.

Now, networks using hub and spoke architecture are often backhauling traffic thousands of miles, between continents and across oceans, just to apply security policies before packets are dispatched to their final destination, which is often physically closer to where they started! This creates a “trombone” effect, where precious seconds are wasted bouncing traffic back and forth across the globe, and performance problems are amplified by packet loss and instability along the way.

Magic makes your network faster

Network and security teams have tried to combat this issue by installing hardware at more locations to establish smaller, regional hubs, but this quickly becomes prohibitively expensive and hard to manage. The price of purchasing multiple hardware boxes and dedicated private links adds up quickly, both in network gear and connectivity itself as well as the effort required to maintain additional infrastructure. Ultimately, this cost usually outweighs the benefit of the seconds regained with shorter network paths.

The “hub” is everywhere

There’s a better way — with the Anycast architecture of Magic products, all traffic is automatically routed to the closest Cloudflare location to its source. There, security policies are applied with single-pass inspection before traffic is routed to its destination. This model is conceptually similar to a hub and spoke, except that the hub is everywhere: 95% of the entire Internet-connected world is within 50 ms of a Cloudflare location (check out this week’s updates on our quickly-expanding network presence for the latest). This means instead of tromboning traffic between locations, it can stop briefly at a Cloudflare hop in-path before it goes on its way: dramatically faster architecture without compromising security.

To demonstrate how this architecture shift can make a meaningful difference, we created a lab to mirror the setup we’ve heard many customers describe as they’ve explained performance issues with their existing network. This example customer network is headquartered in South Carolina and has branch office locations on the west coast, in California and Oregon. Traditionally, traffic from each branch would be backhauled through the South Carolina “hub” before being sent on to its destination, either another branch or the public Internet.

In our alternative setup, we’ve connected each customer network location to Cloudflare with an Anycast GRE tunnel, simplifying configuration and removing the South Carolina trombone. We can also enforce network and application-layer filtering on all of this traffic, ensuring that the faster network path doesn’t compromise security.

Magic makes your network faster

Here’s a summary of results from performance tests on this example network demonstrating the difference between the traditional hub and spoke setup and the Magic “global hub” — we saw up to 70% improvement in these tests, demonstrating the dramatic impact this architecture shift can make.

LAX <> OR (ms)
ICMP round-trip for “Regular” (hub and spoke) WAN 127
ICMP round-trip for Magic WAN 38
Latency savings for Magic WAN vs “Regular” WAN 70%

This effect can be amplified for networks with globally distributed locations — imagine the benefits for customers who are used to delays from backhauling traffic between different regions across the world.

Getting smarter

Adding more connectivity options and removing traffic trombones provide a performance boost for all Magic traffic, but we’re not stopping there. In the same way we leverage insights from hundreds of billions of requests per day to block new types of malicious traffic, we’re also using our unique perspective on Internet traffic to make more intelligent decisions about routing customer traffic versus relying on BGP alone. Earlier this week, we announced updates to Argo Smart Routing including the brand-new Argo Smart Routing for Packets. Customers using Magic products can enable it to automatically boost performance for any IP traffic routed through Cloudflare (by 10% on average according to results so far, and potentially more depending on the individual customer’s network topology) — read more on this in the announcement blog.

What’s next?

The modern architecture, well-connected network, and intelligent optimizations we’ve talked about today are just the start. Our vision is for any customer using Magic to connect and protect their network to have the best performance possible for all of their traffic, automatically. We’ll keep investing in expanding our presence, interconnections, and backbone, as well as continuously improving Smart Routing — but we’re also already cooking up brand-new products in this space to deliver optimizations in new ways, including WAN Optimization and Quality of Service functions. Stay tuned for more Magic coming soon, and get in touch with your account team to learn more about how we can help make your network faster starting today.

Making Magic Transit health checks faster and more responsive

Post Syndicated from Meyer Zinn original https://blog.cloudflare.com/making-magic-transit-health-checks-faster-and-more-responsive/

Making Magic Transit health checks faster and more responsive

Making Magic Transit health checks faster and more responsive

Magic Transit advertises our customer’s IP prefixes directly from our edge network, applying DDoS mitigation and firewall policies to all traffic destined for the customer’s network. After the traffic is scrubbed, we deliver clean traffic to the customer over GRE tunnels (over the public Internet or Cloudflare Network Interconnect). But sometimes, we experience inclement weather on the Internet: network paths between Cloudflare and the customer can become unreliable or go down. Customers often configure multiple tunnels through different network paths and rely on Cloudflare to pick the best tunnel to use if, for example, some router on the Internet is having a stormy day and starts dropping traffic.

Making Magic Transit health checks faster and more responsive

Because we use Anycast GRE, every server across Cloudflare’s 200+ locations globally can send GRE traffic to customers. Every server needs to know the status of every tunnel, and every location has completely different network routes to customers. Where to start?

In this post, I’ll break down my work to improve the Magic Transit GRE tunnel health check system, creating a more stable experience for customers and dramatically reducing CPU and memory usage at Cloudflare’s edge.

Everybody has their own weather station

To decide where to send traffic, Cloudflare edge servers need to periodically send health checks to each customer tunnel endpoint.

When Magic Transit was first launched, every server sent a health check to every tunnel once per minute. This naive, “shared-nothing” approach was simple to implement and served customers well, but would occasionally deliver less than optimal health check behavior in two specific ways.

Way #1: Inconsistent weather reports

Sometimes a server just runs into bad luck, and a check randomly fails. From there, the server would mark the tunnel as degraded and immediately start shifting traffic towards a fallback tunnel. Imagine you and I were standing right next to each other under a clear sky, and I felt a single drop of water and declared, “It’s raining!” whereas you felt no raindrops and declared, “It’s all clear!”

With relatively minimal data per server, it means that health determinations can be imprecise. It also means that individual servers could overreact to individual failures. From a customer’s point of view, it’s like Cloudflare detected a problem with the primary tunnel. But, in reality, the server just got a bad weather forecast and made a different judgement call.

Way #2: Slow to respond to storms

Even when tunnel states are consistent across servers, they can be slow to respond. In this case, if a server runs a health check which succeeds, but a second later the tunnel goes down, the next health check won’t happen for another 59 seconds. Until that next health check fails, the server has no idea anything is wrong, so it keeps sending traffic over unhealthy tunnels, leading to packet loss and latency for the customer.

Much like how a live, up-to-the-minute rain forecast helps you decide when to leave to avoid the rain, servers that send tunnel checks more frequently get a finer view of the Internet weather and can respond faster to localized storms. But if every server across Cloudflare’s edge sent health checks too frequently, we would very quickly start to overwhelm our customers’ networks.

All of the weather stations nearby start sharing observations

Clearly, we needed to hammer out some kinks. We wanted servers in the same location to come to the same conclusions about where to send traffic, and we wanted faster detection of issues without increasing the frequency of tunnel checks.

Health checks sent from servers in the same data center take the same route across the Internet. Why not share the results among them?

Instead of a single raindrop causing me to declare that it’s raining, I’d tell you about the raindrop I felt, and you’d tell me about the clear sky you’re looking at. Together, we come to the same conclusion: there isn’t enough rain to open an umbrella.

There is even a special networking protocol that allows us to easily share information between servers in the same private network. From the makers of Unicast and Anycast, now presenting: Multicast!

A single IP address does not necessarily represent a single machine in a network. The Internet Protocol specifies a way to send one message that gets delivered to a group of machines, like writing to an email list. Every machine has to opt into the group—we can’t just enroll people at random for our email list—but once a machine joins, it receives a copy of any message sent to the group’s address.

Making Magic Transit health checks faster and more responsive

The servers in a Cloudflare edge data center are part of the same private network, so for “version 2” of our health check system, we had each server in a data center join a multicast group and share their health check results with one another. Each server still made an independent assessment for each tunnel, but that assessment was based on data collected by all servers in the same location.

This second version of tunnel health checks resulted in more consistent  tunnel health determinations by servers in the same data center. It also resulted in faster response times—especially in large data centers where servers receive updates from their peers very rapidly.

However, we started seeing scaling problems. As we added more customers, we added more tunnel endpoints where we need to check the weather. In some of our larger data centers, each server was receiving close to half a billion messages per minute.

Imagine it’s not just you and me telling each other about the weather above us. You’re in a crowd of hundreds of people, and now everyone is shouting the weather updates for thousands of cities around the world!

One weather station to rule them all

As an engineering intern on the Magic Transit team, my project this summer has been developing a third approach. Rather than having every server infrequently check the weather for every tunnel and shouting the observation to everyone else, now every server tunnel can frequently check the weather for a few tunnels. With this new approach, servers would then only tell the others about the overall weather report—not every individual measurement.

That scenario sounds more efficient, but we need to distribute the task of sending tunnel health checks across all the servers in a location so one server doesn’t get an overwhelming amount of work. So how can we assign tunnels to servers in a way that doesn’t require a centralized orchestrator or shared database? Enter consistent hashing, the single coolest distributed computing concept I got to apply this summer.

Every server sends a multicast “heartbeat” every few seconds. Then, by listening for multicast heartbeats, each server can construct a list of the IP addresses of peers known to be alive, including its own address, sorted by taking the hash of each address. Every server in a data center has the same list of peers in the same order.

When a server needs to decide which tunnels it is responsible for sending health checks to, the server simply hashes each tunnel to an integer and searches through the list of peer addresses to find the peer with the smallest hash greater than the tunnel’s hash, wrapping around to the first peer if no peer is found. The server is responsible for sending health checks to the tunnel when the assigned peer’s address equals the server’s address.

If a server stops sending messages for a long enough period of time, the server gets removed from the known peers list. As a consequence, the next time another server tries to hash a tunnel the removed peer was previously assigned, the tunnel simply gets reassigned to the next peer in the list.

And like magic, we have devised a scheme to consistently assign tunnels to servers in a way that is resilient to server failures and does not require any extra coordination between servers beyond heartbeats. Now, the assigned server can send health checks way more frequently, compose more precise weather forecasts, and share those forecasts without being drowned out by the crowd.

Results

Releasing the new health check system globally reduced Magic Transit’s CPU usage by over 70% and memory usage by nearly 85%.

Memory usage (measured in terabytes):

Making Magic Transit health checks faster and more responsive

CPU usage (measured in CPU-seconds per two minute interval, averaged over two days):

Making Magic Transit health checks faster and more responsive

Reducing the number of multicast messages means that servers can now keep up with the Internet weather, even in the largest data centers. We’re now poised for the next stage of Magic Transit’s growth, just in time for our two-year anniversary.

If you want to help build the future of networking, join our team.

Announcing Project Pangea: Helping Underserved Communities Expand Access to the Internet For Free

Post Syndicated from Marwan Fayed original https://blog.cloudflare.com/pangea/

Announcing Project Pangea: Helping Underserved Communities Expand Access to the Internet For Free

Announcing Project Pangea: Helping Underserved Communities Expand Access to the Internet For Free

Half of the world’s population has no access to the Internet, with many more limited to poor, expensive, and unreliable connectivity. This problem persists despite large levels of public investment, private infrastructure, and effort by local organizers.

Today, Cloudflare is excited to announce Project Pangea: a piece of the puzzle to help solve this problem. We’re launching a program that provides secure, performant, reliable access to the Internet for community networks that support underserved communities, and we’re doing it for free1 because we want to help build an Internet for everyone.

What is Cloudflare doing to help?

Project Pangea is Cloudflare’s project to help bring underserved communities secure connectivity to the Internet through Cloudflare’s global and interconnected network.

Cloudflare is offering our suite of network services — Cloudflare Network Interconnect, Magic Transit, and Magic Firewall — for free to nonprofit community networks, local networks, or other networks primarily focused on providing Internet access to local underserved or developing areas. This service would dramatically reduce the cost for communities to connect to the Internet, with industry leading security and performance functions built-in:

  • Cloudflare Network Interconnect provides access to Cloudflare’s edge in 200+ cities across the globe through physical and virtual connectivity options.
  • Magic Transit acts as a conduit to and from the broader Internet and protects community networks by mitigating DDoS attacks within seconds at the edge.
  • Magic Firewall gives community networks access to a network-layer firewall as a service, providing further protection from malicious traffic.

We’ve learned from working with customers that pure connectivity is not enough to keep a network sustainably connected to the Internet. Malicious traffic, such as DDoS attacks, can target a network and saturate Internet service links, which can lead to providers aggressively rate limiting or even entirely shutting down incoming traffic until the attack subsides. This is why we’re including our security services in addition to connectivity as part of Project Pangea: no attacker should be able to keep communities closed off from accessing the Internet.

Announcing Project Pangea: Helping Underserved Communities Expand Access to the Internet For Free

What is a community network?

Community networks have existed almost as long as commercial subscribership to the Internet that began with dial-up service. The Internet Society, or ISOC, describes community networks as happening “when people come together to build and maintain the necessary infrastructure for Internet connection.”

Most often, community networks emerge from need, and in response to the lack or absence of available Internet connectivity. They consistently demonstrate success where public and private-sector initiatives have either failed or under-deliver. We’re not talking about stop-gap solutions here, either — community networks around the world have been providing reliable, sustainable, high-quality connections for years.

Many will operate only within their communities, but many others can grow, and have grown, to regional or national scale. The most common models of governance and operation are as not-for-profits or cooperatives, models that ensure reinvestment within the communities being served. For example, we see networks that reinvest their proceeds to replace Wi-Fi infrastructure with fibre-to-the-home.

Cloudflare celebrates these networks’ successes, and also the diversity of the communities that these networks represent. In that spirit, we’d like to dispel myths that we encountered during the launch of this program — many of which we wrongly assumed or believed to be true — because the myths turn out to be barriers that communities so often are forced to overcome.  Community networks are built on knowledge sharing, and so we’re sharing some of that knowledge, so others can help accelerate community projects and policies, rather than rely on the assumptions that impede progress.

Myth #1: Only very rural or remote regions are underserved and in need. It’s true that remote regions are underserved. It is also true that underserved regions exist within 10 km (about six miles) of large city centers, and even within the largest cities themselves, as evidenced by the existence of some of our launch partners.

Myth #2: Remote, rural, or underserved is also low-income. This might just be the biggest myth of all. Rural and remote populations are often thriving communities that can afford service, but have no access. In contrast, the need for urban community networks are often egalitarian, and emerge because the access that is available is unaffordable to many.

Myth #3: Service is necessarily more expensive. This myth is sometimes expressed by statements such as, “if large service providers can’t offer affordable access, then no one can.”  More than a myth, this is a lie. Community networks (including our launch partners) use novel governance and cost models to ensure that subscribers pay rates similar to the wider market.

Myth #4: Technical expertise is a hard requirement and is unavailable. There is a rich body of evidence and examples showing that, with small amounts of training and support, communities can build their own local networks cheaply and reliably with commodity hardware and non-specialist equipment.

These myths aside, there is one truth: the path to sustainability is hard. The start and initial growth of community networks often consists of volunteer time or grant funding, which are difficult to sustain in the long-term. Eventually the starting models need to transition to models of “willing to charge and willing to pay” — Project Pangea is designed to help fill this gap.

What is the problem?

Communities around the world can and have put up Wi-Fi antennas and laid their own fibre. Even so, and however well-connected the community is to itself, Internet services are prohibitively expensive — if they can be found at all.

Two elements are required to connect to the Internet, and each incurs its own cost:

  • Backhaul connections to an interconnection point — the connection point may be anything from a local cabinet to a large Internet exchange point (IXP).
  • Internet Services are provided by a network that interfaces with the wider Internet, and agrees to route traffic to and from on behalf of the community network.

These are distinct elements. Backhaul service carries data packets along a physical link (a fibre cable or wireless medium). Internet service is separate and may be provided over that link, or at its endpoint.

The cost of Internet service for networks is both dominant and variable (with usage), so in most cases it is cheaper to purchase both as a bundle from service providers that also own or operate their own physical network. Telecommunications and energy companies are prime examples.

However, the operating costs and complexity of long-distance backhaul is significantly lower than the costs of Internet service. If reliable, high capacity service were affordable, then community networks could extend their knowledge and governance models sustainably to also provide their own backhaul.

For all that community networks can build, establish, and operate, the one element entirely outside their control is the cost of Internet service — a problem that Project Pangea helps to solve.

Why does the problem persist?

On this subject, I — Marwan — can only share insights drawn from prior experience as a computer science professor, and a co-founder of HUBS c.i.c., launched with talented professors and a network engineer. HUBS is a not-for-profit backhaul and Internet provider in Scotland. It is a cooperative of more than a dozen community networks — some that serve communities with no roads in or out — across thousands of square kilometers along Scotland’s West Coast and Borders regions. As is true of many community networks, not least some of Pangea’s launch partners, HUBS’ is award-winning, and engages in advocacy and policy.

During that time my co-founders and I engaged with research funders, economic development agencies, three levels of government, and so many communities that I lost track. After all that, the answer to the question is still far from clear. There are, however, noteworthy observations and experiences that stood out, and often came from surprising places:

  • Cables on the ground get chewed by animals that, small or large, might never be seen.
  • Burying power and Ethernet cables, even 15 centimeters below soil, makes no difference because (we think) animals are drawn by the electrical current.
  • Property owners sometimes need to be convinced that 8 to 10 square meters to build a small tower in exchange for free Internet and community benefit is a good thing.
  • The raising of small towers, even that no one will see, is sometimes blocked by legislation or regulation that assumes private non-residential structures can only be a shed, or never taller than a shed.
  • Private fibre backbone installations installed with public funds are often inaccessible, or are charged by distance even though the cost to light 100 meters of fibre is identical to the cost of lighting 1 km of fibre.
  • Civil service agencies may be enthusiastic, but are also cautious, even in the face of evidence. Be patient, suffer frustration, be more patient, and repeat. Success is possible.
  • If and where possible, it’s best to avoid attempts to deliver service where national telecommunications companies have plans to do so.
  • Never underestimate tidal fading — twice a day, wireless signals over water will be amazing, and will completely disappear. We should have known!

All anecdotes aside, the best policies and practices are non-trivial — but because of so many prior community efforts, and organizations such as ISOC, the APC, the A4AI, and more, the challenges and solutions are better understood than ever before.

How does a community network reach the Internet?

First, we’d like to honor the many organisations we’ve learned from who might say that there are no technical barriers to success. Connections within the community networks may be shaped by geographical features or regional regulations. For example, wireless lines of sight between antenna towers on personal property are guided by hills or restricted by regulations. Similarly, Ethernet cables and fibre deployments are guided by property ownership, digging rights, and the presence or migration of grazing animals that dig into soil and gnaw at cables — yes, they do, even small rabbits.

Once the community establishes its own area network, the connections to meet Internet services are more conventional, more familiar. In part, the choice is influenced or determined by proximity to Internet exchanges, PoPs, or regional fibre cabinet installations. The connections with community networks fall into three broad categories.

Colocation. A community network may be fortunate enough to have service coverage that overlaps with, or is near to, an Internet eXchange Point (IXP), as shown in the figure below. In this case a natural choice is to colocate a router within the exchange, near to the Internet service provider’s router (labeled as Cloudflare in the figure). Our launch partner NYC Mesh connects in this manner. Unfortunately, being that exchanges are most often located in urban settings, colocation is unavailable to many, if not most, community networks.

Announcing Project Pangea: Helping Underserved Communities Expand Access to the Internet For Free

Conventional point-to-point backhaul. Community networks that are remote must establish a point-to-point backhaul connection to the Internet exchange. This connection method is shown in the figure below in which the community network in the previous figure has moved to the left, and is joined by a physical long-distance link to the Internet service router that remains in the exchange on the right.

Announcing Project Pangea: Helping Underserved Communities Expand Access to the Internet For Free

Point-to-point backhaul is familiar. If the infrastructure is available — and this is a big ‘if’ — then backhaul is most often available from a utility company, such as a telecommunications or energy provider, that may also bundle Internet service as a way to reduce total costs. Even bundled, the total cost is variable and unaffordable to individual community networks, and is exacerbated by distance. Some community networks have succeeded in acquiring backhaul through university, research and education, or publicly-funded networks that are compelled or convinced to offer the service in the public interest. On the west coast of Scotland, for example, Tegola launched with service from the University of Highlands and Islands and the University of Edinburgh.

Start a backhaul cooperative for point-to-point and colocation. The last connection option we see among our launch partners overcomes the prohibitive costs by forming a cooperative network in which the individual subscriber community networks are also members. The cooperative model can be seen in the figure below. The exchange remains on the right. On the left the community network in the previous figure is now replaced by a collection of community networks that may optionally connect with each other (for example, to establish reliable routing if any link fails). Either directly or indirectly via other community networks, each of these community networks has a connection to a remote router at the near-end of the point-to-point connection. Crucially, the point-to-point backhaul service — as well as the co-located end-points — are owned and operated by the cooperative. In this manner, an otherwise expensive backhaul service is made affordable by being a shared cost.

Announcing Project Pangea: Helping Underserved Communities Expand Access to the Internet For Free

Two of our launch partners, Guifi.net and HUBS c.i.c., are organised this way and their 10+ years in operation demonstrate both success and sustainability. Since the backhaul provider is a cooperative, the community network members have a say in the ways that revenue is saved, spent, and — best of all — reinvested back into the service and infrastructure.

Why is Cloudflare doing this?

Cloudflare’s mission is to help build a better Internet, for everyone, not just those with privileged access based on their geographical location. Project Pangea aligns with this mission by extending the Internet we’re helping to build — a faster, more reliable, more secure Internet — to otherwise underserved communities.

How can my community network get involved?

Check out our landing page to learn more and apply for Project Pangea today.

The ‘community’ in Cloudflare

Lastly, in a blog post about community networks, we feel it is appropriate to acknowledge the ‘community’ at Cloudflare: Project Pangea is the culmination of multiple projects, and multiple peoples’ hours, effort, dedication, and community spirit. Many, many thanks to all.
______

1For eligible networks, free up to 5Gbps at p95 levels.

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations

Post Syndicated from Matt Lewis original https://blog.cloudflare.com/interconnect-anywhere/

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations

Customers choose Cloudflare for our network performance, privacy and security.  Cloudflare Network Interconnect is the best on-ramp for our customers to utilize our diverse product suite. In the past, we’ve talked about Cloudflare’s physical footprint in over 200+ data centers, and how Cloudflare Network Interconnect enabled companies in those data centers to connect securely to Cloudflare’s network. Today, Cloudflare is excited to announce expanded partnerships that allows customers to connect to Cloudflare from their own Layer 2 service fabric. There are now over 1,600 locations where enterprise security and network professionals have the option to connect to Cloudflare securely and privately from their existing fabric.

Interconnect Anywhere is a journey

Since we launched Cloudflare Network Interconnect (CNI) in August 2020, we’ve been focused on extending the availability of Cloudflare’s network to as many places as possible. The initial launch opened up 150 physical locations alongside 25 global partner locations. During Security Week this year, we grew that availability by adding data center partners to our CNI Partner Program. Today, we are adding even more connectivity options by expanding Cloudflare availability to all of our partners’ locations, as well as welcoming CoreSite Open Cloud Exchange (OCX) and Infiny by Epsilon into our CNI Partner Program. This totals 1,638 locations where our customers can now connect securely to the Cloudflare network. As we continue to expand, customers are able to connect the fabric of their choice to Cloudflare from a growing list of data centers.

Fabric Partner Enabled Locations
PacketFabric 180+
Megaport 700+
Equinix Fabric 46+
Console Connect 440+
CoreSite Open Cloud Exchange 22+
Infiny by Epsilon 250+

“We are excited to expand our partnership with Cloudflare to ensure that our mutual customers can benefit from our carrier-class Software Defined Network (SDN) and Cloudflare’s network security in all Packetfabric locations. Now customers can easily connect from wherever they are located to access best of breed security services alongside Packetfabric’s Cloud Connectivity options.”
Alex Henthorn-Iwane, PacketFabric’s Chief Marketing Officer

“With the significant rise in DDoS attacks over the past year, it’s becoming ever more crucial for IT and Operations teams to prevent and mitigate network security threats. We’re thrilled to enable Cloudflare Interconnect everywhere on Megaport’s global Software Defined Network, which is available in over 700 enabled locations in 24 countries worldwide. Our partnership will give organizations the ability to reduce their exposure to network attacks, improve customer experiences, and simplify their connectivity across our private on-demand network in a matter of a few mouse clicks.”
Misha Cetrone, Megaport VP of Strategic Partnerships

“Expanding the connectivity options to Cloudflare ensures that customers can provision hybrid architectures faster and more easily, leveraging enterprise-class network services and automation on the Open Cloud Exchange. Simplifying the process of establishing modern networks helps achieve critical business objectives, including reducing total cost of ownership, and improving business agility as well as interoperability.”
Brian Eichman, CoreSite Senior Director of Product Development

“Partner accessibility is key in our cloud enablement and interconnection strategy. We are continuously evolving to offer our customers and partners simple and secure connectivity no matter where their network resides in the world. Infiny enables access to Cloudflare from across a global footprint while delivering high-quality cloud connectivity solutions at scale. Customers and partners gain an innovative Network-as-a-Service SDN platform that supports them with programmable and automated connectivity for their cloud and networking needs.”
Mark Daley, Epsilon Director of Digital Strategy

Uncompromising security and increased reliability from your choice of network fabric

Now, companies can connect to Cloudflare’s suite of network and security products without traversing shared public networks by taking advantage of software-defined networking providers. No matter where a customer is connected to one of our fabric partners, Cloudflare’s 200+ data centers ensure that world-class network security is close by and readily available via a secure, low latency, and cost-effective connection. An increased number of locations further allows customers to have multiple secure connections to the Cloudflare network, increasing redundancy and reliability. As we further expand our global network and increase the number of data centers where Cloudflare and our partners are connected, latency becomes shorter and customers will reap the benefits.

Let’s talk about how a customer can use Cloudflare Network Interconnect to improve their security posture through a fabric provider.

Plug and Play Fabric connectivity

Acme Corp is an example company that wants to deliver highly performant digital services to their customers and ensure employees can securely connect to their business apps from anywhere. They’ve purchased Magic Transit and Cloudflare Access and are evaluating Magic WAN to secure their network while getting the performance Cloudflare provides. They want to avoid potential network traffic congestion and latency delays, so they have designed a network architecture with their software-defined network fabric and Cloudflare using Cloudflare Network Interconnect.

With Cloudflare Network Interconnect, provisioning this connection is simple. Acme goes to their partner portal, and requests a virtual layer 2 connection to Cloudflare with the bandwidth of their choice. Cloudflare accepts the connection, provides the BGP session establishment information and organizes a turn up call if required. Easy!

Let’s talk about how Cloudflare and our partners have worked together to simplify the interconnectivity experience for the customer.

With Cloudflare Network interconnect, availability only gets better

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations
Connection Established with Cloudflare Network Interconnect

When a customer uses CNI to establish a connection between a fabric partner and Cloudflare, that connection runs over layer 2, configured via the partner user interface. Our new partnership model allows customers to connect privately and securely to Cloudflare’s network even when the customer is not located in the same data center as Cloudflare.

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations
Shorter Customer Path with New Partner Locations‌‌

The diagram above shows a shorter customer path thanks to an incremental partner-enabled location. Every time Cloudflare brings a new data center online and connects with a partner fabric, all customers in that region immediately benefit from the closer proximity and reduced latency.

Network fabrics in action

For those who want to self-serve, we’ve published documentation that details the steps in provisioning these connections. You can find the steps for each partner below:

As we expand our network, it’s critical we provide more ways to allow our customers to connect easily. We will continue to shorten the time it takes to set up a new interconnection, drive down costs, strengthen security and improve customer experience with all of our Network On-Ramp partners.

If you are using one of our software-defined networking partners and would like to connect to Cloudflare via their fabric, contact your fabric partner account team or reach out to us using the Cloudflare Network Interconnect page. If you are not using a fabric today, but would like to take advantage of software-defined networking to connect to Cloudflare, reach out to your account team.