All posts by Omer Yoachimik

Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/moobot-vs-gatebot-cloudflare-automatically-blocks-botnet-ddos-attack-topping-at-654-gbps/

Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps

On July 3, Cloudflare’s global DDoS protection system, Gatebot, automatically detected and mitigated a UDP-based DDoS attack that peaked at 654 Gbps. The attack was part of a ten-day multi-vector DDoS campaign targeting a Magic Transit customer and was mitigated without any human intervention. The DDoS campaign is believed to have been generated by Moobot, a Mirai-based botnet. No downtime, service degradation, or false positives were reported by the customer.

Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps
Moobot Targets 654 Gbps towards a Magic Transit Customer

Over those ten days, our systems automatically detected and mitigated over 5,000 DDoS attacks against this one customer, mainly UDP floods, SYN floods, ACK floods, and GRE floods. The largest DDoS attack was a UDP flood and lasted a mere 2 minutes. This attack targeted only one IP address but hit multiple ports. The attack originated from 18,705 unique IP addresses, each believed to be a Moobot-infected IoT device.

Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps
Attack Distribution by Country – From 100 countries

The attack was observed in Cloudflare’s data centers in 100 countries around the world. Approximately 89% of the attack traffic originated from just 10 countries with the US leading at 41%, followed by South Korea and Japan in second place (12% each), and India in third (10%). What this likely means is that the malware has infected at least 18,705 devices in 100 countries around the world.

Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps
Attack Distribution by Country – Top 10

Moobot – Self Propagating Malware

‘Moobot’ sounds like a cute name, but there’s nothing cute about it. According to Netlab 360, Moobot is the codename of a self-propagating Mirai-based malware first discovered in 2019. It infects IoT (Internet of Things) devices using remotely exploitable vulnerabilities or weak default passwords. IoT is a term used to describe smart devices such as security hubs and cameras, smart TVs, smart speakers, smart lights, sensors, and even refrigerators that are connected to the Internet.

Once a device is infected by Moobot, control of the device is transferred to the operator of the command and control (C2) server, who can issue commands remotely such as attacking a target and locating additional vulnerable IoT devices to infect (self-propagation).

Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps

Moobot is a Mirai-based botnet, and has similar capabilities (modules) as Mirai:

  1. Self-propagation – The self-propagation module is in charge of the botnet’s growth. After an IoT device is infected, it randomly scans the Internet for open telnet ports and reports back to the C2 server. Once the C2 server gains knowledge of open telnet ports around the world, it tries to leverage known vulnerabilities or brute force its way into the IoT devices with common or default credentials.
Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps
Self-propagation
  1. Synchronized attacks – The C2 server orchestrates a coordinated flood of packets or HTTP requests with the goal of creating a denial of service event for the target’s website or service.
Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps
Synchronized attacks

The botnet operator may use multiple C2 servers in various locations around the world in order to reduce the risk of exposure. Infected devices may be assigned to different C2 servers varying by region and module; one server for self-propagation and another for launching attacks. Thus if a C2 server is compromised and taken down by law enforcement authorities, only parts of the botnet are deactivated.

Why this attack was not successful

This is the second large scale attack in the past few months that we observed on Cloudflare’s network. The previous one peaked at 754M packets per second and attempted to take down our routers with a high packet rate. Despite the high packet rate, the 754Mpps attack peaked at a mere 253 Gbps.

As opposed to the high packet rate attack, this attack was a high bit rate attack, peaking at 654 Gbps. Due to the high bit rates of this attack, it seems as though the attacker tried (and failed) to cause a denial of service event by saturating our Internet link capacity. So let’s explore why this attack was not successful.

Cloudflare’s global network capacity is over 42 Tbps and growing. Our network spans more than 200 cities in over 100 countries, including 17 cities in mainland China. It interconnects with over 8,800 networks globally, including major ISPs, cloud services, and enterprises. This level of interconnectivity along with the use of Anycast ensures that our network can easily absorb even the largest attacks.

Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps
The Cloudflare Network

After traffic arrives at an edge data center, it is then load-balanced efficiently using our own Layer 4 load-balancer that we built, Unimog, which uses our appliances’ health and other metrics to load-balance traffic intelligently within a data center to avoid overwhelming any single server.

Besides the use of Anycast for inter-data center load balancing and Unimog for intra-data center load balancing, we also utilize various forms of traffic engineering in order to deal with sudden changes in traffic loads across our network. We utilize both automatic and manual traffic engineering methods that can be employed by our 24/7/365 Site Reliability Engineering (SRE) team.

These combined factors significantly reduce the likelihood of a denial of service event due to link saturation or appliances being overwhelmed — and as seen in this attack, no link saturation occurred.

Detecting & Mitigating DDoS attacks

Once traffic arrives at our edge, it encounters our three software-defined DDoS protection systems:

  1. Gatebot – Cloudflare’s centralized DDoS protection systems for detecting and mitigating globally distributed volumetric DDoS attacks. Gatebot runs in our network’s core data center. It receives samples from every one of our edge data centers, analyzes them, and automatically sends mitigation instructions when attacks are detected. Gatebot is also synchronized to each of our customers’ web servers to identify its health and triggers mitigation accordingly.
  2. dosd (denial of service daemon) – Cloudflare’s decentralized DDoS protection systems. dosd runs autonomously in each server in every Cloudflare data center around the world, analyzing traffic and applying local mitigation rules when needed. Besides being able to detect and mitigate attacks at super-fast speeds, dosd significantly improves our network resilience by delegating the detection and mitigation capabilities to the edge.
  3. flowtrackd (flow tracking daemon) – Cloudflare’s TCP state tracking machine for detecting and mitigating the most randomized and sophisticated TCP-based DDoS attacks in unidirectional routing topologies (such as the case for Magic Transit). flowtrackd is able to identify the state of a TCP connection and then drops, challenges, or rate-limits packets that don’t belong to a legitimate connection.
Moobot vs. Gatebot: Cloudflare Automatically Blocks Botnet DDoS Attack Topping At 654 Gbps
Cloudflare DDoS Protection Lifecycle

The three DDoS protection systems collect traffic samples in order to detect DDoS attacks. The types of traffic data that they sample include:

  1. Packet fields such as the source IP, source port, destination IP, destination port, protocol, TCP flags, sequence number, options, and packet rate.
  2. HTTP request metadata such as HTTP headers, user agent, query-string, path, host, HTTP method, HTTP version, TLS cipher version, and request rate.
  3. HTTP response metrics such as error codes returned by customers’ origin servers and their rates.

Our systems then crunch these sample data points together to form a real-time view of our network’s security posture and our customer’s origin server health. They look for attack patterns and traffic anomalies. When found, a mitigation rule with a dynamically crafted attack signature is generated in real-time. Rules are propagated to the most optimal place for cost-effective mitigation. For example, an L7 HTTP flood might be dropped at L4 to reduce the CPU consumption.

Rules that are generated by dosd and flowtrackd are propagated within a single data center for rapid mitigation. Gatebot’s rules are propagated to all of the edge data centers which then take priority over dosd’s rules for an even and optimal mitigation. Even if the attack is detected in a subset of edge data centers, Gatebot propagates the mitigation instructions to all of Cloudflare’s edge data centers — effectively sharing the threat intelligence across our network as a form of proactive protection.

In the case of this attack, in each edge data center, dosd generated rules to mitigate the attack promptly. Then as Gatebot received and analyzed samples from the edge, it determined that this was a globally distributed attack. Gatebot propagated unified mitigation instructions to the edge, which prepared each and every one of our 200+ data centers to tackle the attack as the attack traffic may shift to a different data center due to Anycast or traffic engineering.

No inflated bills

DDoS attacks obviously pose the risk of an outage and service disruption. But there is another risk to consider — the cost of mitigation. During these ten days, more than 65 Terabytes of attack traffic were generated by the botnet. However, as part of Cloudflare’s unmetered DDoS protection guarantee, Cloudflare mitigated and absorbed the attack traffic without billing the customer. The customer doesn’t need to submit a retroactive credit request. Attack traffic is automatically excluded from our billing system. We eliminated the financial risk.

flowtrackd: DDoS Protection with Unidirectional TCP Flow Tracking

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/announcing-flowtrackd/

flowtrackd: DDoS Protection with Unidirectional TCP Flow Tracking

flowtrackd: DDoS Protection with Unidirectional TCP Flow Tracking

Magic Transit is Cloudflare’s L3 DDoS Scrubbing service for protecting network infrastructure. As part of our ongoing investment in Magic Transit and our DDoS protection capabilities, we’re excited to talk about a new piece of software helping to protect Magic Transit customers: flowtrackd. flowrackd is a software-defined DDoS protection system that significantly improves our ability to automatically detect and mitigate even the most complex TCP-based DDoS attacks. If you are a Magic Transit customer, this feature will be enabled by default at no additional cost on July 29, 2020.

flowtrackd: DDoS Protection with Unidirectional TCP Flow Tracking

TCP-Based DDoS Attacks

In the first quarter of 2020, one out of every two L3/4 DDoS attacks Cloudflare mitigated was an ACK Flood, and over 66% of all L3/4 attacks were TCP based. Most types of DDoS attacks can be mitigated by finding unique characteristics that are present in all attack packets and using that to distinguish ‘good’ packets from the ‘bad’ ones. This is called “stateless” mitigation, because any packet that has these unique characteristics can simply be dropped without remembering any information (or “state”) about the other packets that came before it. However, when attack packets have no unique characteristics, then “stateful” mitigation is required, because whether a certain packet is good or bad depends on the other packets that have come before it.

The most sophisticated types of TCP flood require stateful mitigation, where every TCP connection must be tracked in order to know whether any particular TCP packet is part of an active connection. That kind of mitigation is called “flow tracking”, and it is typically implemented in Linux by the iptables conntrack module. However, DDoS protection with conntrack is not as simple as flipping the iptable switch, especially at the scale and complexity that Cloudflare operates in. If you’re interested to learn more, in this blog we talk about the technical challenges of implementing iptables conntrack.

Complex TCP DDoS attacks pose a threat as they can be harder to detect and mitigate. They therefore have the potential to cause service degradation, outages and increased false positives with inaccurate mitigation rules. So how does Cloudflare block patternless DDoS attacks without affecting legitimate traffic?

Bidirectional TCP Flow Tracking

Using Cloudflare’s traditional products, HTTP applications can be protected by the WAF service, and TCP/UDP applications can be protected by Spectrum. These services are “reverse proxies“, meaning that traffic passes through Cloudflare in both directions. In this bidirectional topology, we see the entire TCP flow (i.e., segments sent by both the client and the server) and can therefore track the state of the underlying TCP connection. This way, we know if a TCP packet belongs to an existing flow or if it is an “out of state” TCP packet. Out of state TCP packets look just like regular TCP packets, but they don’t belong to any real connection between a client and a server. These packets are most likely part of an attack and are therefore dropped.

flowtrackd: DDoS Protection with Unidirectional TCP Flow Tracking
Reverse Proxy: What Cloudflare Sees

While not trivial, tracking TCP flows can be done when we serve as a proxy between the client and server, allowing us to absorb and mitigate out of state TCP floods. However it becomes much more challenging when we only see half of the connection: the ingress flow. This visibility into ingress but not egress flows is the default deployment method for Cloudflare’s Magic Transit service, so we had our work cut out for us in identifying out of state packets.

The Challenge With Unidirectional TCP Flows

With Magic Transit, Cloudflare receives inbound internet traffic on behalf of the customer, scrubs DDoS attacks, and routes the clean traffic to the customer’s data center over a tunnel. The data center then responds directly to the eyeball client using a technique known as Direct Server Return (DSR).

flowtrackd: DDoS Protection with Unidirectional TCP Flow Tracking
Magic Transit: Asymmetric L3 Routing

Using DSR, when a TCP handshake is initiated by an eyeball client, it sends a SYN packet that gets routed via Cloudflare to the origin data center. The origin then responds with a SYN-ACK directly to the client, bypassing Cloudflare. Finally, the client responds with an ACK that once again routes to the origin via Cloudflare and the connection is then considered established.

flowtrackd: DDoS Protection with Unidirectional TCP Flow Tracking
L3 Routing: What Cloudflare Sees

In a unidirectional flow we don’t see the SYN+ACK sent from the origin to the eyeball client, and therefore can’t utilize our existing flow tracking capabilities to identify out of state packets.

Unidirectional TCP Flow Tracking

To overcome the challenges of unidirectional flows, we recently completed the development and rollout of a new system, codenamed flowtrackd (“flow tracking daemon”). flowtrackd is a state machine that hooks into the network interface. It tracks unidirectional TCP flows using only the ingress traffic that routes through Cloudflare to determine the state of the TCP connection. flowtrackd is then able to determine if a packet is part of a new connection, an open one, a connection that is closing, one that is closed, or if it’s an out of state packet. Once a high volume of out-of-state packets is detected, flowtrackd will either challenge (force RST) or drop the packets.

flowtrackd: DDoS Protection with Unidirectional TCP Flow Tracking
Snapshot from what flowtrackd sees

The state machine that determines the state of the flows was developed in-house and complements Gatebot and dosd. Together Gatebot, dosd, and flowtrackd provide a comprehensive multi layer DDoS protection.

Releasing flowtrackd to the Wild

And it works! Less than a day after releasing flowtrackd to an early access customer, flowtrackd automatically detected and mitigated an ACK flood that peaked at 6 million packets per second. No downtime, service disruption, or false positives were reported.

flowtrackd: DDoS Protection with Unidirectional TCP Flow Tracking
flowtrackd Mitigates 6M pps Flood

Cloudflare’s DDoS Protection – Delivered From Every Data Center

As opposed to legacy scrubbing center providers with limited network infrastructures, Cloudflare provides DDoS Protection from every one of our data centers in over 200 locations around the world. We write our own software-defined DDoS protection systems. Notice I say systems, because as opposed to vendors that use a dedicated third party appliance, we’re able to write and spin up whatever software we need, deploy it in the optimal location in our tech stack and are therefore not dependent on other vendors or be limited to the capabilities of one appliance.

flowtrackd joins the Cloudflare DDoS protection family which includes our veteran Gatebot and the younger and energetic dosd. flowtrackd will be available from every one of our data centers, with a total mitigation capacity of over 37 Tbps, protecting our Magic Transit customers against the most complex TCP DDoS attacks.

New to Magic Transit? Replace your legacy provider with Magic Transit and pay nothing until your current contract expires. Offer expires September 1, 2020. Click here for details.

No Humans Involved: Mitigating a 754 Million PPS DDoS Attack Automatically

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/no-humans-involved-mitigating-a-754-million-pps-ddos-attack-automatically/

No Humans Involved: Mitigating a 754 Million PPS DDoS Attack Automatically

No Humans Involved: Mitigating a 754 Million PPS DDoS Attack Automatically

On June 21, Cloudflare automatically mitigated a highly volumetric DDoS attack that peaked at 754 million packets per second. The attack was part of an organized four day campaign starting on June 18 and ending on June 21: attack traffic was sent from over 316,000 IP addresses towards a single Cloudflare IP address that was mostly used for websites on our Free plan. No downtime or service degradation was reported during the attack, and no charges accrued to customers due to our unmetered mitigation guarantee.

The attack was detected and handled automatically by Gatebot, our global DDoS detection and mitigation system without any manual intervention by our teams. Notably, because our automated systems were able to mitigate the attack without issue, no alerts or pages were sent to our on-call teams and no humans were involved at all.

No Humans Involved: Mitigating a 754 Million PPS DDoS Attack Automatically
Attack Snapshot – Peaking at 754 Mpps. The two different colors in the graph represent two separate systems dropping packets. 

During those four days, the attack utilized a combination of three attack vectors over the TCP protocol: SYN floods, ACK floods and SYN-ACK floods. The attack campaign sustained for multiple hours at rates exceeding 400-600 million packets per second and peaked multiple times above 700 million packets per second, with a top peak of 754 million packets per second. Despite the high and sustained packet rates, our edge continued serving our customers during the attack without impacting performance at all

The Three Types of DDoS: Bits, Packets & Requests

Attacks with high bits per second rates aim to saturate the Internet link by sending more bandwidth per second than the link can handle. Mitigating a bit-intensive flood is similar to a dam blocking gushing water in a canal with limited capacity, allowing just a portion through.

No Humans Involved: Mitigating a 754 Million PPS DDoS Attack Automatically
Bit Intensive DDoS Attacks as a Gushing River Blocked By Gatebot

In such cases, the Internet service provider may block or throttle the traffic above the allowance resulting in denial of service for legitimate users that are trying to connect to the website but are blocked by the service provider. In other cases, the link is simply saturated and everything behind that connection is offline.

No Humans Involved: Mitigating a 754 Million PPS DDoS Attack Automatically
Swarm of Mosquitoes as a Packet Intensive DDoS Attack

However in this DDoS campaign, the attack peaked at a mere 250 Gbps (I say, mere, but ¼ Tbps is enough to knock pretty much anything offline if it isn’t behind some DDoS mitigation service) so it does not seem as the attacker intended to saturate our Internet links, perhaps because they know that our global capacity exceeds 37 Tbps. Instead, it appears the attacker attempted (and failed) to overwhelm our routers and data center appliances with high packet rates reaching 754 million packets per second. As opposed to water rushing towards a dam, flood of packets can be thought of as a swarm of millions of mosquitoes that you need to zap one by one.

No Humans Involved: Mitigating a 754 Million PPS DDoS Attack Automatically
Zapping Mosquitoes with Gatebot

Depending on the ‘weakest link’ in a data center, a packet intensive DDoS attack may impact the routers, switches, web servers, firewalls, DDoS mitigation devices or any other appliance that is used in-line. Typically, a high packet rate may cause the memory buffer to overflow and thus voiding the router’s ability to process additional packets. This is because there’s a small fixed CPU cost of handing each packet and so if you can send a lot of small packets you can block an Internet connection not by filling it but by causing the hardware that handles the connection to be overwhelmed with processing.

Another form of DDoS attack is one with a high HTTP request per second rate. An HTTP request intensive DDoS attack aims to overwhelm a web server’s resources with more HTTP requests per second than the server can handle. The goal of a DDoS attack with a high request per second rate is to max out the CPU and memory utilization of the server in order to crash it or prevent it from being able to respond to legitimate requests. Request intensive DDoS attacks allow the attacker to generate much less bandwidth, as opposed to bit intensive attacks, and still cause a denial of service.

Automated DDoS Detection & Mitigation

So how did we handle 754 million packets per second? First, Cloudflare’s network utilizes BGP Anycast to spread attack traffic globally across our fleet of data centers. Second, we built our own DDoS protection systems, Gatebot and dosd, which drop packets inside the Linux kernel for maximum efficiency in order to handle massive floods of packets. And third, we built our own L4 load-balancer, Unimog, which uses our appliances’ health and other various metrics to load-balance traffic intelligently within a data center.

In 2017, we published a blog introducing Gatebot, one of our two DDoS protection systems. The blog was titled Meet Gatebot – a bot that allows us to sleep, and that’s exactly what happened during this attack. The attack surface was spread out globally by our Anycast, then Gatebot detected and mitigated the attack automatically without human intervention. And traffic inside each datacenter was load-balanced intelligently to avoid overwhelming any one machine. And as promised in the blog title, the attack peak did in fact occur while our London team was asleep.

So how does Gatebot work? Gatebot asynchronously samples traffic from every one of our data centers in over 200 locations around the world. It also monitors our customers’ origin server health. It then analyzes the samples to identify patterns and traffic anomalies that can indicate attacks. Once an attack is detected, Gatebot sends mitigation instructions to the edge data centers.

To complement Gatebot, last year we released a new system codenamed dosd (denial of service daemon) which runs in every one of our data centers around the world in over 200 cities. Similarly to Gatebot, dosd detects and mitigates attacks autonomously but in the scope of a single server or data center. You can read more about dosd in our recent blog.

The DDoS Landscape

While in recent months we’ve observed a decrease in the size and duration of DDoS attacks, highly volumetric and globally distributed DDoS attacks such as this one still persist. Regardless of the size, type or sophistication of the attack, Cloudflare offers unmetered DDoS protection to all customers and plan levels—including the Free plans.

Network-Layer DDoS Attack Trends for Q1 2020

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/network-layer-ddos-attack-trends-for-q1-2020/

Network-Layer DDoS Attack Trends for Q1 2020

Network-Layer DDoS Attack Trends for Q1 2020

As we wrapped up the first quarter of 2020, we set out to understand if and how DDoS attack trends have shifted during this unprecedented time of global shelter in place. Since then, traffic levels have increased by over 50% in many countries, but have DDoS attacks increased as well?

Traffic increases are often observed during holiday seasons. During holidays, people may spend more time online; whether shopping, ordering food, playing online games or a myriad of other online activities. This higher usage translates into higher revenue per minute for the companies that provide those various online services.

Downtime or service degradation during these peak times could result in user churn and loss of significant revenue in a very short time. ITIC estimates that the average cost of an outage is $5,600 per minute, which extrapolates to well over $300K per hour. It is therefore no surprise that attackers capitalize on the opportunity by launching a higher number of DDoS attacks during the holiday seasons.

The current pandemic has a similar cause and effect. People are forced to stay home. They have become more reliant on online services to accomplish their daily tasks which has generated a surge in the Internet traffic and DDoS attacks.

The Rise of Smaller, Shorter Attacks

Most of the attacks that we observed in Q1 2020 were relatively small, as measured by their bit rates. As shown in the figure below, in Q1 2020, 92% of the attacks were under 10 Gbps, compared to 84% in Q4 2019.

Diving deeper, an interesting shift can be observed in the distribution of attacks below 10 Gbps in Q1, as compared to the previous quarter. In Q4, 47% of network-layer DDoS attacks peaked below 500 Mbps, whereas in Q1 they increased to 64%.

Network-Layer DDoS Attack Trends for Q1 2020

From a packet rate perspective, the majority of the attacks peaked below 1 million packets per second (pps). This rate, along with their bit rate, indicates that attackers are no longer focusing their efforts and resources to generate high-rate floods — bits or packets per second.

Network-Layer DDoS Attack Trends for Q1 2020

However, it’s not only the packet and bit rates that are decreasing, but also the attack durations. The figure below illustrates that 79% of DDoS attacks in Q1 lasted between 30 to 60 minutes, compared to 60% in Q4, which represents a 19% increase.

Network-Layer DDoS Attack Trends for Q1 2020

These three trends could be explained by the following:

  • Launching DDoS attacks is cheap and you don’t need much technical background. DDoS-as-a-service tools have provided a possible avenue for bad actors with little to no technical expertise to launch DDoS attacks quickly, easily, in a cost-effective manner and with limited bandwidth. According to Kaspersky, DDoS attack services can cost as little as $5 for a 300-second attack (5 minutes). Additionally, amateur attackers can also easily leverage free tools to generate floods of packets. As we’ll see in the next section, 13.5% of all DDoS attacks in Q1 were generated using variations of the publicly available Mirai code.
  • While an attack under 10 Gbps might seem small, it can still be enough to affect underprotected Internet properties. Smaller and quicker attacks might prove to deliver a high ROI for attackers to extort a ransom from companies in lieu of not disrupting the availability of the Internet property.

Larger Attacks Still Persist, Albeit in Smaller Numbers

While the majority of the attacks were under 10 Gbps, larger attacks are still prevalent. The below graph shows a trend in the largest bit-rate of network-layer DDoS attacks that Cloudflare has observed and mitigated in Q4 2019 and Q1 2020. The largest attack for the quarter was observed during March and peaked just above 550 Gbps.

Network-Layer DDoS Attack Trends for Q1 2020

If At First You Don’t Succeed, Try, Try Again

A persistent attacker is one that does not give up when their attacks fail; they try and try again. They launch multiple attacks on their target, often utilizing multiple attack vectors. In the Q4 2019 holiday season, attackers persisted and launched as many as 523 DDoS attacks in one day against a single Cloudflare IP. Each Cloudflare IP under attack was targeted by as many as 4.6 DDoS attacks every day on average.

During Q1, as the world entered COVID-19 lockdown, we observed a significant increase in the number of attacks compared to the monthly average. The last time we saw such an increase was in the Q4 2019 holiday season. However, an interesting difference is that attackers seem less persistent now than during the holidays. In Q1 2020, the average persistence rate dropped as low as 2.2 attacks per Cloudflare IP address per day, with a maximum of 311 attacks on a single IP; 40% less than the previous holiday quarter.

Network-Layer DDoS Attack Trends for Q1 2020

Throughout the past two quarters, the average number of attack vectors employed in DDoS attacks per IP per day has been mostly steady at approximately 1.4, with a maximum of 10.

Network-Layer DDoS Attack Trends for Q1 2020

Over the past quarter, we’ve seen over 34 different types of attack vectors on L3/4. ACK attacks formed the majority (50.1%) in Q1, followed by SYN attacks with 16.6%, and in third place, Mirai, which still represents a significant portion of the attacks (15.4%). Together, SYN & ACK DDoS attacks (TCP) form 66% of all L3/4 attack vectors in Q1.

Top Attack Vectors

Network-Layer DDoS Attack Trends for Q1 2020

All Attack Vectors

Attack VectorPercent in Q1
ACK50.121%
SYN16.636%
Mirai15.404%
UDP5.714%
LDAP2.898%
SSDP2.833%
DNS2.677%
Other0.876%
QUIC0.527%
NTP0.373%
RST0.353%
Memcached0.296%
ChargeGen0.236%
WS Discovery0.221%
ACK-PSH0.208%
SNMP0.159%
VSE0.081%
MSSQL0.079%
ICMP0.072%
Bittorrent0.056%
OpenVPN0.046%
Dahua0.032%
GRE0.022%
TFTP0.014%
LOIC0.014%
STUN0.011%
Lantronix0.009%
CoAP0.008%
Jenkins0.006%
VXWorks0.005%
Ubiquity0.005%
TeamSpeak0.004%
XMAS0.003%
SPSS0.001%

A Crisis is Unfortunately Sometimes a Malevolent Opportunity

The number of DDoS attacks in March 2020 increased as compared to January and February. Attackers found the crisis period to be an opportune time to launch an increased number of DDoS attacks, as illustrated below.

Network-Layer DDoS Attack Trends for Q1 2020

Furthermore, as various government authorities started mandating lockdowns and shelter-in-place orders, attackers resorted to increasing the number of large-sized attacks in the latter half of March. There were 55% more attacks observed in the second half of month (March 16-31) as compared to the first half (March 1-15). Additionally, 94% of attacks peaking at 300-400 Gbps were launched in the month of March.

Stop DDoS attacks, Large or Small, Closer To The Source

With the ever shifting DDoS landscape, it is important to have a DDoS protection solution which is comprehensive and adaptive. In context with the attack insights illustrated above, here’s how Cloudflare stays ahead of these shifts to protect our customers.

  • As attacks shrink in rate and duration, Time To Mitigate SLAs as long as 15 minutes provided by legacy vendors are just not practical anymore. Cloudflare mitigates network layer DDoS attacks under 10 seconds in most cases, which is especially critical for the increasingly shorter attacks. Read more about the recent enhancements to our DDoS detection and mitigation systems that allow us to automatically detect and mitigate DDoS attacks so quickly at scale.
  • An increasing number of DDoS attacks are localized, which implies that legacy DDoS solutions which adopt a scrubbing center approach are not a feasible solution, as they are limited in their global coverage as well as act as a choke point, as DDoS traffic needs to be hauled back and forth from them. Cloudflare’s unique distributed architecture empowers every one of its data centers, spanning across 200 cities globally, to provide full DDoS mitigation capabilities.
  • Large distributed volumetric attacks still exist and are employed by resourceful attackers when the opportunity is rife. An attack exceeding 1 Tbps can be expected in the future, so the ability to mitigate large DDoS attacks is a key aspect of today’s DDoS solution. Cloudflare has one of the most interconnected networks in the world with a capacity of over 35 Tbps which allows it to mitigate even the largest DDoS attacks. This massive network capacity concomitant with the globally distributed architecture allows Cloudflare to mitigate attacks, both small and large, closer to the source.

To learn more about Cloudflare’s DDoS solution contact us or get started.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/rolling-with-the-punches-shifting-attack-tactics-dropping-packets-faster-cheaper-at-the-edge/

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

On Cloudflare’s 8th birthday in 2017, we announced free unmetered DDoS Protection as part of all of our plans, regardless if you’re an independent blogger using WordPress on Cloudflare’s Free plan or part of a large enterprise operating global network infrastructures. Our DDoS protection covers attack vectors on Layers 3-7; whether highly distributed and volumetric (rate-intensive) or small and sneaky. We protect over 26 million Internet properties, and at this scale, identifying small and sneaky DDoS attacks can be challenging, especially at L7. In this post, we discuss this challenge along with trends that we’ve seen, interesting DDoS attacks, and how we’ve responded to them so that you don’t have to worry.

When analyzing attacks on the Cloudflare network, we’ve seen a steady decline in the proportion of L3/L4 DDoS attacks that exceed a rate of 30 Gbps in recent months. From September 2019 to March 2020, attacks peaking over 30 Gbps decreased by 82%, and in March 2020, more than 95% of all network-layer DDoS attacks peaked below 30 Gbps. Over the same time period, the average size of a DDoS attack has also steadily decreased by 53%, to just 11.88 Gbps. Yet, very large attacks have not disappeared: we’re still seeing attacks with intensive rates peaking at 330 Gbps on average and up to 400 millions packets per second.  Some of our customers are being targeted with as many as 890 DDoS attacks in a single day and 1,750 DDoS attacks in a month.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

As the average rate of these L3/L4 attacks has decreased, they have become more localized and less geographically distributed. Increasingly, we’re seeing attacks hit just one or two of our data centers, which means that these hyper-localized attacks were launched in the catchment of the data center, otherwise our Anycast network would have spread the attack surface across our global fleet of data centers. Counterintuitively, these hyper-localized floods can be more difficult to detect on a global scale as the attack samples get diluted when aggregated from all of our data centers in the core. Therefore we’ve had to change our tactics and systems to roll with the change in attacker behavior.

Keeping things interesting in the penthouse floor of the OSI Model, over the same time period we’ve also observed some of the most rate-intensive and highly distributed L7 HTTP DDoS attacks we’ve ever seen. These attacks have pushed our engineering teams to invent even more efficient and intelligent ways to defend our network and our customers at scale. Let’s take a look at some of these trends and attacks.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge
Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

Centrally Analyzed, Edge Enforced DDoS Mitigations

Before we released dosd late last year, the primary automated system responsible for protecting Cloudflare and our customers against distributed rate-intensive attacks was Gatebot. Gatebot works by ingesting samples of flow data from routers and samples of HTTP requests from servers. It then analyzes these samples for anomalies, and when attacks are detected, pushes mitigation instructions automatically to the edge.

Gatebot requires a lot of computational power to analyze these samples, and correlate them across all the data centers, so it runs centrally in our “core” data centers, rather than at the edge. It does a terrific job at mitigating large attacks, and on average stops over 4,000 L3/L4 DDoS attacks every month.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

Edge Analyzed, Edge Enforced Mitigations

The persistent increase we’ve observed in smaller, more localized attacks was one of the main factors that drove us to develop a new, complementary system to Gatebot. We call this new system our denial of service daemon, or “dosd”, and this past month alone it mitigated 281,746 L3/4 DDoS attacks. This figure is roughly 6 times greater than what Gatebot dropped over the same period, thanks to dosd’s ability to detect smaller network attacks that would previously have flown under the radar (or taken longer to mitigate).

To complement the computationally heavy, centralized deployments of Gatebot, dosd was architected as a decentralized system that runs on every single server in every one of our data centers. Each instance detects and mitigates attacks independent of the other instances, or any sort of centralized data center whatsoever. As a result, the system is much faster than Gatebot, and can detect and mitigate attacks within 0-3 seconds (and less than 10 seconds on average). The speed of dosd enables it to generate real-time rules to quickly protect our customers at the data center. Then Gatebot, which samples traffic globally, can determine a mitigation that applies to all data centers if needed. In such a case, Gatebot will push rules to the data centers which will take priority over dosd’s rules.

dosd is also a leaner piece of software, consumes less memory and CPU, and significantly improves the resiliency of our network by removing the need to communicate with our core data centers to mitigate attacks. dosd detects and mitigates attacks using a similar logic to Gatebot’s methods, but in the scope of a single server, across a subset of servers in the same data center, or even across the entire data center.

Our automated Gatebot system is also tasked with mitigating L7 HTTP floods using request attributes as anomaly indicators. Mitigations can come in the form of actions such as JavaScript challenges, CAPTCHAs, Rate Limits (429), or Blocks (403) which are served back to the client as an error or challenge page. This form of mitigation at L7 allows the request to pass through TCP and TLS to the HTTP web server. During very rate-intensive attacks our servers can waste a lot of CPU and bandwidth as seen in the attack examples below.

Example #1 – Highly Distributed DDoS Attack Targeting A Customer Website

In July 2019, Cloudflare mitigated an HTTP DDoS attack that peaked at 1.4M requests per second. While this isn’t the most rate-intensive attack that we’ve seen, what is interesting is that the attack originated from almost 1.1M unique IP addresses. These were actual clients with the ability to complete a TCP and HTTPS handshake, they were not spoofed IP addresses. As it turns out, responding (rather than dropping at the network level) to over a million clients at a max rate of 1.4M requests per second can be quite costly.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

Example #2 – Rate-Intensive DDoS Attack Targeting A Customer Website

The second attack took place in September 2019. We mitigated an HTTP DDoS attack that peaked and persisted just below 5M requests per second for a little over an hour. What’s interesting is the sustained capability of the attacker to reach those rates from only 371K unique IPs (also not spoofed).

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

These attacks highlighted to us what needed to be optimized and consequently drove us to improve our L7 mitigations even more so, and significantly reduced the cost of mitigating an attack.

Using IP Jails to Reduce the Cost of Mitigation

With the goal of reducing the computational cost to Cloudflare of mitigating rate-intensive attacks, we recently rolled out a new Gatebot capability called IP Jails. IP Jails excels at efficiently mitigating extremely rate-intensive and distributed HTTP DDoS attacks. It is triggered when an attack exceeds a certain request rate and then pushes the mitigation from the application layer (L7 in the OSI model) to the transport layer (L4). Therefore instead of responding with an error or challenge page from the proxy, we simply drop the connection for that IP. Mitigating at L4 is more computationally efficient, it reduces our CPU and memory consumption in addition to saving bandwidth. It allows us to keep mitigating the largest of attacks without sacrificing performance.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

IP Jails in action

In the first graph below, you can see an HTTP flood peaking just below 8M rps before the IPs are ‘jailed’ for misbehaving. In the second graph, you can see that same attack being dropped as packets at L4.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge
Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

The flood requests generated over 130 Gbps in responses. IP Jails slashed it by a factor of 10.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

Similarly, you can see a spike in the attack mitigation CPU usage which then drops back to normal after IP Jails kicks in.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

Using Origin Errors to Catch Low-Rate Attacks

We see one or two of these rate-intensive attacks every month. But the vast majority of attacks we observe are mostly of a lower request rate, trying to sneak under the radar. To tackle these low-rate attacks better, last month we completed the rollout of a new capability that synchronizes Gatebot’s detection sensitivity with our customers’ origin server health. Gatebot uses the origin’s error response codes as an additional adaptive feedback signal.

However, when we take a step back and think about what a DDoS attack is actually, we usually think of a malicious actor that targets traffic at a specific website or IP address with the intent to degrade performance or cause an outage. However, malicious attackers are not the only threats to your applications availability.

As the migration of functionality to the edge increases, the cloud becomes smarter and more powerful, which often allows administrators to scale down their origin servers and infrastructure leaving the origin server weaker and under-configured. Evidently, there are many cases where an origin was taken down by small floods of traffic that were neither malicious nor generated with bad intentions. These floods may be generated by an overly excited good bot or even faulty client applications calling home too frequently. Fixing a home-sick client application or strengthening a server can be lengthy and costly processes during which the origin remains susceptible. Consequently, if a website is taken offline, no matter the reason, the end-users still experience it as if it were an attack.

Therefore this new capability not only protects our customers against DDoS attacks, but also protects the origin against all kinds of unwanted floods. It is designed to protect every one of our customers; big or small. It’s available on all of our plans including the Free plan.

When an origin responds to Cloudflare with an increasing rate of errors from the 500 range (Internal Server Error), Gatebot initiates automatically and analyzes traffic to reduce or eliminate the impact on the origin even faster than before. The current error rate is also compared to the average error rate to minimize false positives. Once an attack is detected, dynamically generated, ephemeral mitigation rules are propagated to Cloudflare’s edge data centers to mitigate the flood. Mitigation rules may use a block action (403), rate-limit (429), or even a challenge based on the fingerprint logic and confidence.

In March 2020, we mitigated 812 HTTP DDoS attacks on average every day, and approximately 20,000 HTTP DDoS attacks in total.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

Don’t Take Our Word For It, See For Yourself

Whether it’s Gatebot or dosd that mitigated L3/4 DDoS attacks, you can see both types of attack events for yourself in our new Network Analytics dashboard.

Rolling With The Punches: Shifting Attack Tactics & Dropping Packets Faster & Cheaper At The Edge

Today this dashboard provides Magic Transit & BYOIP customers real-time visibility into L3/4 traffic and DDoS attacks, and in the future we plan to expand access to customers of our other products.

Visibility into L7 DDoS attacks is available to our WAF/CDN customers that have access to the Firewall Analytics dashboard.

Unmetered DDoS Protection For All

Whether you’re part of a large global enterprise, or use Cloudflare for your personal site on the Free plan, we want to make sure that you’re protected and also have the visibility that you need.

DDoS Protection is included as part of every Cloudflare service; from Magic Transit at L3, through Spectrum at L4, to the WAF/CDN service at L7. Our mission is to help build a better Internet – and this means a safer, faster, and more reliable Internet. For everyone.

If you’re a Cloudflare customer of any plan (Free, Pro, Business or Enterprise), these new protections are now enabled by default at no additional charge.

Announcing Network Analytics

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/announcing-network-analytics/

Our Analytics Platform

Announcing Network Analytics

Back in March 2019, we released Firewall Analytics which provides insights into HTTP security events across all of Cloudflare’s protection suite; Firewall rule matches, HTTP DDoS Attacks, Site Security Level which harnesses Cloudflare’s threat intelligence, and more. It helps customers tailor their security configurations more effectively. The initial release was for Enterprise customers, however we believe that everyone should have access to powerful tools, not just large enterprises, and so in December 2019 we extended those same enterprise-level analytics to our Business and Pro customers.

Announcing Network Analytics
Source: https://imgflip.com/memegenerator

Since then, we’ve built on top of our analytics platform; improved the usability, added more functionality and extended it to additional Cloudflare services in the form of Account Analytics, DNS Analytics, Load Balancing Analytics, Monitoring Analytics and more.

Our entire analytics platform harnesses the powerful GraphQL framework which is also available to customers that want to build, export and share their own custom reports and dashboards.

Extending Visibility From L7 To L3

Until recently, all of our dashboards were mostly HTTP-oriented and provided visibility into HTTP attributes such as the user agent, hosts, cached resources, etc. This is valuable to customers that use Cloudflare to protect and accelerate HTTP applications, mobile apps, or similar. We’re able to provide them visibility into the application layer (Layer 7 in the OSI model) because we proxy their traffic at L7.

Announcing Network Analytics
DDoS Protection for Layer 3-7

However with Magic Transit, we don’t proxy traffic at L7 but rather route it at L3 (network layer). Using BGP Anycast, customer traffic is routed to the closest point of presence of Cloudflare’s network edge where it is filtered by customer-defined network firewall rules and automatic DDoS mitigation systems. Clean traffic is then routed via dynamic GRE Anycast tunnels to the customer’s data-centers. Routing at L3 means that we have limited visibility into the higher layers. So in order to provide Magic Transit customers visibility into traffic and attacks, we needed to extend our analytics platform to the packet-layer.

Announcing Network Analytics
Magic Transit Traffic Flow

On January 16, 2020, we released the Network Analytics dashboard for Magic Transit customers and Bring Your Own IP (BYOIP) customers. This packet and bit oriented dashboard provides near real-time visibility into network- and transport-layer traffic patterns and DDoS attacks that are blocked at the Cloudflare edge in over 200 cities around the world.

Announcing Network Analytics
Network Analytics – Packets & Bits Over Time

Analytics For A Year

The way we’ve architected the analytics data-stores enables us to provide our customers one year’s worth of insights. Traffic is sampled at the edge data-centers. From those samples we structure IP flow logs which are similar to SFlow and bucket one minute’s worth of traffic, grouped by destination IP, port and protocol. IP flows includes multiple packet attributes such as TCP flags, source IPs and ports, Cloudflare data-center, etc. The source IP is considered PII data and is therefore only stored for 30 days, after which the source IPs are discarded and logs are rolled up into one hour groups, and then one day groups. The one hour roll-ups are stored for 6 months and the one day roll-ups for 1 year.

Similarly, attack logs are also stored efficiently. Attacks are stored as summaries with start/end timestamps, min/max/average/total bits and packets per second, attack type, action taken and more. A DDoS attack could easily consist of billions of packets which could impact performance due to the number of read/write calls to the data-store. By storing attacks as summary logs, we’re able to overcome these challenges and therefore provide attack logs for up to 1 year back.

Network Analytics via GraphQL API

We built this dashboard on the same analytics platform, meaning that our packet-level analytics are also available by GraphQL. As an example, below is an attack report query that would show the top attacker IPs, the data-center cities and countries where the attack was observed, the IP version distribution, the ASNs that were used by the attackers and the ports. The query is done at the account level, meaning it would provide a report for all of your IP ranges. In order to narrow the report down to a specific destination IP or port range, you can simply add additional filters. The same filters also exist in the UI.

{
  viewer {
    accounts(filter: { accountTag: $accountTag }) {
      topNPorts: ipFlows1mGroups(
        limit: 5
        filter: $portFilter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: sourcePort
          ipProtocol
          __typename
        }
        __typename
      }
      topNASN: ipFlows1mGroups(
        limit: 5
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: sourceIPAsn
          description: sourceIPASNDescription
          __typename
        }
        __typename
      }
      topNIPs: ipFlows1mGroups(
        limit: 5
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: sourceIP
          __typename
        }
        __typename
      }
      topNColos: ipFlows1mGroups(
        limit: 10
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: coloCity
          coloCode
          __typename
        }
        __typename
      }
      topNCountries: ipFlows1mGroups(
        limit: 10
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: coloCountry
          __typename
        }
        __typename
      }
      topNIPVersions: ipFlows1mGroups(
        limit: 2
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: ipVersion
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
}
Attack Report Query Example

After running the query using Altair GraphQL Client, the response is returned in a JSON format:

Announcing Network Analytics

What Do Customers Want?

As part of our product definition and design research stages, we interviewed internal customer-facing teams including Customer Support, Solution Engineering and more. I consider these stakeholders as super-user-aggregators because they’re customer-facing teams and are constantly engaging and helping our users. After the internal research phase, we expanded externally to customers and prospects; particularly network and security engineers and leaders. We wanted to know how they expect the dashboard to fit in their work-flow, what are their use cases and how we can tailor the dashboard to their needs. Long story short, we identified two main use cases: Incident Response and Reporting. Let’s go into each of these use cases in more detail.

Incident Response

I started off by asking them a simple question – “what do you do when you’re paged?” We wanted to better understand their incident response process; specifically, how they’d expect to use this dashboard when responding to an incident and what are the metrics that matter to them to help them make quick calculated decisions.

You’ve Just Been Paged

Let’s say that you’re a security operations engineer. It’s Black Friday. You’re on call. You’ve just been paged. Traffic levels to one of your data-centers has exceeded a safe threshold. Boom. What do you do? Responding quickly and resolving the issue as soon as possible is key.

If your workflows are similar to our customers’, then your objective is to resolve the page as soon as possible. However, before you can resolve it, you need to determine if there is any action that you need to take. For instance, is this a legitimate rise in traffic from excited Black Friday shoppers, perhaps a new game release or maybe an attack that hasn’t been mitigated? Do you need to shift traffic to another data-center or are levels still stable? Our customers tell us that these are the metrics that matter the most:

  1. Top Destination IP and port – helps understand what services are being impacted
  2. Top source IPs, port, ASN, data-center and data-center Country – helps identify the source of the traffic
  3. Real-time packet and bit rates – helps understand the traffic levels
  4. Protocol distribution – helps understand what type of traffic is abnormal
  5. TCP flag distribution – an abnormal distribution could indicate an attack
  6. Attack Log – shows what types of traffic is being dropped/rate-limited
Announcing Network Analytics
Customizable DDoS Attack Log

As network and transport layer attacks can be highly distributed and the packet attributes can be easily spoofed, it’s usually not practical to block IPs. Instead, the dashboard enables you to quickly identify patterns such as an increased frequency of a specific TCP flag or increased traffic from a specific country. Identifying these patterns brings you one step closer to resolving the issue. After you’ve identified the patterns, packet-level filtering can be applied to drop or rate-limit the malicious traffic. If the attack was automatically mitigated by Cloudflare’s systems, you’ll be able to see it immediately along with the attack attributes in the activity log. By filtering by the Attack ID, the entire dashboard becomes your attack report.

Announcing Network Analytics
Packet/Bit Distribution by Source & Destination
Announcing Network Analytics
TCP Flag Distribution

Reporting

During our interviews with security and network engineers, we also asked them what metrics and insights they need when creating reports for their managers, C-levels, colleagues and providing evidence to law-enforcement agencies. After all, processing data and creating reports can consume over a third (36%) of a security team’s time (~3 hours a day) and is also one of the most frequent DDoS asks by our customers.

Announcing Network Analytics
Add filters, select-time range, print and share

On top of all of these customizable insights, we wanted to also provide a one-line summary that would reflect your recent activity. The one-liner is dynamic and changes based on your activity. It tells you whether you’re currently under attack, and how many attacks were blocked. If your CISO is asking for a security update, you can simply copy-paste it and convey the efficiency of the service:

Announcing Network Analytics
Dynamic Summary

Our customers say that they want to reflect the value of Cloudflare to their managers and peers:

  1. How much potential downtime and bandwidth did Cloudflare spare me?
  2. What are my top attacked IPs and ports?
  3. Where are the attacks coming from? What types and what are the trends?

The Secret To Creating Good Reports

What does everyone love? Cool Maps! The key to a good report is adding a map; showing where the attack came from. But given that packet attributes can be easily spoofed, including the source IP, it won’t do us any good to plot a map based on the locations of the source IPs. It would result in a spoofed source country and is therefore useless data. Instead, we decided to show the geographic distribution of packets and bits based on the Cloudflare data-center in which they were ingested. As opposed to legacy scrubbing center solutions with limited network infrastructures, Cloudflare has data-centers in more than 200 cities around the world. This enables us to provide precise geographic distribution with high confidence, making your reports accurate.

Announcing Network Analytics
Packet/Bit Distribution by geography: Data-center City & Country

Tailored For You

One of the main challenges both our customers and we struggle with is how to process and generate actionable insights from all of the data points. This is especially critical when responding to an incident. Under this assumption, we built this dashboard with the purpose of speeding up your reporting and investigation processes. By tailoring it to your needs, we hope to make you more efficient and make the most out of Cloudflare’s services. Got any feedback or questions? Post them below in the comments section.

If you’re an existing Magic Transit or BYOIP customer, then the dashboard is already available to you. Not a customer yet? Click here to learn more.

Who DDoS’d Austin?

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/who-ddosd-austin/

Who DDoS'd Austin?

It was a scorching Monday on July 22 as temperatures soared above 37°C (99°F) in Austin, TX, the live music capital of the world. Only hours earlier, the last crowds dispersed from the historic East 6th Street entertainment district. A few blocks away, Cloudflarians were starting to make their way to the office. Little did those early arrivers know that they would soon be unknowingly participating in a Cloudflare time honored tradition of dogfooding new services before releasing them to the wild.

6th East Street, Austin Texas

Who DDoS'd Austin?
(A photo I took on a night out with the team while visiting the Cloudflare Austin office)

Dogfooding is when an organization uses its own products. In this case, we dogfed our newest cloud service, Magic Transit, which both protects and accelerates our customers’ entire network infrastructure—not just their web properties or TCP/UDP applications. With Magic Transit, Cloudflare announces your IP prefixes via BGP, attracts (routes) your traffic to our global network edge, blocks bad packets, and delivers good packets to your data centers via Anycast GRE.

Who DDoS'd Austin?

We decided to use Austin’s network because we wanted to test the new service on a live network with real traffic from real people and apps. With the target identified, we began onboarding the Austin office in an always-on routing topology.

In an always-on routing mode, Cloudflare data centers constantly advertise Austin’s prefix, resulting in faster, almost immediate mitigation. As opposed to traditional on-demand scrubbing center solutions with limited networks, Cloudflare operates within 100 milliseconds of 99% of the Internet-connected population in the developed world. For our customers, this means that always-on DDoS mitigation doesn’t sacrifice performance due to suboptimal routing. On the contrary, Magic Transit can actually improve your performance due to our network’s reach.

Cloudflare’s Global Network

Who DDoS'd Austin?

DDoS’ing Austin

Now that we’ve completed onboarding Austin to Magic Transit, all we needed was a motivated attacker to launch a DDoS attack. Luckily, we found more than a few willing volunteers on our Site Reliability Engineering (SRE) team to execute the attack. While the teams were still assembling in multiple locations around the world, our SRE volunteer started firing packets at our target from an undisclosed location.

Who DDoS'd Austin?

Without Magic Transit, the Austin office would’ve been hit directly with the packet flood. Two things could have happened in this case (not mutually exclusive):

  1. Austin’s on-premise equipment (routers, firewalls, servers, etc.) would have been overwhelmed and failed
  2. Austin’s service providers would have dropped packets that exceeded its bandwidth allowance

Both cases would result in a very bad day for everyone.

Cloudflare DDoS Mitigation

Instead, when our SRE attacker launched the flood the packets were automatically routed via BGP to Cloudflare’s network. The packets reached the closest data center via Anycast and encountered multiple defenses in the form of XDP, eBPF and iptables. Those defenses are populated with pre-configured static firewall rules as well as dynamic rules generated by our DDoS mitigation systems.

Static rules can vary from straightforward IP blocking and rate-limiting to more sophisticated expressions that match against specific packet attributes. Dynamic rules, on the other hand, are generated automatically in real-time. To play fair with our attacker, we didn’t pre-configure any special rules against the attack. We wanted to give our attacker a fair opportunity to take Austin down. Although due to our multi-layered protection approach, the odds were never actually in their favor.

Who DDoS'd Austin?
Source: https://imgflip.com

Generating Dynamic Rules

As part of our multi-layered protection approach, Dynamic Rules are generated on-the-fly by analyzing the packets that route through our network. While the packets are being routed, flow data is asynchronously sampled, collected, and analyzed by two main detection systems. The first is called Gatebot and runs across the entire Cloudflare network; the second is our newly deployed DoSD (denial of service daemon) which operates locally within each data center. DoSD is an exciting improvement that we’ve just recently rolled out and we look forward to writing more about its technical details here soon. DoSD samples at a much faster rate (1/100 packets) versus Gatebot which samples at a lower rate (~1/8000 packets), allowing it to detect even more attacks and block them faster.

The asynchronous attack detection lifecycle is represented as the dotted lines in the diagram below. Attacks are detected out of path to assure that we don’t add any latency, and mitigation rules are pushed in line and removed as needed.

Who DDoS'd Austin?

Multiple packet attributes and correlations are taken into consideration during analysis and detection. Gatebot and DoSD search for both new network anomalies and already known attacks. Once an attack is detected, rules are automatically generated, propagated, and applied in the optimal location within 10 seconds or less. Just to give you an idea of the scale, we’re talking about hundreds of thousands of dynamic rules that are applied and removed every second across the entire Cloudflare network.

One of the beauties of Gatebot and DoSD is that they don’t require a traffic learning period. Once a customer is onboarded, they’re protected immediately. They don’t need to sample traffic for weeks before kicking in. While we can always apply specific firewall rules if requested by the customer, no manual configuration is required by the customer or our teams. It just works.

What this mitigation process looks like in practice

Let’s look at what happened in Austin when one of our SREs tried to DDoS Austin and failed. During one of the first attempts, before DoSD had rolled out globally, a degradation in audio and video quality was noticed for Austin employees on video calls for a few seconds before Gatebot kicked in. However, as soon as Gatebot kicked in, the quality was immediately restored. If we hadn’t had Magic Transit in-line, the degradation of service would’ve worsened until the point of full denial of service. Austin would have been offline and our Austin colleagues wouldn’t have had a very productive day.

On a subsequent attack attempt which took place after DoSD was deployed, our SRE launched a SYN flood on Austin. The attack targeted multiple IP addresses in Austin’s prefix and peaked just above 250,000 packets per second. DoSD detected the attack and blocked it in approximately 3 seconds. DoSD’s quick response resulted in no degradation of service for the Austin team.

Attack Snapshot

Who DDoS'd Austin?
Green line = Attack traffic to Cloudflare edge, Yellow line = clean traffic from Cloudflare to origin over GRE

What We Learned

Dogfooding Magic Transit served as a valuable experiment for us with lots of lessons learned both from the engineering and procedural aspects. From the engineering aspect, we fine-tuned our mitigations and optimized routings. From the procedural aspects, we drilled members of multiple teams including the Security Operations Center and Solution Engineering teams to help refine our run-books. By doing so, we reduced the onboarding duration to hours instead of days in order to assure a quick and smooth onboarding experience for our customers.

Want To Learn More?

Request a demo and learn how you can protect and accelerate your network with Cloudflare.